Skip to main content

DBM-Tree: Trading height-balancing for performance in metric access methods

Abstract

Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called theDBM-tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm calledShrink is also presented, which improves the performance of an already builtDBM-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that theDBM-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range andk-nearest neighbor queries. In addition, theDBM-tree scales up well, exhibiting linear performance with growing number of elements in the database.

References

  1. Ricardo A. Baeza-Yates, Walter Cunto, Udi Manber, and Sun Wu. Proximity matching using fixedqueries trees. In5th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 807 of LNCS, pages 198–212, Asilomar, USA, 1994. Springer Verlag.

  2. Tolga Bozkaya and Meral zsoyoglu. Distance-based indexing for high-dimensional metric spaces. InProceedings of the ACM International Conference on Management of Data (SIGMOD), pages 357–368, 1997.

  3. Tolga Bozkaya and Meral zsoyoglu. Indexing large metric spaces for similarity search queries.ACM Transactions on Database Systems (TODS), 24(3):361–404, sep 1999.

    Article  Google Scholar 

  4. Sergey Brin. Near neighbor search in large metric spaces. InProceedings of the International Conference on Very Large Data Bases (VLDB), pages 574-584, Zurich, Switzerland, 1995. Morgan Kaufmann.

  5. W. A. Burkhard and R. M. Keller. Some approaches to best-match file searching.Communications of the ACM, 16(4):230–236, apr 1973.

    Article  MATH  Google Scholar 

  6. Fabio J. T. Chino, Marcos R. Vieira, Agma J. M. Traina, and Caetano Traina Jr. Mamview: A visual tool for exploring and understanding metric access methods. InProceedings of the 20th Annual ACM Symposium on Applied Computing (SAC), page 6p, Santa Fe, New Mexico, USA, 2005. ACM Press.

  7. Edgar Chvez, Gonzalo Navarro, Ricardo Baeza-Yates, and Jos Luis Marroqun. Searching in metric spaces.ACM Computing Surveys (CSUR), 33(3):273–321, sep 2001.

    Article  Google Scholar 

  8. P. Ciaccia, M. Patella, and P. Zezula. A cost model for similarity queries in metric spaces. InACM Symposium on Principles of Database Systems (PODS), pages 59–68, 1998.

  9. Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An efficient access method for similarity search in metric spaces. InProceedings of International Conference on Very Large Data Bases (VLDB), pages 426–435, Athens, Greece, 1997. Morgan Kaufmann.

  10. Roberto F. Santos Filho, Agma J. M. Traina, Caetano Traina Jr., and Christos Faloutsos. Similarity search without tears: The OMNI family of allpurpose access methods. InIEEE International Conference on Data Engineering (ICDE), pages 623–630, Heidelberg, Germany, 2001.

  11. Volker Gaede and Oliver Gnther. Multidimensional access methods.ACM Computing Surveys (CSUR), 30(2):170–231, 1998.

    Article  Google Scholar 

  12. A. Guttman. R-tree : A dynamic index structure for spatial searching. InACM International Conference on Data Management (SIGMOD), pages 47–57, Boston, USA, 1984.

  13. Gisli R. Hjaltason and Hanan Samet. Index-driven similarity search in metric spaces.ACM Transactions on Database Systems (TODS), 28(4):517–580, dec 2003.

    Article  Google Scholar 

  14. Caetano Traina Jr., Agma J. M. Traina, Christos Faloutsos, and Bernhard Seeger. Fast indexing and visualization of metric datasets using slim-trees.IEEE Transactions on Knowledge and Data Engineering (TKDE), 14(2):244–260, 2002.

    Article  MathSciNet  Google Scholar 

  15. Caetano Traina Jr., Agma J. M. Traina, Bernhard Seeger, and Christos Faloutsos. Slim-trees: High performance metric trees minimizing overlap between nodes. InInternational Conference on Extending Database Technology (EDBT), volume 1777 ofLNCS, pages 51–65, Konstanz, Germany, 2000. Springer.

  16. K. A. Ross, I. Sitzmann, and P. J. Stuckey. Costbased unbalanced R-trees. InIEEE International Conference on Scientific and Statistical Database Management (SSDBM), pages 203–212, 2001.

  17. Y. Theodoridis, E. Stefanakis, and T. K. Sellis. Efficient cost models for spatial queries using R-trees.IEEE Transactions on Knowledge and Data Engineering (TKDE), 12(1):19–32, 2000.

    Article  Google Scholar 

  18. Agma J. M. Traina, Caetano Traina Jr., Josiane M. Bueno, and Paulo M. de A. Marques. The metric histogram: A new and efficient approach for content-based image retrieval. InSixth IFIP Working Conference on Visual Database Systems (VDB), Brisbane, Australia, 2002.

  19. Jeffrey K. Uhlmann. Satisfying general proximity/ similarity queries with metric trees.Information Processing Letters, 40(4):175–179, 1991.

    Article  MATH  Google Scholar 

  20. Marcos R. Vieira, Caetano Traina Jr., Fabio J. T. Chino, and Agma J. M. Traina. DBM-tree: A dynamic metric access method sensitive to local density data. InXIX Brazilian Symposium on Databases (SBBD), pages 163–177, Brasília, Brazil, 2004.

  21. Peter N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. InProceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 311–321, Austin, USA, 1993.

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work has been supported by FAPESP (São Paulo State Research Foundation) under grants 01/11987-3, 01/12536-5 and 02/07318-1 and by CNPq (Brazilian National Council for Supporting Research) under grants 52.1685/98-6, 860.068/00-7, 50.0780/2003-0 and 35.0852/94-4.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Vieira, M.R., Traina, C., Chino, F.J.T. et al. DBM-Tree: Trading height-balancing for performance in metric access methods. J Braz Comp Soc 11, 37–51 (2005). https://doi.org/10.1007/BF03192381

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03192381

Key words