- Open Access
DBM-Tree: Trading height-balancing for performance in metric access methods
Journal of the Brazilian Computer Society volume 11, pages 37–51 (2005)
Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called theDBM-tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm calledShrink is also presented, which improves the performance of an already builtDBM-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that theDBM-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range andk-nearest neighbor queries. In addition, theDBM-tree scales up well, exhibiting linear performance with growing number of elements in the database.
Ricardo A. Baeza-Yates, Walter Cunto, Udi Manber, and Sun Wu. Proximity matching using fixedqueries trees. In5th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 807 of LNCS, pages 198–212, Asilomar, USA, 1994. Springer Verlag.
Tolga Bozkaya and Meral zsoyoglu. Distance-based indexing for high-dimensional metric spaces. InProceedings of the ACM International Conference on Management of Data (SIGMOD), pages 357–368, 1997.
Tolga Bozkaya and Meral zsoyoglu. Indexing large metric spaces for similarity search queries.ACM Transactions on Database Systems (TODS), 24(3):361–404, sep 1999.
Sergey Brin. Near neighbor search in large metric spaces. InProceedings of the International Conference on Very Large Data Bases (VLDB), pages 574-584, Zurich, Switzerland, 1995. Morgan Kaufmann.
W. A. Burkhard and R. M. Keller. Some approaches to best-match file searching.Communications of the ACM, 16(4):230–236, apr 1973.
Fabio J. T. Chino, Marcos R. Vieira, Agma J. M. Traina, and Caetano Traina Jr. Mamview: A visual tool for exploring and understanding metric access methods. InProceedings of the 20th Annual ACM Symposium on Applied Computing (SAC), page 6p, Santa Fe, New Mexico, USA, 2005. ACM Press.
Edgar Chvez, Gonzalo Navarro, Ricardo Baeza-Yates, and Jos Luis Marroqun. Searching in metric spaces.ACM Computing Surveys (CSUR), 33(3):273–321, sep 2001.
P. Ciaccia, M. Patella, and P. Zezula. A cost model for similarity queries in metric spaces. InACM Symposium on Principles of Database Systems (PODS), pages 59–68, 1998.
Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An efficient access method for similarity search in metric spaces. InProceedings of International Conference on Very Large Data Bases (VLDB), pages 426–435, Athens, Greece, 1997. Morgan Kaufmann.
Roberto F. Santos Filho, Agma J. M. Traina, Caetano Traina Jr., and Christos Faloutsos. Similarity search without tears: The OMNI family of allpurpose access methods. InIEEE International Conference on Data Engineering (ICDE), pages 623–630, Heidelberg, Germany, 2001.
Volker Gaede and Oliver Gnther. Multidimensional access methods.ACM Computing Surveys (CSUR), 30(2):170–231, 1998.
A. Guttman. R-tree : A dynamic index structure for spatial searching. InACM International Conference on Data Management (SIGMOD), pages 47–57, Boston, USA, 1984.
Gisli R. Hjaltason and Hanan Samet. Index-driven similarity search in metric spaces.ACM Transactions on Database Systems (TODS), 28(4):517–580, dec 2003.
Caetano Traina Jr., Agma J. M. Traina, Christos Faloutsos, and Bernhard Seeger. Fast indexing and visualization of metric datasets using slim-trees.IEEE Transactions on Knowledge and Data Engineering (TKDE), 14(2):244–260, 2002.
Caetano Traina Jr., Agma J. M. Traina, Bernhard Seeger, and Christos Faloutsos. Slim-trees: High performance metric trees minimizing overlap between nodes. InInternational Conference on Extending Database Technology (EDBT), volume 1777 ofLNCS, pages 51–65, Konstanz, Germany, 2000. Springer.
K. A. Ross, I. Sitzmann, and P. J. Stuckey. Costbased unbalanced R-trees. InIEEE International Conference on Scientific and Statistical Database Management (SSDBM), pages 203–212, 2001.
Y. Theodoridis, E. Stefanakis, and T. K. Sellis. Efficient cost models for spatial queries using R-trees.IEEE Transactions on Knowledge and Data Engineering (TKDE), 12(1):19–32, 2000.
Agma J. M. Traina, Caetano Traina Jr., Josiane M. Bueno, and Paulo M. de A. Marques. The metric histogram: A new and efficient approach for content-based image retrieval. InSixth IFIP Working Conference on Visual Database Systems (VDB), Brisbane, Australia, 2002.
Jeffrey K. Uhlmann. Satisfying general proximity/ similarity queries with metric trees.Information Processing Letters, 40(4):175–179, 1991.
Marcos R. Vieira, Caetano Traina Jr., Fabio J. T. Chino, and Agma J. M. Traina. DBM-tree: A dynamic metric access method sensitive to local density data. InXIX Brazilian Symposium on Databases (SBBD), pages 163–177, Brasília, Brazil, 2004.
Peter N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. InProceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 311–321, Austin, USA, 1993.
This work has been supported by FAPESP (São Paulo State Research Foundation) under grants 01/11987-3, 01/12536-5 and 02/07318-1 and by CNPq (Brazilian National Council for Supporting Research) under grants 52.1685/98-6, 860.068/00-7, 50.0780/2003-0 and 35.0852/94-4.
About this article
Cite this article
Vieira, M.R., Traina, C., Chino, F.J.T. et al. DBM-Tree: Trading height-balancing for performance in metric access methods. J Braz Comp Soc 11, 37–51 (2005). https://doi.org/10.1007/BF03192381
- Metric Access Method
- Metric Tree
- Similarity Queries