Similarity evaluation on tree-structured data

Rui Yang, Panos Kalnis, Anthony K.H. Tung*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

112 Scopus citations

Abstract

Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform tree-structured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L 1 distance of the corresponding vectors, whose computational complexity is 0(|T 1| + |T 2|), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filter-and-refine framework to process similarity search on tree-structured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets.

Original languageEnglish (US)
Pages (from-to)754-765
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
StatePublished - 2005
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Similarity evaluation on tree-structured data'. Together they form a unique fingerprint.

Cite this