A sequence-based tree similarity search
Closing the Gaps in Biodiversity Data Management: What Biodiversity Researchers Seek and Computer Science Can Offer
Dynamic strategies for query constructing and rank merging from multiple search engines
Fuzzy Inference-Based Ontology Matching Using Upper Ontology
MAG: A performance evaluation framework for database systems
SeeCOnt: A New Seeding-Based Clustering Approach for Ontology Matching
Semantic Technologies for Consolidating Structured Data and Unstructured Documents in Biodiversity Research
Towards Visualization Recommendation – A Semi-Automated Domain-Specific Learning Approach
A sequence-based tree similarity search
Title: | A sequence-based tree similarity search |
---|---|
Authors: | Alsayed Algergawy, Friederike Klan |
Source: | 9th IEEE International Conference on Research Challenges in Information Science (RCIS) |
Place: | Greece |
Date: | 2015-05-11 |
Type: | Conference Paper |
Abstract: |
Tree-structured data are pervasively growing and exploiting them based on similarity is essential for a broad number of applications. Therefore, there has been a growing need to develop high-performance techniques to efficiently look for similar trees across a large number of trees. To this end, in this paper, we present a new sequence-based approach for tree similarity search that exploits both the structural and the content characteristics of tree-structured data. In particular, we transform tree data into sequence representations using a modified Prüfer sequence that constructs a one-to-one mapping between tree data and their sequence representations. We introduce a new tree sequence distance based on the structural information of the data tree, which filters out a set of false positive candidates. We then introduce a refinement step exploiting the content information of data trees. The preliminary experimental results show that our algorithm achieves high performance. Our method is especially suitable for accelerating similarity computation in clustering and/or classification of large numbers of trees in massive datasets. |