Fast, flexible text search using genomic short-read mapping model

Sung Hwan Kim; Hwan Gue Cho

Journal ArticleOPEN ACCESS

Fast, flexible text search using genomic short-read mapping model

ETRI Journal (2016) 38(3) 518-528

DOI: 10.4218/etrij.16.0115.0594

0Citations

8Readers

Abstract

The searching of an extensive document database for documents that are locally similar to a given query document, and the subsequent detection of similar regions between such documents, is considered as an essential task in the fields of information retrieval and data management. In this paper, we present a framework for such a task. The proposed framework employs the method of short-read mapping, which is used in bioinformatics to reveal similarities between genomic sequences. In this paper, documents are considered biological objects; consequently, edit operations between locally similar documents are viewed as an evolutionary process. Accordingly, we are able to apply the method of evolution tracing in the detection of similar regions between documents. In addition, we propose heuristic methods to address issues associated with the different stages of the proposed framework, for example, a frequency-based fragment ordering method and a locality-aware interval aggregation method. Extensive experiments covering various scenarios related to the search of an extensive document database for documents that are locally similar to a given query document are considered, and the results indicate that the proposed framework outperforms existing methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Kim, S. H., & Cho, H. G. (2016). Fast, flexible text search using genomic short-read mapping model. ETRI Journal, 38(3), 518–528. https://doi.org/10.4218/etrij.16.0115.0594

Fast, flexible text search using genomic short-read mapping model

Abstract

Author supplied keywords

Cite

Register to see more suggestions