Fast, flexible text search using genomic short-read mapping model

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The searching of an extensive document database for documents that are locally similar to a given query document, and the subsequent detection of similar regions between such documents, is considered as an essential task in the fields of information retrieval and data management. In this paper, we present a framework for such a task. The proposed framework employs the method of short-read mapping, which is used in bioinformatics to reveal similarities between genomic sequences. In this paper, documents are considered biological objects; consequently, edit operations between locally similar documents are viewed as an evolutionary process. Accordingly, we are able to apply the method of evolution tracing in the detection of similar regions between documents. In addition, we propose heuristic methods to address issues associated with the different stages of the proposed framework, for example, a frequency-based fragment ordering method and a locality-aware interval aggregation method. Extensive experiments covering various scenarios related to the search of an extensive document database for documents that are locally similar to a given query document are considered, and the results indicate that the proposed framework outperforms existing methods.

Cite

CITATION STYLE

APA

Kim, S. H., & Cho, H. G. (2016). Fast, flexible text search using genomic short-read mapping model. ETRI Journal, 38(3), 518–528. https://doi.org/10.4218/etrij.16.0115.0594

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free