This study explores an alternative way of storing text files in a difierent format that will speed up the searching process. The input file is decomposed into two parts as filter and payload. Filter part is composed of most informative k-bits of each byte from the original file. Remaining bits form the payload. Selection of the most informative bits are achieved according to their entropy. When an input pattern is to be searched on the new file structure, same decomposition is performed on the pattern. The filter part of the pattern is queried in the filter part of the file following by a verification process of the payload for the matching positions. Experiments conducted on natural language texts, plain ascii DNA sequences, and random byte sequences showed that the search performance with the proposed scheme is on the average two times faster than the tested exact pattern matching algorithms. © 2011 Springer Science+Business Media B.V.
CITATION STYLE
Külekci, M. O., Vitter, J. S., & Xu, B. (2010). Boosting pattern matching performance via k-bit filtering. In Lecture Notes in Electrical Engineering (Vol. 62 LNEE, pp. 27–32). https://doi.org/10.1007/978-90-481-9794-1_6
Mendeley helps you to discover research relevant for your work.