Suppose we have a large dictionary of strings. Each entry starts with a figure of merit (popularity). We wish to find the kbest matches for a substring, s, in a dictinoary, dict. That is, grep s dict | sort -n | head -k, but we would like to do this in sublinear time. Example applications: (1) web queries with popularities, (2) products with prices and (3) ads with click through rates. This paper proposes a novel index, k-best suffix arrays, based on ideas borrowed from suffix arrays and kdtrees. A standard suffix array sorts the suffixes by a single order (lexicographic) whereas k-best suffix arrays are sorted by two orders (lexicographic and popularity). Lookup time is between log N and sqrt N.
CITATION STYLE
Church, K., Thiesson, B., & Ragno, R. (2007). K-best suffix arrays. In NAACL-HLT 2007 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Companion Volume: Short Papers (pp. 17–20). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1614108.1614113
Mendeley helps you to discover research relevant for your work.