This article discusses the advanced features of the newly developed search engine of the “Tugan tel” corpus management system. This corpus consists of texts written in the Tatar language. The new features include executing complex queries with arbitrary logical formulas for direct and reverse search; executing complex queries using a thesaurus or word form/lemma list and extracting some types of named entities. Complex queries enable to automatically extract and annotate semantic data from a corpus for linguistic applications. These options improve the search process and also enable to test the lexicon and collocations in the corpus.
CITATION STYLE
Mukhamedshin, D., Nevzorova, O., & Khusainov, A. (2017). Complex Search Queries in the Corpus Management System. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10449 LNAI, pp. 407–416). Springer Verlag. https://doi.org/10.1007/978-3-319-67077-5_39
Mendeley helps you to discover research relevant for your work.