Indexing of textual databases based on lexical resources: A case study for Serbian

3Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several tf idf based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure.

Cite

CITATION STYLE

APA

Stanković, R., Krstev, C., Obradović, I., & Kitanović, O. (2015). Indexing of textual databases based on lexical resources: A case study for Serbian. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9398, pp. 167–181). Springer Verlag. https://doi.org/10.1007/978-3-319-27932-9_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free