This paper presents a named entity recognition method which finds predetermined entities in an unstructured text. The method uses word similarities based on typical word transformations (lemmatization and stemming), word embeddings and character level based similarity to map those entities onto words in the text. The approach is language independent, though language-dependent components are used for lemmatization, stemming and word embedding, and works on any given set of entities. Special attention is given to the entities which are represented in a hierarchical form with the hypernymy-hyponymy relation. The proposed method has the following advantages: it finds the normalized form of the recognized entity name; it is easy to adjust to a new domain; it respects the hierarchical organization of entities; and due to the modular approach can be constantly improved just by updating components for lemmatization, stemming or word embedding. The proposed entity recognition method was tested on a test set of tourist queries and hierarchical entities collected from Slovenia.info tourist portal.
CITATION STYLE
Štravs, M., & Zupančič, J. (2019). Named entity recognition using gazetteer of hierarchical entities. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11606 LNAI, pp. 768–776). Springer Verlag. https://doi.org/10.1007/978-3-030-22999-3_65
Mendeley helps you to discover research relevant for your work.