Liner2-a customizable framework for proper names recognition for polish

32Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the paper we present a customizable and open-source framework for proper names recognition called Liner2. The framework consists of several universal methods for sequence chunking which include: dictionary look-up, pattern matching and statistical processing. The statistical processing is performed using Conditional Random Fields and a rich set of features including morphological, lexical and semantic information. We present an application of the framework to the task of recognition proper names in Polish texts (5 common categories of proper names, i.e. first names, surnames, city names, road names and country names). The Liner2 framework was also used to train an extended model to recognize 56 categories of proper names which was used to bootstrap the manual annotation of KPWr corpus. We also present the CRF-based model integrated with a heterogeneous named entity similarity function. We show that the similarity function added to the best configuration improved the final result for cross-domain evaluation. The last section presents NER-WS-a web service for proper names recognition in Polish texts utilizing the Liner2 framework and the model for 56 categories of proper names. The web service can be tested using a web-based demo available at http://nlp.pwr.wroc.pl/inforex/. © 2013 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Marcińczuk, M., Kocoń, J., & Janicki, M. (2013). Liner2-a customizable framework for proper names recognition for polish. Studies in Computational Intelligence, 467, 231–253. https://doi.org/10.1007/978-3-642-35647-6_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free