We have extended an existing lemmatizer, which relies on a lexicon of about 1.2 millions form, where lemmas are indexed by rich PoS tags, with a sequence of cascading filters, each one in charge of dealing with specific issues related to out-of-dictionary words. The last two filters are devoted to resolve semantic ambiguities between words of the same syntactic category, by querying external resources: an enriched index built on the Italian Wikipedia and the Google index. © Springer-Verlag Berlin Heidelberg 2013.
CITATION STYLE
Attardi, G., Dei Rossi, S., & Simi, M. (2013). The Tanl lemmatizer enriched with a sequence of cascading filters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7689 LNAI, pp. 257–265). https://doi.org/10.1007/978-3-642-35828-9_28
Mendeley helps you to discover research relevant for your work.