The published experiments with shallow parsing for Slavic languages are characterised with small size of the corpora used. With the publication of the National Corpus of Polish (NCP), a new opportunity was opened: to test several chunking algorithms on the 1-million token manually annotated subcorpus of the NCP. We test three Machine Learning techniques: Decision Tree induction, Memory-Based Learning and Conditional Random Fields. We also investigate the influence of tagging errors on the overall chunker performance, which happens to be quite substantial. © 2012 Springer-Verlag.
CITATION STYLE
Radziszewski, A., & Pawlaczek, A. (2012). Large-scale experiments with NP chunking of Polish. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7499 LNAI, pp. 143–149). https://doi.org/10.1007/978-3-642-32790-2_17
Mendeley helps you to discover research relevant for your work.