The present work introduces It-CMC, a fine-grained POS tagset that aims at combining linguistic accuracy and computational sustainability. It-CMC is tailored on Italian data from Computer-Mediated Communication (CMC) and, across the sections of the paper, a sistematically comparison with the current tagset of the La Repubblica corpus is provided. After an early stage of performance monitoring carried out with Schmid’s TreeTagger, the tagset is currently involved in a workflow that aims at creating an Italian parameter file for RFTagger.
CITATION STYLE
Towards it-CMC: A fine-grained pos tagset for Italian linguistic analysis. (2016). In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9924 LNCS, pp. 62–73). Springer Verlag. https://doi.org/10.1007/978-3-319-45510-5_8
Mendeley helps you to discover research relevant for your work.