Part-of-speech (POS) tagging is a key process for various natural language processing related tasks, in which each word of a sentence is assigned a uniquely interpretable label (called a POS tag). There are many proposed methodologies for this task, such as Hidden Markov Models, Conditional Random Fields, Maximum Entropy classifiers etc. Such methods are primarily intended for English which, in comparison to highly inflectional languages has a relatively small tagset inventory. One of the well-known methods used for large tagset labeling (referred to as morpho-syntactic descriptors or MSDs) is called Tiered Tagging (Tufiş, 1999), (Tufiş and Dragomirescu, 2006) and it exploits a reduced set of tags from which context irrelevant features (e.g. gender information) which can be deduced trough the word form's flectional analysis are stripped. In our previous work we presented an alternative method to Tiered Tagging, in which we performed multi-class classification with a feed-forward neural network. Our methodology has the advantage that it does not require extensive linguistic knowledge as implied by the previously mentioned approach. We extend our work by testing our tool on Czech and successfully experimenting with a genetic algorithm designed to find a better network topology. © Springer-Verlag Berlin Heidelberg 2013.
CITATION STYLE
Boroş, T., & Dumitrescu, S. D. (2013). Improving the RACAI Neural Network MSD Tagger. In Communications in Computer and Information Science (Vol. 383 CCIS, pp. 42–51). Springer Verlag. https://doi.org/10.1007/978-3-642-41013-0_5
Mendeley helps you to discover research relevant for your work.