Part-of-speech (POS) tagging for morphologically rich languages normally requires the use of handcrafted features that encapsulate clues about the language’s morphology. In this work, we tackle Portuguese POS tagging using a deep neural network that employs a convolutional layer to learn character-level representation of words. We apply the network to three different corpora: the original Mac-Morpho corpus; a revised version of the Mac-Morpho corpus; and the Tycho Brahe corpus. Using the proposed approach, while avoiding the use of any handcrafted feature, we produce state-of-the-art POS taggers for the three corpora: 97.47% accuracy on the Mac-Morpho corpus; 97.31% accuracy on the revised Mac-Morpho corpus; and 97.17% accuracy on the Tycho Brahe corpus. These results represent an error reduction of 12.2%, 23.6% and 15.8%, respectively, on the best previous known result for each corpus.
CITATION STYLE
dos Santos, C. N., & Zadrozny, B. (2014). Training State-of-the-Art Portuguese POS Taggers without Handcrafted Features (pp. 82–93). https://doi.org/10.1007/978-3-319-09761-9_8
Mendeley helps you to discover research relevant for your work.