This paper presents a recurrent neural network (RNN) for part-of-speech (POS) tagging. The variation of RNN used is a Bidirectional Long Short-Term Memory architecture, which solves two crucial problems: the vanishing gradients phenomenon, which is architecturespecific, and the dependence of POS labels on sequential information both preceding and subsequent to them, which is task-specific. The approach is attractive compared to other machine learning approaches in that it does not require hand-crafted features or purposebuilt resources such as a morphological dictionary. The study presents preliminary results on the BulTreeBank corpus, with a tagset of 153 labels. One of its main contributions is the training of distributed word representations (word embeddings) against a large corpus of Bulgarian text. Another is complementing the word embedding input vectors with distributed morphological representations (suffix embeddings), which are shown to significantly improve the accuracy of the system.
CITATION STYLE
Popov, A. (2016). Deep learning architecture for part-of-speech tagging with word and suffix embeddings. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9883 LNAI, pp. 68–77). Springer Verlag. https://doi.org/10.1007/978-3-319-44748-3_7
Mendeley helps you to discover research relevant for your work.