Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms. We present such an approach, which employs a neural network to smoothly model a prior distribution over feature bundles (even rare ones). Although this basic model does not consider a token's context, that very property allows it to operate on a simple list of unigram type counts, partitioning each count among different analyses of that unigram. We discuss evaluation metrics for this novel task and report results on 5 languages.
CITATION STYLE
Cotterell, R., Kirov, C., Mielke, S. J., & Eisner, J. (2018). Unsupervised disambiguation of syncretism in inflected lexicons. In NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference (Vol. 2, pp. 548–553). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n18-2087
Mendeley helps you to discover research relevant for your work.