Latent-variable modeling of string transductions with finite-state methods

Markus Dreyer; Jason R. Smith; Jason Eisner

Conference ProceedingsOPEN ACCESS

Latent-variable modeling of string transductions with finite-state methods

EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (2008) 1080-1089

DOI: 10.3115/1613715.1613856

68Citations

106Readers

Abstract

String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling correction, pronunciation modeling and inflectional morphology. We present a conditional log-linear model for string-to-string transduction, which employs overlapping features over latent alignment sequences, and which learns latent classes and latent string pair regions from incomplete training data. We evaluate our approach on morphological tasks and demonstrate that latent variables can dramatically improve results, even when trained on small data sets. On the task of generating morphological forms, we outperform a baseline method reducing the error rate by up to 48%. On a lemmatization task, we reduce the error rates in Wicentowski (2002) by 38-92%. © 2008 Association for Computational Linguistics.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Dreyer, M., Smith, J. R., & Eisner, J. (2008). Latent-variable modeling of string transductions with finite-state methods. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (pp. 1080–1089). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1613715.1613856

Readers' Seniority

PhD / Post grad / Masters / Doc 40

63%

Researcher 15

23%

Professor / Associate Prof. 6

Lecturer / Post doc 3

Readers' Discipline

Computer Science 51

80%

Linguistics 8

13%

Engineering 3

Agricultural and Biological Sciences 2

Latent-variable modeling of string transductions with finite-state methods

Abstract

References Powered by Scopus

On the limited memory BFGS method for large scale optimization

Learning string-edit distance

OpenFst: A general and efficient weighted finite-state transducer library

Cited by Powered by Scopus

Entity linking with a knowledge base: Issues, techniques, and solutions

Conll-Sigmorphon 2017 shared task: Universal morphological reinflection in 52 languages

Joint lemmatization and morphological tagging with LEMMING

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline