Latent-variable modeling of string transductions with finite-state methods

68Citations
Citations of this article
106Readers
Mendeley users who have this article in their library.

Abstract

String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling correction, pronunciation modeling and inflectional morphology. We present a conditional log-linear model for string-to-string transduction, which employs overlapping features over latent alignment sequences, and which learns latent classes and latent string pair regions from incomplete training data. We evaluate our approach on morphological tasks and demonstrate that latent variables can dramatically improve results, even when trained on small data sets. On the task of generating morphological forms, we outperform a baseline method reducing the error rate by up to 48%. On a lemmatization task, we reduce the error rates in Wicentowski (2002) by 38-92%. © 2008 Association for Computational Linguistics.

References Powered by Scopus

On the limited memory BFGS method for large scale optimization

6417Citations
N/AReaders
Get full text

Learning string-edit distance

630Citations
N/AReaders
Get full text

OpenFst: A general and efficient weighted finite-state transducer library

434Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Entity linking with a knowledge base: Issues, techniques, and solutions

610Citations
N/AReaders
Get full text

Conll-Sigmorphon 2017 shared task: Universal morphological reinflection in 52 languages

141Citations
N/AReaders
Get full text

Joint lemmatization and morphological tagging with LEMMING

91Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Dreyer, M., Smith, J. R., & Eisner, J. (2008). Latent-variable modeling of string transductions with finite-state methods. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (pp. 1080–1089). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1613715.1613856

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 40

63%

Researcher 15

23%

Professor / Associate Prof. 6

9%

Lecturer / Post doc 3

5%

Readers' Discipline

Tooltip

Computer Science 51

80%

Linguistics 8

13%

Engineering 3

5%

Agricultural and Biological Sciences 2

3%

Save time finding and organizing research with Mendeley

Sign up for free