Algorithms for minimum risk chunking

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Stochastic finite automata are useful for identifying sub-strings (chunks) within larger units of text. Relevant applications include tokenization, base-NP chunking, named entity recognition, and other information extraction tasks. For a given input string, a stochastic automaton represents a probability distribution over strings of labels encoding the location of chunks. For chunking and extraction tasks, the quality of predictions is evaluated in terms of precision and recall of the chunked/extracted phrases when compared against some gold standard. However, traditional methods for estimating the parameters of a stochastic finite automaton and for decoding the best hypothesis do not pay attention to the evaluation criterion, which we take to be the well-known F-measure. We are interested in methods that remedy this situation, both in training and decoding. Our main result is a novel algorithm for efficiently evaluating expected F-measure. We present the algorithm and discuss its applications for utility/risk-based parameter estimation and decoding.

Cite

CITATION STYLE

APA

Jansche, M. (2006). Algorithms for minimum risk chunking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4002 LNAI, pp. 97–109). Springer Verlag. https://doi.org/10.1007/11780885_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free