Data-driven approaches for information structure identification

8Citations
Citations of this article
85Readers
Mendeley users who have this article in their library.

Abstract

This paper investigates automatic identification of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guidelines. We present the performance of decision trees (C4.5), maximum entropy, and rule induction (RIPPER) classifiers on all tectogrammatical nodes. We compare the results against a baseline system that always assigns f(ocus) and against a rule-based system. The best system achieves an accuracy of 90.69%, which is a 44.73% improvement over the baseline (62.66%). © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Postolache, O., Kruijff-Korbayová, I., & Kruijff, G. J. M. (2005). Data-driven approaches for information structure identification. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 9–16). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220575.1220577

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free