An automatic approach for efficient text segmentation

Keke Cai; Jiajun Bu; Chun Chen; Peng Huang

Conference Proceedings

An automatic approach for efficient text segmentation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4251 LNAI - I 417-424

DOI: 10.1007/11892960_51

1Citations

8Readers

Get full text

Abstract

This paper presents a domain-independent approach for partitioning text documents into a set of topic-coherent segment units, where the structure of segments reflects the patterns of sub-topics of the processed text document. The approach adopts similarity analyses, which is based on Shannon Information Theory, to determine topic distribution among text documents without incorporating thesaurus information and other auxiliary knowledge bases. It first observes the documents in terms of consistency of distribution from the viewpoint of individual word and then constructs a number of segmentation proposals accordingly. Furthermore, it employs the K-means clustering technique to get a consensus from these proposals and finally partition text into a set of topic coherent paragraphs. Through extensive experimental studies based on real and synthetic data sources, the performance analysis illustrates the effectiveness of the approach in text segmentation. © Springer-Verlag Berlin Heidelberg 2006.

Author supplied keywords

Cite

CITATION STYLE

APA

Cai, K., Bu, J., Chen, C., & Huang, P. (2006). An automatic approach for efficient text segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4251 LNAI-I, pp. 417–424). Springer Verlag. https://doi.org/10.1007/11892960_51

An automatic approach for efficient text segmentation

Abstract

Author supplied keywords

Cite

Register to see more suggestions