Improving text segmentation using latent semantic analysis: A reanalysis of choi, wiemer-hastings, and moore (2001)

Yves Bestgen

Journal ArticleOPEN ACCESS

Improving text segmentation using latent semantic analysis: A reanalysis of choi, wiemer-hastings, and moore (2001)

Bestgen Y

Computational Linguistics (2006) 32(1) 5-12

DOI: 10.1162/coli.2006.32.1.5

34Citations

83Readers

Abstract

Choi, Wiemer-Hastings, and Moore (2001) proposed to use Latent Semantic Analysis (LSA) to extract semantic knowledge from corpora in order to improve the accuracy of a text segmentation algorithm. By comparing the accuracy of the very same algorithm, depending on whether or not it takes into account complementary semantic knowledge, they were able to show the benefit derived from such knowledge. In their experiments, semantic knowledge was, however, acquired from a corpus containing the texts to be segmented in the test phase. If this hyper-specificity of the LSA corpus explains the largest part of the benefit, one may wonder if it is possible to use LSA to acquire generic semantic knowledge that can be used to segment new texts. The two experiments reported here show that the presence of the test materials in the LSA corpus has an important effect, but also that the generic semantic knowledge derived from large corpora clearly improves the segmentation accuracy. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Bestgen, Y. (2006). Improving text segmentation using latent semantic analysis: A reanalysis of choi, wiemer-hastings, and moore (2001). Computational Linguistics, 32(1), 5–12. https://doi.org/10.1162/coli.2006.32.1.5

Improving text segmentation using latent semantic analysis: A reanalysis of choi, wiemer-hastings, and moore (2001)

Abstract

Cite

Register to see more suggestions