Latent Semantic Analysis (LSA) is an automatic statistical method, for representing the meanings of words and text passages. An emerging body of evidence supports the reliability of LSA as a tool for assessing the semantic similarities between units of discourse. LSA has also proved to be comparable to human judgments of similarities in documents. Before analyzing a linguistic corpus composed by digitized documents, this tool acquires the mathematical representation of the texts. The main objective of this study was to analyze what properties (general, condensed, diversified, and base corpus) different linguistic corpora should have so that the assessment of the summaries carried out by the LSA is as similar as possible to the assessment made by four human raters. Three hundred and ninety Spanish middle and high school students (14-16 years old) and undergraduate students read a narrative text and later summarized it. Findings indicate that the size of the corpora need not be as general and as big as those used in Boulder (made up by millions of texts and over one million words), nor do they have to be too specific (fewer than 300 texts and 5000 words) for the assessment to be satisfactorily efficient.
CITATION STYLE
Olmos, R., León, J. A., Escudero, I., & Jorge-Botana, G. (2009). Análisis del tamaño y especificidad de los corpus en la evaluación de resúmenes mediante el LSA. Un análisis comparativo entre LSA y jueces expertos. Revista Signos, 42(69), 71–81. https://doi.org/10.4067/s0718-09342009000100004
Mendeley helps you to discover research relevant for your work.