Term similarity and weighting framework for text representation

Sadiq Sani; Nirmalie Wiratunga; Stewart Massie; Robert Lothian

Conference Proceedings

Term similarity and weighting framework for text representation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6880 LNAI 304-318

DOI: 10.1007/978-3-642-23291-6_23

4Citations

10Readers

Get full text

Abstract

Expressiveness of natural language is a challenge for text representation since the same idea can be expressed in many different ways. Therefore, terms in a document should not be treated independently of one another since together they help to disambiguate and establish meaning. Term-similarity measures are often used to improve representation by capturing semantic relationships between terms. Another consideration for representation involves the importance of terms. Feature selection techniques address this by using statistical measures to quantify term usefulness for retrieval. In this paper we present a framework that combines term-similarity and weighting for text representation. This allows us to comparatively study the impact of term similarity, term weighting and any synergistic effect that may exist between them. Study of term similarity is based on approaches that exploit term co-occurrences within document and sentence contexts whilst term weighting uses the popular Chi-squared test. Our results on text classification tasks show that the combined effect of similarity and weighting is superior to each technique independently and that this synergistic effect is obtained regardless of co-occurrence context granularity. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Sani, S., Wiratunga, N., Massie, S., & Lothian, R. (2011). Term similarity and weighting framework for text representation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6880 LNAI, pp. 304–318). https://doi.org/10.1007/978-3-642-23291-6_23

Term similarity and weighting framework for text representation

Abstract

Cite

Register to see more suggestions