Random-walk term weighting for improved text classification

Samer Hassan; Carmen Banea

Conference Proceedings

Random-walk term weighting for improved text classification

Proceedings of TextGraphs: The 1st Workshop on Graph-Based Methods for Natural Language Processing (2020) 53-60

DOI: 10.1142/9789819818525_0003

15Citations

116Readers

Get full text

Abstract

This paper describes a new approach for estimating term weights in a text classification task. The approach uses term co-occurrence as a measure of dependency between word features. A random walk model is applied on a graph encoding words and co-occurrence dependencies, resulting in scores that represent a quantification of how a particular word feature contributes to a given context. We argue that by modeling feature weights using these scores, as opposed to the traditional frequency-based scores, we can achieve better results in a text classification task. Experiments performed on four standard classification datasets show that the new random-walk based approach outperforms the traditional term frequency approach to feature weighting.

Cite

CITATION STYLE

APA

Hassan, S., & Banea, C. (2020). Random-walk term weighting for improved text classification. In Proceedings of TextGraphs: The 1st Workshop on Graph-Based Methods for Natural Language Processing (pp. 53–60). Association for Computational Linguistics. https://doi.org/10.1142/9789819818525_0003

Random-walk term weighting for improved text classification

Abstract

Cite

Register to see more suggestions