Abstract
This paper describes a new approach for estimating term weights in a text classification task. The approach uses term co-occurrence as a measure of dependency between word features. A random walk model is applied on a graph encoding words and co-occurrence dependencies, resulting in scores that represent a quantification of how a particular word feature contributes to a given context. We argue that by modeling feature weights using these scores, as opposed to the traditional frequency-based scores, we can achieve better results in a text classification task. Experiments performed on four standard classification datasets show that the new random-walk based approach outperforms the traditional term frequency approach to feature weighting.
Cite
CITATION STYLE
Hassan, S., & Banea, C. (2020). Random-walk term weighting for improved text classification. In Proceedings of TextGraphs: The 1st Workshop on Graph-Based Methods for Natural Language Processing (pp. 53–60). Association for Computational Linguistics. https://doi.org/10.1142/9789819818525_0003
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.