Short text is one of the predominant forms of communication with unique characteristics such as short length, high sparsity, and lack of shared context and word co-occurrence. These characteristics distinguish short text from general text and make short text classification a challenging task. Term weighting is an important pre-processing step for text classification in the vector space model. In this paper, we propose three modifications to existing state-of-the-art term weighting schemes: ifn-tp-icf, RFR and modOR and a new term weighting scheme: ifn-modRF. We compare the proposed schemes with ten existing unsupervised and supervised schemes using three datasets of informally written short text: a self-labelled dataset of real-world events from Twitter, a Yahoo! questions dataset and a dataset of product reviews. Based on the experimental results using three popular classifiers, we observe that the proposed scheme ifn-modRF achieves the best F1-scores on the Twitter dataset, while the proposed modification modOR is a consistent performer with the best scores in most of the experiments. The proposed modification ifn-tp-icf also outperform the original scheme in most experiments.
CITATION STYLE
Samant, S. S., Bhanu Murthy, N. L., & Malapati, A. (2019). Improving Term Weighting Schemes for Short Text Classification in Vector Space Model. IEEE Access, 7, 166578–166592. https://doi.org/10.1109/ACCESS.2019.2953918
Mendeley helps you to discover research relevant for your work.