Imbalanced data commonly exists in real world, especially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like “but”, “though”, “while”, etc., and the head discourse and the tail discourse usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pre-trained attention-based model. Our method increases sample diversity in the first place, obtaining a expanded dataset with relatively low imbalanced-ratio, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.
CITATION STYLE
Zhang, T., Wu, X., Lin, M., Han, J., & Hu, S. (2019). Imbalanced Sentiment Classification Enhanced with Discourse Marker. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11730 LNCS, pp. 117–129). Springer Verlag. https://doi.org/10.1007/978-3-030-30490-4_11
Mendeley helps you to discover research relevant for your work.