Filtering method for twitter streaming data using human-in-the-loop machine learning

Yu Suzuki

Journal ArticleOPEN ACCESS

Filtering method for twitter streaming data using human-in-the-loop machine learning

Suzuki Y

Journal of Information Processing (2019) 27 404-410

DOI: 10.2197/ipsjjip.27.404

15Citations

16Readers

Abstract

A large number of texts is posted daily on social media. However, only a small portion of these texts is informative for a specific purpose. For example, in order to collect a set of tweets for marketing strategy, we should collect a large number of tweets related to a specific topic with high accuracy. If we accurately filter the texts, we can continuously obtain fresh and useful information in real time. In a keyword-based approach, filters are constructed using keywords, but selecting the appropriate keywords is often tricky. In this work, we propose a method for filtering texts that are related to specific topics using a classification method that is based on crowdsourcing and machine learning. In our approach, we construct a text classifier using fastText and then annotate whether the tweets are related to the topics using crowdsourcing. For constructing an accurate classifier, we should prepare a large amount of learning data. However, this process is costly and time-consuming. To construct an accurate classifier using a small number of learning data, we consider two strategies for selecting tweets which the crowdsourcing participants should assess: Optimistic and pessimistic approach. Then, we reconstruct the text classifier using the annotated texts and classify them again. If we continue instigating this loop, the accuracy of the classifier will improve, and we will obtain useful information without having to specify the keywords. Experimental results demonstrate that our proposed system is adequate for filtering social media streams. Moreover, we discovered that the pessimistic approach is better than the optimistic approach.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Suzuki, Y. (2019). Filtering method for twitter streaming data using human-in-the-loop machine learning. Journal of Information Processing, 27, 404–410. https://doi.org/10.2197/ipsjjip.27.404

Readers' Seniority

PhD / Post grad / Masters / Doc 4

67%

Researcher 2

33%

Readers' Discipline

Computer Science 3

50%

Decision Sciences 1

17%

Business, Management and Accounting 1

17%

Engineering 1

17%

Article Metrics

Social Media

Shares, Likes & Comments: 25

View details >

Filtering method for twitter streaming data using human-in-the-loop machine learning

Abstract

Author supplied keywords

References Powered by Scopus

Bag of tricks for efficient text classification

A sequential algorithm for training text classifiers

Get another label? Improving data quality and data mining using multiple, noisy labelers

Cited by Powered by Scopus

TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter

Effects and mitigation of out-of-vocabulary in universal language models

Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics