Filtering method for twitter streaming data using human-in-the-loop machine learning

15Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

A large number of texts is posted daily on social media. However, only a small portion of these texts is informative for a specific purpose. For example, in order to collect a set of tweets for marketing strategy, we should collect a large number of tweets related to a specific topic with high accuracy. If we accurately filter the texts, we can continuously obtain fresh and useful information in real time. In a keyword-based approach, filters are constructed using keywords, but selecting the appropriate keywords is often tricky. In this work, we propose a method for filtering texts that are related to specific topics using a classification method that is based on crowdsourcing and machine learning. In our approach, we construct a text classifier using fastText and then annotate whether the tweets are related to the topics using crowdsourcing. For constructing an accurate classifier, we should prepare a large amount of learning data. However, this process is costly and time-consuming. To construct an accurate classifier using a small number of learning data, we consider two strategies for selecting tweets which the crowdsourcing participants should assess: Optimistic and pessimistic approach. Then, we reconstruct the text classifier using the annotated texts and classify them again. If we continue instigating this loop, the accuracy of the classifier will improve, and we will obtain useful information without having to specify the keywords. Experimental results demonstrate that our proposed system is adequate for filtering social media streams. Moreover, we discovered that the pessimistic approach is better than the optimistic approach.

References Powered by Scopus

Bag of tricks for efficient text classification

2150Citations
N/AReaders
Get full text

A sequential algorithm for training text classifiers

1918Citations
N/AReaders
Get full text

Get another label? Improving data quality and data mining using multiple, noisy labelers

952Citations
N/AReaders
Get full text

Cited by Powered by Scopus

TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter

29Citations
N/AReaders
Get full text

Effects and mitigation of out-of-vocabulary in universal language models

6Citations
N/AReaders
Get full text

Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction

4Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Suzuki, Y. (2019). Filtering method for twitter streaming data using human-in-the-loop machine learning. Journal of Information Processing, 27, 404–410. https://doi.org/10.2197/ipsjjip.27.404

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

67%

Researcher 2

33%

Readers' Discipline

Tooltip

Computer Science 3

50%

Decision Sciences 1

17%

Business, Management and Accounting 1

17%

Engineering 1

17%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 25

Save time finding and organizing research with Mendeley

Sign up for free