SMAD: Text Classification of Arabic Social Media Dataset for News Sources

2Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Abstract—Due to the advances in technology, social media has become the most popular means for the propagation of news. Many news items are published on social media like Facebook, Twitter, Instagram, etc. but are not categorized into various different domains, such as politics, education, finance, art, sports, and health. Thus, text classification is needed to classify the news into different domains to reduce the huge amount of news available over social media, reduce time and effort for recognizing the category or domain, and present data to improve the searching process. Most existing datasets don’t follow pre-processing and filtering processes and aren’t organized based on classification standards to be ready for use. Thus, the Arabic Natural Processing Language (ANLP) phases will be used to pre-process, normalize, and categorize the news into the right domain. This paper proposes an Arabic Social Media Dataset (SMAD) for text classification purposes over the social media using ANLP steps. The SMAD dataset consists of 15,240 Arabic news items categorized over the Facebook social network. The experimental results illustrate that the SMAD corpus gives accuracy of about 98% in five domains (Art, Education, Health, Politics, and Sport). The SMAD dataset has been trained tested and is ready for use.

Cite

CITATION STYLE

APA

Gaber, A. M., Gaber, A. M., & Moussa, H. (2021). SMAD: Text Classification of Arabic Social Media Dataset for News Sources. International Journal of Advanced Computer Science and Applications, 12(10), 508–516. https://doi.org/10.14569/IJACSA.2021.0121058

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free