SMAD: Text Classification of Arabic Social Media Dataset for News Sources

Amira M. Gaber; Amira M. Gaber; Hanan Moussa

Journal ArticleOPEN ACCESS

SMAD: Text Classification of Arabic Social Media Dataset for News Sources

International Journal of Advanced Computer Science and Applications (2021) 12(10) 508-516

DOI: 10.14569/IJACSA.2021.0121058

2Citations

16Readers

Abstract

Abstract—Due to the advances in technology, social media has become the most popular means for the propagation of news. Many news items are published on social media like Facebook, Twitter, Instagram, etc. but are not categorized into various different domains, such as politics, education, finance, art, sports, and health. Thus, text classification is needed to classify the news into different domains to reduce the huge amount of news available over social media, reduce time and effort for recognizing the category or domain, and present data to improve the searching process. Most existing datasets don’t follow pre-processing and filtering processes and aren’t organized based on classification standards to be ready for use. Thus, the Arabic Natural Processing Language (ANLP) phases will be used to pre-process, normalize, and categorize the news into the right domain. This paper proposes an Arabic Social Media Dataset (SMAD) for text classification purposes over the social media using ANLP steps. The SMAD dataset consists of 15,240 Arabic news items categorized over the Facebook social network. The experimental results illustrate that the SMAD corpus gives accuracy of about 98% in five domains (Art, Education, Health, Politics, and Sport). The SMAD dataset has been trained tested and is ready for use.

Author supplied keywords

Cite

CITATION STYLE

APA

Gaber, A. M., Gaber, A. M., & Moussa, H. (2021). SMAD: Text Classification of Arabic Social Media Dataset for News Sources. International Journal of Advanced Computer Science and Applications, 12(10), 508–516. https://doi.org/10.14569/IJACSA.2021.0121058

SMAD: Text Classification of Arabic Social Media Dataset for News Sources

Abstract

Author supplied keywords

Cite

Register to see more suggestions