Enhancing Information Preservation in Social Media Text Analytics Using Advanced and Robust Pre-processing Techniques

Shah M. Emaduddin; Rafi Ullah; Ibtesam Mazahir; Muhammad Zain Uddin

Journal ArticleOPEN ACCESS

Enhancing Information Preservation in Social Media Text Analytics Using Advanced and Robust Pre-processing Techniques

International Journal of Media and Information Literacy (2022) 7(1) 60-70

DOI: 10.13187/ijmil.2022.1.60

1Citations

10Readers

Abstract

Data mining has become an essential element of today's information world. Different industries and sources daily produce a huge amount of data. When it comes to textual analysis, internet users produce a large amount of data in the form of Twitter Tweets, updates, posts, and comments from Facebook and blogs, short messages, and emails. Analysis of such data will give more valuable information and insights about the studied subject but the problem with social media text is that it is availbel in very raw form. Social media users usually do not produce text in a particular format required by analytics algorithms. Social Media text contains usually miss-spelt words, links, and hash-tags, mentioning people, word/phrase short forms, word elongations, emotional symbols, and many other raw forms. When available text pre-processing techniques (tokenization, lower case, stemming, lemmatization, stop word removals, and normalization) are applied to this raw and un-cleaned data, the removal of many words/phrases results in information loss or information modification. Hence, the curse of data dimensionality vanished and make it difficult to get as much as possible insights from data. We have proposed some advance and robust pre-processing techniques used to increase information preservation from social media text while preserving the semantics of data remain the same.

Author supplied keywords

Cite

CITATION STYLE

APA

Emaduddin, S. M., Ullah, R., Mazahir, I., & Uddin, M. Z. (2022). Enhancing Information Preservation in Social Media Text Analytics Using Advanced and Robust Pre-processing Techniques. International Journal of Media and Information Literacy, 7(1), 60–70. https://doi.org/10.13187/ijmil.2022.1.60

Enhancing Information Preservation in Social Media Text Analytics Using Advanced and Robust Pre-processing Techniques

Abstract

Author supplied keywords

Cite

Register to see more suggestions