Exploring feature selection technique in detecting sybil accounts in a social network

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Machine learning (ML) provides us the techniques to carve out meaningful insights into the useful information embedded in various datasets by making the machine learn from the datasets. There are different machine learning techniques available for various purposes. The general sequence of steps for a typical supervised machine learning technique includes preprocessing, feature selection, building the prediction model, testing and validating the model. Various ML techniques are being used to detect the presence of fake as well as spambot accounts on a number of Online Social Networks (OSNs). These fake/spambot accounts especially the Sybil accounts appear in these networks with malicious intentions to disrupt or highjack the very purpose of these networks. In this paper, we have trained various prediction models using appropriate real-time datasets to detect the presence of Sybil accounts on online social media. Since the data is collected from various sources; it necessitates the preprocessing of the dataset. The preprocessing has mainly been carried out for (a) removing the noise from this data and/or (b) normalizing values of various features. Next, three different feature selection techniques have been used for the selection of the optimal set of features from the superset of features so as to remove the features that are redundant and irrelevant in making accurate predictions. The three feature selection techniques used are Correlation Matrix with Heatmap, Feature Importance and Recursive Feature Elimination with Cross-Validation. Further, K-Nearest Neighbor (KNN), Random Forest (RF) and Support Vector Machine (SVM) classifiers have been deployed to train the proposed prediction models for predicting the presence of Sybil accounts in the OSN dataset. The performances of the proposed prediction models have been analyzed using six standard metrics. We conclude that the prediction model based on the Random Forest classifier provides the best results in predicting the presence of Sybil accounts in the dataset of an OSN.

Cite

CITATION STYLE

APA

Sharma, S., & Sood, M. (2021). Exploring feature selection technique in detecting sybil accounts in a social network. In Advances in Intelligent Systems and Computing (Vol. 1166, pp. 695–708). Springer. https://doi.org/10.1007/978-981-15-5148-2_61

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free