A hybrid technique for health insurance fraud detection on highly imbalanced dataset

2Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Health Insurance industry is producing a massive amount of heterogeneous data. Detecting fraud from these data is a challenging task. Highly imbalanced data causes huge challenge to the Insurance Data Analysis. Classification of imbalanced data is a critical issue faced by the fraud detection methodologies. Fraud only covers less than 10% of the whole data. In this study, we use highly imbalanced data and propose a hybrid method for fixing class imbalance problem by using a combination of SMOTE, Cross Validation, and Random Forest. We used Medicare data, which will be applied to various sampling techniques, and further a classification model was built. We observed that SMOTE with Random forest with cross validation produced excellent results. Our model should be capable of identifying all the relevant(fraud) instances, i.e., the model should have a high recall value. SMOTE with Random forest had average recall of 86% and an overall accuracy of 90%, which could be considered as good among the existing models.

Cite

CITATION STYLE

APA

Shamitha, S. K., & Ilango, V. (2019). A hybrid technique for health insurance fraud detection on highly imbalanced dataset. International Journal of Innovative Technology and Exploring Engineering, 8(11), 3498–3501. https://doi.org/10.35940/ijitee.K2489.0981119

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free