Gradient Boosted Decision Tree Algorithms for Medicare Fraud Detection

52Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Employing Machine Learning algorithms to identify health insurance fraud is an application of Artificial Intelligence in Healthcare. Insurance fraud spuriously inflates the cost of Healthcare. Therefore, it could limit or even deny patients necessary care and treatment. We use Medicare claims data as input to various algorithms to gauge their performance in fraud detection. The claims data contain categorical features, some of which have thousands of possible values. To the best of our knowledge, this is the first study on using CatBoost and LightGBM to encode categorical data for Medicare fraud detection. We show that CatBoost attains better performance in the task of Medicare fraud detection than other algorithms, attaining a mean AUC value of 0.77452. At a 99% confidence level (with p value 0), our analysis shows that this result is significantly better than the mean AUC value of 0.76132 that LightGBM yields. A second contribution we make is to show that when we include an additional categorical feature (Healthcare provider state), CatBoost yields a mean AUC value of 0.88245, which is also significantly better than the mean AUC value of 0.85137 that LightGBM yields. Our empirical evidence clearly indicates CatBoost is a better alternative to other classifiers for Medicare fraud detection, especially when incorporating categorical features.

Cite

CITATION STYLE

APA

Hancock, J. T., & Khoshgoftaar, T. M. (2021). Gradient Boosted Decision Tree Algorithms for Medicare Fraud Detection. SN Computer Science, 2(4). https://doi.org/10.1007/s42979-021-00655-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free