Employing Machine Learning algorithms to identify health insurance fraud is an application of Artificial Intelligence in Healthcare. Insurance fraud spuriously inflates the cost of Healthcare. Therefore, it could limit or even deny patients necessary care and treatment. We use Medicare claims data as input to various algorithms to gauge their performance in fraud detection. The claims data contain categorical features, some of which have thousands of possible values. To the best of our knowledge, this is the first study on using CatBoost and LightGBM to encode categorical data for Medicare fraud detection. We show that CatBoost attains better performance in the task of Medicare fraud detection than other algorithms, attaining a mean AUC value of 0.77452. At a 99% confidence level (with p value 0), our analysis shows that this result is significantly better than the mean AUC value of 0.76132 that LightGBM yields. A second contribution we make is to show that when we include an additional categorical feature (Healthcare provider state), CatBoost yields a mean AUC value of 0.88245, which is also significantly better than the mean AUC value of 0.85137 that LightGBM yields. Our empirical evidence clearly indicates CatBoost is a better alternative to other classifiers for Medicare fraud detection, especially when incorporating categorical features.
CITATION STYLE
Hancock, J. T., & Khoshgoftaar, T. M. (2021). Gradient Boosted Decision Tree Algorithms for Medicare Fraud Detection. SN Computer Science, 2(4). https://doi.org/10.1007/s42979-021-00655-z
Mendeley helps you to discover research relevant for your work.