An analysis of the proportion of feature subsampling on XG boost - A case study of claim prediction in car insurance

Wafíyatul Khusna; Hendri Murfí

Conference ProceedingsOPEN ACCESS

An analysis of the proportion of feature subsampling on XG boost - A case study of claim prediction in car insurance

AIP Conference Proceedings (2020) 2296

DOI: 10.1063/5.0031366

1Citations

13Readers

Abstract

Claim prediction is one of the important elements in the insurance. The increasing frequency of claim makes the data volume also increases to become big data. So, we need the right machine learning method to help insurance companies manage big data more efficiently. XGBoost is a machine learning model based on decision trees. XGBoost can be applied for claim prediction case in the form of two-class or multi-class classification. We may select a subset of features in building the XGBoost model especially for data with a large number of features. In this paper, we examine the influence of the proportion of features on the accuracy of the XGBoost model. Our simulations show that by randomly using 1/5 of features, the XGBoost model can produce accuracy comparable to the model that uses all features. It means that the XGBoost model is scalable in terms of the proportion of features.

Cite

CITATION STYLE

APA

Khusna, W., & Murfí, H. (2020). An analysis of the proportion of feature subsampling on XG boost - A case study of claim prediction in car insurance. In AIP Conference Proceedings (Vol. 2296). American Institute of Physics Inc. https://doi.org/10.1063/5.0031366

An analysis of the proportion of feature subsampling on XG boost - A case study of claim prediction in car insurance

Abstract

Cite

Register to see more suggestions