The purpose of this study is to analyze tobacco spending in Georgia using various machine learning methods applied to a sample of 10,757 households from the Integrated Household Survey collected by GeoStat in 2016. Previous research has shown that smoking is the leading cause of death for 35–69 year olds. In addition, tobacco expenditures may constitute as much as 17% of the household budget. Five different algorithms (ordinary least squares, random forest, two gradient boosting methods and deep learning) were applied to 8,173 households (or 76.0%) in the train set. Out-of-sample predictions were then obtained for 2,584 remaining households in the test set. Under the default settings, a random forest algorithm showed the best performance with more than 10% improvement in terms of root-mean-square error (RMSE). Improved accuracy and availability of machine learning tools in R calls for active use of these methods by policy makers and scientists in health economics, public health and related fields.
CITATION STYLE
Obrizan, M., Torosyan, K., & Pignatti, N. (2019). Tobacco spending in Georgia: Machine learning approach. In Advances in Intelligent Systems and Computing (Vol. 836, pp. 103–114). Springer Verlag. https://doi.org/10.1007/978-3-319-97885-7_11
Mendeley helps you to discover research relevant for your work.