Dealing with real application datasets often derive a stumbling block for machine learning algorithms to produce good results in solving either prediction or classification problems. Imbalance dataset is the major reason for this problem associated with missing values, small dimension of data size and very skewed data distribution. This paper demonstrates an empirical study that used Automated Machine Learning (AML) based on Genetic Programming (GP) named as AML TPOT. This is a very recent AML developed as an open source Python library and reported as a promising model by a few of researchers who have tested the algorithm. Nevertheless, most of the works on the AML TPOT were conducted on a set of common or benchmark datasets for machine learning testing. In this paper, the focus is on real and deviant dataset, which were collected according to the tax avoidance of the Government-Link Company in Malaysia. Comparison of the AML performances that tested on the dataset with different GP parameters setting is provided. Thus, this paper provides a fundamental knowledge on the experimental design and finding that will be useful for the AML based GP future improvement.
CITATION STYLE
Masrom, S., Rahman, R. A., Baharun, N., & Rahman, A. S. A. (2020). Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem. In ACM International Conference Proceeding Series (pp. 139–143). Association for Computing Machinery. https://doi.org/10.1145/3383923.3383942
Mendeley helps you to discover research relevant for your work.