This paper investigates the use of batch and incremental classifierssuch as logistic regression, neural networks, C5, naive bayesupdateable, IBk (instance-based learner, k nearest neighbour) and racedincremental logit boost to obtain the best classifier to be used forimproving the predictive accuracy of consumers' credit card risk of abank in Malaysia. Prior to generating all the models for comparison, theinitial set of data is also loaded into an ETL (extraction,transformation, loading) system developed to perform feature selectionor attribute relevancy analysis using ID3 algorithm, compiling a subsetof data with the highest information gain and gain ratio. An extendedtest is performed to use equal length binning on some attributes to findif it affects the relevancy of each attribute. The selected subset ofdata of 24 months is used to generate various data mining models usingdifferent training and testing sizes and binning sizes. C5 emergedconsistently as the technique that have generated the best models withan average predictive accuracy as high as 94.68%. Sample sizes,equal-length binning sizes and training and testing sizes are all shownto have an effect on accuracy in different intensity.
CITATION STYLE
Ling, K. S., & Teh, Y. W. (2013). A comparative study of data mining techniques in predicting consumers credit card risk in banks. African Journal of Business Management, 5(20), 8307–8312. https://doi.org/10.5897/ajbm11.476
Mendeley helps you to discover research relevant for your work.