Abstract
Big data analytics has been the focus for large scale data processing. Machine learning and Big data has great future in prediction. Churn prediction is one of the sub domain of big data. Preventing customer attrition especially in telecom is the advantage of churn prediction. Churn prediction is a day-to-day affair involving millions. So a solution to prevent customer attrition can save a lot. This paper propose to do comparison of three machine learning techniques Decision tree algorithm, Random Forest algorithm and Gradient Boosted tree algorithm using Apache Spark. Apache Spark is a data processing engine used in big data which provides inmemory processing so that the processing speed is higher. The analysis is made by extracting the features of the data set and training the model. Scala is a programming language that combines both object oriented and functional programming and so a powerful programming language. The analysis is implemented using Apache Spark and modelling is done using scala ML. The accuracy of Decision tree model came out as 86%, Random Forest model is 87% and Gradient Boosted tree is 85%.
Author supplied keywords
Cite
CITATION STYLE
Malleswari, M., Maniraj, R., Kumar, P., & Murugan. (2018). Comparative analysis of machine learning techniques to Identify churn for telecom data. International Journal of Engineering and Technology(UAE), 7(3.34 Special Issue 34), 291–295. https://doi.org/10.14419/ijet.v7i3.2.14422
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.