In this chapter, we remark on machine learning and Big Data with their sample applications, process, and commonly used machine learning techniques like classification and clustering. These techniques are used to explore, evaluate, and leverage data. Also, tools and techniques that can be used to develop machine learning schemes to learn from data (or, Big Data) will be discussed. In addition to this, the role of distributed computing platforms like Apache Spark in applying machine learning to Big Data will be presented in detail. Apache Spark is a general-purpose cluster computing framework which works on the principle of distributed processing. It is open-source software used for fast computing. On receiving data, it can immediately process it. Apache Spark deals with historical data using batch processing and real-time processing. Machine learning is a subfield of Artificial Intelligence. Its main focus is on learning models that can be learned by experience (which is data in the case of machines). For example, a machine learning model can learn to recognize an image of a Dog by being shown lots and lots of images of Dogs. In this chapter, we assume that a reader has a basic understanding of Machine Learning. Ongoing through this book chapter, readers will be able to: i. Machine learning with Big Data, characteristics, sources, and applications are discussed. ii. Understand the comparative working of Apache Spark. iii. Analyze the various types of problems to identify suitable techniques. iv. Develop models using open-source tools like Skill Network Lab and IBM cloud. v. Explore problems of Big Data using machine learning techniques with Apache Spark.
CITATION STYLE
Prajapati, G. L., & Raghuwanshi, R. (2021). Study of Big Data Analytics Tool: Apache Spark. In Big Data Analytics in Cognitive Social Media and Literary Texts: Theory and Praxis (pp. 65–100). Springer Nature. https://doi.org/10.1007/978-981-16-4729-1_4
Mendeley helps you to discover research relevant for your work.