Big data can be generally characterised by 5 Vs—Volume, Velocity, Variety, Veracity and Variability. Many studies have been focused on using machine learning as a powerful tool of big data processing. In machine learning context, learning algorithms are typically evaluated in terms of accuracy, efficiency, interpretability and stability. These four dimensions can be strongly related to veracity, volume, variety and variability and are impacted by both the nature of learning algorithms and characteristics of data. This chapter analyses in depth how the quality of computational models can be impacted by data characteristics as well as strategies involved in learning algorithms. This chapter also introduces a unified framework for control of machine learning tasks towards appropriate employment of algorithms and efficient processing of big data. In particular, this framework is designed to achieve effective selection of data pre-processing techniques towards effective selection of relevant attributes, sampling of representative training and test data, and appropriate dealing with missing values and noise. More importantly, this framework allows the employment of suitable machine learning algorithms on the basis of the training data provided from the data pre-processing stage towards building of accurate, efficient and interpretable computational models.
CITATION STYLE
Liu, H., Gegov, A., & Cocea, M. (2017). Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data. In Studies in Big Data (Vol. 24, pp. 123–140). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-53474-9_6
Mendeley helps you to discover research relevant for your work.