A data structure to speed-up machine learning algorithms on massive datasets

Francisco Padillo; J. M. Luna; Alberto Cano; Sebastián Ventura

Conference Proceedings

A data structure to speed-up machine learning algorithms on massive datasets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9648 365-376

DOI: 10.1007/978-3-319-32034-2_31

9Citations

12Readers

Get full text

Abstract

Data processing in a fast and efficient way is an important functionality in machine learning, especially with the growing interest in data storage. This exponential increment in data size has hampered traditional techniques for data analysis and data processing, giving rise to a new set of methodologies under the term Big Data. Many efficient algorithms for machine learning have been proposed, facing up time and main memory requirements. Nevertheless, this process could still become hard when the number of features or records is extremely high. In this paper, the goal is not to propose new efficient algorithms but a new data structure that could be used by a variety of existing algorithms without modifying their original schemata. Moreover, the proposed data structure enables sparse datasets to be massively reduced, efficiently processing the data input into a new data structure output. The results demonstrate that the proposed data structure is highly promising, reducing the amount of storage and improving query performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Padillo, F., Luna, J. M., Cano, A., & Ventura, S. (2016). A data structure to speed-up machine learning algorithms on massive datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9648, pp. 365–376). Springer Verlag. https://doi.org/10.1007/978-3-319-32034-2_31

A data structure to speed-up machine learning algorithms on massive datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions