Categorical Big Data Processing

Jaime Salvador-Meneses; Zoila Ruiz-Chavez; Jose Garcia-Rodriguez

Conference Proceedings

Categorical Big Data Processing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11314 LNCS 245-252

DOI: 10.1007/978-3-030-03493-1_26

1Citations

2Readers

Get full text

Abstract

Prior to the application of a machine learning algorithm, the information has to be stored in memory which may consumes big memory amounts. If we reduce the amount of memory used to represent datasets, we can reduce the number of operations required to process it. All the libraries used to represent the information make a traditional representation (vector, matrix for example), which force to iterate over the whole dataset to obtain a result. In this paper we present a technique to process categorical data that was previously encoded in blocks of arbitrary size, the method process the data block by block which can reduces the number of iterations over the original dataset, and at the same time, the performance is similar to the traditional processing of the data. This method also requires the data to be stored in memory but in an encoded way that optimize the memory size consumed for the representation as well as the operations required to process it. The results of the experiments carried out show a slightly lower time processing than the obtained with traditional implementations, which allows us to obtain a good performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Salvador-Meneses, J., Ruiz-Chavez, Z., & Garcia-Rodriguez, J. (2018). Categorical Big Data Processing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11314 LNCS, pp. 245–252). Springer Verlag. https://doi.org/10.1007/978-3-030-03493-1_26

Categorical Big Data Processing

Abstract

Author supplied keywords

Cite

Register to see more suggestions