Integrating DBMS and parallel data mining algorithms for modern many-core processors

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Relational DBMSs (RDBMSs) remain the most popular tool for processing structured data in data intensive domains. However, most of stand-alone data mining packages process flat files outside a RDBMS. In-database data mining avoids export-import data/results bottleneck as opposed to use stand-alone mining packages and keeps all the benefits provided by a RDBMS. The paper presents an approach to data mining inside a RDBMS based on a parallel implementation of user-defined functions (UDFs). Such an approach is implemented for PostgreSQL and modern Intel MIC (Many Integrated Core) architecture. The UDF performs a single mining task on data from the specified table and produces a resulting table. The UDF is organized as a wrapper of an appropriate mining algorithm, which is implemented in C language and is parallelized by the OpenMP technology and thread-level parallelism. The heavy-weight parts of the algorithm are additionally parallelized by intrinsic functions for MIC platforms to reach the optimal loop vectorization manually. The library of such UDFs supports a cache of precomputed mining structures to reduce costs of further computations. In the experiments, the proposed approach shows good scalability and overtakes R data mining package.

Cite

CITATION STYLE

APA

Rechkalov, T., & Zymbler, M. (2018). Integrating DBMS and parallel data mining algorithms for modern many-core processors. In Communications in Computer and Information Science (Vol. 822, pp. 230–245). Springer Verlag. https://doi.org/10.1007/978-3-319-96553-6_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free