An efficient data indexing approach on Hadoop using Java persistence API

Yang Lai; Shi Zhongzhi

Conference ProceedingsOPEN ACCESS

An efficient data indexing approach on Hadoop using Java persistence API

IFIP Advances in Information and Communication Technology (2010) 340 AICT 213-224

DOI: 10.1007/978-3-642-16327-2_27

2Citations

9Readers

Abstract

Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in the implementation of a KD-tree algorithm on Hadoop. An improved intersection algorithm for distributed data indexing on Hadoop is proposed, it performs O(M+logN), and is suitable for occasions of multiple intersections. We compare the data indexing algorithm on open dataset and synthetic dataset in a modest cloud environment. The results show the algorithms are feasible in large-scale data mining. © 2010 IFIP.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Lai, Y., & Zhongzhi, S. (2010). An efficient data indexing approach on Hadoop using Java persistence API. In IFIP Advances in Information and Communication Technology (Vol. 340 AICT, pp. 213–224). Springer Science and Business Media, LLC. https://doi.org/10.1007/978-3-642-16327-2_27

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 5

83%

Researcher 1

17%

Readers' Discipline

Computer Science 6

67%

Engineering 2

22%

Linguistics 1

11%

An efficient data indexing approach on Hadoop using Java persistence API

Abstract

Author supplied keywords

References Powered by Scopus

Multidimensional Binary Search Trees Used for Associative Searching

Cache performance and optimizations of blocked algorithms

K-d trees for semidynamic point sets

Cited by Powered by Scopus

DH-TRIE frequent pattern mining on Hadoop using JPA

Parallel PLS algorithm using MapReduce and its application in spectral modeling

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline