A Two-sample Kolmogorov-Smirnov-like test for big data

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Exploratory data analysis (EDA) is an important component of modern data analysis and data mining. The Big Data setting has made many traditional and useful EDA tools impractical and ineffective. Among such useful tools is the two-sample Kolmogorov-Smirnov (TS-KS) goodness-of-fit (GoF) test for assessing whether or not two samples arose from the same population. A TS-KS like testing procedure is constructed using chunked and averaged (CA) estimation paradigm. The procedure is named the TS-CAKS GoF test. Distributed and streamed implementations of the TS-CAKS procedure are discussed. The consistency of the TS-CAKS test is proved. A numerical study is provided to demonstrate the effectiveness and computational efficiency of the procedure.

Cite

CITATION STYLE

APA

Nguyen, H. D. (2018). A Two-sample Kolmogorov-Smirnov-like test for big data. In Communications in Computer and Information Science (Vol. 845, pp. 89–106). Springer Verlag. https://doi.org/10.1007/978-981-13-0292-3_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free