Exploratory data analysis (EDA) is an important component of modern data analysis and data mining. The Big Data setting has made many traditional and useful EDA tools impractical and ineffective. Among such useful tools is the two-sample Kolmogorov-Smirnov (TS-KS) goodness-of-fit (GoF) test for assessing whether or not two samples arose from the same population. A TS-KS like testing procedure is constructed using chunked and averaged (CA) estimation paradigm. The procedure is named the TS-CAKS GoF test. Distributed and streamed implementations of the TS-CAKS procedure are discussed. The consistency of the TS-CAKS test is proved. A numerical study is provided to demonstrate the effectiveness and computational efficiency of the procedure.
CITATION STYLE
Nguyen, H. D. (2018). A Two-sample Kolmogorov-Smirnov-like test for big data. In Communications in Computer and Information Science (Vol. 845, pp. 89–106). Springer Verlag. https://doi.org/10.1007/978-981-13-0292-3_6
Mendeley helps you to discover research relevant for your work.