Coresets can be described as a compact subset such that models trained on coresets will also provide a good fit with models trained on full data set. By using coresets, we can scale down a big data to a tiny one in order to reduce the computational cost of a machine learning problem. In recent years, data scientists have investigated various methods to create coresets. The two state-of-the-art algorithms have been proposed in 2018 are ProTraS by Ros & Guillaume and Lightweight Coreset by Bachem et al. In this paper, we briefly introduce these two algorithms and make a comparison between them to find out the benefits and drawbacks of each one.
CITATION STYLE
Hoang, N. L., Dang, T. K., & Trang, L. H. (2019). A Comparative Study of the Use of Coresets for Clustering Large Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11814 LNCS, pp. 45–55). Springer. https://doi.org/10.1007/978-3-030-35653-8_4
Mendeley helps you to discover research relevant for your work.