Clustering has become an important tool for every data scientist as it allows to perform exploratory data analysis and summarize large amounts of data. Specifically for text data, clustering faces other challenges derived from the high-dimensional space into which the data is represented. Furthermore and in spite of the fact that important contributions have already been made, scalability presents an important challenge when the whole-data-in-memory approach is no longer valid for real scenarios where data is collected in massive volumes. This chapter reviews the recent contributions on high-dimensional text data clustering with particular emphasis on scalability issues and also on the impact of the curse of dimensionality over the distance-based clustering methods.
CITATION STYLE
Zamora, J. (2017). Recent advances in high-dimensional clustering for text data. In Studies in Fuzziness and Soft Computing (Vol. 349, pp. 323–337). Springer Verlag. https://doi.org/10.1007/978-3-319-48317-7_20
Mendeley helps you to discover research relevant for your work.