Text clustering is a well known technique for improving quality in information retrieval, In Today’s real world data is not organized in the essential manner for a precise mining, given a large unstructured text document collection it is essential to organize into clusters of related documents. It is a contemporary challenge to explore compact and meaning insights from large collections of the unstructured text documents. Although many frequent item mining algorithms have been discovered yet most do not scale for “Big Data” and also takes more processing time. This paper presents a high scalable speedy and efficient map reduce based augmented clustering algorithm based on bivariate n-gram frequent item to reduce high dimensionality and derive high quality clusters for Big Text documents and also the comparative analysis is shown for the sample text datasets with stop word removal the proposed algorithm performs better than without stop word removal.
CITATION STYLE
Kanimozhi, K. V., & Venkatesan, M. (2018). A novel map-reduce based augmented clustering algorithm for big text datasets. In Advances in Intelligent Systems and Computing (Vol. 542, pp. 427–436). Springer Verlag. https://doi.org/10.1007/978-981-10-3223-3_41
Mendeley helps you to discover research relevant for your work.