A probe on document clustering methodologies and its performance metrics

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Due to the huge growth of internet usage, large volume of information flow has also been increased, which leads to the problem of information congestion. In unsupervised learning, clustering is consider as most important problem. Big quality, high dimensionality and complicated semantics are the difficult issue of document clustering.it focus on the way of identifying a structure from an unlabeled data collection. A cluster is a method in which the data items are identified and grouped based on the resemblance between the objects from a dissimilar object set. Decision of a good cluster, can be demonstrated that there is no absolute “best” criterion independent of the final objective of the clustering. A good document clustering scheme’s primary objective is to minimize intra-cluster distance between papers while maximizing inter-cluster distance(using a suitable document distance measure).A distance measure(or, dually, measure of resemblance)is therefore at the core of document clustering. This assessment gives an implication about the different methods(Vector Space Model, Latent Sematic Indexing, Latent Dirichlet Allocation, Singular Value Decomposition, Doc2Vec Model, Graph model), distance measures(Euclidean Distance, Cosine Similarity, Jaccard Coefficient, Pearson Correlation Coefficient)and evaluation parameters of document clustering. This work is theoretical in nature and aims to corner the overall procedure of document clustering.

Cite

CITATION STYLE

APA

Kalpana, P., & Tamije Selvy, P. (2019). A probe on document clustering methodologies and its performance metrics. International Journal of Recent Technology and Engineering, 8(2), 2938–2942. https://doi.org/10.35940/ijrteB2624.078219

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free