Abstract
Textual documents are growing rapidly through the internet in today’s modern technology era. Electronic structured databases archive offline and online documents, e-mails, webpages, blog and social network posts. Without appropriate ranking and demand clustering when there is classification without any specifics, it is quite difficult to retain and access these documents. K-means is one of the methods that is frequently used for clustering. In terms of determining the proximity of meaning or semantics between data, the distance-based K-means method still has flaws. To get around this issue, semantic similarity can be estimated by measuring the level of similarity between objects in a cluster. This research provides a method for clustering documents based on semantic similarity. The approach is carried out by defining document synopses from the IMDB and Wikipedia databases using the NLTK dictionary, and we provide a semantic-based K-means clustering approach that assesses not only the similarity of the data represented as a vector space model with TFIDF, but also the semantic similarity of the data Precision, recall, and F-measure, we demonstrate how well the semantic-based K-means clustering technique works using experimental findings from the IMDB and Wikipedia top 100 movies datasets.
Author supplied keywords
Cite
CITATION STYLE
Salih, N. M. (2022). Semantic-Based K-Means Clustering for IMDB Top 100 Movies. Journal of Applied Science and Technology Trends, 3(2), 112–115. https://doi.org/10.38094/jastt302138
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.