The task of finding semantically related words from a text corpus has applications in - to name a few - lexicon induction, word sense disambiguation and information retrieval. The text data in real world, say from the World Wide Web, need not be grammatical. Hence methods relying on parsing or part-of-speech tagging will not perform well in these applications. Further even if the text is grammatically correct, for large corpora, these methods may not scale well. The task of building semantically related sets of words from a corpus of documents and allied problems have been studied extensively in the literature. Most of these techniques rely on the usage of part-of-speech or parse information. In this paper, we explore a less expensive method for finding semantically related words from a corpus without parsing or part-of-speech tagging to address the above problems. This work focuses on building sets of semantically related words from a corpus of documents using traditional data clustering techniques. We examine some key results and possible applications of this work. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Deepak, P., Rao, D., & Khemani, D. (2006). Building clusters of related words: An unsupervised approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4099 LNAI, pp. 474–483). Springer Verlag. https://doi.org/10.1007/978-3-540-36668-3_51
Mendeley helps you to discover research relevant for your work.