In this paper, we present an approach, based on random indexing, to identify semantically related information that effectively disambiguate the user query and improves the retrieval efficiency of news documents. User query terms are expanded based on the terms with similar word senses that are discovered by implicitly considering the “associatedness” of the document context with that of the given query. This type of associatedness is guided by word space models, as described by Kanerva et al.(2000). The word-space model computes the meaning of the terms by implicitly utilizing the distributional patterns (contexts) of words collected over large text data. The distributional patterns represent semantic similarity between words in terms of their spatial proximity in the context space. In this space, words are represented by context vectors whose relative directions are assumed to indicate semantic similarity. Motivated by this distributional hypothesis, words with similar meanings are assumed to have similar contexts. For example, if we observe two words that constantly occur with the same context, we are justified in assuming that they mean similar things. Hence the word space methodology makes semantics computable and the underlying models do not require any linguistic or semantic expertise. Experimental results done on FIRE news collection show that the proposed approach effectively captures the term contexts using higher order term associations across the collection of news documents and use such information to assist the retrieval of documents.
CITATION STYLE
Prasath, R., Sarkar, S., & O’Reilly, P. (2014). RI for IR: Capturing term contexts using random indexing for comprehensive information retrieval. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8856, 104–112. https://doi.org/10.1007/978-3-319-13647-9_12
Mendeley helps you to discover research relevant for your work.