Analyzing distillation process of hidden terms in web documents for IR

  • Pradeepa M
  • Deisy C
  • 2

    Readers

    Mendeley users who have this article in their library.
  • 0

    Citations

    Citations of this article.

Abstract

The previous work in web based applications such as mining web content, pattern recognition and similarity measures between the web documents. This paper is about, analyzing web documents in an enhanced way and delve the distillation web document will be the next pacc in hypertext mining. The sparse document is a very little data on the web, which may facc problems like different words with almost identical or similar meanings and sparscncss. Natural language processing (NLP) and information retrieval (IR) arc the main obstacles of the above problem. The mining of hidden terms discovers the search queries from large external datascts (universal datascts). It helps to handle unseen data in a better way. The goal of this web document mining consists of an efficient information finding, filtering information based on user query, and discovers more topic focused keywords based on the rich source of global information datascts. The proposed method we use the Distillation model, it is the integration of probabilistic generative model, Gibbs sampling algorithm and deployment method. This model can be applied for different natural languages and data domains for achieving the goal. © 2012 Published by Elsevier Ltd.

Author-supplied keywords

  • Gibbs sampler and clustering
  • Hidden terms
  • Latent dirichlet allocation (LDA)
  • Sparse data
  • Web mining

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • M. Pradeepa

  • C. Deisy

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free