Probabilistic models of text and link structure for hypertext classification

Lise Getoor; Eran Segal; Ben Taskar; Daphne Koller

Journal Article

Probabilistic models of text and link structure for hypertext classification

Getoor L
Segal E
Taskar B
et al.

IJCAI workshop on text learning: beyond supervision (2001) 24-29

N/ACitations

35Readers

Abstract

Most text classification methods treat each document as an independent instance. However, in many text domains, documents are linked and the topics of linked documents are correlated. For example, web pages of related topics are often connected by hyperlinks and scientific papers from related fields are commonly linked by citations. We propose a unified probabilistic model for both the textual content and the link structure of a document collection. Our model is based on the recently introduced framework of Probabilistic Relational Models (PRMs), which allows us to capture correlations between linked documents. We show how to learn these models from data and use them efficiently for classification. Since exact methods for classification in these large models are intractable, we utilize belief propagation, an approximate inference algorithm. Belief propagation automatically induces a very natural behavior, where our knowledge about one document helps us classify related ones, which in turn help us classify others. We present preliminary empirical results on a dataset of university web pages. 1

Cite

CITATION STYLE

APA

Getoor, L., Segal, E., Taskar, B., & Koller, D. (2001). Probabilistic models of text and link structure for hypertext classification. IJCAI Workshop on Text Learning: Beyond Supervision, 24–29. Retrieved from http://homes.cs.washington.edu/~taskar/pubs/ijcai01-ws.pdf

Probabilistic models of text and link structure for hypertext classification

Abstract

Cite

Register to see more suggestions