Sign up & Download
Sign in

Learning Probabilistic Models of Link Structure

by Lise Getoor, Nir Friedman, Daphne Koller, Benjamin Taskar
Journal of Machine Learning Research ()

Abstract

Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with "flat" data representations, forcing us to convert our data into a form that loses much of the link structure. The recently introduced framework of probabilistic relational models (PRMs) embraces the object-relational nature of structured data by capturing probabilistic interactions between attributes of related entities. In this paper, we extend this framework by modeling interactions between the attributes and the link structure itself. An advantage of our approach is a unified generative model for both content and relational structure. We propose two mechanisms for representing a probabilistic distribution over link structures: reference uncertainty and existence uncertainty. We describe the appropriate conditions for using each model and present learning algorithms for each. We present experimental results showing that the learned models can be used to predict link structure and, moreover, the observed link structure can be used to provide better predictions for the attributes in the model.

Cite this document (BETA)

Available from www.crossref.org
Page 1
hidden

Learning Probabilistic Models of ...

Journal of Machine Learning Research 3 (2002) 679-707 Submitted 12/01 Published 12/02 Learning Probabilistic Models of Link Structure Lise Getoor GETOOR@CS.UMD.EDU Computer Science Dept. and UMIACS University of Maryland College Park, MD 20742 Nir Friedman NIR@CS.HUJI.AC.IL School of Computer Sci. & Eng. Hebrew University Jerusalem, 91904, Israel Daphne Koller KOLLER@CS.STANFORD.EDU Computer Science Dept. Stanford University Stanford, CA 94305 Benjamin Taskar BTASKAR@CS.STANFORD.EDU Computer Science Dept. Stanford University Stanford, CA 94305 Abstract Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with ���flat��� data representations, forcing us to convert our data into a form that loses much of the link structure. The recently introduced framework of probabilistic relational models (PRMs) embraces the object-relational nature of structured data by capturing probabilistic interactions be- tween attributes of related entities. In this paper, we extend this framework by modeling interactions between the attributes and the link structure itself. An advantage of our approach is a unified gener- ative model for both content and relational structure. We propose two mechanisms for representing a probabilistic distribution over link structures: reference uncertainty and existence uncertainty. We describe the appropriate conditions for using each model and present learning algorithms for each. We present experimental results showing that the learned models can be used to predict link structure and, moreover, the observed link structure can be used to provide better predictions for the attributes in the model. Keywords: Probabilistic Relational Models, Bayesian Networks, Relational Learning 1. Introduction In recent years, we have witnessed an explosion in the amount of information that is available to us in digital form. More and more data is being stored, and more and more data is being made accessible, through traditional interfaces such as corporate databases and, of course, via the Internet c 2002 Lise Getoor, Nir Friedman, Daphne Koller and Benjamin Taskar.
Page 2
hidden
GETOOR, FRIEDMAN, KOLLER AND TASKAR and the World Wide Web. There is much to be gained by applying machine learning techniques to these data, in order to extract useful information and patterns. Most often, the objects in these data do not exist in isolation ��� there are ���links��� or relationships that hold between them. For example, there are links from one web page to another, a scientific paper cites another paper, and an actor is linked to a movie by the appearance relationship. Most work in machine learning, however, has focused on ���flat��� data, where the instances are independent and identically distributed. The main exception to this rule has been the work on inductive logic programming (Muggleton, 1992, Lavra�� c and D�� zeroski, 1994). This link of work focuses on the problem of inferring link structure from a logical perspective. More recently, there has been a growing interest in combining the statistical approaches that have been so successful when learning from a collection of homogeneous independent instances (typi- cally represented as a single table in a relational database) with relational learning methods. Slattery and Craven (1998) were the first to consider combining a statical approach with a first-order rela- tional learner (FOIL) for the task of web page classification. Chakrabarti et al. (1998) explored methods for hypertext classifications which used both the content of the current page and informa- tion from related pages. Popescul et al. (2002) use an approach that uses a relational learner to guide in the construction of features to be used by a (statistical) propositional learner. Yang et al. (2002) identify certain categories of relational regularities and explore the conditions under which they can be exploited to improve classification accuracy. Here, we propose a unified statistical framework for content and links. Our framework builds on the recent work on probabilistic relational models (PRMs) (Poole, 1993, Ngo and Haddawy, 1995, Koller and Pfeffer, 1998). PRMs extend the standard attribute-based Bayesian network representa- tion to incorporate a much richer relational structure. These models allow properties of an entity to depend probabilistically on properties of other related entities. The model represents a generic de- pendence for a class of objects, which is then instantiated for particular sets of entities and relations between them. Friedman et al. (1999) adapt machinery for learning Bayesian networks from a set of unrelated homogeneous instances to the task of learning PRMs from structured relational data. The original PRM framework focused on modeling the distribution over the attributes of the ob- jects in the model. It took the relational structure itself ��� the relational links between entities ��� to be background knowledge, determined outside the probabilistic model. This assumption implies that the model cannot be used to predict the relational structure itself. A more subtle yet very impor- tant point is that the relational structure is informative in and of itself. For example, the links from and to a web page are very informative about the type of web page (Craven et al., 1998), and the citation links between papers are very informative about paper topics (Cohn and Hofmann, 2001). The PRM framework can be naturally extended to address this limitation. By making links first- class citizens in the model, the PRM language easily allows us to place a probabilistic model directly over them. In other words, we can extend our framework to define probability distributions over the presence of relational links between objects in our model. The concept of a probabilistic model over relational structure was introduced by Koller and Pfeffer (1998) under the name structural uncertainty. They defined several variants of structural uncertainty, and presented algorithms for doing probabilistic inference in models involving structural uncertainty. In this paper, we show how a probabilistic model of relational structure can be learned directly from data. Specifically, we provide two simple probabilistic models of link structure: The first is an extension of the reference uncertainty model of Koller and Pfeffer (1998), which makes it 680
Page 3
hidden
PROBABILISTIC MODELS OF LINK STRUCTURE suitable for a learning framework the second is a new type of structural uncertainty, called existence uncertainty. We present a clear semantics for these extensions, and propose a method for learning such models from a relational database. We present empirical results on real-world data, showing that these models can be used to predict the link structure, as well as use the presence of (observed) links in the model to provide better predictions about attribute values. Interestingly, these benefits are obtained even with the very simple models of link structure that we propose in this paper. Thus, even simplistic models of link uncertainty provide us with increased predictive accuracy. 2. Probabilistic Relational Models A probabilistic relational model (PRM) specifies a template for a probability distribution over a database. The template describes the relational schema for the domain, and the probabilistic depen- dencies between attributes in the domain. A PRM, together with a particular database of objects and relations, defines a probability distribution over the attributes of the objects and the relations. 2.1 Relational Schema A schema S for a relational model describes a set of classes, X = X1,..., Xn. Each class is associated with a set of descriptive attributes and a set of reference slots.1 The set of descriptive attributes of a class X is denoted A(X ). Attribute A of class X is denoted X .A, and its domain of values is denoted V (X .A). For example, the Actor class might have the descriptive attributes Gender, with domain {male, female}. For simplicity, we assume in this paper that attribute domains are finite this is not a fundamental limitation of our approach. The set of reference slots of a class X is denoted R (X ). We use X .�� to denote the reference slot �� of X . Each reference slot �� is typed: the domain type of Dom[��] = X and the range type Range[��] = Y , where Y is some class in X . A slot �� denotes a function from Dom[��] = X to Range[��] = Y . For example, we might have a class Role with the reference slots Actor, whose range is the class Actor, and Movie, whose range is the class Movie. For each reference slot ��, we can define an inverse slot ��-1, which is interpreted as the inverse function of ��. For example, we can define an inverse slot for the Actor slot of Role and call it Roles. Note that this is not a one-to-one relation, but returns a set of Role objects. Finally, we define the notion of a slot chain, which allows us to compose slots, defining functions from objects to other objects to which they are indirectly related. More precisely, we define a slot chain �� = ��1,..., ��k to be a sequence of slots (inverse or otherwise) such that for all i, Range[��i] = Dom[��i+1]. We say that Range[��] = Range[��k]. We note that the functional nature of slots does not prevent us from having many-to-many relations between classes. We simply use a standard transformation where we introduce a class corresponding to the relationship object. This class will have an object for every tuple of related objects. Each instance of the relationship has functional reference slots to the objects it relates. The Role class above is an example of this transformation. It is useful to distinguish between an entity and a relationship, as in entity-relationship diagrams. In our language, classes are used to represent both entities and relationships. Thus, a relationship such as Role, which relates actors to movies, is also represented as a class, with reference slots to the class Actor and the class Movie. We use XE to denote the set of classes that represent entities, 1. There is a direct mapping between our notion of class and the tables in a relational database: descriptive attributes correspond to standard table attributes, and reference slots correspond to foreign keys (key attributes of another table). 681
Page 4
hidden
GETOOR, FRIEDMAN, KOLLER AND TASKAR ACTOR name gender fred male ginger female bing male MOVIE name genre m1 drama m2 comedy ROLE role movie actor role-type r1 m1 fred hero r2 m1 ginger heroine r3 m1 bing villain r4 m2 bing hero r5 m2 ginger love-interest Figure 1: An instantiation of the relational schema for a simple movie domain. and XR to denote those that represent relationships. We use the generic term object to refer both to entities and to relationships. The semantics of this language is straightforward. An complete instantiation I specifies the set of objects in each class X , and the values for each attribute and each reference slot of each object. Thus, a complete instantiation I is a set of objects with no missing values and no dangling references. It describes the set of objects, the relationships that hold between the objects and all the values of the attributes of the objects. For example, Figure 1 shows an instantiation of our simple movie schema. It specifies a particular set of actors, movies and roles, along with values for each of their attributes and references. As discussed in the introduction, our goal in this paper is to construct probabilistic models over instantiations. To do so, we need to provide enough background knowledge to circumscribe the set of possible instantiations. Friedman et al. (1999) assume that the entire relational structure is given as background knowledge. More precisely, they assume that they are given a relational skeleton, ��r, which specifies the set of objects in all classes, as well as all the relationships that hold between them in other words, it specifies the values for all of the reference slots. In our simple movie example, the relational skeleton would contain all of the information except for the gender of the actors, the genre of the movies, and the nature of the role. 2.2 Probabilistic Model for Attributes A probabilistic relational model �� specifies a probability distribution over a set of instantiations I of the relational schema. More precisely, given a relational skeleton ��r, it specifies a distribution over all complete instantiations I that extend the skeleton ��r. A PRM consists of a qualitative dependency structure, G, and the parameters associated with it, ��G . The dependency structure is defined by associating with each attribute X .A a set of formal parents Pa(X .A). These correspond to formal parents they will be instantiated in different ways for different objects in X . Intuitively, the parents are attributes that are ���direct influences��� on X .A. The attribute X .A can depend on another probabilistic attribute B of X . It can also depend on attributes of related objects. X .��.B, where �� is a slot chain. In the case where X .��.B is not single-valued and (possibly) refers a multi-set of objects, we use an aggregate function and define a dependence on the computed aggregate value. This is described in greater detail in Getoor (2001). 682
Page 5
hidden
PROBABILISTIC MODELS OF LINK STRUCTURE The quantitative part of the PRM specifies the parameterization of the model. Given a set of parents for an attribute, we can define a local probability model by associating with it a conditional probability distribution (CPD). For each attribute we have a CPD that specifies P(X .A | Pa(X .A)). Each CPD in our PRM is legal, i.e., the entries are positive and sum to 1. Definition 1 A probabilistic relational model (PRM) �� for a relational schema S defines for each class X ��� X and each descriptive attribute A ��� A(X ), a set of formal parents Pa(X .A), and a conditional probability distribution (CPD) that represents P(X .A | Pa(X .A)). Given a relational skeleton ��r, a PRM �� specifies a distribution over a set of instantiations I consistent with ��r. This specification is done by mapping the dependencies in the class-level PRM to the actual objects in the domain. For a class X , we use ��r(X ) to denote the objects X , as specified by the relational skeleton ��r. (In general we will use the notation ��(X ) to refer to the set objects of each class as defined by any type of domain skeleton.) Let X .A be an attribute in the schema, and let x be some object in ��r(X ). The PRM allows two types of formal parents: X .B and X .��.C. For a formal parent of the form X .B, the corresponding actual parent of x.A is x.B. For a formal parent of the form X .��.C, the corresponding formal parent of x.A is y.C, where y ��� x.�� in ��r. Thus, the class-level dependencies in the PRM are instantiated according to the relational skeleton, to define object-level dependencies. The parameters specified by the PRM are used for each object in the skeleton, in the obvious way. Thus, for a given skeleton, the PRM basically defines a ground Bayesian network. The qualitative structure of the network is defined via an instance dependency graph G��r , whose nodes correspond to descriptive attributes x.A of entities in the skeleton. These are the random variables in our model. We have a directed edge from y.B to x.A if y.B is an actual parent of x.A, as defined above. The quantitative parameters of the network are defined by the CPDs in the PRM, with the same CPD used multiple times in the network. This ground Bayesian network leads to the following chain rule which defines a distribution over the instantiations compatible with our particular skeleton ��r: P(I | ��r, ��) = ��� X���X ��� x�����r (X ) ��� A���A (X ) P(x.A | Pa(x.A)) (1) For this definition to specify a coherent probability distribution over instantiations, we must ensure that our probabilistic dependencies are acyclic. In particular, we must verify that each random variable x.A does not depend, directly or indirectly, on its own value. In other words, G��r must be acyclic. We say that a dependency structure G is acyclic relative to a relational skeleton ��r if the directed graph G��r is acyclic. Theorem 2 (Friedman et al., 1999) Let �� be a PRM with an acyclic instance dependency graph, and let ��r be a relational skeleton. Then, Eq. (1) defines a coherent distributions over instances that extend ��r. The definition of the instance dependency graph is specific to the particular skeleton at hand: the existence of an edge from y.B to x.A depends on whether y ��� x.��, which in turn depends on the interpretation of the reference slots. Thus, it allows us to determine the coherence of a PRM only relative to a particular relational skeleton. When we are evaluating different possible PRMs as part of our learning algorithm, we want to ensure that the dependency structure G we choose results in coherent probability models for any skeleton. 683

Readership Statistics

82 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
32% Ph.D. Student
 
15% Researcher (at an Academic Institution)
 
13% Post Doc
by Country
 
32% United States
 
5% France
 
5% Italy

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in