Sign up & Download
Sign in

Hierarchically Supervised Latent Dirichlet Allocation

by Nicholas Bartlett, Frank Wood, Adler Perotte
Advances in Neural Information Processing Systems 24 (2011)

Cite this document (BETA)

Available from www.stat.columbia.edu
Page 1
hidden

Hierarchically Supervised Latent Dirichlet Allocation

Hierarchically Supervised Latent Dirichlet Allocation
Adler Perotte Nicholas Bartlett Noe´mie Elhadad Frank Wood
Columbia University, New York, NY 10027, USA
fajp9009@dbmi,bartlett@stat,noemie@dbmi,fwood@statg.columbia.edu
Abstract
We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a
model for hierarchically and multiply labeled bag-of-word data. Examples of such
data include web pages and their placement in directories, product descriptions
and associated categories from product hierarchies, and free-text clinical records
and their assigned diagnosis codes. Out-of-sample label prediction is the primary
goal of this work, but improved lower-dimensional representations of the bag-
of-word data are also of interest. We demonstrate HSLDA on large-scale data
from clinical document labeling and retail product categorization tasks. We show
that leveraging the structure from hierarchical labels improves out-of-sample label
prediction substantially when compared to models that do not.
1 Introduction
There exist many sources of unstructured data that have been partially or completely categorized
by human editors. In this paper we focus on unstructured text data that has been, at least in part,
manually categorized. Examples include but are not limited to webpages and curated hierarchical
directories of the same [1], product descriptions and catalogs, and patient records and diagnosis
codes assigned to them for bookkeeping and insurance purposes. In this work we show how to
combine these two sources of information using a single model that allows one to categorize new
text documents automatically, suggest labels that might be inaccurate, compute improved similari-
ties between documents for information retrieval purposes, and more. The models and techniques
that we develop in this paper are applicable to other data as well, namely, any unstructured repre-
sentations of data that have been hierarchically classified (e.g., image catalogs with bag-of-feature
representations).
There are several challenges entailed in incorporating a hierarchy of labels into the model. Among
them, given a large set of potential labels (often thousands), each instance has only a small number
of labels associated to it. Furthermore, there are no naturally occurring negative labeling in the data,
and the absence of a label cannot always be interpreted as a negative labeling.
Our work operates within the framework of topic modeling. Our approach learns topic models of the
underlying data and labeling strategies in a joint model, while leveraging the hierarchical structure
of the labels. For the sake of simplicity, we focus on “is-a” hierarchies, but the model can be applied
to other structured label spaces. We extend supervised latent Dirichlet allocation (sLDA) [6] to
take advantage of hierarchical supervision. We propose an efficient way to incorporate hierarchical
information into the model. We hypothesize that the context of labels within the hierarchy provides
valuable information about labeling.
We demonstrate our model on large, real-world datasets in the clinical and web retail domains. We
observe that hierarchical information is valuable when incorporated into the learning and improves
our primary goal of multi-label classification. Our results show that a joint, hierarchical model
outperforms a classification with unstructured labels as well as a disjoint model, where the topic
model and the hierarchical classification are inferred independently of each other.
1
Page 2
hidden
Figure 1: HSLDA graphical model
The remainder of this paper is as follows. Section 2 introduces hierarchically supervised LDA
(HSLDA), while Section 3 details a sampling approach to inference in HSLDA. Section 4 reviews
related work, and Section 5 shows results from applying HSLDA to health care and web retail data.
2 Model
HSLDA is a model for hierarchically, multiply-labeled, bag-of-word data. We will refer to individual
groups of bag-of-word data as documents. Letwn;d 2  be the nth observation in the dth document.
Let wd = fw1;d; : : : ; w1;Ndg be the set ofNd observations in document d. Let there beD such doc-
uments and let the size of the vocabulary be V = jj. Let the set of labels be L =

l1; l2; : : : ; ljLj

.
Each label l 2 L, except the root, has a parent pa(l) 2 L also in the set of labels. We will for expo-
sition purposes assume that this label set has hard “is-a” parent-child constraints (explained later),
although this assumption can be relaxed at the cost of more computationally complex inference.
Such a label hierarchy forms a multiply rooted tree. Without loss of generality we will consider a
tree with a single root r 2 L. Each document has a variable yl;d 2 f1; 1g for every label which
indicates whether the label is applied to document d or not. In most cases yi;d will be unobserved,
in some cases we will be able to fix its value because of constraints on the label hierarchy, and in the
relatively minor remainder its value will be observed. In the applications we consider, only positive
labels are observed.
The constraints imposed by an is-a label hierarchy are that if the lth label is applied to document
d, i.e., yl;d = 1, then all labels in the label hierarchy up to the root are also applied to document d,
i.e., ypa(l);d = 1; ypa(pa(l));d = 1; : : : ; yr;d = 1: Conversely, if a label l0 is marked as not applying
to a document then no descendant of that label may be applied to the same. We assume that at least
one label is applied to every document. This is illustrated in Figure 1 where the root label is always
applied but only some of the descendant labelings are observed as having been applied (diagonal
hashing indicates that potentially some of the plated variables are observed).
In HSLDA, documents are modeled using the LDA mixed-membership mixture model with global
topic estimation. Label responses are generated using a conditional hierarchy of probit regressors.
The HSLDA graphical model is given in Figure 1. In the model, K is the number of LDA “topics”
(distributions over the elements of ), k is a distribution over “words,” d is a document-specific
distribution over topics, is a global distribution over topics, DirK() is a K-dimensional Dirichlet
distribution, NK() is the K-dimensional Normal distribution, IK is the K dimensional identity
matrix, 1d is the d-dimensional vector of all ones, and I() is an indicator function that takes the
value 1 if its argument is true and 0 otherwise. The following procedure describes how to generate
from the HSLDA generative model.
2

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

19 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
42% Ph.D. Student
 
21% Post Doc
 
11% Student (Master)
by Country
 
47% United States
 
11% Australia
 
11% China