CUNY BLENDER TAC-KBP2011 Temporal Slot Filling System Description
Abstract
In this paper we describe the participation of the CUNY-BLENDER team in the Temporal Slot Filling (TSF) pilot task organized as part of the TAC-KBP2010 evaluation. Our team submitted results for both the diagnostic and full TSF subtasks, obtaining the top score in the diagnostic subtask. We implemented a structured and a flat approach to the classification of temporal ex- pressions. The structured approach captures long syntactic contexts surrounding the query entity, slot fill and temporal expression using a dependency path kernel tailored to this task. The flat approach exploits information such as the lexical context and shallow dependen- cy features. In order to provide enough training data for these classifiers we used a distant supervision approach to automatically generate a large amount of training instances from the We- b. This data was further refined by apply- ing logistic regression models for instance re- labeling and feature selection methods.
CUNY BLENDER TAC-KBP2011 Temporal Slot Filling System Description
Javier Artiles, Qi Li, Taylor Cassidy, Suzanne Tamang and Heng Ji
Computer Science Department and Linguistics Department
Queens College and Graduate Center
City University of New York
New York, NY 11367, USA
fjavart, hengjicunyg@gmail.com
Abstract
In this paper we describe the participation of
the CUNY-BLENDER team in the Temporal
Slot Filling (TSF) pilot task organized as part
of the TAC-KBP2010 evaluation. Our team
submitted results for both the “diagnostic” and
“full” TSF subtasks, obtaining the top score in
the diagnostic subtask.
We implemented a “structured” and a “flat”
approach to the classification of temporal ex-
pressions. The structured approach captures
long syntactic contexts surrounding the query
entity, slot fill and temporal expression using
a dependency path kernel tailored to this task.
The flat approach exploits information such
as the lexical context and shallow dependen-
cy features.
In order to provide enough training data for
these classifiers we used a distant supervision
approach to automatically generate a large
amount of training instances from the We-
b. This data was further refined by apply-
ing logistic regression models for instance re-
labeling and feature selection methods.
1 Introduction
This paper presents the CUNY-BLENDER partici-
pation in the KBP2011 Temporal Slot Filling (TSF)
pilot task (Ji et al., 2011).
Our approach to the TSF task was to reformulate
it as two problems: the classification of temporal ex-
pressions and the aggregation of the resulting tem-
poral information. Classification is applied to iden-
tify the role of temporal expressions that appear in
the context of a particular entity and attribute val-
ue. For instance, in “Harry married Sally in 1995”
a classifier should determine that “1995” indicates
the beginning of the attribute spouse. Given the out-
put of this classification the TSF system proceeds
to aggregate the available temporal information and
provide a final answer. We developed and tested t-
wo approaches to the temporal classification prob-
lem: a structured approach and a flat approach. The
structured approach captures long syntactic contexts
surrounding the query entity, slot fill and temporal
expression using a dependency path kernel tailored
to this task. The flat approach exploits surface lex-
ical context and shallow dependency features. For
the aggregation of the temporal information we rely
on a simple but effective iterative algorithm.
Given the expensive nature of human-assessed
training data for this task we used distant supervi-
sion to acquire large amounts of annotated data from
the Web without human intervention. We explored
the reduction of the feature space to speed up train-
ing and eliminate noisy or unnecessary features. Ad-
ditionally we tested the impact of relabeling training
instances based a small set of hand labeled data.
The rest of this paper is structured as follows.
Section 2 briefly summarizes the task definition and
scoring metric. The different components of our TS-
F system are described in Section 3. In Section 4 we
present the distant supervision approach used to ob-
tain training data for the classification of temporal
expressions. In Section 5 we include the results ob-
tained on the KBP2011 TSF test data. Related work
is described in Section 7. Finally we provide con-
clusions and future work plans in Section 8.
The TSF task is best characterized as an extension
of the existing KBP regular Slot Filling task. Slot
Filling aims at, given an entity and a large documen-
t collection, extracting values for attributes such as
employee, spouse, member, etc. The TSF task focus-
es on the subset of these attributes whose value may
change over time. Systems take as input an entity,
slot type and slot value as well as the source doc-
ument where the slot value was found. In the “di-
agnostic” subtask correct slot values were provided,
while in the “full” subtask participants were required
to run their own Slot Filling system. The output ex-
pected from the systems is a start/end date for each
entity/attribute pair.
The KBP2011 temporal representation model
consists of a 4-tuple whose elements are dates (day,
month and year), < t1; t2; t3; t4 >. A tuple repre-
sents the set of possible beginnings and endings of
an event. t1 and t3 represent the lower and upper
bounds, respectively, for the beginning of the event,
while t2 and t4 represent the lower and upper bounds
for end of the event. This allows the representation
of different temporal granularities. For instance one
might only know that an event began on a certain
year, and in that case t1 will be set to the first day of
that year and t2 to the last day.
Given an entity name Jose Padilha, its slot fill
Film Maker for the slot type per:title, a diagnostic
temporal slot filling system may discover a temporal
tuple< 1; 2007 12 26; 2007 12 26;+1 >
to represent the temporal boundaries.
The official scoring metricQ(S) for the task com-
pares a system’s output S =< t1; t2; t3; t4 > against
a gold standard tuple Sg =< g1; g2; g3; g4 >, based
on the absolute distances between ti and gi:
Q(S) =
1
4
X
i
1
1 + jti gij
When there is no constraint on t1 or t3 a value of
-1 is assigned; similarly a value of +1 is assigned
to an unconstrained t3 or t4.
Let fG1; G2; :::; GNg be the set of gold standard
tuples, fS1; S2; :::; SMg the set of system output tu-
ples. For each unique slot fill i, there is the 4-tuple
Gi :=< g1; g2; g3; g4 >, and Sj :=< t1; t2; t3; t4 >.
Then Precision, Recall and F-measure scores are
calculated as follows:
Precision =
P
Si2C(S)Q(S
i)
M
Recall =
P
Si2C(S)Q(S
i)
N
F1 =
2 Precision Recall
Precision+Recall
Where C(S) is the set of all instances in system
output which have correct slot filling answers, and
Q(S) is the quality value of S. In the diagnostic
task, precision, recall, and F 1 values are the same
since we are provided with correct slot filling values
as part of the system input.
3 System Overview
In Figure 1 we summarize our system pipeline. Each
relevant source document is fully processed using
the NLP Core Stanford toolkit (Finkel et al., 2005) to
tokenize, segment sentences, detect named entities,
build a coreference chain and analyze the syntactic
dependencies within sentences.
Note that in the diagnostic TSF subtask slot val-
ues and their corresponding source documents are
provided by the organizers and are known to be cor-
rect. The full TSF subtask, on the other hand, re-
quires participants to run their own Slot Filling (SF)
system to obtain the slot values associated with each
entity in the KBP source document collection. In the
full task we search the source collection using each
query name and its slot value to find documents re-
lated to the query in addition to those that support
the SF output. This set of related documents is aug-
mented using A Lucene index to search for the top
10 most relevant documents in the KBP source col-
lection containing each entity/slot fill pair found by
our SF system (Chen et al., 2010).
The first application of this annotation is to find
sentences that mention both the entity and the slot
value. String matching only provides very limited
coverage and so we use named entity recognition
and coreference results to expand this set of relevant
sentences. We apply those coreference chains that
contain the provided slot value or entity name to se-
lect sentences that mention both.
Our next step is to represent each temporal ex-
pression in the context of the entity and slot value as
a classification instance. For example, the following
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



