Abstract
The paper presents a methodology for training an event argument extraction system in a semi-supervised setting. We use Wikipedia and Wikidata to automatically obtain a small noisily labeled dataset and a large unlabeled dataset. The dataset consists of event clusters containingWikipedia pages in multiple languages. The unlabeled data is iteratively labeled using semi-supervised learning combined with probabilistic soft logic to infer the pseudo-label of each example from the predictions of multiple base learners. The proposed methodology is applied toWikipedia pages about earthquakes and terrorist attacks in a cross-lingual setting. Our experiments show improvement of the results when using the proposed methodology. The system achieves F1-score of 0:79 when only the automatically labeled dataset is used, and F1-score of 0:84 when trained according to the methodology with semi-supervised learning combined with probabilistic soft logic.
Author supplied keywords
Cite
CITATION STYLE
Zajec, P., & Mladenić, D. (2022). Using Semi-Supervised Learning andWikipedia to Train an Event Argument Extraction System. Informatica (Slovenia), 46(1), 121–128. https://doi.org/10.31449/inf.v46i1.3577
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.