Supervised named entity recognition systems often suffer from training data inadequacy when deal with domain specific corpora, e.g., documents in medical and healthcare. For these domains, obtaining some seed words or phrases is not very difficult. Then, some positive instances obtained through distant supervision based on the seeds can be used to learn recognition models. However, with the limited size of training samples and no negative ones, the classifying results may not be satisfying. In this paper, we leverage the conjunction and comma writing style as the list constraint to enlarge the set of training instances. Different from earlier studies, we formulate two kinds of constraints, namely, soft list constraint and mention constraint, as regularizers. We then incorporate the constraints to a unified discriminative learning framework and propose a joint optimization algorithm. The experimental results show that our model is superior than state-of-the-art baselines on a large collection of documents about drugs.
CITATION STYLE
Tu, H., Ma, Z., Sun, A., Xu, Z., & Wang, X. (2017). Entity recognition by distant supervision with soft list constraint. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10604 LNAI, pp. 681–694). Springer Verlag. https://doi.org/10.1007/978-3-319-69179-4_48
Mendeley helps you to discover research relevant for your work.