Multi-dataset-multi-task neural sequence tagging for information extraction from tweets

11Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multi-task learning is effective in reducing the required data for learning a task, while ensuring competitive accuracy with respect to single task learning. We study effectiveness of multi-datasetmulti-task learning in training neural models for four sequence tagging tasks for Twitter data, namely, part of speech (POS) tagging, chunking, super sense tagging, and named entity recognition (NER). We utilize - 7 POS, 10 NER, 1 Chunking, and 2 super sense - tagged publicly available datasets. We use a multi-dataset-multi-task neural model based on pre-trained contextual text embeddings and compare it against single-dataset-single-task, and multi-datasetsingle-task models. Even within a task, the tagging schemes may differ across datasets. The model learns using this tagging diversity across all datasets for a task. The models are more effective compared to single data/task models, leading to significant improvements for POS (1-2% acc., 7 datasets), NER (1-10% F1, 9 datasets), and chunking (4%). For super sense tagging there is 2% improvement in F1 for out of domain data. Our models and tools can be found at https://socialmediaie.github.io/.

Cite

CITATION STYLE

APA

Mishra, S. (2019). Multi-dataset-multi-task neural sequence tagging for information extraction from tweets. In HT 2019 - Proceedings of the 30th ACM Conference on Hypertext and Social Media (pp. 283–284). Association for Computing Machinery, Inc. https://doi.org/10.1145/3342220.3344929

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free