Sequence-to-set semantic tagging for complex query reformulation and automated text categorization in biomedical IR using self-attention

3Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.

Abstract

Novel contexts, comprising a set of terms referring to one or more concepts, may often arise in complex querying scenarios such as in evidence-based medicine (EBM) involving biomedical literature. These may not explicitly refer to entities or canonical concept forms occurring in a fact-based knowledge source, e.g. the UMLS ontology. Moreover, hidden associations between related concepts meaningful in the current context, may not exist within a single document, but across documents in the collection. Predicting semantic concept tags of documents can therefore serve to associate documents related in unseen contexts, or categorize them, in information filtering or retrieval scenarios. Thus, inspired by the success of sequence-to-sequence neural models, we develop a novel sequence-to-set framework with attention, for learning document representations in a unique unsupervised setting, using no human-annotated document labels or external knowledge resources and only corpus-derived term statistics to drive the training. This can effect term transfer within a corpus for semantically tagging a large collection of documents. Our sequence-to-set modeling approach to predict semantic tags, gives to the best of our knowledge, the state-of-theart for both, an unsupervised query expansion (QE) task for the TREC CDS 2016 challenge dataset when evaluated on an Okapi BM25- based document retrieval system; and also over the MLTM system baseline (Soleimani and Miller, 2016), for both supervised and semi-supervised multi-label prediction tasks with del.icio.us and Ohsumed datasets. We make our code and data publicly available

Cite

CITATION STYLE

APA

Das, M., Li, J., Fosler-Lussier, E., Lin, S., Rust, S., Huang, Y., & Ramnath, R. (2020). Sequence-to-set semantic tagging for complex query reformulation and automated text categorization in biomedical IR using self-attention. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 14–27). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.bionlp-1.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free