Sequence-to-set semantic tagging for complex query reformulation and automated text categorization in biomedical IR using self-attention

Manirupa Das; Juanxi Li; Eric Fosler-Lussier; Simon Lin; Steve Rust; Yungui Huang; Rajiv Ramnath

Conference ProceedingsOPEN ACCESS

Sequence-to-set semantic tagging for complex query reformulation and automated text categorization in biomedical IR using self-attention

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2020) 14-27

DOI: 10.18653/v1/2020.bionlp-1.2

3Citations

66Readers

Abstract

Novel contexts, comprising a set of terms referring to one or more concepts, may often arise in complex querying scenarios such as in evidence-based medicine (EBM) involving biomedical literature. These may not explicitly refer to entities or canonical concept forms occurring in a fact-based knowledge source, e.g. the UMLS ontology. Moreover, hidden associations between related concepts meaningful in the current context, may not exist within a single document, but across documents in the collection. Predicting semantic concept tags of documents can therefore serve to associate documents related in unseen contexts, or categorize them, in information filtering or retrieval scenarios. Thus, inspired by the success of sequence-to-sequence neural models, we develop a novel sequence-to-set framework with attention, for learning document representations in a unique unsupervised setting, using no human-annotated document labels or external knowledge resources and only corpus-derived term statistics to drive the training. This can effect term transfer within a corpus for semantically tagging a large collection of documents. Our sequence-to-set modeling approach to predict semantic tags, gives to the best of our knowledge, the state-of-theart for both, an unsupervised query expansion (QE) task for the TREC CDS 2016 challenge dataset when evaluated on an Okapi BM25- based document retrieval system; and also over the MLTM system baseline (Soleimani and Miller, 2016), for both supervised and semi-supervised multi-label prediction tasks with del.icio.us and Ohsumed datasets. We make our code and data publicly available

Cite

CITATION STYLE

APA

Das, M., Li, J., Fosler-Lussier, E., Lin, S., Rust, S., Huang, Y., & Ramnath, R. (2020). Sequence-to-set semantic tagging for complex query reformulation and automated text categorization in biomedical IR using self-attention. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 14–27). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.bionlp-1.2

Sequence-to-set semantic tagging for complex query reformulation and automated text categorization in biomedical IR using self-attention

Abstract

Cite

Register to see more suggestions