Many recent Natural Language Processing (NLP) task formulations, such as question answering and fact verification, are implemented as a two-stage cascading architecture. In the first stage an IR system retrieves "relevant'' documents containing the knowledge, and in the second stage an NLP system performs reasoning to solve the task. Optimizing the IR system for retrieving relevant documents ensures that the NLP system has sufficient information to operate over. These recent NLP task formulations raise interesting and exciting challenges for IR, where the end-user of an IR system is not a human with an information need, but another system exploiting the documents retrieved by the IR system to perform reasoning and address the user information need. Among these challenges, as we will show, is that noise from the IR system, such as retrieving spurious or irrelevant documents, can negatively impact the accuracy of the downstream reasoning module. Hence, there is the need to balance maximizing relevance while minimizing noise in the IR system. This paper presents experimental results on two NLP tasks implemented as a two-stage cascading architecture. We show how spurious or irrelevant retrieved results from the first stage can induce errors in the second stage. We use these results to ground our discussion of the research challenges that the IR community should address in the context of these knowledge-intensive NLP tasks.
CITATION STYLE
Sauchuk, A., Thorne, J., Halevy, A., Tonellotto, N., & Silvestri, F. (2022). On the Role of Relevance in Natural Language Processing Tasks. In SIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1785–1789). Association for Computing Machinery, Inc. https://doi.org/10.1145/3477495.3532034
Mendeley helps you to discover research relevant for your work.