The use of stopwords has been thoroughly studied in traditional Information Retrieval systems, but remains unexplored in the context of neural models. Neural re-ranking models take the full text of both the query and document into account. Naturally, removing tokens that do not carry relevance information provides us with an opportunity to improve the effectiveness by reducing noise and lower document representation caching-storage requirements. In this work we propose a novel contextualized stopword detection mechanism for neural re-ranking models. This mechanism consists of training a sparse vector in order to filter out document tokens from the ranking decision. This vector is learned end-to-end based on the contextualized document representations, allowing the model to filter terms on a per occurrence basis. This leads to a more explainable model, as it reduces noise. We integrate our component into the state-of-the-art interaction-based TK neural re-ranking model. Our experiments on the MS MARCO passage collection and queries from the TREC 2019 Deep Learning Track show that filtering out traditional stopwords prior to the neural model reduces its effectiveness, while learning to filter out contextualized representations improves it.
CITATION STYLE
Hofstätter, S., Lipani, A., Zlabinger, M., & Hanbury, A. (2020). Learning to Re-Rank with Contextualized Stopwords. In International Conference on Information and Knowledge Management, Proceedings (pp. 2057–2060). Association for Computing Machinery. https://doi.org/10.1145/3340531.3412079
Mendeley helps you to discover research relevant for your work.