Inseq: An interpretability toolkit for sequence generation models

42Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq1, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models internal information and feature importance scores for popular decoderonly and encoder-decoder Transformers architectures. We showcase its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2. Thanks to its extensible interface supporting cutting-edge techniques such as contrastive feature attribution, Inseq can drive future advances in explainable natural language generation, centralizing good practices and enabling fair and reproducible model evaluations.

Cite

CITATION STYLE

APA

Sarti, G., Feldhus, N., Sickert, L., & Van Der Wal, O. (2023). Inseq: An interpretability toolkit for sequence generation models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 3, pp. 421–435). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-demo.40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free