Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

Angie Boggust; Benjamin Hoover; Arvind Satyanarayan; Hendrik Strobelt

Conference ProceedingsOPEN ACCESS

Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

Conference on Human Factors in Computing Systems - Proceedings (2022)

DOI: 10.1145/3491102.3501965

19Citations

42Readers

Abstract

Saliency methods - techniques to identify the importance of input features on a model's output - are a common step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: metrics for comparing model reasoning (via saliency) to human reasoning (via ground truth annotations). By providing quantitative descriptors, Shared Interest enables ranking, sorting, and aggregating inputs, thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior, such as cases where contextual features or a subset of ground truth features are most important to the model. Working with representative real-world users, we show how Shared Interest can be used to decide if a model is trustworthy, uncover issues missed in manual analyses, and enable interactive probing.

Author supplied keywords

Cite

CITATION STYLE

APA

Boggust, A., Hoover, B., Satyanarayan, A., & Strobelt, H. (2022). Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3491102.3501965

Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

Abstract

Author supplied keywords

Cite

Register to see more suggestions