Trade-offs in Sampling and Search for Early-stage Interactive Text Classification

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For many automated classification tasks, collecting labeled data is the key barrier to training a useful supervised model. Interfaces for interactive labeling tighten the loop of labeled data collection and model development, enabling a subject-matter expert to quickly establish the feasibility of a classifier to address a problem of interest. These interactive machine learning (IML) interfaces iteratively sample unlabeled data for annotation, train a new model, and display feedback on the model's estimated performance. Different sampling strategies affect both the rate at which the model improves and the bias of performance estimates. We compare the performance of three sampling strategies in the "early-stage"of label collection, starting from zero labeled data. By simulating a user's interactions with an IML labeling interface, we demonstrate a trade-off between improving a text classifier's performance and computing unbiased estimates of that performance. We show that supplementing early-stage sampling with user-guided text search can effectively "seed"a classifier with positive documents without compromising generalization performance - particularly for imbalanced tasks where positive documents are rare. We argue for the benefits of incorporating search alongside active learning in IML interfaces and identify design trade-offs around the use of non-random sampling strategies.

Cite

CITATION STYLE

APA

Levonian, Z., Lee, C. J., Murdock, V., & Harper, F. M. (2022). Trade-offs in Sampling and Search for Early-stage Interactive Text Classification. In International Conference on Intelligent User Interfaces, Proceedings IUI (pp. 566–583). Association for Computing Machinery. https://doi.org/10.1145/3490099.3511134

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free