Technology-Assisted Review for Spreadsheets and Noisy Text

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In a large-scale eDiscovery effort, human assessors participated in a technology-assisted review ("TAR") process employing a modified version of Grossman and Cormack's Continuous Active Learning® ("CAL®") tool to review Excel spreadsheets and poor-quality OCR text (defined as 30-50% Markov error rate). In the legal industry, these documents are typically considered inappropriate for the application of TAR and, consequently, are usually the subject of exhaustive manual review. Our results assuage this concern by showing that a CAL TAR process, using feature engineering techniques adapted from spam filtering, can achieve satisfactory results on Excel spreadsheets and noisy OCR text. Our findings are cause for optimism in the legal industry - - adding these document classes to TAR datasets will make large reviews more manageable and less costly.

Cite

CITATION STYLE

APA

O’Halloran, T., McManus, B., Harbison, A., Grossman, M. R., & Cormack, G. V. (2023). Technology-Assisted Review for Spreadsheets and Noisy Text. In DocEng 2023 - Proceedings of the 2023 ACM Symposium on Document Engineering. Association for Computing Machinery, Inc. https://doi.org/10.1145/3573128.3609341

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free