Abstract
In a large-scale eDiscovery effort, human assessors participated in a technology-assisted review ("TAR") process employing a modified version of Grossman and Cormack's Continuous Active Learning® ("CAL®") tool to review Excel spreadsheets and poor-quality OCR text (defined as 30-50% Markov error rate). In the legal industry, these documents are typically considered inappropriate for the application of TAR and, consequently, are usually the subject of exhaustive manual review. Our results assuage this concern by showing that a CAL TAR process, using feature engineering techniques adapted from spam filtering, can achieve satisfactory results on Excel spreadsheets and noisy OCR text. Our findings are cause for optimism in the legal industry - - adding these document classes to TAR datasets will make large reviews more manageable and less costly.
Author supplied keywords
Cite
CITATION STYLE
O’Halloran, T., McManus, B., Harbison, A., Grossman, M. R., & Cormack, G. V. (2023). Technology-Assisted Review for Spreadsheets and Noisy Text. In DocEng 2023 - Proceedings of the 2023 ACM Symposium on Document Engineering. Association for Computing Machinery, Inc. https://doi.org/10.1145/3573128.3609341
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.