Novel Benchmark Data Set for Automatic Error Detection and Correction

Corina Masanti; Hans Friedrich Witschel; Kaspar Riesen

Conference Proceedings

Novel Benchmark Data Set for Automatic Error Detection and Correction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13913 LNCS 511-521

DOI: 10.1007/978-3-031-35320-8_38

0Citations

3Readers

Get full text

Abstract

The present paper introduces a novel benchmark data set for automatic error detection as well as error correction in text documents based on language models or other techniques. The data set contains a large number of sentences from various domains annotated with various types of errors (orthographic, grammatical, punctuation, and typography errors). The paper presents the method used to collect and annotate the documents, provides statistical analyses of the data set’s properties and evaluates two preliminary baseline models for automatic error detection on a specific benchmark task. The results show, on the one hand, the effectiveness of the proposed data set for the evaluation of automatic error detection systems. On the other hand, these initial analyses also reveal that the data set contains challenging cases that are difficult to detect. Finally, the paper discusses potential applications of the proposed data set in the development and research of error detection and error correction systems.

Author supplied keywords

Cite

CITATION STYLE

APA

Masanti, C., Witschel, H. F., & Riesen, K. (2023). Novel Benchmark Data Set for Automatic Error Detection and Correction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13913 LNCS, pp. 511–521). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-35320-8_38

Novel Benchmark Data Set for Automatic Error Detection and Correction

Abstract

Author supplied keywords

Cite

Register to see more suggestions