Abstract
It is a widely accepted belief in natural language processing research that naturally occurring data is the best (and perhaps the only appropriate) data for testing text mining systems. This paper compares code coverage using a suite of functional tests and using a large corpus and finds that higher class, line, and branch coverage is achieved with structured tests than with even a very large corpus.
Cite
CITATION STYLE
Cohen, K. B., Baumgartner, W. A., & Hunter, L. (2008). Software testing and the naturally occurring data assumption in natural language processing. In ACL-08: HLT - Software Engineering, Testing, and Quality Assurance for Natural Language Processing (pp. 23–30). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1622110.1622116
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.