The Effect of Corpora Size on Performance of Named Entity Recognition

Zeinab Liaghat

Book Chapter

The Effect of Corpora Size on Performance of Named Entity Recognition

Liaghat Z

Springer Science and Business Media Deutschland GmbH, (2018), 93-105

DOI: 10.1007/978-3-319-60255-4_8

0Citations

2Readers

Get full text

Abstract

The amount of on-line text available is continuously growing and has reached hundreds of billions of words. A lot of research has been done using this data, trying to improve results on different problems. Algorithms are continuously optimized, tested and compared after training on corpora with only one million words or less. Most research focuses on the accuracy of the results generated by these algorithms often overlooking the running time or the cost associated with running those algorithms. The main goal of this paper is to show the effect that large data has on the running time and performance of those algorithms in Natural Language Processing. To achieve this goal, three Named Entity Recognition tools were selected. We evaluated the trade-off between quality, running time, and the effect of increasing the data size on performance on the best variety of tools in NER domain. The result shows that the existing tools are unable to work with increasing data size. Also by increasing data size quality is increasing but performance is decreasing; therefore, rendering the existing tools inefficient. By optimizing these tools, large data sizes can be processed; unfortunately, latency is still high.

Author supplied keywords

Cite

CITATION STYLE

APA

Liaghat, Z. (2018). The Effect of Corpora Size on Performance of Named Entity Recognition. In Studies in Big Data (Vol. 27, pp. 93–105). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-60255-4_8

The Effect of Corpora Size on Performance of Named Entity Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions