Pipeline efficiency

Charles Darwin

Conference Proceedings

Pipeline efficiency

Darwin C

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9383 123-182

DOI: 10.1007/978-3-319-25741-9_4

0Citations

7Readers

Get full text

Abstract

The importance of run-time efficiency is still often disregarded in approaches to text analysis tasks, limiting their use for industrial size text mining applications (Chiticariu et al. 2010b). Search engines avoid efficiency problems by analyzing input texts at indexing time (Cafarella et al. 2005). However, this is impossible in case of ad-hoc text analysis tasks. In order both to manage and to benefit from the ever increasing amounts of text in the world, we need not only scale existing approaches to the large (Agichtein 2005), but we also need to develop novel approaches at large scale (Glorot et al. 2011). Standard text analysis pipelines execute computationally expensive algorithms on most parts of the input texts, as we have seen in Sect. 3.1. While one way to enable scalability is to rely on cheap but less effective algorithms only (Pantel et al. 2004; Al-Rfou’ and Skiena 2012), in this chapter we present ways to significantly speed up arbitrary pipelines by up to over one order of magnitude. As a consequence, more effective algorithms can be employed in large-scale text mining.

Cite

CITATION STYLE

APA

Darwin, C. (2015). Pipeline efficiency. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9383, pp. 123–182). Springer Verlag. https://doi.org/10.1007/978-3-319-25741-9_4

Pipeline efficiency

Abstract

Cite

Register to see more suggestions