Pipeline efficiency

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The importance of run-time efficiency is still often disregarded in approaches to text analysis tasks, limiting their use for industrial size text mining applications (Chiticariu et al. 2010b). Search engines avoid efficiency problems by analyzing input texts at indexing time (Cafarella et al. 2005). However, this is impossible in case of ad-hoc text analysis tasks. In order both to manage and to benefit from the ever increasing amounts of text in the world, we need not only scale existing approaches to the large (Agichtein 2005), but we also need to develop novel approaches at large scale (Glorot et al. 2011). Standard text analysis pipelines execute computationally expensive algorithms on most parts of the input texts, as we have seen in Sect. 3.1. While one way to enable scalability is to rely on cheap but less effective algorithms only (Pantel et al. 2004; Al-Rfou’ and Skiena 2012), in this chapter we present ways to significantly speed up arbitrary pipelines by up to over one order of magnitude. As a consequence, more effective algorithms can be employed in large-scale text mining.

Cite

CITATION STYLE

APA

Darwin, C. (2015). Pipeline efficiency. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9383, pp. 123–182). Springer Verlag. https://doi.org/10.1007/978-3-319-25741-9_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free