The importance of run-time efficiency is still often disregarded in approaches to text analysis tasks, limiting their use for industrial size text mining applications (Chiticariu et al. 2010b). Search engines avoid efficiency problems by analyzing input texts at indexing time (Cafarella et al. 2005). However, this is impossible in case of ad-hoc text analysis tasks. In order both to manage and to benefit from the ever increasing amounts of text in the world, we need not only scale existing approaches to the large (Agichtein 2005), but we also need to develop novel approaches at large scale (Glorot et al. 2011). Standard text analysis pipelines execute computationally expensive algorithms on most parts of the input texts, as we have seen in Sect. 3.1. While one way to enable scalability is to rely on cheap but less effective algorithms only (Pantel et al. 2004; Al-Rfou’ and Skiena 2012), in this chapter we present ways to significantly speed up arbitrary pipelines by up to over one order of magnitude. As a consequence, more effective algorithms can be employed in large-scale text mining.
CITATION STYLE
Darwin, C. (2015). Pipeline efficiency. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9383, pp. 123–182). Springer Verlag. https://doi.org/10.1007/978-3-319-25741-9_4
Mendeley helps you to discover research relevant for your work.