The Beauty of Small Data: An Information Retrieval Perspective

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This chapter focuses on Data Science problems, which we will refer to as “Small Data” problems. We have over the past 20 years accumulated considerable experience with working on Information Retrieval applications that allow effective search on collections that do not exceed in size the order of tens or hundreds of thousands of documents. In this chapter we want to highlight a number of lessons learned in dealing with such document collections. The better-known term “Big Data” has in recent years created a lot of buzz, but also frequent misunderstandings. To use a provocative simplification, the magic of Big Data often lies in the fact that sheer volume of data will necessarily bring redundancy, which can be detected in the form of patterns. Algorithms can then be trained to recognize and process these repeated patterns in the data streams. Conversely, “Small Data” approaches do not operate on volumes of data big enough to exploit repetitive patterns to a successful degree. While there have been spectacular applications of Big Data technology, we are convinced that there are and will remain countless, equally exciting, “Small Data” tasks, across all industrial and public sectors, and also for private applications. They have to be approached in a very different manner to Big Data problems. In this chapter, we will first argue that the task of retrieving documents from large text collections (often termed “full text search”) can become easier as the document collection grows. We then present two exemplary “Small Data” retrieval applications and discuss the best practices that can be derived from such applications.

Cite

CITATION STYLE

APA

Braschler, M. (2019). The Beauty of Small Data: An Information Retrieval Perspective. In Applied Data Science: Lessons Learned for the Data-Driven Business (pp. 233–250). Springer International Publishing. https://doi.org/10.1007/978-3-030-11821-1_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free