An Introduction to Data Profiling

1Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the crucial requirements before consuming datasets for any application is to understand the dataset at hand and its metadata. The process of metadata discovery is known as data profiling. Profiling activities range from ad-hoc approaches, such as eye-balling random subsets of the data or formulating aggregation queries, to systematic inference of metadata via profiling algorithms. In this course, we will discuss the importance of data profiling as part of any data-related use-case, and shed light on the area of data profiling by classifying data profiling tasks and reviewing the state-of-the-art data profiling systems and techniques. In particular, we discuss hard problems in data profiling, such as algorithms for dependency discovery and their application in data management and data analytics. We conclude with directions for future research in the area of data profiling.

Cite

CITATION STYLE

APA

Abedjan, Z. (2018). An Introduction to Data Profiling. In Lecture Notes in Business Information Processing (Vol. 324, pp. 1–20). Springer Verlag. https://doi.org/10.1007/978-3-319-96655-7_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free