What is Data Quality and Why Should We Care?

  • Herzog T
  • Scheuren F
  • Winkler W
N/ACitations
Citations of this article
41Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Caring about data quality is key to safeguarding and improving it. As stated, this sounds like a very obvious proposition. But can we, as the expression goes, "recognize it when we see it"? Considerable analysis and much experience make it clear that the answer is "no." Discovering whether data are of acceptable quality is a measurement task, and not a very easy one. This observation becomes all the more important in this information age, when explicit and meticulous attention to data is of growing importance if information is not to become misinformation. This chapter provides foundational material for the specifics that follow in later chapters about ways to safeguard and improve data quality. 1 After identifying when data are of high quality, we give reasons why we should care about data quality and discuss how one can obtain high-quality data. Experts on quality (such as Redman [1996], English [1999], and Loshin [2001]) have been able to show companies how to improve their processes by first understanding the basic procedures the companies use and then showing new ways to collect and analyze quantitative data about those procedures in order to improve them. Here, we take as our primary starting point primarily the work of Deming, Juran, and Ishakawa. 2.1. When Are Data of High Quality? Data are of high quality if they are "Fit for Use" in their intended operational, decision-making and other roles. 2 In many settings, especially for intermediate products, it is also convenient to define quality as "Conformance to Standards" that have been set, so that fitness for use is achieved. These two criteria link the 1 It is well recognized that quality must have undoubted top priority in every organization. As Juran and Godfrey [1999; pages 4-20, 4-21, and 34-9] makes clear, quality has several dimensions, including meeting customer needs, protecting human safety, and protecting the environment. We restrict our attention to the quality of data, which can affect efforts to achieve quality in all three of these overall quality dimensions. 2 Juran and Godfrey [1999]. 7

Cite

CITATION STYLE

APA

Herzog, T. N., Scheuren, F. J., & Winkler, W. E. (2007). What is Data Quality and Why Should We Care? In Data Quality and Record Linkage Techniques (pp. 7–15). Springer New York. https://doi.org/10.1007/0-387-69505-2_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free