Modeling Information Quality Risk for Data Mining and Case Studies

  • Su Y
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Today, information is a vital business asset. For institutional and individual processes that depend on information, the quality of information (IQ) is one of the key determinants of the quality of their decisions and actions (Hand, et al., 2001; W. Kim et al., 2003; Mucksch, et al., 1996). Data mining (DM) technology can discover hidden relationships, patterns and interdependencies and generate rules to predict the correlations in data warehouses (Y. Su, et al., 2009c). However, only a few companies have implemented these technologies because of their inability to clearly measure the quality of data and consequently the quality risk of information derived from the data warehouse(Fisher, et al., 2003). Without this ability it becomes difficult for companies to estimate the cost of poor information to the organization (D. Ballou, Madnick, & Wang, 2003). For the above reasons, the risk management of the IQ for DM is been identified as a critical issue for companies. Therefore, we develop a methodology to model the quality risk of information based on the quality of the source databases and associated DM processes. The rest of this chapter is organized as follows. After a review of the relevant in Section 2, we introduce a forma1 model proposed for data warehousing and DM that attempts to support quality risks of different levels in Section 3. In section 4, we discuss the different quality risks that need to be considered for the output of Restriction operator, Projection and Cubic product operators. Section 5 describes an information quality assurance exercise undertaken for a finance company as part of a larger project in auto finance marketing. A methodology to estimate the effects of data accuracy, completeness and consistency on the data aggregate functions Count, Sum and Average is presented(Y. Su, et al., 2009a). The methodology should be of specific interest to quality assurance practitioners for projects that harvest warehouse data for decision support to the management. The assessment comprised ten checks in three broad categories, to ensure the quality of information collected over 1103 attributes. The assessment discovered four critical gaps in the data that had to be corrected before the data could be transitioned to the analysis phase. Section 6 applies above methodology to evaluate two information quality characteristics accuracy and completeness for the HIS database. Four quantitative measures are introduced to assess the risk of medical information quality. The methodology is illustrated through a medical domain: infection control. The results show the methodology was effective to detection and aversion of risk factors(Y. Su, et al., 2009b).

Cite

CITATION STYLE

APA

Su, Y. (2011). Modeling Information Quality Risk for Data Mining and Case Studies. In New Fundamental Technologies in Data Mining. InTech. https://doi.org/10.5772/13928

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free