The identification and selection of good quality data using pedigree matrix

10Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most data-based studies require significant amounts of data to support their decision-making process. Apart from increasing data quantity, scientists tend to be aware of the quality of data that influences the robustness of the results. A Pedigree matrix method is presented to characterize the data quality aspects and quantify the quality rating. Five quality aspects (reliability, completeness, temporal, geographical and technological representativeness) are defined as the characteristics to describe how well the reference data is fit for the underlying study. Reference rules are made subjectively for allocating the quality rating, which enable the computer to select appropriate data effectively from among different data sources. The overall data quality rating is calculated reflecting the quality level and converted to the four-parameter Beta probability distribution for uncertainty quantification. This is complemented by the Monte Carlo simulation that identifies uncertainty hotspots, to further improve the quality of identified data. This study provides an effective way to identify the data of good quality through the definition of reference rules. Making such rules can help the users to effectively capture the descriptive information regarding the data quality, further assess the quality levels consistently. The four-parameter Beta distribution is used for quantitative transformation, since it is appropriate to represent expert judgement. Therefore, the definition of distribution parameters is flexible depending on the expert understanding of uncertainty. This strength extends the application of the method to different data systems. Further research can focus on the development of reference rules for different quality aspects, as well the integration of the Pedigree matrix in various data systems.

Cite

CITATION STYLE

APA

Chen, X., & Lee, J. (2021). The identification and selection of good quality data using pedigree matrix. In Smart Innovation, Systems and Technologies (Vol. 200, pp. 13–25). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-8131-1_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free