Data cleaning and transformation Using the AJAX framework

6Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data quality problems arise in different application contexts and require appropriate handling so that information becomes reliable. Examples of data anomalies are: missing values, the existence of duplicates, misspellings, data inconsistencies and wrong data formats. Current technologies handle data quality problems through: (i) software programs written in a programming language (e.g., C or Java) or an RDBMS programming language, (ii) the integrity constraints mechanisms offered by relational database management systems; or (iii) using a commercial data quality tool. None of these approaches is appropriate when handling non-conventional data applications dealing with large amounts of information. In fact, the existing technology is not able to support the design of a data flow graph that effectively and efficiently produce clean data. AJAX is a data cleaning and transformation tool that overcomes these aspects. In this paper, we present an overview of the entire set of functionalities supported by the AJAX system. First, we explain the logical and physical levels of the AJAX framework, and the advantages brought in terms of specification and optimization of data cleaning programs. Second, the set of logical data cleaning and transformation operators is described and exemplified, using the declarative language proposed. Third, we illustrate the purpose of the debugging facility and how it is supported by the exception mechanism offered by logical operators. Finally, the architecture of the AJAX system is presented and experimental validation of the prototype is briefly referred. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Galhardas, H. (2006). Data cleaning and transformation Using the AJAX framework. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4143 LNCS, pp. 327–343). Springer Verlag. https://doi.org/10.1007/11877028_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free