Abstract
Integration of data sources to build a Data warehouse (DW), refers to the task of developing a common schema as well as data transformation solutions for a number of data sources with related content. The large number and size of modern data sources make the integration process cumbersome. In such cases dimensionality of the data is reduced prior to populating the DWs. Attribute subset selection on the basis of relevance analysis is one way to reduce the dimensionality. Relevance analysis of attribute is done by means of correlation analysis, which detects the attributes (redundant) that do not have significant contribution in the characteristics of whole data of concern. After which the redundant attribute or attribute strongly correlated to some other attribute is disqualified to be the part of DW. Automated tools based on the existing methods for attribute subset selection may not yield optimal set of attributes, which may degrade the performance of DW. Various researchers have used GA, as an optimization tool but most of them use GA to search the optimal technique amongst the available techniques for attribute selection. This paper formulates and validates a method for selecting optimal attribute subset based on correlation using Genetic algorithm (GA), where GA is used as optimal search tool for selecting subset of attributes. . General Terms: Data Warehousing, Data Mining, Genetic Algorithms.
Cite
CITATION STYLE
Tiwari, R., & Singh, M. P. (2010). Correlation-based Attribute Selection using Genetic Algorithm. International Journal of Computer Applications, 4(8), 28–34. https://doi.org/10.5120/847-1182
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.