The vast range of data mining algorithms available for learningclassification problems has encouraged a trial-and-error approach tofinding the best model. This problem is exacerbated by the fact thatlittle is known about which techniques are suited to which types ofproblems. This paper provides some insights into the datacharacteristics that suit particular data mining algorithms. Ourapproach consists of four main stages. First, the performance of sixleading data mining algorithms is examined across a collection of 57well-known classification problems from the machine learning literature.Secondly, a collection of statistics that describe each of the 57problems in terms of data complexity is collated. Thirdly, aself-organising map (SOM) is used to cluster the 57 problems based onthese measures of complexity. Each cluster represents a group ofclassification problems with similar data characteristics. Theperformance of each data mining algorithm within each cluster is thenexamined in the Final stage to provide both quantitative and qualitativeinsights into which techniques perform best on certain problem types.
CITATION STYLE
Smith, K. A., Woo, F., Ciesielski, V., & Ibrahim, R. (2002). Matching Data Mining Algorithm Suitability to Data Characteristics Using a Self-Organizing Map. In Hybrid Information Systems (pp. 169–179). Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-1782-9_13
Mendeley helps you to discover research relevant for your work.