To conduct the research reported in thismonograph, extensive analysis of the Galaxy Zoo and SDSS data sets and the various algorithms utilised is necessary in order to assess the needed requirements. The principal requirement, however, is to be able to successfully identify the actual morphologies of the galaxies labelled as Uncertain in the Galaxy Zoo data set. In this chapter, the adopted methodology will be analysed and shown to be the best fit for this project, together with a review of the K-Means algorithm and the entropy-based Information Gain feature selection technique which are the methods chosen for clustering and assessing the importance of the features, respectively. The innovative heuristic algorithm, required for obtaining the best attribute selection and that has been developed through this project, will also be presented and discussed in detail along with the pre- and post- processing methods that were utilised throughout the data mining process.
CITATION STYLE
Edwards, K. J., & Gaber, M. M. (2014). Adopted Data Mining Methods. In Studies in Big Data (Vol. 6, pp. 31–42). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-06599-1_4
Mendeley helps you to discover research relevant for your work.