Machine learning for genomic prediction of growth traits in aquaculture: a case study of the Australasian snapper (Chrysophrys auratus)

Ze Chen; Julie Blommaert; Yi Mei; Linley Jesson; Maren Wellenreuther; Mengjie Zhang

Journal ArticleOPEN ACCESS

Machine learning for genomic prediction of growth traits in aquaculture: a case study of the Australasian snapper (Chrysophrys auratus)

BMC Bioinformatics (2025) 26(1)

DOI: 10.1186/s12859-025-06287-x

0Citations

9Readers

Abstract

Background: Chrysophrys auratus (family: Sparidae), commonly known as Australasian snapper, is a warm-water species being developed as a candidate for aquaculture in New Zealand. Genomic selection of elite snapper offers significant potential to accelerate genetic gains in aquaculture; however, the complexity of genetic architecture, coupled with challenges such as missing data and high dimensionality, poses significant hurdles. Machine learning techniques have emerged as powerful tools in genomic selection programmes due to their flexibility and ability to model complex, polygenic and non-linear relationships between genotypes and traits. This study aims to develop a comprehensive machine learning framework to evaluate imputation methods and genomic prediction models, and identify single-nucleotide polymorphisms associated with growth traits in snapper, ultimately contributing to the advancement of selective breeding programmes. Results: We evaluated multiple approaches for each component of the machine learning framework. We developed and evaluated the Domain Knowledge-based K-nearest neighbour (DK-KNN) imputation method, achieving a notably high imputation accuracy of 98.33% in simulation testing, outperforming two alternative imputation methods. Among feature selection and classification combinations evaluated for growth prediction, Chi-squared feature selection paired with Distance-Weighted Discrimination (Chi2-DWD) achieved 60% prediction accuracy, comparable to genomic best linear unbiased prediction (60.3%) but without requiring the genomic relationship matrix. Notably, the two-stage approach using Domain Knowledge-based Pre-filtering (DK Pre-filtering) as a pre-filter did not substantially impact prediction accuracy, and it proved valuable in reducing the dimensionality of the feature space without affecting model performance. Conclusions: Integration of domain knowledge into machine learning frameworks effectively addresses missing values and high-dimensional challenges in snapper genomic data. The evaluated framework demonstrates that Chi2-DWD represents a promising combination for genomic prediction tasks. The DK Pre-filtering workflow as a pre-filtering method successfully removes redundant features without affecting model performance. Selected features showed biological significance and were confirmed to be associated with growth traits based on biological analysis, providing valuable insights for selective breeding programs.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Z., Blommaert, J., Mei, Y., Jesson, L., Wellenreuther, M., & Zhang, M. (2025). Machine learning for genomic prediction of growth traits in aquaculture: a case study of the Australasian snapper (Chrysophrys auratus). BMC Bioinformatics, 26(1). https://doi.org/10.1186/s12859-025-06287-x

Machine learning for genomic prediction of growth traits in aquaculture: a case study of the Australasian snapper (Chrysophrys auratus)

Abstract

Author supplied keywords

Cite

Register to see more suggestions