Core collection is an ideal resource for genome-wide association studies (GWAS). A subcore collection is a subset of a core collection. A strategy was proposed for finding the optimal sampling percentage on plant subcore collection based on Monte Carlo simulation. A cotton germplasm group of 168 accessions with 20 quantitative traits was used to construct subcore collections. Mixed linear model approach was used to eliminate environment effect and GE (genotype × environment) effect. Least distance stepwise sampling (LDSS) method combining 6 commonly used genetic distances and unweighted pair-group average (UPGMA) cluster method was adopted to construct subcore collections. Homogeneous population assessing method was adopted to assess the validity of 7 evaluating parameters of subcore collection. Monte Carlo simulation was conducted on the sampling percentage, the number of traits, and the evaluating parameters. A new method for "distilling free-form natural laws from experimental data" was adopted to find the best formula to determine the optimal sampling percentages. The results showed that coincidence rate of range (CR) was the most valid evaluating parameter and was suitable to serve as a threshold to find the optimal sampling percentage. The principal component analysis showed that subcore collections constructed by the optimal sampling percentages calculated by present strategy were well representative. © 2014 Jiancheng Wang et al.
Wang, J., Guan, Y., Wang, Y., Zhu, L., Wang, Q., Hu, Q., & Hu, J. (2014). A strategy for finding the optimal scale of plant core collection based on Monte Carlo simulation. The Scientific World Journal, 2014. https://doi.org/10.1155/2014/503473