The Accuracy of Genomic Prediction between Environments and Populations for Soft Wheat Traits


Genomic selection (GS) uses training population (TP) data to estimate the value of lines in a selection population. In breeding, the TP and selection population are often grown in different environments, which can cause low prediction accuracy when the correlation of genetic effects between the environments is low. Subsets of TP data may be more predictive than using all TP data. Our objectives were (i) to evaluate the effect of using subsets of TP data on GS accuracy between environments, and (ii) to assess the accuracy of models incorporating marker x environment interaction (MEI). Two wheat (Triticum aestivum L.) populations were phenotyped for 11 traits in independent environments and genotyped with single-nucleotide polymorphism markers. Within each population trait combination, environments were clustered. Data from one duster were used as the TP to predict the value of the same lines in the other cluster(s) of environments. Models were built using all TP data or subsets of markers selected for their effect and stability. The GS accuracy using all TP data was >0.25 for 9 of 11 traits. The between-environment accuracy was generally greatest using a subset of stable and significant markers; accuracy increased up to 48% relative to using all TP data. We also assessed accuracy using each population as the TP and the other as the selection population. Using subsets of TP data or the MEI models did not improve accuracy between populations. Using optimized subsets of markers within a population can improve GS accuracy by reducing noise in the prediction data set.