Center for Integrated Breeding Research, University of Göttingen
Disclosure: Disclosure information not submitted.
Crop performance depends on complex interactions between genetic background and environmental conditions. Weather-related covariates can be used to characterize environments in plant breeding trials. Machine learning techniques may be capable of identifying the most relevant environmental variables that impact important traits. Additionally, machine learning approaches sometimes enhance the ability to predict phenotypes, compared to linear models, in multi-environment datasets. To explore the promise of machine learning for identifying critical weather-related variables and using this information to predict complex traits in maize hybrids, we leveraged the publicly available resources of the Maize Genomes to Fields (G2F) Initiative. We developed an assortment of basic environmental variables covering four maize growth stages: early vegetative, late vegetative, flowering, and grain filling. Additional environmental variables that exploited prior knowledge about crop physiology were also used. As expected, many variables were highly correlated, such as heat stress covariates with minimal, maximal and average temperatures. Therefore, feature (referred to as variable) selection methods were compared in order to obtain a subset of the most relevant variables to use as covariates in our modeling. Recursive feature elimination and Lasso regularization identified common environmental variables explaining most of the trait variability, for instance heat stress at flowering stage. In addition, GBS data of inbred lines were processed to obtain synthetic genotypic data (250,000 SNPs remaining for further analyses) on approximately 2,200 maize hybrids. Our initial results demonstrate that models with environmental covariates can sometimes outperform models which utilize year-location covariates for prediction. In this talk, we further dissect predictability by describing the potential of machine learning methodologies for feature selection and hybrid prediction.