Evaluating Traditional and Machine Learning-Based GWAS Methods in the Context of Wheat Breeding

  • Milek, J. (Vortragender)
  • Sebastian Michel (Autor)
  • Alexander Buchelt (Autor)
  • Andreas Holzinger (Autor)
  • Molin, E. M. (Autor)

Aktivität: Vortrag ohne Tagungsband / VorlesungPräsentation auf einer wissenschaftlichen Konferenz / Workshop

Beschreibung

Modern genetic research methods, such as genome-wide association studies (GWAS), have
significantly advanced our understanding of the genetic architecture underlying important
agricultural traits, enabling the identification of key genes and regulatory networks. Despite
numerous variations, GWAS still faces traditional statistical challenges that machine learning
(ML) methods show promise to overcome. In this study, we systematically benchmarked
GWAS tools and ML methods for identifying marker–trait associations (MTAs) in wheat, using
the publicly available CIMMYT dataset. Traditional GWAS tools, including GAPIT, GCTA,
GEMMA, sommer, and TASSEL, were evaluated with respect to computational efficiency, model
performance, and the consistency of detected associations. In parallel, ML approaches, such
as Elastic Net, Extreme Gradient Boosting, Random Forest, and the hybrid TSLRF model, were
assessed based on feature importance and functional annotation of selected markers. Despite
relying on similar mixed linear model frameworks, the traditional tools displayed notable
differences in runtime and in the number and overlap of detected MTAs. Several models also
exhibited high genomic inflation, complicating downstream interpretation. ML methods
successfully recovered known associations and additionally identified novel, potentially nonlinear
or epistatic signals overlooked by conventional approaches. In conclusion, our results
demonstrate that ML offers a powerful and complementary approach to traditional GWAS
methodologies in wheat genomics, enhancing the detection of relevant genetic signals and
providing robust, model-agnostic metrics for marker relevance. Whilst mixed linear models
remain robust for correcting population structure and controlling false positives, they are
inherently limited to uncover complex, non-additive genetic signals. Our results advocate for
the integration of ML into routine GWAS workflows in plant breeding, thereby enhancing trait
dissection and accelerating marker-assisted selection under complex genomic architectures
Zeitraum22 Sept. 2025
Ereignistitel8th AMICI Symposium & Austrian Bioinformatics Workshop 2025
VeranstaltungstypKonferenz
OrtWienAuf Karte anzeigen
BekanntheitsgradInternational

Research Field

  • Exploration of Biological Resources