The training population is a critical
component of genomic selection (GS) and plays a crucial role in developing
accurate prediction models for estimating the breeding values of individuals
based on their genotypic information. The relevance of the training population
lies in its ability to capture the genetic diversity and phenotypic variation
present in the breeding germplasm. Here are some key aspects of the training
population in genomic selection and important considerations for its creation:
Representativeness:
·
The
training population should represent the genetic diversity present in the
breeding germplasm. It should include individuals from diverse genetic
backgrounds, including different breeding lines, landraces, wild relatives, and
elite cultivars.
·
Care
should be taken to ensure that the training population adequately covers the
range of genetic variation for the traits of interest, including both favorable
and unfavorable alleles.
Phenotypic Data:
·
Phenotypic
data collected on individuals in the training population are essential for
establishing the relationship between marker genotypes and phenotypic traits.
·
Phenotypic
data should be collected under relevant environmental conditions and using
standardized protocols to ensure consistency and accuracy.
·
Traits
measured should be heritable and economically important for the breeding
program, and efforts should be made to collect data on multiple traits to
enhance the prediction accuracy of the model.
Marker Density:
·
The
marker density in the training population should be sufficient to capture the
genetic variation present in the breeding germplasm.
·
High-density
genotyping platforms, such as single nucleotide polymorphism (SNP) arrays or
genotyping-by-sequencing (GBS), are often used to genotype individuals in the
training population to ensure comprehensive coverage of the genome.
Population Size:
·
The
size of the training population should be large enough to capture the genetic
complexity of the traits being targeted.
·
A
larger training population size generally leads to more accurate prediction
models, especially for traits with low heritability or controlled by multiple
genes.
Population Structure and Relatedness:
·
Population
structure and relatedness among individuals in the training population can
influence the accuracy of genomic predictions.
·
Strategies
such as controlling for population structure using principal component analysis
(PCA) or incorporating kinship matrices into prediction models can help account
for genetic relatedness and population stratification.
Cross-Validation:
·
Cross-validation
techniques, such as leave-one-out cross-validation or k-fold cross-validation,
are commonly used to assess the predictive ability of the model and validate
its performance.
·
The
training population is typically divided into training and validation sets,
with the prediction model trained on the training set and evaluated on the
validation set to estimate prediction accuracy.
Long-Term Stability:
·
The
training population should be maintained over time to ensure the long-term
stability and relevance of the prediction models.
·
Regular
updates to the training population may be necessary to incorporate new
germplasm, phenotypic data, or advances in genotyping technologies.
In summary, the training population is a foundational
element of genomic selection, providing the genetic and phenotypic data needed
to develop accurate prediction models for estimating the breeding values of
individuals. Careful consideration of representativeness, phenotypic data
quality, marker density, population size, population structure,
cross-validation, and long-term stability is essential during the creation of a
suitable training population to ensure the success of genomic selection in
plant breeding programs.
0 Comments