Abstract
In many cases, traditional analysis of breeding trials based on analysis of variance (ANOVA) do not allow a suitable genetic evaluation. Alternatively, mixed model-based approaches create the possibility of dealing with unbalanced data and modeling spatial trends. The aims of this study were to compare the goodness-of-fit of the model and the genotype ranking through different residual modeling approaches and to select the best performing tropical wheat genotypes based on the best-fitting model. A panel of tropical wheat genotypes was evaluated in three field trials conducted between 2020 and 2021 for grain yield. Linear mixed model analyses were used on the data to estimate the genetic parameters and to predict the genotypic values in analyses of single- and multi-environment trials. Accounting for spatial trends in the analyses of single- and multi-environment trials provides better outcomes than the compound symmetry model does.
Keywords: BLUP; mixed-model; spatial analysis; Triticum aestivum L
INTRODUCTION
Elite wheat selection candidates need to be evaluated in multi-site multi-year breeding trials to be recommended. Generally, data from multi-environment trials (MET) are unbalanced and have heterogeneous variances due to differences in environmental conditions, factors that hamper ordinary least square-based inferences. These difficulties are easily overcome using linear mixed models (Henderson 1975). This method deals with statistical and genotypic imbalance, and allows the modeling of covariance structures. Using linear mixed models, all decisions are based on restricted maximum likelihood (REML) estimates and best linear unbiased predictors (BLUP), which penalizes the adjusted mean by the amount of available information, increasing the correlation between the true and predicted genotypic values.
Linear mixed models are also fitted to deal with spatially correlated data. Dependence between plots can be caused by external sources of variation, such as soil heterogeneity, disease or pest outbreaks, and inappropriate crop management or experimental designs (Burgueño et al. 2018). The first approaches proposed to deal with spatial trends consisted of adjusting plot results for spatial variability using information from neighboring plots (Wilkinson et al. 1983). Later, statistical models were proposed to sequentially fit a class of autoregressive integrated moving averages to the plot errors in row or column directions (Gleeson and Cullis 1987). This model was further extended to two directions, assuming that rows and columns in the field are regularly spaced (Cullins and Gleeson 1991). Currently, one of the most used models to account for spatial trends is the first-order separable autoregressive model for rows and columns, which models the residual variance-covariance matrix in a two-dimensional process and decomposes the residual variation into correlated and uncorrelated residuals (Gilmour et al. 1997).
Controlling external sources of variation by modeling the residual term in single- and multi-environment trials can increase the reliability of selection and genetic gains (Lado et al. 2013, Gogel et al. 2018). Nevertheless, this is often overlooked in tropical wheat breeding programs. In this study, the objective was to show how residual modeling in single- and multi-environment trials can be beneficial for wheat breeding. For that purpose, we compared the goodness-of-fit of the model and the genotype ranking through different residual modeling approaches, and selected the best performing tropical wheat genotypes for grain yield based on the best-fitting model.
MATERIAL AND METHODS
Genotypes and field trials
We evaluated 42 wheat lines from the UFV wheat breeding program and eight commercial checks for grain yield (kg ha-1) in three field trials (FT1, FT2, and FT3) in the Professor “Diogo Alves de Mello” experimental field (lat 20º 45’ 14” S, long 42º 52’ 55” W, and alt 648 m asl) at the Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil. The soil is classified as an Oxisol (Santos et al. 2018), and the climate of the region is a monsoon-influenced humid subtropical climate with wet winters and hot summers, average annual rainfall between 1300 and 1600 mm, and average annual temperature of 21 ºC (Alvares et al. 2013).
FT1 was conducted during the summer of 2020, and FT2 and FT3 were conducted during the winter of 2020 and 2021, respectively. We laid out the trials in a randomized complete block design with three replications. Plots consisted of five 5-m rows spaced at 0.20 m and had a population density of 350 seeds m-2. We performed agronomic practices according to the technical recommendations for wheat growing in the Brazilian South Central region.
Genetic and statistical analyses
We estimated the variance components through the restricted maximum likelihood (REML) method and predicted the genotypic values through best linear unbiased prediction (BLUP) (Patterson and Thompson 1971, Henderson 1975).
Single-environment trial analysis
We considered the following model for the individual analysis of each trial:
where is the vector of phenotypic observations; is the vector of fixed effects of blocks added to the overall mean; is the vector of random effects of genotypes, ~ (0; ), where is the genotypic variance; and is the vector of random residual effects, ~ (0; ), where is the residual variance. and are the incidence matrices for those effects.
We fitted four residual covariance structures for modeling the residual effects: the first model (NSPM) did not account for correlations among the rows or columns, i.e., , where is the residual variance, and and are identity matrices representing spatial independence in the row and column directions, respectively. The second model (SPM1) accounted for spatial correlations among rows, i.e., , where is the variance of spatially correlated residuals, is the first-order autoregressive correlation matrix for rows, and is the Kronecker product. The third model (SPM2) accounted for spatial correlations among columns, i.e., , where is the first-order autoregressive correlation matrix for columns. Finally, the fourth model (SPM3) accounted for correlations in both directions (rows and columns), i.e., .
Multi-environment trial analysis
For the multi-environment trial analyses, we used the following model:
where is the vector of phenotypic observations; is the vector of fixed effects of trials added to the overall mean; is the vector of fixed effects of replicates nested within trials; is the vector of random effects of genotypes, ~ (0; ); is the vector of the random effects of the genotype-by-environment interactions, ~ (0; ), where is the genotype-by-environment interaction variance; and is the vector of random effects of the residuals ~ (0; ). , , , and are the incidence matrices for those effects.
We modeled the residuals in the joint analyses using three covariance structures. The first approach (SSM1) assumed a homogeneous residual variance across the trials, i.e., , where is the residual variance and is an identity matrix whose dimension is the number of trials. The second approach (SSM2) assumed heterogeneous residual variances, i.e., , where is a diagonal matrix containing the residual variance of each trial. Given the results of the single-environment trial analyses, where the best-fitting model for each field trial considered spatial correlations among columns, i.e., SPM2, the third approach for the multi-environment trial analysis (SSM3) considered heterogeneous residual variances and the spatial correlation in column directions in each field trial, i.e., .
Model selection
We selected the best-fitting model using the Akaike Information Criterion (AIC) (Akaike 1974):
where is the logarithm of the maximum of the restricted likelihood function; is the number of parameters estimated; and is the number of observations. The best-fitting model is the one with the lowest AIC value.
Likelihood ratio test
We tested the significances of the genotype and the genotype-by-environment interaction effects using the likelihood ratio test (LRT) (Wilks 1938), given by:
where is the logarithm of the restricted likelihood function of the full model, and is the logarithm of the restricted likelihood function of the reduced model. The significance of the random effects was tested using the chi-square distribution, considering 5% and 1% probabilities.
Genetic parameters
We estimated broad sense heritability as follows (Cullis et al. 2006):
where is the mean variance of a difference of two BLUPs; and is the genotypic variance.
We estimated accuracy as follows:
where is the prediction error variance; and is the genotypic variance.
Selection gain
For selection of the top performers across the trials, we considered a 20% selection proportion. The predicted genetic gain from selection was calculated as follows:
where is the mean of the selected genotypes; and is the original mean.
Linear regression and Kappa coefficient
To investigate the degree of dissimilarity among the outcomes of each model, we fitted a linear regression model using the BLUPs of the SSM1, SSM2, and SSM3 as follows:
where is the BLUP of the model; is the intercept; is the angular coefficient of the regression; is the BLUP of the model; and is the error. We also computed the Kappa coefficient (K) (Cohen 1960) to evaluate the agreement among models in the ranking of the 20% best selected genotypes:
where is the proportion of matching selected genotypes; and is the proportion of matching selected genotypes expected by chance.
Software
We performed all analysis in the R version 4.2.2 software (R Core Team 2022). Mixed-model analyses were fitted using the asreml package, version 4.2 (Butler 2023), and plots were prepared with the ggplot2 package (Wickham 2016).
RESULTS AND DISCUSSION
The genotype effects were significant at 1% probability by the Chi-square test in all three field trials considering all four mixed models fitted (Table 1). In FT1, heritability estimates ranged from 0.62 (NSPM) to 0.68 (SPM3) and accuracy estimates from 0.78 (NSPM) to 0.82 (SPM3). The autocorrelation coefficient estimates for rows were -0.13 (SMP1) and -0.15 (SPM3), and the autocorrelation coefficient estimates for columns were 0.23 (SPM2) and 0.25 (SPM3). In FT2, heritability estimates ranged from 0.65 (NSPM) to 0.68 (SPM3) and accuracy estimates from 0.81 (NSPM) to 0.82 (SPM3). The autocorrelation coefficient estimates for rows were -0.24 (SPM1) and -0.16 (SPM3), and the autocorrelation coefficient estimates for columns were 0.40 (SPM2) and 0.38 (SPM3). Lastly, in FT3, heritability estimates ranged from 0.57 (NSPM) to 0.58 (SPM2) and accuracy estimates from 0.74 (NSPM) to 0.75 (SPM1). The autocorrelation coefficient estimates for rows were 0.03 (SPM1) and 0.01 (SPM3), and the autocorrelation coefficient estimate for columns considering models SPM2 and SPM3 was 0.30.
The results of the single-environment trial analyses obtained in this study demonstrate that accounting for spatial trends in column directions (SPM2) provide better outcomes than the compound symmetry model (NSPM) does. This is confirmed by the low AIC estimates obtained by SPM2 in all field trials. Additionally, modeling the spatial trends also had positive impacts on the heritability and accuracy estimates. Although those parameters can be considered moderate to high by NSPM (Resende and Alves 2022), fitting SPM2 allowed for a slight increase in the heritability and accuracy estimates, indicating that accounting for residual correlations might increase the accuracy and, consequently, leverage genetic gains from selection. Those results are consistent with previous reports aiming to evaluate the suitability of spatial analysis for genetic evaluation of soybean, maize, and common bean field trials (Bernardeli et al. 2021, Salvador et al. 2022).
The variograms of the best-fitting model (SPM2) for the three field trials showed peaks of field spatial dependencies along columns for most of the trials (Figure 1). These results are consistent with the autocorrelation coefficient estimates for the three field trials fitting SPM2 and SPM3, i.e., there is a certain correlation among residuals in the column direction. The results obtained from the variograms and the moderate autocorrelation coefficient estimates in the single-environment trial analyses indicate the presence of heterogeneity patterns of adjacent plots in the column direction (Burgueño et al. 2018). Additionally, the positive autocorrelation estimates mean that the plots were under the same environmental conditions (Bernardeli et al. 2021). The low magnitude of the autocorrelation coefficient estimates for row direction shows the presence of undefined patterns of spatial variability and, because those coefficients were negative, it indicates a certain competition among plots (Andrade et al. 2020).
Variograms obtained from the best-fitting model (SPM2) used for analysis of field trial 1 (A), field trial 2 (B), and field trial 3 (C).
Examining the presence of spatial dependencies through variogram inspection and estimation of autocorrelation coefficients by single-environment trial analyses might be useful for further modeling of the residual effects, i.e., including the row and column effects in the fixed and random parts of a global model. This approach has been adopted as part of a two-stage strategy in maize, common bean, and elephant grass; and it is able to enhance prediction of the genotypic values (Salvador et al. 2022, Ferreira et al. 2022).
Traditionally, single-stage analysis is considered the gold-standard approach (Smith et al. 2001), since it provides best linear unbiased estimators (BLUE) of all fixed effects and BLUP of all random effects under the assumed single-stage model (Piepho et al. 2012). Nevertheless, a two-stage approach might be suitable, especially when a large number of environments need to be analyzed. This strategy can leverage speed, simplicity, and computational efficiency (Möhring and Piepho 2009). Commonly, in the two-stage approach, the BLUE and the weights obtained in the first stage are used in the second stage for the predictions. However, if a two-stage analysis needs to be implemented, it is reasonable to consider the genetic effects as random in the first stage (Verbyla 2023). After that, the estimates obtained in the first stage need to be de-regressed and used in the second stage, along with a full weight matrix.
In the multi-environment trial analyses, the genotype effects were significant at 1% probability and the genotype-by-environment interaction effects were not significant by the Chi-square test (Table 1). Those results indicate suitable genetic variability and a consistent response of the genotypes across the environments. The absence of the genotype-by-environment interaction might be explained by the fact that the trials were conducted in the same location. Increasing the number of environments (locations, years, and seasons) could potentially lead to differential responses of the genotypes. Consequently, robust biometric approaches would be required to quantify the genotype-by-environment interaction and recommend the genotypes according to their overall performance and stability (Chaves et al. 2023).
The heritability estimates were 0.66, 0.67, and 0.65 for the models SSM1, SSM2, and SSM3, respectively. The accuracy estimates were 0.81, 0.82, and 0.79 for the models SSM1, SSM2, and SSM3, respectively. The predicted genetic gain from selection of the 20% best performing genotypes were 15.25%, 12.11%, and 11.21% for the models SSM1, SSM2, and SSM3, respectively. The highest predicted genetic gain was observed for SSM1, which is the compound symmetry model. Nevertheless, the AIC obtained for both SSM2 and SSM3 shows that modeling the residual variance across environments and accounting for spatial trends might be better options for multi-environment trial analyses. The AIC has been used as a standard criterion for selection of non-nested models (Verbyla 2019), and in this case in particular, SSM3 had the lowest AIC, exhibiting the most reliable results. Thus, the predicted genetic gains observed for SSM1 and SSM2 might be overestimated and might not be achieved in practice.
Considering that the residual variances are heterogeneous across the environments is a strong assumption since the trials might experience different environmental conditions (Araújo et al. 2023). This is true even in experiments carried out in the same site across the years or seasons. Additionally, it is reasonable that plots located close to each other are more likely to be under similar environmental conditions, leading to spatial dependence (Resende and Sturion 2003). These facts reinforce the need to model residual variances in multi-environment trial analyses.
Eight genotypes (VI 19007, Tbio Astro, BRS 264, ORS Guardião, ORS Senna, VI 14327, VI 14001, and VI 131313) were simultaneously selected by the three models fitted in the multi-environment trial analyses (Figure 2). Nevertheless, the main practical consequence of fitting different models is modification in the rankings provided by each model. This reinforces the need to use a criterion for selection of the best-fitting model, which will allow for accurate selection and recommendation of superior genotypes. Since the AIC indicated SSM3 as the best-fitting model, the 20% best performing genotypes (top 10 genotypes) should be selected. These genotypes are: BRS 264, TBIO Astro, VI 14001, ORS Senna, ORS Guardião, VI 19007, VI 131313, VI 14060, VI 14118, and VI 14327 (Figure 3). These genotypes had been previously evaluated regarding genetic diversity (Casagrande et al. 2020, Lima et al. 2021) and they showed suitable performance for grain yield and other important agronomic traits, e.g., yield components, disease resistance, short cycle, and plant height.
Genotypes selected by joint analysis of three field trials using three residual modeling strategies: A, homogeneous residual variance across the trials (SSM1); B, heterogeneous residual variances across the trials (SSM2); C, heterogeneous residual variances with spatial column adjustment (SSM3). The number beside the genotype represents the number of criteria for which this genotype appears in the top ten.
The coefficient of the linear regression between the BLUP from SSM1 and SSM2 (Figure 4A) was 0.88, and agreement in the ranking of the 20% best performing genotypes was 0.03. The coefficient of the linear regression between the BLUP from SSM1 and SSM3 was 0.86, and agreement in the ranking of the 20% best performing genotypes was also 0.03 (Figure 4B). The coefficient of the linear regression between the BLUP of SSM2 and SSM3 was 0.97, and agreement in the ranking of the 20% best performing genotypes was 0.46 (Figure 4C). SSM3 showed high correlation and good agreement with SSM2 (Figure 4C) in the ranking of the best performing genotypes, indicating that modeling residual effects can lead to better genetic parameter estimation.
Kappa coefficient (K) and simple linear regression among the predicted genotypic values of the genotypes using three modeling strategies: homogeneous residual variance across the trials (SSM1), heterogeneous residual variances across trials (SSM2), and heterogeneous residual variances with spatial column adjustment (SSM3) in joint analysis of three wheat field trials.
Since the first-order separable autoregressive model proposal (Gilmour et al. 1997), several studies have shown the benefits of accounting for spatial trends in different crops, as discussed above. To the best of our knowledge, this study is the first attempt to provide insights on residual modeling under the mixed-model framework in a tropical wheat breeding program in Brazil, and it confirms that using such approaches can provide for better outcomes. The residual modeling approaches presented here might also be combined in different strategies aiming to accurately predict the performance of the genotypes and to reach desired genetic gains. The genotypes selected through SSM3 could be included in future crossing blocks for the development of base populations with enough variability for carrying out selection in the UFV Wheat Breeding Program.
CONCLUSION
Fitting a first-order autoregressive model for columns in the single-environment trial analyses allowed for slight increases in heritability and accuracy estimates in most scenarios compared to the compound symmetry model, and proved to be the best-fitting model. In the multi-environment trail analyses, considering the variances across the trials as heterogenous and fitting a first-order autoregressive model for columns led to better performance than the compound symmetry model did. The best genotypes for grain yield performance were BRS 264, TBIO Astro, VI 14001, ORS Senna, ORS Guardião, VI 19007, VI 131313, VI 14060, VI 14118, and VI 14327.
ACKNOWLEDGMENTS
This study was funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) - Finance Code 312791/2021-6, and by the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) - Finance Code APQ 00520-21 and APQ-02668-21. We also acknowledge the UFV Wheat Breeding Program interns who assisted in conducting the field trails and in collecting data.
REFERENCES
- Akaike H1974 A new look at the statistical model identification. IEEE Transactions on Automatic Control 19:716-723
- Alvares CA, Stape JL, Sentelhas PC, Moraes Gonçalves JL, Sparovek G2013 Köppen’s climate classification map for Brazil. Meteorologische Zeitschrift 22:711-728
- Andrade MHML, Fernandes Filho CC, Fernandes MO, Bastos AJRB, Guedes ML, Marçal TS, Gonçalves FMA, Pinto CABP, Zoratelli L2020 Accounting for spatial trends to increase the selection efficiency in potato breeding. Crop Science 60:2354-2372
- Araújo MS, Chaves SFS, Damasceno‐Silva KJ, Dias LAS, Rocha MM2023 Modeling covariance structures for genetic and non‐genetic effects in cowpea multi‐environment trials. Agronomy Journal 115:1248-1256
- Bernardeli A, Rocha JRASC, Borém A, Lorenzoni R, Aguiar R, Silva JNB, Bueno RD, Alves RS, Jarquin D, Ribeiro C, Lamas Costa MD2021 Modeling spatial trends and enhancing genetic selection: An approach to soybean seed composition breeding. Crop Science 61:976-988
- Burgueño J, Glaz B, Yeater KM2018 Spatial analysis of field experiments. In Glaz A and Yeater KM (eds) Applied statistics in agricultural, biological, and environmental sciences. American Society of Agronomy, Crop Science Society of America and Soil Science Society of America, Madison, p. 319-344
-
Butler DG, Cullis BR, Gilmour AR, Gogel BG, Thompson R2023 ASReml-R reference manual version 4.2. Available at <Available at https://asreml.kb.vsni.co.uk/wp-content/uploads/sites/3/ASReml-R-Reference-Manual-4.2.pdf >. Accessed on November 11, 2023.
» https://asreml.kb.vsni.co.uk/wp-content/uploads/sites/3/ASReml-R-Reference-Manual-4.2.pdf - Casagrande CR, Mezzomo HC, Cruz CD, Borém A, Nardino M2020 Choosing parent tropical wheat genotypes through genetic dissimilarity based on REML/BLUP. Crop Breeding and Applied Biotechnology 20:1-10
- Chaves SFS, Evangelista JSPC, Trindade RS, Dias LAS, Guimarães PE, Guimarães LJM, Alves RS, Bhering LL, Dias KOG2023 Employing factor analytic tools for selecting high‐performance and stable tropical maize hybrids. Crop Science 63:1114-1125
- Cohen J1960 A coefficient of agreement for nominal scales. Educational and Psychological Measurements 20:37-46
- Cullis BR, Gleeson AC1991 Spatial analysis of field experiments - an extension to two dimensions. Biometrics 47:1449-1460
- Cullis BR, Smith AB, Coombes NE2006 On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological, and Environmental Statistics 11:381-393
- Ferreira FM, Leite RV, Malikouski RG, Peixoto MA, Bernardeli A, Alves RS, Magalhães Júnior WCP, Andrade RG, Bhering LL, Machado JC2022 Bioenergy elephant grass genotype selection leveraged by spatial modeling of conventional and high-throughput phenotyping data. Journal of Cleaner Production 363:132286-132295
- Gilmour AR, Cullis BR, Verbyla AP1997 Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agricultural, Biological, and Environmental Statistics 2:269-293
- Gleeson AC, Cullis BR1987 Residual maximum likelihood (REML) estimation of a neighbour model for field experiments. Biometrics 43:277-287
- Gogel B, Smith A, Cullis B2018 Comparison of a one- and two-stage mixed model analysis of Australia’s National Variety Trial Southern Region wheat data. Euphytica 214:44-65
- Henderson CR1975 Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423-447
- Lado B, Matus I, Rodríguez A, Inostroza L, Poland J, Belzile F, del Pozo A, Quincke M, Castro M, von Zitzewitz J2013 Increased genomic prediction accuracy in wheat breeding through spatial adjustment of field trial data. G3 Genes, Genomes, Genetics 3:21052114
- Lima GW, Silva CM, Mezzomo HC, Casagrande CR, Olivoto T, Borem A, Nardino M2022 Genetic diversity in tropical wheat germplasm and selection via multitrait index. Agronomy Journal 114:887-899
- Möhring J, Piepho HP2009 Comparison of weighting in two-stage analysis of plant breeding trials. Crop Science 49:1977-1988
- Patterson HD, Thompson R1971 Recovery of inter-block information when block sizes are unequal. Biometrika 58:545
- Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO2012 A stage-wise approach for the analysis of multi-environment trials. Biometrical Journal 54:844-860
-
R Core Team2023 R: The R project for statistical computing. R-project.org. Available at <Available at https://www.r-project.org/ >. Accessed on October 16, 2023.
» https://www.r-project.org/ - Resende MDV, Alves RS2022 Statistical significance, selection accuracy, and experimental precision in plant breeding. Crop Breeding and Applied Biotechnology 22:1-19
- Resende MDV, Sturion JA2003 Análise estatística espacial de experimentos via modelos mistos individuais com erros modelados por processos ARIMA em duas dimensões. Revista de Matemática e Estatística 21:7-33
- Salvador FV, Pereira GS, Souza MH, Silva LMB, Santana AS, Paula IG, Steckling SM, Fernandes RS, Marçal TS, Carneiro APS, Carneiro PCS, Carneiro JES2022 Correcting experimental data for spatial trends in a common bean breeding program. Crop Science 62:825-838
- Santos HG, Jacomine PKT, Anjos LHC, Oliveira VÁ, Lumbreras JF, Coelho MR, Almeida JÁ, Filho JC, Oliveira JB, Cunha TJF2018 Brazilian soil classification system. Embrapa Soils, Brasília, 303p
- Smith A, Cullis B, Thompson R2001 Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138-1147
- Verbyla AP2019 A note on model selection using information criteria for general linear models estimated using REML. Australian & New Zealand Journal of Statistics 61:39-50
- Verbyla AP2023 On two-stage analysis of multi-environment trials. Euphytica 219:121-140
- Wickham H2016 ggplot2: elegant graphics for data analysis. Springer New York, New York, 260p
- Wilkinson GN, Eckert SR, Hancock TW, Mayo O1983 Nearest neighbor (NN) analysis of field experiments. Journal of the Royal Statistical Society 45:151-211
- Wilks SS1938 The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics 9:60-62
Publication Dates
-
Publication in this collection
07 June 2024 -
Date of issue
2024
History
-
Received
15 Nov 2023 -
Accepted
06 Jan 2024 -
Published
06 Feb 2024