Open-access Implications of removing model parameters on the linear relationships of trials with oat

Implicações da remoção dos parâmetros do modelo nas relações lineares de ensaios com a aveia

ABSTRACT:

This study was conducted with the aim of analyzing the implications of removing the parameters from the mathematical model on the results of path analysis, with oats grown in different years and agricultural scenarios (with and without fungicide). For this, two field trials were conducted in southern Brazil, in five years of growth. The experimental design used in trial I (with fungicide application) was randomized complete blocks (RCB), in a 22 × 4 bifactorial arrangement, characterized by twenty-two oat cultivars and four fungicide applications. For trial II (without fungicide application) the RCB design was used, and the treatments were characterized by twenty-two oat cultivars, with three replications. The traits measured were panicle length, panicle mass, number of spikelets, number of grains, grain mass, and grain yield. For each year, data group, and scenario, the correlation coefficients between the explanatory variables and grain yield were calculated. The diagnosis of multicollinearity indicated violation of the statistical assumption, so it was necessary to proceed with a path analysis under multicollinearity (ridge). The removal of parameters from the mathematical model caused changes in the linear relationships between the oat yield traits, with the maintenance of the linear correlation coefficients in 3.30% and 20% of the situations, for the scenarios with and without fungicide application, respectively. Regarding the path coefficients, it was observed that the direct effects were maintained in 3.30% and 30% and indirect effects in 7.33% and 24.67% of the situations, for the scenarios with and without fungicide application, respectively.

Key words: Avena sativa; multicollinearity; path analysis; simple correlation

RESUMO:

O estudo foi conduzido com o intuito de analisar as implicações da remoção dos parâmetros do modelo matemático sobre os resultados da análise de trilha, com aveia cultivada em diferentes anos e cenários agrícolas (com e sem fungicida). Para isso, dois ensaios de campo foram conduzidos no sul do Brasil, em cinco anos de cultivo. O delineamento experimental empregado no ensaio I (com aplicação de fungicida) foi o de blocos completos ao acaso, sendo um bifatorial 22 × 4, caracterizado por vinte e duas cultivares de aveia e quatro aplicações de fungicidas. Para o ensaio II (sem aplicação de fungicida) foi empregado o delineamento em blocos completos ao acaso, sendo os tratamentos caracterizados por vinte e duas cultivares de aveia, com três repetições. Os caracteres mensurados foram comprimento da panícula, massa da panícula, número de espiguetas, número de grãos, massa de grãos e rendimento de grãos. Para cada ano, grupo de dados e cenário foram calculados os coeficientes de correlação entre as variáveis explicativas e a produtividade de grãos. O diagnóstico de multicolinearidade indicou a violação do pressuposto estatístico, sendo necessário proceder a análise de trilha sob multicolinearidade (em crista). A remoção dos parâmetros do modelo matemático promoveu alterações nas relações lineares entre os caracteres de rendimento da aveia, sendo verificado a manutenção no padrão dos coeficientes de correlação linear em 3,30% e 20% das situações, para os cenários com e sem aplicação de fungicida, respectivamente. Com relação aos coeficientes de trilha, foi verificado manutenção na direção e magnitude dos efeitos diretos em 3,30% e 30% e os efeitos indiretos em 7,33% e 24,67% das situações, para os cenários com e sem aplicação de fungicida, respectivamente.

Palavras-chave: Avena sativa; análise de trilha; correlação simples; multicolinearidade

INTRODUCTION

Oat (Avena sativa L.) is one of the main winter cereals grown in the world, with about 9,442,749 hectares (ha) of cultivated area, production of 23,132,209 tons of grains and average yield of 2,450 kg ha-1 (FAO, 2022). Brazil stands out as the second largest producer of oat, with an 8% increase in the planted area between the 2021 and 2022 harvests, production of 1,262.6 tons of grain, and yield of 2,321 kg ha-1. In turn, Rio Grande do Sul accounts for 71.2% of this area destined for oat exploitation (387,600 ha), production of 937,200 tons of grains, and yield of 2,418 kg ha-1, which is higher than the national average (CONAB, 2022).

The increase in the area destined for the cultivation of the oat crop is directly related to its use in human food and animal feed, vegetation cover, and straw for the no-tillage system (MARCHIORO et al., 2001; BORTOLINI et al., 2005; BUERSTMAYR et al., 2007; FLOSS et al., 2007; ACHLEITNER et al., 2008; FONTANELI et al., 2009; CASTRO et al., 2012). The demand for oat for human consumption has increased considerably over time, due to the benefits generated for the diet, such as whole grains, source of soluble fiber, balanced energy, and nutritional supply, having in its chemical constitution amino acids, fatty acids, vitamins and essential mineral salts for the human body (ACHLEITNER et al., 2008; DUDA et al., 2021; GUTKOSKI et al., 2009). Additionally, the consumption of oats in the diet has been related to reduction of risks and improvement in the conditions of numerous diseases such as diabetes, hyperglycemic complications, dyslipidemia, and hypercholesterolemia, reducing the risk of cardiovascular diseases and gastrointestinal disorders, improving immune functions, and aiding in the control of overweight and obesity (GUIMARÃES et al., 2021).

Studies have been carried out to enhance oat production systems through either genetic improvement programs aiming the development of new materials (ALESSI et al., 2018; CAIERÃO et al., 2006; KLEIN et al., 2019; MANTAI et al., 2017; MEIRA et al., 2019a) or the improvement of cultural management and practices (DORNELLES et al., 2020; KRAISIG et al., 2020; MANTAI et al., 2020a, 2020b). To obtain superior genotypes it is essential to carry out efficient selection, which can often be laborious and time-consuming when performed directly on the trait of interest. This difficulty can be overcome by selecting materials based on their yield components and/or other adaptive traits that indirectly favor the main trait of interest. However, indirect selection requires a high correlation between the trait under selection and the object trait (FALCONER & MACKAY, 1997).

Simple correlation analyses make it possible to identify the direction (+ or -) and magnitude of the linear association between two traits. However, they do not indicate a cause-and-effect relationship between the traits (CRUZ et al., 2012; FERREIRA, 2009; SARI et al., 2018; VENCOVSKI & BARRIGA, 1992). When the object of study involves more than two traits of interest, path analysis (PA) becomes more suitable, as it provides information about the interrelationships between the traits. In PA, the simple linear correlation coefficients are broken down into direct and indirect effects, allowing the measurement of the influence of one trait over the other (CRUZ et al., 2012). Therefore, it is essential to know and use adequate statistical techniques that allow the identification of cause-and-effect relationships between traits, as they generate reliable information, even indicating traits that can be used in the indirect selection of genotypes (BELLO et al., 2010; CARGNELUTTI FILHO et al., 2015; NARDINO et al., 2016).

Conversely, for the results obtained through the PA to be reliable, it is essential that the statistical assumptions of the model are met (HAIR et al., 2009), especially the absence of multicollinearity between the explanatory traits (OLIVOTO et al., 2017; SARI et al., 2018). OLIVOTO et al. (2017), when comparing the results of estimated traditional PA with plot means and estimated traditional PA considering each observation in the plot, found that violating the assumption generates biased path coefficients, with little biological interpretation. Similarly, TOEBE et al. (2017), working with corn, reported that carrying out traditional path analysis with severe multicollinearity between the explanatory traits can result in inaccurate estimates of path coefficients, indicated by the obtaining of direct effects above |1|.

Additionally, studies have shown that the production performance of the oat crop is influenced by the genetic constitution of the materials used and the cultivation environment (HOLLAND et al., 2000; BENIN et al., 2003a). Another limiting factor for oat production is leaf rust disease (MARTINELI, 2003), which influences the quantitative and qualitative performance of the genotypes and can generate reductions of more than 50% in grain yield, especially under unfavorable environmental conditions (BENIN et al., 2003a), increasing the magnitude of the interaction between genetic materials and the environment (BENIN et al., 2005a). The control of this disease requires frequent applications of fungicides, which considerably increases production costs (MARTINELI, 2003). In addition, the application of fungicide affects the parameters of adaptability, responsiveness, and stability of genetic materials, indicating that for genetic improvement and correct recommendations, production performance should be studied considering the characteristics of the environment with and without fungicide application (scenarios) in an individualized way (BENIN et al., 2005a; LORENCETTI et al., 2004).

When carrying out multivariate analyses, such as PA, the parameters of the mathematical model related to the design and the treatment characteristics are not considered in the use of the technique. Generally, one works with the average values of each treatment or repetition, which arise from a sum of effects, without stratifying the influence of factors (for factorials), block (random block design), and interaction (factorial) aspects that can condition the occurrence of bias in the results obtained. Considering the scarcity of information on the subject and the importance of the PA technique for genetic improvement programs, as well as the influence of contrasting scenarios on oat performance, the following hypotheses were formulated: i) the removal of parameters from the mathematical model generates divergent results compared to traditional path analysis; ii) the results obtained by the traditional path methods and with the removal of model parameters are influenced by the different agricultural scenarios. To meet these hypotheses, the study aimed to analyze the effect of removing the parameters from the mathematical model on the results of path analysis, with the oat crop, grown in different years and agricultural scenarios with and without fungicide applications.

MATERIALS AND METHODS

Study area and experimental design

The study was carried out with results of experiments conducted during the agricultural years 2015, 2016, 2017, 2018, and 2019, in the city of Augusto Pestana, in southern Brazil, under geographic coordinates of 28º 26’ 30” S and 54º 00’ 58” W, with an altitude of 400 meters (m) above sea level. According to Köppen’s climate classification, the climate of the region is Cfa, characterized by an average air temperature of 19.1 °C, ranging from 0 to 38 °C, and accumulated rainfall of 2,040 mm (ALVARES et al., 2013). The soil of the experimental area is classified as Latossolo Vermelho distrófico típico (Oxisol) (TEDESCO et al., 1995).

Trial I - with fungicide application

The experimental design used was randomized complete blocks, in a 22 × 4 bifactorial arrangement, characterized by twenty-two oat cultivars: URS Altiva, URS Brava, URS Guará, URS Estampa, URS Corona, URS Torena, URS Charrua, URS Guria, URS Tarimba, URS Taura, URS 21, FAEM 007, FAEM 006, FAEM 5 Chiarasul, FAEM 4 Carlasul, Brisasul, Barbarasul, Fapa Slava, IPR Afrodite, UPFPS Farroupilha, UPFA Ouro and UPFA Gaudéria, and four fungicide applications: 1 (application performed at 60 days after emergence [DAE]), 2 (applications performed at 60 DAE and 75 DAE), 3 (applications performed at 60, 7\5, and 90 DAE), and 4 (applications performed at 60, 75, 90 and 105 DAE), with three repetitions.

Trial II - without fungicide application

The experimental design used was randomized complete blocks, in a unifactorial arrangement, characterized by twenty-two oat cultivars: URS Altiva, URS Brava, URS Guará, URS Estampa, URS Corona, URS Torena, URS Charrua, URS Guria, URS Tarimba, URS Taura, URS 21, FAEM 007, FAEM 006, FAEM 5 Chiarasul, FAEM 4 Carlasul, Brisasul, Barbarasul, Fapa Slava, IPR Afrodite, UPFPS Farroupilha, UPFA Ouro and UPFA Gaudéria, with three repetitions.

Crop management

The sowing of oats in all agricultural years was carried out from May 15 to June 15, following the technical recommendations for the crop. Harvesting was carried out from late October to early November in all agricultural years. Production performance was analyzed by collecting plants from three central rows, 5 m long, selected on the day of harvest. The number of spikelets per panicle (NSP) and the number of grains per panicle (NGP) were determined by counting. Panicle length (PL) was measured with a graduated ruler. Panicle dry mass (PDM) and grain weight per panicle (GWP) were determined by weighing on a precision scale. Panicle harvest index (HI) was determined by the ratio between grain weight and panicle dry mass. Grain yield (yield) was determined by weighing grains from the usable plot, with moisture adjusted to 13%, and later converting the results to kg ha-1.

Statistical analysis

To carry out the statistical analyses, the five agricultural years were approached as being five environments: environment 1 refers to the year 2015, environment 2 to the year 2016, environment 3 to the year 2017, environment 4 to the year 2018, and environment 5 to the year 2019. Initially, the statistical assumption of multicollinearity between the explanatory traits was tested. The diagnosis of multicollinearity was made considering the variance inflation factor (VIF) and the condition number (CN).

Pearson’s correlation analysis was performed, generating the matrix of correlation coefficients; posteriorly, the CN was obtained by the ratio between the highest and the lowest eigenvalue of the X’X correlation matrix. CN ≤ 100 indicates weak multicollinearity, 100 < CN < 1,000, moderate to severe multicollinearity, and CN ≥ 1,000, severe multicollinearity (MONTGOMERY & PECK, 1982). The VIF was obtained for each variable, on the inverse diagonal of the X’X correlation matrix; when the VIF value > 10 is obtained, the occurrence of severe multicollinearity is considered (HAIR et al., 2009). The occurrence of multicollinearity between the explanatory variables was defined by obtaining values of CN ≥ 1,000 and VIF >10.

In each environment, path analysis was performed, considering yield as the main trait, depending on the explanatory traits (PL, PDM, NSP, NGP, GWP, and HI) (CRUZ et al, 2012). The direct and indirect effects of the explanatory traits on yield were estimated by means of path analysis under multicollinearity conditions (ridge path analysis). Therefore, it was considered that each explanatory trait has a direct effect on yield and acts indirectly through its effects on the other explanatory traits.

In the ridge path analysis, the six explanatory traits (PL, PDM, NSP, NGP, GWP, and HI) were considered to estimate the direct and indirect effects on yield. However, a constant “k” was added to the diagonal of the correlation matrix X’X, to reduce the variance associated with the least squares estimator of the path analysis. Thus, the system of normal equations X’X “β” ̂ = X’Y became (X’X + k) “β” ̂ = X’Y. The addition of values of the constant “k” was tested and its lowest value was chosen, from which the path coefficients stabilized (CRUZ et al., 2012).

Additionally, the effects of the parameters of the mathematical model (cultivar, application, and block for trial I and cultivar and block for trial II) and ridge path analysis were also isolated and removed. These results were compared to those observed in the “traditional” path analysis under multicollinearity, to identify whether the removal of model effects generates changes in the results of the path analysis.

Removing the effects of the model parameters is referred to as the uniformity trials (without the application of treatments), considering that each observation is composed of the overall mean plus the random effect of the error. The data group from which the effects of the parameters of the model were removed was designated as predicted, and the data group with the maintenance of the effects of the parameters was designated as original.

The mathematical model for trial I, which is characterized as a two-factor experiment (fixed effect) under a randomized block design, is shown in equation 1:

Yijk=m+ ai+ dj+ (ad)ij+ bk+eijk(1)

Where: Yijk is an observation in block k (k = 1, 2, and 3) referring to treatment level i (22 levels) of factor A (cultivar) with level j (4 levels) of factor D (applications): m is the overall mean of the experiment; ai is the effect of level i (i = 22) of factor A; dj is the effect of level j (j = 4) of factor D; (ad)ij is the effect of the interaction of level i of factor A with level j of factor D; bk is the random effect of block k; eijk is the random effect of experimental error.

The mathematical model for trial II, which is characterized as a one-factor experiment (fixed effect) under a randomized block design, is shown in equation 2:

Yij=m+ ti+ bj+ eij(2)

Where: Yij is the observed value of the variable Y, in the experimental unit that received level i (22 levels) of treatment t, in block k (k = 1, 2 and 3); m is the overall mean of the experiment; ti is the effect of level i (I = 22) of the treatment; bj is the random effect of block j; eij is the experimental error effect.

In all statistical analyses, the level of 5% probability of error was adopted, and all analyses were performed using Excel software and R software (R CORE TEAM, 2021). The analyses were performed using the following packages: stats (R CORE TEAM, 2021), car (FOX & WEISBERG, 2019), MVN (KORKMAZ et al., 2014), pracma (BORCHERS, 2021), faraway (FARAWAY, 2016), Hmisc (HARRELL, 2021), biotools (SILVA et al., 2017), rpanel (BOWMAN et al., 2007) and tkrplot (TIERNEY, 2021).

Weather conditions

The data of air temperature (minimum, average, and maximum) and accumulated rainfall were obtained from a mobile automatic weather station, located approximately 200 meters away from the experimental area. In figure 1, the weather conditions of the experimental period are presented. During the period of oat cultivation in 2015, the average air temperature was 17.4 °C, ranging from -0.16 °C to 33.0 °C, and the accumulated rainfall was 798.3 mm (Figure 1). In 2016, the average air temperature was 16.5 °C, ranging from -0.3 °C to 34.7 °C, and the accumulated rainfall was 711.2 mm. In 2017, the average air temperature was 19.1 °C, ranging from -4.2 °C to 34.3 °C, and the accumulated rainfall was 715.5 mm. For 2018, the average air temperature was 16.5 °C, ranging from -1.60 °C to 32.30 °C, and the accumulated rainfall was 687.4 mm. During cultivation in 2019, the average air temperature was 17.0 °C, ranging from -4.2 °C to 36.1 °C, and the accumulated rainfall was 645.5 mm. For some growth and development to occur, oats require temperatures between 0 °C and 35 °C (LEITE et al., 2012). Therefore, during all the years of cultivation, the minimum and maximum temperatures exceeded or were very close to the limits established for the crop. Also, in 2015, a period without rainfall was observed soon after crop fertilization, and this aspect was associated with the occurrence of high temperatures during the anthesis period when the development of the reproductive system is particularly sensitive to water stress, and high temperatures may have impacted crop performance.

Figure 1
(A) Air temperature (minimum, average and maximum) and (B) accumulated rainfall, during the oat cultivation period (June to October), in five agricultural years: 2015, 2016, 2017, 2018 and 2019. Error bars indicate the standard deviation of air temperatures during the 10-day-period.

RESULTS

Simple correlation

When analyzing the linear relationships between yield components and yield of oat, high levels of significant correlations (76.7%) were observed for the original data group with fungicide application, with r values ranging from |0.02| to |0.41|. Conversely, for the group of original data without fungicide application, statistical significance was observed in 23.3% of the coefficients, with values ranging from |0.33| to |0.00|. For the predicted data group, the lowest rates of statistical significance (3.30%) were observed, regardless of the scenario (with and without fungicide application), with r values ranging from |0.37| to |0.00| (Table 1).

Table 1
Pearson correlation coefficients of the explanatory variables with the yield of oats, cultivated with and without fungicide application and in five environments, considering the original and predicted data groups. Each environment corresponds to an agricultural year: 1 = 2015; 2 = 2016, 3 = 2017; 4 = 2018; and 5 = 2019.

When analyzing the group of original data with fungicide application, a positive and significant linear relationship of GWP, HI, PDM, and NGP with grain yield was observed, in most environments studied (95%). PL and NSP, on the other hand, showed coefficients of low magnitude in most environments and without statistical significance (60%). For the group of original data without fungicide application, PL and HI showed no significant correlation with yield, in all cultivation environments. PDM was significantly and positively correlated with yield in environments 2 and 3. NSP was negatively and significantly associated with yield only in environment 4. NGP was significantly correlated with yield in environments 1 and 3, while GWP had a significant influence in environments 2 and 3. However, for the group of predicted data, significance was observed only for NSP in environment 5, for the scenario with fungicide application. For the scenario without fungicide application, a significant correlation was observed only for PL, in environment 2.

When individually analyzing the estimates of the correlation coefficients obtained from the combination of each pair of variables with yield in 5 environments, considering the different scenarios and data groups, it is possible to observe different situations, that is, changes in the direction of the associations (positive signal or negative signal), drastic changes in the magnitude of the coefficients (>50%), changes of lesser magnitude in the coefficients (<50%) and maintenance of the association pattern, whether in direction or magnitude. In general, the response pattern was maintained in 11.70% of the combinations, with an inverse direction (signal) in 45% of the combinations and a change greater than 50% in the absolute value of the coefficient, with signal maintenance in 43.30% of the combinations.

When analyzing the effect of removing the model parameters on the correlation coefficients, for the scenario with fungicide application, a change in the direction of the associations was observed in 56.70% of the combinations, a change greater than 50% in the magnitude of the coefficients, with the maintenance of signal in 40% of the combinations and maintenance of the response pattern in 3.30% of the situations. For the scenario without fungicide application, there was an inverse direction in the associations in 33.30% of the combinations, >50% change in the magnitude of the coefficients in 46.70% of the combinations with the maintenance of 20% in the response pattern.

Multicollinearity

The multicollinearity diagnoses indicated a violation of the statistical assumption for all environments under study and the data group evaluated (Table 2). The diagnosis of multicollinearity based on VIF indicated that the variables PL, NSP, and NGP do not have a high correlation with the other explanatory variables (VIF<10), regardless of the environment, scenario, and data group. For the variables PDM, GWP and HI, the VIF statistic indicated the existence of multicollinearity, regardless of the data group, scenario, or cultivation environment. Similar responses were observed when making the diagnosis of multicollinearity of the CN statistic, which indicated the existence of moderate to severe multicollinearity between the explanatory variables, regardless of the data group (Table 3).

Table 2
Variance inflation factor (VIF) for the explanatory variables of yield of oats cultivated with and without fungicide application and in five environments, considering the original and predicted data groups. Each environment corresponds to an agricultural year: 1 = 2015; 2 = 2016, 3 = 2017; 4 = 2018; and 5 = 2019.

Table 3
Condition number (CN) for the explanatory variables of yield of oats cultivated with and without fungicide application and in five environments, considering the original and predicted data groups. Each environment corresponds to an agricultural year: 1 = 2015; 2 = 2016, 3 = 2017; 4 = 2018; and 5 = 2019.

Traditional path analysis under multicollinearity vs Modified path analysis under multicollinearity

The traits PL, PDM, NSP, NGP, GWP, and HI showed explanatory capacity of 22.7%, 18.2%, 18.7%, 12.7%, and 11.5% of the variance in oat yield for environments 1, 2, 3, 4 and 5, respectively, for the original data group with fungicide application (Table 4, Table 5, Table 6 and Table 7). For the group of predicted data with fungicide application, the traits could explain 3.0%, 2.9%, 2.0%, 2.0%, and 4.9% of the variance in yield for environments 1, 2, 3, 4, and 5, respectively. Thus, removing parameters from the mathematical model resulted in an average change of 82.10% in the explanatory capacity of the traits.

Table 4
Direct and indirect effects of the explanatory variables on the yield of oats cultivated with fungicide application and in five environments (Env), considering the original (Orig.) and predicted (Pred.) data groups, with the addition of a k value on the diagonal of the X'X matrix of correlation. Each environment corresponds to an agricultural year: 1 = 2015; 2 = 2016, 3 = 2017; 4 = 2018; and 5 = 2019 (continuation in Table 5).
Table 5
Direct and indirect effects of the explanatory variables on the yield of oats cultivated with fungicide application and in five environments (Env.), considering the original (Orig.) and predicted (Pred.) data groups, with the addition of a k value on the diagonal of the X'X matrix of correlation. Each environment corresponds to an agricultural year: 1 = 2015; 2 = 2016, 3 = 2017; 4 = 2018; and 5 = 2019 (continuation of Table 4).
Table 6
Direct and indirect effects of the explanatory variables on the yield of oats cultivated without fungicide application and in five environments (Env), considering the original (Orig.) and predicted (Pred.) data groups, with the addition of a k value on the diagonal of the X'X correlation matrix. Each environment corresponds to an agricultural year: 1 = 2015; 2 = 2016, 3 = 2017; 4 = 2018; and 5 = 2019 (continuation in Table 7).
Table 7
Direct and indirect effects of the explanatory variables on the yield of oats cultivated without fungicide application and in five environments (Env), considering the original (Orig.) and predicted (Pred.) data groups, with the addition of a k value on the diagonal of the X'X correlation matrix. Each environment corresponds to an agricultural year: 1 = 2015; 2 = 2016, 3 = 2017; 4 = 2018; and 5 = 2019 (continuation of Table 6).

In general, the removal of the parameters from the mathematical model caused a change in the direction of the path coefficients of the direct effects in 63.30% of the combinations, with a change greater than 50% in the magnitude of the direct effects in 43.30%, with the maintenance of the pattern of response in 3.30% of situations. For the indirect effects, a change of 48% in the direction of the path coefficients, a change of 44.67% in the magnitude (>50% of the absolute value) of the coefficients, and maintenance of the response pattern in 7.33% of the combinations were observed.

For the condition without fungicide application, the traits were able to explain 14.2%, 8.0%, 17.8%, 15.7%, and 6.0% of the accumulated variance, in environments 1, 2, 3, 4, and 5, considering the original data group. For the predicted data group, the traits showed explanatory power of 6.10%, 27.20%, 16.7%, 5.3%, and 10.5% of the accumulated variance. Thus, the removal of model parameters reduced by 61.6% the explanatory capacity of variance in environments 1 and 4 and increased by 157.50% the explanatory capacity in environments 2 and 5, with maintenance in environment 3 (Table 6 and Table 7).

When analyzing the direct effects, the removal of the model parameters resulted in a 13.33% change in the direction of the path coefficients, a change in the magnitude of the coefficients (>50%) in 56.67% of the combinations and maintenance of the response pattern in 30% of the situations. For the indirect effects, a change of 28% in the direction of the path coefficients, a change greater than 50% in the absolute values of the coefficients in 57.30%, and maintenance of the response pattern in 24.67% of the situations were observed.

In order to analyze more specifically the impacts of removing parameters from the mathematical model on the direction and magnitude of the path coefficients, Pearson’s correlation was estimated between the values of the direct and indirect effects of each trait, between the groups of data (original and predicted) in each scenario. After estimating Pearson’s correlation between the original and predicted data, in the scenario with fungicide application, positive and significant correlations were found for the traits PDM, NSP, NGP, and GWP, ranging from moderate to strong (0.48 to 0.62) (Table 8). For HI there was a strong negative and significant correlation, while for PL a weak correlation was obtained. In the scenario without fungicide application, positive correlations were observed for all traits under study. However, only PL and NSP showed statistical significance, with moderate correlations (0.40 to 0.51).

Table 8
Pearson correlation coefficients between the path coefficients (direct and indirect effects) obtained between the data groups (original and predicted), for the scenarios with fungicide application and without fungicide application (n = 30).

DISCUSSION

Simple correlation and multicollinearity

When analyzing the linear relationships between panicle components and oat yield, high levels of significant correlation were found for the scenario with fungicide application, regardless of the data group (original and predicted). Obtaining correlation coefficients with low magnitude values, but with statistical significance, is related to the sensitivity of the correlation coefficient to the number of observations (n) used in the estimates (LÚCIO et al., 2013). This result is a consequence of the fact that the sample n is a parameter in the equations for estimating the correlation coefficient and the minimum absolute value for the correlation coefficient to show significance, respectively (FILHO & JÚNIOR, 2009). Thus, when n is reduced, the value of the correlation coefficient needs to show a high magnitude (|1|), to be statistically significant (CARGNELUTTI FILHO et al., 2010; HAIR et al., 2009; STEVENSON, 2001). Thus, situations may occur, such as those in the present study, in which the magnitudes of the correlations are relatively low (<0.40) and, even so, statistical significance is identified, especially for the scenario with fungicide application in which n is higher (n = 264).

For the scenario with fungicide application, positive and significant correlations were found between the yield components GWP, HI, PDM, and NGP with yield, regardless of the data group (Table 1). There was no response pattern between the data groups for the scenario without fungicide application. However, a tendency to obtain a non-significant correlation between traits and yield was observed. CAIERÃO et al. (2001), when studying the linear relationships of the traits NGP, PDM, and thousand-grain weight (TGW), observed a trend of positive associations with yield, with r of 0.62, 0.72, and 0.36, respectively, values with greater magnitudes than those observed in the present study. In addition, the authors point out that the linear correlation coefficient between panicle weight and yield (0.72) provides good perspectives for indirect selection via panicle weight, especially if one considers that the grains represent about 80 to 85% of the panicle weight, unlike other cereals, in which the percentage is lower.

BENIN et al. (2005b), when analyzing different plant selection methods, found correlations of 0.15, 0.47, and 0.27 of PDM with yield for the methods carried out based on individual plant yield, selection carried out based on the average weight of grains and combined selection, respectively. The same authors observed similar results for the NGP trait, obtaining correlation coefficients of 0.12, 0.58, and 0.23, for methods based on individual plant yield, average grain weight, and the use of combined methods, respectively. MANTAI et al. (2016), when studying the performance of oats subjected to different doses of N, found low correlations of the traits PDM, NSP, NGP, and PL with yield, at the lowest doses of N applied (30 and 60 kg N ha-1). GWP (mean r = 0.66) and HI (mean r = 0.86) showed high correlations.

Similar responses were observed by MANTAI et al. (2020a), who analyzed the linear relationships between the panicle components and the yield of oats, grown in different succession systems and with different N doses applied, and observed weak correlations for the soybean/oat system with PL, NSP, NGP, and PDM, with values of r ≤ 0.35. For the variables GWP and HI, correlations of greater magnitude were found (0.37 ≤ r ≤ 0.53). In the corn/oat succession system, associations of low magnitude were obtained for NSP, NGP, and PDM with yield, regardless of the N rate applied (0, 30, 60, and 120 kg ha-1). However, PL, GWP, and HI were significantly associated with yield. Also, the authors observed that PL negatively influences yield, that is, panicles with greater length result in less productive plants, regardless of the dose of N supplied (MANTAI et al., 2020b). The existence of significant correlations indicated the viability of indirect selection to obtain gains in the most important trait, which also directly depends on the heritability of the considered trait (CRUZ et al., 2012).

Considering the existence of a linear relationship between oat grain yield and most of the analyzed yield components, mainly for the group of original data with fungicide application, and that the main purpose of this research was to investigate the implications of removing the parameters from the model on the cause-and-effect relationships, and that obtaining statistical significance of the correlation coefficient is sensitive to the number of observations, it was decided to maintain all explanatory traits in the path analysis. Therefore, it was necessary to make a diagnosis of multicollinearity between the explanatory traits, to avoid obtaining biased results.

The multicollinearity diagnoses indicated a violation of the statistical assumption in all environments under study, data group, and scenarios evaluated (Table 2 and Table 3), and the traits PDM, GWP, and HI caused severe multicollinearity. The occurrence of severe multicollinearity between the explanatory traits is a recurrent result in the literature, being found for tomato (RODRIGUES et al., 2010; SARI et al., 2017), corn (ENTRINGER et al., 2014; OLIVOTO et al., 2017; TOEBE et al., 2017), soybean (CARVALHO et al., 2002; NOGUEIRA et al., 2012; ZUFFO et al., 2018), jabuticaba (SALLA et al., 2015), black oat (MEIRA et al., 2019b), wheat (GONDIM et al., 2008), canola (AMORIM et al., 2008), among others. Among the negative aspects caused by multicollinearity, the inflation of the variance of the estimates of the path coefficients can be highlighted, leading to values that are very high, imprecise, and without biological interpretation (SARI et al., 2017; TOEBE et al., 2017).

To overcome the problems related to the occurrence of multicollinearity between the explanatory traits, some strategies can be adopted, such as excluding non-additive traits from the model (they generate multicollinearity) or carrying out a path analysis under multicollinearity (ridge) (CRUZ et al., 2012; MONTGOMERY et al., 2012). Considering the first strategy, it would be necessary to investigate the removal of PDM, GWP, and HI. Studies suggested that these traits have a high correlation with yield and a good perspective for indirect selection via PDM (CAIERÃO et al., 2001; CAIERÃO et al., 2006; MANTAI et al., 2020a). The same studies described that the grain weight corresponds to 80% to 85% of the panicle mass and that the panicle harvest index is obtained by the ratio between PDM and GWP. Thus, it was decided to carry out the ridge path analysis without removing any variable from the database.

Path analysis under multicollinearity vs Path analysis under multicollinearity with removal of model parameters

Removing parameters from the mathematical model resulted in changes in the ability to explain the variance in oat yield, especially for the scenario with fungicide application. In general, the traits made it possible to explain an average of 12.2% and 3.0% of the variance, for the original and predicted data groups, for the scenario with fungicide application, and 12.3% and 13.2% for the original and predicted data groups, for the scenario without fungicide application. CAIERÃO et al. (2001), when analyzing the cause-and-effect relationships of TGW, NGP, PDM, plant height (PH), days from emergence to flowering (DEF), days from emergence to maturation (DEM) and days from flowering to maturation (DFM) with the yield of oat genotypes, found determination coefficients of 0.59, indicating that about 60% of the observed yield comes from the effects of the analyzed traits. Furthermore, it is interesting to point out that the coefficient of determination is restricted to these levels, because the main trait is quantitative, with many genes with little effect on the trait, showing considerable environmental variance, reducing its heritability (VESOHOSKI et al., 2011).

In the literature, there are studies that obtained determination coefficients of low magnitude and; consequently, a high residual effect, for example in the industrial grain yield of oats grown in succession to corn, in which the residual effects ranged from 0.62 to 0.86 for the traits panicle length, number of spikelets per panicle, number of grains per panicle, panicle mass, panicle grain mass, and panicle harvest index (MANTAI et al., 2020b). BENIN et al. (2003) analyzed the cause-and-effect relationships of the traits number of panicles per plant, panicle weight, number of grains per panicle, average grain weight, vegetative cycle and plant height in relation to grain production per oat plant and observed residual effect of 0.50. In the black oat crop, the study of the cause-and-effect relationships of the traits plant height, number of leaves per plant, and number of tillers per plant on the fresh mass and dry mass produced indicated residual effects between 0.40 and 0.72 (CARGNELUTTI FILHO et al., 2015). For soybean and sweet potato crops, estimates of the direct and indirect effects of secondary traits on primary traits indicated high residual effects, ranging from 0.69 to 0.97 and from 0.82 to 0.92, respectively (CAVALCANTE et al., 2006; NOGUEIRA et al., 2012).

The determination or explanation coefficient is an indicator of the goodness of fit of the adopted model. In situations where the determination coefficient values are close to or equal to the unit (1), it is accepted that variations in the dependent trait are explained by variations in the explanatory traits (BORGES et al., 2011; KAVALCO et al., 2014). The coefficient of determination values of the path analysis model, observed in the present study, were of low magnitude and the residual effects were high, indicating that the independent traits considered as predictors of the model explain a small fraction of the variation observed for the dependent trait. This aspect shows that, for the conditions of the present study, the independent traits do not interfere with the yield variance, so there are other traits that may provide a greater impact in terms of selection (CRUZ et al., 2012) and should be included in path diagrams (NOGUEIRA et al., 2012). Thus, it was decided not to discuss the results of the path analyses, considering only the implications of removing parameters from the mathematical model on the path coefficients, in each scenario and environment.

The removal of parameters from the mathematical model resulted in changes in the direction and magnitude (>50%) of the path coefficients, for all environments and scenarios studied. In general, maintenance of the response pattern of direct effects was observed in 3.30% and 30% of the combinations, for the scenarios with fungicide application and without fungicide application, respectively (Table 4, Table 5, Table 6 and Table 7). For the indirect effects, maintenance of the response pattern was observed in 7.33% and 24.67% of the combinations, for the scenarios with fungicide application and without fungicide application, respectively. Furthermore, the Pearson correlation performed for the path coefficients in each scenario and for each analyzed trait indicated the influence of the data group, confirming the initially indicated results.

The removal of parameters from the mathematical models and the stratification within each scenario are strategies that must be considered by the researcher during the planning of the experiment, to avoid possible results that may not show the real relationship between the measured variables. Thus, for situations in which one seeks to expand the scope of the information generated about the cause-and-effect relationships between the different variables measured in agricultural trials, the use of the proposed new approach is strongly suggested, as it allows for removing the influences of treatments and design on the observations and, consequently, on the path coefficients and their interpretations. Thus, it is possible to reduce the possible bias in the estimates of the coefficients, highlighting the real relationship between those variables.

CONCLUSION

The removal of parameters from the mathematical model caused changes in the direction and magnitude of the linear associations between oat yield traits, with general maintenance of the response pattern being obtained in 3.30% and 20% of the situations, for the scenarios with fungicide application and without fungicide application.

The removal of parameters from the mathematical model implied changes in the direction and magnitude of the path coefficients, with maintenance in the response pattern of the direct effects of 3.30% and 30% and indirect effects of 7.33% and 24.67%, for the scenarios with fungicide application and without fungicide application, respectively.

The use of the new approach proposed for path analysis is recommended for situations in which variables were measured in experiments that contain treatments and/or test networks in which the experimental design is not supported in multi-environments. Situations that would require the researcher to carry out a new path analysis for each environment and treatment and interpret it separately.

Furthermore, further research must be conducted with the aim of studying the implications and feasibility of removing parameters from the mathematical model from other multivariate statistical techniques, to make the results different and expand the scope of the information generated.

ACKNOWLEDGEMENTS

The authors are grateful to the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES-Brasil) for their financial support to the first author (Finance code 001 - Process Nº.88887.499817/2020-00). Also, I would like to thank the members of the research groups on Technical Systems of Agricultural Production at the Universidade Regional do Noroeste do Estado do Rio Grande do Sul (UNIJUÍ) and the research group on Agricultural Experimentation at the Universidade Federal de Santa Maria (UFSM) for the help in this project.

REFERENCES

  • CR-2023-0379.R1

Edited by

Publication Dates

  • Publication in this collection
    26 Aug 2024
  • Date of issue
    2024

History

  • Received
    14 July 2023
  • Accepted
    18 Apr 2024
  • Reviewed
    24 June 2024
location_on
Universidade Federal de Santa Maria Universidade Federal de Santa Maria, Centro de Ciências Rurais , 97105-900 Santa Maria RS Brazil , Tel.: +55 55 3220-8698 , Fax: +55 55 3220-8695 - Santa Maria - RS - Brazil
E-mail: cienciarural@mail.ufsm.br
rss_feed Acompanhe os números deste periódico no seu leitor de RSS
Acessibilidade / Reportar erro