Open-access Avaliação de métodos de tratamento de dados com censura à esquerda utilizando simulação estocástica

rbrh RBRH RBRH 1414-381X 2318-0331 Associação Brasileira de Recursos Hídricos RESUMO O artigo avalia a influência do tamanho das séries, do percentual de dados censurados e dos coeficientes de variação utilizados para gerar séries sintéticas na estimativa de médias, desvios-padrão, coeficientes de variação e medianas em séries com dados censurados. Foram aplicadas sete técnicas de tratamento de dados censurados em séries sintéticas em 180 cenários (quatro tamanhos de séries, nove percentuais de censura e cinco coeficientes de variação): valores proporcionais ao DL: zero, DL/2, DL/20.5 e DL - e métodos paramétrico (MLE), robustos (ROS) e Kaplan-Meier. As previsões foram analisadas com quatro métricas de desempenho (MPE, MAPE, KGE e RMSE). Verificou-se que o percentual de dados censurados e o coeficiente de variação alteram significativamente a qualidade das previsões. Verificou-se também que a substituição por DL/2, por DL/20.5 e ROS são as técnicas mais adequadas para estimar as variáveis descritas, destacando-se a ROS para estimar variáveis paramétricas e a substituição por DL/20.5 para medianas. INTRODUCTION Time series resulting from water quality monitoring may have several records with analytical concentrations below the detection limit (DL) of the measuring device. The DL is the minimum concentration of a substance that can be reported and whose value is greater than zero with 99% confidence (US Environmental Protection Agency, 2016). Series with values below the DL are referred to as left-censored data. One of the problems associated with the presence of left-censored data is the calculus of time-series statistics, such as the mean, median, and standard deviation. Statistics computed with only values above the DL do not represent accurate time-series statistics. One of the ways in which to deal with this problem is to apply methods to reduce bias and uncertainty in estimating statistics, such as means and standard deviations, as observed in George et al. (2021), and increase the reliability of hypothesis tests, as mentioned in Mohamed et al. (2021). In addition to enabling the analysis of water quality (Cantoni et al., 2020), the handling of censored data helps evaluate the risk of disease caused by microorganisms (Canales et al., 2018), analyze breast cancer patients (Faucheux et al., 2021), spatially interpolate measurements in riverbeds (Mohamed et al., 2021), and model genetic modifications in fish meat (Fusek et al., 2020), among other areas. Different methods can realize the treatment of left-censored data. The most commonly used methods are those replacing values below the DL with values proportional to the DL (0, DL/2, DL/20.5, and DL). There are other parametric methods, such as the maximum likelihood estimator (MLE), which is associated with choosing an adequate probability distribution. In addition, there are semiparametric or robust methods (ROS) and nonparametric methods (Kaplan-Meier – (KM)). Detailed descriptions of the above methods can be found in Helsel et al. (2020); Hall Junior et al. (2020); Nostbaken et al. (2021); Bahk & Lee (2021). The unsatisfactory treatment of censored data can significantly influence the results obtained, reducing the degree of assertiveness in decision-making processes such as those related to projects to reduce and control pollution, the establishment of frameworks for water bodies, and revitalization of rivers. Stochastic simulation is one of the techniques used in the evaluation of censored data treatment methods. The use of synthetic series allows for the evaluation of methods considering the effects of series size and the percentage of censored data on the statistical estimates. Table 1 describes those studies researches using stochastic simulations to evaluate statistical estimates, primarily synthetic series. This table presents the authors of the works (Authors), the treatment methods for censored data (Methods), the number of elements in the randomly drawn samples (Elements), the number of random samples drawn (Random Samples), the probability distributions used to draw the random samples (Distribution), the percentage of censored data (Censoring Percentage), the accuracy measures adopted (Accuracy measure), the statistics evaluated (Evaluated stats) and the conclusions obtained (Conclusions). Table 1 Stochastic simulations using series with censored data. Authors Methods Elements Random Samples Distribution Censoring Percentage Accuracy Measure Evaluated Stats Conclusions Related to the Log-normal Distribution Helsel & Cohn (1988) ZDL 25 500 Log-normal 60 RMSE Mean MLE: Significant bias in the estimates of means and standard deviations HDL Mixture of two log-normals Bias Median DL Delta Standard deviation MLE Interquartile ranges ROS Kroll & Stedinger (1996) MLE 10 5000 Log-normal 20 RMSE Percentile 10,90 MLE: Suitable for estimating quantiles and interquartile ranges in highly censored data; ROS 25 Mixture of two log-normals 60 Mean ROS: Suitable for estimating means and standard deviations in medium to long time series with short to medium censoring 50 Gamma 80 Standard Deviation Delta Interquartile Ranges She (1997) HDL 21 1000 Log-normal Three randomly between Bias Mean HDL: Best for CV = 1.00 and 2.00 KM Gamma 10 and 80 Standard error Standard Deviation KM: Second-best technique, similar to MLE MLE MLE: Best for CV = 0.25, 0.50. ROS Means: Worse estimates for higher CV values Shunway et al. (2002) MLE 20 500 Log-normal 50 Bias Mean ROS: No bias for the log-normal distribution, but larger standard error for highly asymmetrical series ROS 50 Gamma 80 Confidence interval Variance MLE: Recommended to use a bias corrector Hewett & Ganser (2007) HDL mai/19 100 Log-normal jan/50 Bias Mean MLE: Recommended for all scenarios LR2 20-100 Contaminated log-normal 50-80 RMSE 95th quantile ROS: Recommended for estimating averages DL KM: Presented poor estimates KM LD: Overestimated the mean and underestimated the 95th percentile MLE ROS Authors Methods Elements Random Samples Distribution Censoring Percentage Accuracy Measure Evaluated Stats Conclusions Related to the Log-normal Distribution Antweiller & Taylor (2008) ZDL 34-841 44 No specific distributions Randomly between Bias Mean KM: Achieved the best results for censoring up to 70%, except when estimating the median HDL 14 and 95 Percentile ROS and HDL: Yielded reasonable results DL 25, 50 and 75 No method yielded suitable results for censoring greater than 70% KM Standard deviation MLE Interquartile range ROS Niemann (2016) ZDL 50 10000 Log-normal 5 to 60 Bias Mean HDL, LR2: Good for ratings up to 30% HDL RMSE MLE: Exhibited significant bias and high RMSE LR2 Confidence interval HDL: Stood out for censorship rates exceeding 50%, providing unbiased estimates and low RMSE DL KM MLE Tekindal et al. (2017) LR2 20 10000 Log-normal 5 Bias Mean ROS: Recommended for estimating mean values; DL 80 Exponential 25 Median LR2: Exhibited less bias when estimating medians KM 140 Weibull 45 Standard deviation KM, DL: Demonstrated similar performance, with the overestimation of means and the underestimation of standard deviations MLE 200 65 MLE: Worst scenario ROS 260 Canales et al. (2018) LR2 100 10000 Log-normal < 10 Bias Mean ROS: Performed better in series with a high percentage of censored data DL 35 RMSE MLE: Showed poor performance, with a high RMSE, especially in series with pronounced asymmetry KM 65 MLE 90 ROS 97 George et al. (2021) HDL 20 1000 Log-normal 30 Mean KM: Overestimated means and underestimated standard deviations, performing less poorly in highly skewed distributions MLE 50 Moderately and highly Asymmetrical 50 Standard deviation ROS: Demonstrated the best performance ROS 80 HDL: Provided reasonable estimates for means but performed poorly for standard deviations KM MLE: Performed poorly in asymmetrical series The treatment methods for censored data evaluated in the studies shown in Table 1 are substitution methods proportional to the limit of detection (ZDL = 0, HDL = DL/2, LR2 = DL/20.5 and DL), the maximum likelihood estimator (MLE), robust methods (ROS) and the Kaplan-Meier (KM) approach. The number of elements in the synthetic series range from 5 to 260. The number of series drawn range from 100 to 10,000. Moreover, the distributions used to draw the synthetic series are log-normal, exponential, Weibull, gamma, delta, a mixture of two log-normal, contaminated log-normal, and moderately and highly asymmetric log-normal. The log-normal is the most commonly used method. In the studies presented in Table 1, the statistics evaluated are the mean, median, variance, standard deviation, interquartile ranges, 10th and 90th percentile, and 95th quantile. Mean and standard deviation evaluations are those were the most repeated in the considered studies. All studies in Table 1 investigate how the intervening factors described earlier influence the forecasted means. Standard deviation and variance are addressed in seven studies as well as were, to a lesser extent, median and interquartile ranges. In this regard, the present study aims to fill an essential scientific gap: how to best estimate the coefficient of variation using censoring treatment techniques. Despite its recognized importance in various aspects, such as reliability analyses (Zhang et al., 2023), this magnitude still needs to be addressed in the described stochastic simulations. In Table 1, statistical estimates of the censored data are compared with the uncensored values using the root mean square error (RMSE), bias, standard error, and confidence interval. Hewett & Ganser (2007) used bias and the RMSE when analyzing the mean and 95th quantile estimates produced by six methods for handling censored data. The above authors observed that the MLE method did not exhibit a very high RMSE in mean and 95th quantile estimates for those series from the log-normal distribution, containing between 20 and 100 elements with a censoring percentage up to 50%. The above authors also recommended a robust method for estimating means in series with these characteristics. Shunway et al. (2002) and Niemann (2016) reported the need for bias correction in mean estimates obtained using the MLE, with the first author extending the conclusion to variance predictions. These examples illustrate the improvement in analyses when employing different performance metrics. Morley et al. (2018) state that the usefulness of a model is determined by how accurately the estimated quantities are predicted. Several metrics are available for performance analysis, and there are various perspectives on what constitutes a good prediction. With these observations, it is interesting to analyze the quality of the estimates obtained by methods for handling censored data using multiple performance indicators, which can provide conclusions about the most suitable technique for each studied scenario more accurately. Tekindal et al. (2017) found similar tendencies using the KM and DL methods, with overestimated means and underestimated standard deviations, and found the best estimations in robust methods and substitution by DL/20.5 to provide more accurate results for means estimation. By adopting higher coefficients of variation in the generation of synthetic series (CV = 0.473, 1.27), the authors observed a rise in the bias of the mean and median estimates. For example, using the robust method, the average bias values increased from 5% to 20% in the means. Finally, the authors highlighted the need for a more adequate method for log-normal series generated with CV = 1.27 when 65% censoring was applied. Therefore, analyzing the coefficient of variation (CV) used in generating synthetic series through the Monte Carlo method is important, as it directly influences the first and second-order moments associated with the two-parameter log-normal function. For instance, George et al. (2021) generated synthetic series with two different coefficients of variation (CVs) (0.53 and 3.45) and found that the mean and standard deviation estimates obtained with the MLE and KM methods had a low level of accuracy in series with greater asymmetry. The use of ROS can provide more reliable predictions in such situations. Other studies have also employed different CVs to generate log-normal series (She, 1997; Tekindal et al., 2017). Despite relevant observations on the topic, the studies in Table 1 do not thoroughly explore the influence of the log-normal distribution parameters on different percentages of censoring in synthetic series. Some studies have combined five censoring percentages with two distinct parameters (Tekindal et al., 2017) or four parameters and three percentages (She, 1997). However, it is possible to conduct simulations with more elements within these variables to address whether the coefficient of variation used in generating synthetic series from log-normal distribution significantly influences the estimation of interest statistics across different censoring percentages. This article aims to analyze the influence of the treatment method, the percentage of censored data, the size of the time series, and the variation coefficients used in synthetic series generation on the estimation of means, medians, standard deviations, and coefficients of variation. These objectives are achieved using different performance analysis metrics: mean percentage error (MPE), mean absolute percentage error (MAPE), root mean square error (RMSE), and Kling Gupta Efficiency (KGE). The analysis is based on randomly drawn synthetic series from five two-parameter log-normal synthetic series (CV = 0.10, 0.25, 0.40, 0.80, and 1.60), with each scenario having 10,000 reference sets. Censored data treatment techniques Substitution methods use values between zero and the DL to fill in censored data (ZDL = 0, HDL = DL/2, LR2 = LD/20.5 and DL). However, adopting these methods can introduce bias in estimated means, medians, and standard deviation values (Tekindal et al., 2017); lead to means that fall outside the confidence interval of observed values (Niemann, 2016); affect quantile regressions (Wang et al., 2022) and distort correlations between variables and spatiotemporal trend analyses (Christófaro & Leão, 2014). The use of single values introduces biases that do not exist in the observed samples (Tekindal et al., 2017), increasing the probability of the replaced value occurring. Moreover, the single values reduces the variability of the data and alters the representation of the monitored data concerning the probability density function. Niemann (2016) tested filling censored data with randomly chosen values below the DL. While this procedure increased the variability of the series and reduced bias in the estimates, it generated very high and uncertain results (averages greater than the maximum values of the series) due to the wide amplitude of the confidence intervals. When multiple DLs exist in historical series, other techniques, such as the Kaplan-Meier method, are used (Helsel et al., 2020). Despite its limitations, substitution is Brazil's most commonly used technique due to its simplicity and ease of understanding (Von Sperling et al., 2020), and its adoption is recommended for series with up to 20% censored data. In contrast, Tran et al. (2021) suggested a threshold of up to 10%. Brasil (2021) recommends using HDL to fill censored water quality data. Additionally, Mora et al. (2022) used HDL for water quality parameters where the DL was close to the maximum allowable value (MAV). The above authors replaced censored data with the DL limit in instances where the DL << MAV, justifying the low level of relevance of this procedure for environmental pollution. Pinto et al. (2019) and Soares et al. (2021) employed the DL method because it represents the most critical situation in terms of negative environmental effects. Parametric methods (MLE) depend on two factors: the adherence of observed data to a recommended probability distribution and the use of the maximum likelihood estimator to calculate the parameters of the likelihood function by maximizing it (Naghettini, 2017). This procedure depends on the percentage of censored data and the values above the DL (Helsel et al., 2020). Given a set of n observations (y1, y2,…, yn) extracted from a population with a probability density function, fy (θ1,…,θk), involving k parameters, the likelihood function is given as follows: L θ 1 , … , θ k = f y y 1 ; θ 1 , … , θ k * f y y n ; θ 1 , … , θ k = ∏ i = 1 n f y θ 1 , … , θ k (1) To maximize this function, the partial derivative concerning each parameter θi is taken, and they are all set to be equal to zero. Solving each equation will yields the vector of the maximum likelihood estimators [θi]. The parametric method is suitable when a good fit exists between the observed data and the recommended probability distribution. However, the method does not produce accurate results when estimating means and standard deviations in short series, as it can introduce biases, mainly when logarithmic transformations are applied. Christófaro & Leão (2014) noted that the MLE is highly sensitive to outliers, which are common in environmental data, and this sensitivity helps explain the poor results of this method in mean estimations, as observed by Niemann (2016). Furthermore, Canales et al. (2018) mentioned that using the MLE can result in estimated means that deviate significantly from reality, particularly in highly asymmetric series, and in She (1997), the best estimates were obtained in series with lower coefficients of variation. Helsel & Hirsch (2002) described that the MLE best estimates medians and interquartile ranges (IQR) in symmetric series or those with positive asymmetry. However, this method does not produce accurate results when estimating means and standard deviations in short series, as it can introduce biases, mainly when logarithmic transformations are applied. The application of robust methods involves two steps. In the first step, an asymmetric distribution (e.g., log-normal) is fitted to the uncensored data using the Weibull plotting position, which provides unbiased exceedance probabilities (Naghettini, 2017). The fitted probability density function is then extrapolated to the lower portion, assigning values to the censored data on the fitted straight line (Figure 1). To do this, the percentile corresponding to the DL (z) is divided by the number of censored elements (m), yielding zi (z/m). Subsequently, the censored data receive values corresponding to quantiles b * zi, where b is a positive integer less than m. Figure 1 Representation of the robust method. Christófaro & Leão (2014) describe that in semiparametric methods, only the observed data points are used to calculate the desired statistics. In contrast, the MLE uses the entire fitted curve for these calculations. The above authors note that the ROS method is more suitable than is the MLE for estimating means and standard deviations, particularly in shorter series (n < 50) and with higher censoring percentages (50-80%), as the ROS method exhibits lower sensitivity to the distribution fitted to the monitored data and avoids biases from logarithmic transformations. This observation is also supported by Shunway et al. (2002), who assessed bias in mean and variance estimations in series with a high censoring percentage (50-80%), which can affect the adherence of the data to the log-normal distribution. Furthermore, Kroll & Stedinger (1996) examined this aspect using the RMSE. The Kaplan-Meier is a nonparametric method was initially used to analyze right-censored data to estimate the survival function (Equation 2), which is subsequently employed to calculate the desired statistics. For example, the mean can be obtained by integrating this function, approximating a summation as the integration steps tend toward zero (Equation 3). S t = ∏ j : tj < t r j − d j r j (2) μ KM = ∫ 0 t max S t dt ~ ∑ j t j − 1 t j − t j − 1 (3) tj: Set of death times observed in the sample rj: Number of individuals at risk immediately before the jth time of death dj: Number of deaths up to tj S(t): Survival function tmáx: Maximum survival time By adapting the Kaplan-Meier (KM) method to the left-censored data series, all elements are transformed into right-censored data by subtracting a fixed value greater than the maximum observed value. The KM method is primarily utilized in survival analysis (Zhan et al., 2022) and equipment failure time studies (Daneshkhah & Menzemer, 2018). Christófaro & Leão (2014) described that this technique offers the advantage of being robust against outliers since it relies solely on ordering values and their positions within the series. As a result, this technique can be directly applied to correlations, hypothesis tests, and nonparametric trend analyses. The authors note that the application of the KM method is particularly suitable for short series (n < 50), which aligns with the findings of She (1997). However, it is known that the KM method may introduce significant bias in the mean and standard deviation estimates (Tekindal et al., 2017; George et al., 2021). Accuracy measurement methods The accuracy of the estimates of the stochastic simulation was assessed by metrics relating censored and uncensored values. Morley et al. (2018) list some desirable characteristics for performance evaluators: being significant to encompass data that present different orders of magnitude, penalizing underestimation and overestimation by the same factor, having ease of interpretation, and being robust to outliers and incorrect data. These characteristics are not contemplated simultaneously by the metrics, justifying joint analyses and discussions inherent to the limitation of each of them. In this study, we consider the following accuracy measures to evaluate how effectively if a given statistic fits the true value: mean percent error (MPE), mean absolute percentage error (MAPE), root mean square error (RMSE), and Kling Gupta Efficiency (KGE). These measures are defined as follows: MPE = 100 n * ∑ j = 1 n x i − x j x i (4) MAPE = 1 n * ∑ j = 1 n x i − x j x i (5) RMSE = ∑ i = 1 n x i − x j 2 n 2 (6) KGE = 1 − r − 1 2 + σ sim σ obs − 1 2 + µ sim µ obs − 1 2 (7) where n: Number of elements of the generated synthetic series; xi: Value of the reference series; xj: Value estimated by the methods of treatment of the censored data in each series; r: Linear correlation coefficient; µsim,µobs: Mean of simulated and observed statistical quantities of interest, respectively; The mean percent error (Equation 4) is a bias indicator. Negative values indicate overestimation, and positive values indicate underestimation. The mean absolute percentage error (Equation 5) considers errors in modular values, whose domain falls in the range [0, + ∞]. Morley et al. (2018) point out the mean percent error has its limitations, such as asymmetry regarding under- and overestimation, positive asymmetry, and sensitivity to outliers. The root mean square error (Equation 6) relates the estimated and observed values through Euclidean distance and is an indirect measure of error variance. The RMSE is an indicator that is highly sensitive to outliers due to the quadratic term in the numerator. As a result, a few highly disparate estimates can significantly distort the final result. The RMSE has the same units as the original variable, and its domain varies in the range [0, +∞]. The Kling Gupta Efficiency is a widely used performance indicator for evaluating hydrological models, as it incorporates terms in its formulation that assess the bias, correlation, and variability of the estimated values (Liu et al., 2022). Although it adds robustness to the indicator, this method loses the simplicity of interpreting the results by a single value. The domain of KGE can vary in the range [-∞, 1]. MATERIALS AND METHODS The methodology started with 10,000 randomly samples of 25, 40, 70 and 100 elements using the Monte-Carlo procedure of five log-normal (2P) series (mean = 1.0 and standard deviation = 0.10, 0.25, 0.40, 0.80, and 1.60). The degree of uncertainty decreases with an increased number of simulated series. The range of coefficients of variation used was based on the works listed in Table 1, and She´s (1997) statement, which stated that most environmental data adhering to log-normal functions have coefficients of variation between 0.25 and 2.00. Simulations were performed with five sets of elements because previous studies used only a maximum of four. After generating the reference series, we simulated thirty-six scenarios, corresponding to four variations in the number of elements (25, 40, 70, and 100) and nine censoring percentages (10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%) for each series. Uncensored series were replicated seven times in each scenario, with the censoring percentage under analysis removed. The mean, median, standard deviation and coefficient of variation were then estimated for each of the censored series using the seven censored data treatment methods: the ZDL (0), HDL (DL/2), LR2 (LD/20.5), DL, MLE, ROS, and KM methods. By comparing the estimated statistics (means, standard deviations, coefficients of variation, and medians) from the censored series with the actual statistics of the uncensored series, the MPE, MAPE, RMSE, and KGE values were calculated for each of the 36 scenarios and each of the seven censored data treatment techniques. Finally, we compared the results from each simulation to establish the influence of the censoring percentage, number of elements in the series, censored data treatment, and CVs of the series in estimating the statistics. Initially, the results obtained with the series generated with CV = 0.25 were emphasized because a study is being developed that uses monitoring data that follow the log-normal distribution and exhibit the described characteristics. Then, Tables 2, 3, 4, and 5 were prepared for each estimated variable (mean, standard deviation, coefficient of variation, and median), showing how each estimation performance evaluation indicator varies according to the number of elements, censoring percentage, and censored data treatment techniques. Table 2 Performance metrics for estimated means. Censoring percentage RMSE (25) Censoring percentage MAPE (25) Censoring percentage MPE (25) Censoring percentage KGE (25) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.029 0.013 0.004 0.007 0.010 0.006 0.019 10% 2.66 1.18 0.31 0.57 0.76 0.46 1.23 10% 2.66 1.18 -0.31 0.56 -0.76 0.21 -0.53 10% 0.970 0.987 0.996 0.994 0.990 0.996 0.981 20% 0.092 0.035 0.025 0.013 0.034 0.016 0.024 20% 8.33 3.14 2.06 1.06 2.92 1.16 1.53 20% 8.33 3.14 -2.06 0.99 -2.92 0.35 -0.28 20% 0.905 0.966 0.971 0.990 0.958 0.990 0.961 30% 0.172 0.059 0.059 0.018 0.073 0.029 0.031 30% 15.58 5.27 5.05 1.37 6.37 2.10 2.32 30% 15.58 5.27 -5.05 0.99 -6.37 0.66 1.19 30% 0.824 0.945 0.923 0.983 0.901 0.978 0.937 40% 0.234 0.075 0.091 0.021 0.110 0.041 0.046 40% 21.24 6.67 7.89 1.55 9.61 2.99 3.68 40% 21.24 6.67 -7.89 0.64 -9.61 0.99 3.00 40% 0.756 0.929 0.879 0.975 0.847 0.956 0.929 50% 0.304 0.090 0.132 0.026 0.158 0.058 0.072 50% 27.56 8.01 11.55 1.88 13.79 4.25 6.07 50% 27.56 8.01 -11.55 -0.09 -13.79 1.52 5.66 50% 0.682 0.917 0.813 0.951 0.770 0.913 0.905 60% 0.425 0.110 0.221 0.049 0.260 0.098 0.139 60% 38.55 9.60 19.36 3.35 22.78 7.17 12.14 60% 38.55 9.59 -19.36 -2.40 -22.78 2.93 11.93 60% 0.545 0.900 0.659 0.883 0.589 0.765 0.842 70% 0.567 0.118 0.360 0.103 0.427 0.167 0.248 70% 51.63 10.07 31.64 7.53 37.44 12.40 22.02 70% 51.63 9.99 -31.64 -7.25 -37.44 5.18 21.77 70% 0.377 0.874 0.391 0.741 0.256 0.401 0.702 80% 0.682 0.115 0.507 0.172 0.609 0.255 0.355 80% 61.90 9.36 44.29 13.27 53.11 18.94 31.46 80% 61.90 8.80 -44.29 -13.19 -53.11 8.08 31.09 80% 0.238 0.816 0.099 0.571 -0.139 -0.106 0.510 90% 0.813 0.108 0.749 0.306 0.948 0.433 0.505 90% 73.95 8.19 65.15 24.41 81.87 32.81 44.22 90% 73.95 4.40 -65.15 -24.41 -81.87 8.66 42.58 90% 0.066 0.612 -0.500 0.186 -1.052 -1.359 -0.124 Censoring percentage RMSE (40) Censoring percentage MAPE (40) Censoring percentage MPE (40) Censoring percentage KGE (40) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.038 0.015 0.008 0.007 0.012 0.006 0.017 10% 3.40 1.37 0.66 0.54 0.98 0.42 1.11 10% 3.40 1.37 -0.66 0.53 -0.98 0.12 -0.54 10% 0.962 0.985 0.991 0.995 0.987 0.995 0.980 20% 0.090 0.033 0.026 0.011 0.032 0.012 0.019 20% 8.20 2.97 2.27 0.87 2.80 0.87 1.26 20% 8.20 2.97 -2.27 0.80 -2.80 0.15 -0.26 20% 0.907 0.968 0.968 0.992 0.960 0.992 0.962 30% 0.155 0.052 0.054 0.014 0.063 0.020 0.023 30% 14.09 4.66 4.76 1.04 5.53 1.49 1.72 30% 14.09 4.66 -4.76 0.76 -5.53 0.26 0.86 30% 0.839 0.950 0.931 0.987 0.917 0.981 0.952 40% 0.231 0.071 0.093 0.017 0.105 0.032 0.042 40% 21.02 6.37 8.27 1.22 9.35 2.34 3.37 40% 21.02 6.37 -8.27 0.31 -9.35 0.51 3.00 40% 0.758 0.933 0.870 0.970 0.851 0.956 0.930 50% 0.319 0.089 0.148 0.025 0.165 0.049 0.078 50% 29.05 7.95 13.15 1.73 14.65 3.58 6.69 50% 29.05 7.95 -13.15 -0.79 -14.65 1.03 6.56 50% 0.661 0.918 0.782 0.936 0.755 0.895 0.899 60% 0.419 0.102 0.226 0.047 0.250 0.075 0.135 60% 38.20 9.03 20.14 3.41 22.27 5.53 11.92 60% 38.20 9.03 -20.14 -3.05 -22.27 2.06 11.86 60% 0.551 0.905 0.643 0.871 0.601 0.767 0.841 70% 0.537 0.108 0.339 0.091 0.376 0.121 0.223 70% 48.92 9.42 30.11 7.04 33.38 8.95 19.92 70% 48.92 9.40 -30.11 -6.97 -33.38 3.95 19.88 70% 0.416 0.881 0.429 0.757 0.355 0.501 0.747 80% 0.677 0.099 0.520 0.178 0.583 0.204 0.349 80% 61.49 8.10 46.12 14.60 51.63 15.15 31.22 80% 61.49 7.69 -46.12 -14.60 -51.63 7.00 31.11 80% 0.245 0.799 0.054 0.536 -0.095 -0.090 0.537 90% 0.847 0.095 0.861 0.371 1.010 0.402 0.531 90% 76.96 6.75 75.93 31.15 88.70 30.41 47.10 90% 76.96 0.52 -75.93 -31.15 -88.70 11.39 46.24 90% 0.015 0.469 -0.835 -0.042 -1.287 -1.809 -0.173 Censoring percentage RMSE (70) Censoring percentage MAPE (70) Censoring percentage MPE (70) Censoring percentage KGE (70) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.037 0.014 0.009 0.005 0.011 0.004 0.013 10% 3.32 1.28 0.76 0.45 0.95 0.32 0.93 10% 3.32 1.28 -0.76 0.44 -0.95 0.02 -0.56 10% 0.963 0.986 0.991 0.995 0.988 0.995 0.980 20% 0.089 0.032 0.027 0.009 0.031 0.009 0.014 20% 8.09 2.84 2.41 0.71 2.71 0.66 1.00 20% 8.09 2.84 -2.41 0.67 -2.71 0.00 -0.27 20% 0.908 0.969 0.965 0.992 0.961 0.992 0.965 30% 0.153 0.050 0.055 0.011 0.060 0.015 0.019 30% 13.96 4.51 4.95 0.80 5.39 1.11 1.41 30% 13.96 4.51 -4.95 0.59 -5.39 0.02 0.87 30% 0.840 0.952 0.927 0.986 0.920 0.981 0.953 40% 0.229 0.068 0.095 0.013 0.102 0.024 0.038 40% 20.85 6.15 8.54 0.91 9.16 1.73 3.15 40% 20.85 6.15 -8.54 0.06 -9.16 0.18 3.02 40% 0.759 0.935 0.866 0.968 0.856 0.956 0.934 50% 0.316 0.085 0.151 0.022 0.160 0.037 0.075 50% 28.82 7.64 13.55 1.55 14.41 2.68 6.57 50% 28.82 7.64 -13.55 -1.14 -14.41 0.60 6.55 50% 0.666 0.921 0.775 0.931 0.760 0.896 0.901 60% 0.417 0.097 0.229 0.046 0.243 0.056 0.134 60% 38.00 8.68 20.63 3.54 21.87 4.12 11.94 60% 38.00 8.68 -20.63 -3.46 -21.87 1.39 11.93 60% 0.553 0.908 0.640 0.868 0.612 0.767 0.843 70% 0.534 0.101 0.344 0.092 0.365 0.091 0.220 70% 48.68 8.89 30.91 7.60 32.82 6.73 19.88 70% 48.68 8.88 -30.91 -7.60 -32.82 2.90 19.88 70% 0.417 0.879 0.413 0.745 0.371 0.490 0.745 80% 0.672 0.086 0.526 0.180 0.562 0.155 0.345 80% 61.20 7.15 47.25 15.49 50.46 11.50 31.19 80% 61.20 6.97 -47.25 -15.49 -50.46 6.00 31.18 80% 0.251 0.787 0.024 0.513 -0.064 -0.074 0.563 90% 0.842 0.075 0.878 0.380 0.960 0.313 0.524 90% 76.72 5.32 78.52 33.05 85.84 23.60 47.01 90% 76.72 -0.90 -78.52 -33.05 -85.84 11.40 46.92 90% 0.016 0.436 -0.904 -0.092 -1.146 -1.758 0.035 Censoring percentage RMSE (100) Censoring percentage MAPE (100) Censoring percentage MPE (100) Censoring percentage KGE (100) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.036 0.014 0.009 0.005 0.011 0.004 0.011 10% 3.30 1.25 0.80 0.41 0.93 0.26 0.82 10% 3.30 1.25 -0.80 0.40 -0.93 -0.01 -0.54 10% 0.963 0.986 0.990 0.996 0.988 0.996 0.981 20% 0.088 0.031 0.028 0.008 0.030 0.008 0.012 20% 8.05 2.78 2.48 0.63 2.69 0.55 0.84 20% 8.05 2.78 -2.48 0.60 -2.69 -0.08 -0.26 20% 0.909 0.970 0.965 0.993 0.962 0.991 0.966 30% 0.153 0.049 0.056 0.009 0.059 0.013 0.017 30% 13.91 4.45 5.01 0.69 5.31 0.93 1.25 30% 13.91 4.45 -5.01 0.53 -5.31 -0.06 0.88 30% 0.841 0.952 0.926 0.986 0.921 0.982 0.952 40% 0.228 0.067 0.096 0.010 0.100 0.020 0.037 40% 20.78 6.07 8.63 0.76 9.06 1.45 3.07 40% 20.78 6.07 -8.63 -0.02 -9.06 0.01 3.01 40% 0.761 0.937 0.863 0.965 0.856 0.956 0.935 50% 0.315 0.083 0.151 0.020 0.158 0.031 0.074 50% 28.73 7.52 13.68 1.49 14.28 2.24 6.55 50% 28.73 7.52 -13.68 -1.26 -14.28 0.36 6.54 50% 0.668 0.922 0.775 0.931 0.763 0.894 0.901 60% 0.416 0.095 0.230 0.045 0.239 0.047 0.133 60% 37.94 8.58 20.78 3.61 21.65 3.40 11.98 60% 37.94 8.58 -20.78 -3.58 -21.65 1.09 11.98 60% 0.553 0.908 0.639 0.868 0.620 0.763 0.847 70% 0.532 0.098 0.346 0.092 0.360 0.076 0.219 70% 48.55 8.67 31.21 7.85 32.53 5.61 19.86 70% 48.55 8.67 -31.21 -7.85 -32.53 2.56 19.86 70% 0.420 0.878 0.403 0.737 0.374 0.501 0.751 80% 0.670 0.081 0.528 0.181 0.552 0.131 0.345 80% 61.13 6.80 47.69 15.82 49.89 9.76 31.29 80% 61.13 6.72 -47.69 -15.82 -49.89 5.41 31.29 80% 0.250 0.787 0.019 0.510 -0.038 -0.069 0.572 90% 0.839 0.066 0.882 0.382 0.940 0.270 0.519 90% 76.57 4.66 79.51 33.79 84.63 20.37 46.92 90% 76.57 -1.47 -79.51 -33.79 -84.63 11.41 46.90 90% 0.025 0.408 -0.954 -0.131 -1.132 -1.758 0.044 Table 3 Performance metrics for estimated standard deviations. Censoring percentage RMSE (25) Censoring percentage MAPE (25) Censoring percentage MPE (25) Censoring percentage KGE (25) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.044 0.018 0.006 0.008 0.011 0.008 0.133 10% 7.28 2.90 0.73 1.32 1.53 1.04 16.59 10% -7.28 -2.90 0.73 -1.30 1.52 -0.23 -13.26 10% 0.927 0.971 0.993 0.987 0.984 0.987 0.779 20% 0.115 0.039 0.027 0.014 0.034 0.017 0.229 20% 19.19 6.39 4.17 2.00 5.17 2.28 31.26 20% -19.19 -6.39 4.17 -1.69 5.17 0.11 -30.44 20% 0.815 0.938 0.959 0.984 0.947 0.961 0.585 30% 0.182 0.054 0.058 0.016 0.065 0.028 0.369 30% 30.20 8.85 8.99 2.23 10.02 3.62 51.42 30% -30.20 -8.85 8.99 -0.90 10.02 0.49 -51.24 30% 0.715 0.915 0.913 0.990 0.900 0.930 0.225 40% 0.221 0.06 0.083 0.02 0.09 0.037 0.493 40% 36.61 9.72 13.03 2.59 14.08 4.76 69.04 40% -36.61 -9.72 13.03 0.40 14.08 0.91 -68.96 40% 0.658 0.908 0.874 0.993 0.860 0.901 -0.069 50% 0.253 0.062 0.112 0.029 0.12 0.048 0.658 50% 41.77 9.79 17.68 3.62 18.73 6.16 90.84 50% -41.77 -9.77 17.68 2.41 18.73 1.40 -90.81 50% 0.612 0.908 0.830 0.975 0.814 0.866 -0.534 60% 0.283 0.053 0.168 0.057 0.176 0.069 1.035 60% 46.65 8.00 26.34 7.54 27.37 8.89 136.48 60% -46.65 -7.76 26.34 7.29 27.37 2.56 -136.48 60% 0.568 0.928 0.746 0.928 0.728 0.792 -1.746 70% 0.281 0.037 0.240 0.106 0.249 0.101 1.925 70% 46.36 5.01 37.67 15.32 38.91 12.95 222.52 70% -46.36 -2.12 37.67 15.30 38.91 5.15 -222.52 70% 0.568 0.973 0.635 0.850 0.610 0.685 -5.511 80% 0.250 0.056 0.303 0.158 0.314 0.134 4.044 80% 40.80 6.81 47.19 23.40 48.53 17.03 360.63 80% -40.80 5.12 47.19 23.40 48.53 8.30 -360.63 80% 0.615 0.940 0.538 0.769 0.506 0.572 -15.996 90% 0.170 0.128 0.389 0.238 0.411 0.200 73.533 90% 27.35 17.76 60.21 35.92 62.65 25.21 1046.54 90% -27.35 17.70 60.21 35.92 62.63 17.47 -1046.54 90% 0.733 0.817 0.396 0.639 0.341 0.411 -395.676 Censoring percentage RMSE (40) Censoring percentage MAPE (40) Censoring percentage MPE (40) Censoring percentage KGE (40) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.052 0.019 0.01 0.008 0.013 0.007 0.132 10% 8.35 3.01 1.38 1.11 1.86 0.87 17.16 10% -8.35 -3.01 1.38 -1.07 1.86 -0.03 -14.93 10% 0.913 0.969 0.986 0.989 0.980 0.985 0.753 20% 0.11 0.035 0.028 0.011 0.031 0.013 0.214 20% 17.60 5.61 4.22 1.48 4.77 1.64 29.90 20% -17.60 -5.61 4.22 -1.23 4.77 0.30 -29.31 20% 0.823 0.943 0.957 0.987 0.950 0.963 0.609 30% 0.164 0.048 0.052 0.012 0.056 0.02 0.323 30% 26.48 7.58 8.04 1.57 8.62 2.57 46.80 30% -26.48 -7.58 8.04 -0.61 8.62 0.73 -46.61 30% 0.741 0.925 0.919 0.994 0.911 0.937 0.380 40% 0.213 0.055 0.082 0.017 0.086 0.029 0.472 40% 34.28 8.70 12.74 2.07 13.32 3.68 68.24 40% -34.28 -8.70 12.74 0.86 13.32 1.24 -68.20 40% 0.668 0.914 0.873 0.990 0.863 0.903 -0.009 50% 0.252 0.056 0.119 0.03 0.123 0.04 0.671 50% 40.51 8.67 18.46 3.79 19.03 5.05 96.01 50% -40.51 -8.66 18.46 3.40 19.03 1.89 -96.00 50% 0.612 0.916 0.816 0.965 0.806 0.856 -0.607 60% 0.275 0.046 0.164 0.055 0.167 0.054 0.977 60% 43.97 6.86 25.40 7.54 25.92 6.79 135.99 60% -43.97 -6.78 25.40 7.49 25.92 2.78 -135.98 60% 0.581 0.934 0.748 0.925 0.737 0.798 -1.595 70% 0.279 0.031 0.220 0.093 0.224 0.075 1.510 70% 44.42 4.15 33.96 13.44 34.50 9.36 199.48 70% -44.42 -2.76 33.96 13.44 34.50 4.30 -199.48 70% 0.578 0.970 0.664 0.867 0.651 0.717 -3.617 80% 0.247 0.050 0.295 0.153 0.300 0.107 3.161 80% 39.00 6.14 45.41 22.92 45.98 13.31 352.29 80% -39.00 5.40 45.41 22.92 45.98 7.75 -352.29 80% 0.625 0.940 0.548 0.772 0.529 0.592 -12.202 90% 0.139 0.149 0.404 0.258 0.414 0.178 27.803 90% 21.36 21.56 61.62 38.84 62.66 22.00 1368.21 90% -21.36 21.56 61.62 38.84 62.66 17.44 -1368.21 90% 0.782 0.779 0.372 0.606 0.335 0.377 -168.826 Censoring percentage RMSE (70) Censoring percentage MAPE (70) Censoring percentage MPE (70) Censoring percentage KGE (70) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.049 0.017 0.01 0.006 0.012 0.005 0.117 10% 7.69 2.65 1.49 0.85 1.76 0.63 15.77 10% -7.69 -2.65 1.49 -0.82 1.76 0.12 -14.38 10% 0.917 0.972 0.984 0.991 0.980 0.983 0.736 20% 0.105 0.033 0.028 0.008 0.03 0.01 0.201 20% 16.43 5.07 4.21 1.10 4.50 1.22 28.79 20% -16.43 -5.07 4.21 -0.94 4.50 0.43 -28.48 20% 0.828 0.946 0.956 0.990 0.951 0.965 0.621 30% 0.159 0.045 0.051 0.009 0.053 0.015 0.31 30% 24.88 6.92 7.84 1.10 8.13 1.91 45.39 30% -24.88 -6.92 7.84 -0.32 8.13 0.86 -45.32 30% 0.747 0.928 0.918 0.996 0.913 0.937 0.407 40% 0.207 0.052 0.08 0.014 0.082 0.022 0.454 40% 32.29 7.93 12.30 1.67 12.59 2.73 66.67 40% -32.29 -7.93 12.30 1.11 12.59 1.35 -66.65 40% 0.677 0.919 0.872 0.988 0.867 0.904 0.059 50% 0.246 0.051 0.116 0.028 0.118 0.031 0.648 50% 38.19 7.83 17.75 3.67 18.03 3.81 94.44 50% -38.19 -7.83 17.75 3.59 18.03 1.98 -94.44 50% 0.624 0.921 0.818 0.963 0.811 0.861 -0.465 60% 0.27 0.042 0.16 0.052 0.162 0.042 0.932 60% 41.95 6.18 24.43 7.46 24.68 5.12 133.65 60% -41.95 -6.17 24.43 7.46 24.68 2.87 -133.65 60% 0.591 0.938 0.751 0.924 0.744 0.807 -1.411 70% 0.274 0.024 0.215 0.090 0.217 0.059 1.420 70% 42.63 3.13 32.83 13.29 33.09 7.21 198.16 70% -42.63 -2.28 32.83 13.29 33.09 4.46 -198.16 70% 0.586 0.974 0.666 0.866 0.657 0.719 -3.350 80% 0.243 0.044 0.286 0.148 0.288 0.084 2.657 80% 37.62 5.62 43.55 22.22 43.77 10.39 344.19 80% -37.62 5.39 43.55 22.22 43.77 7.33 -344.19 80% 0.634 0.941 0.557 0.776 0.546 0.602 -9.505 90% 0.139 0.142 0.392 0.251 0.395 0.145 12.896 90% 21.13 21.00 59.50 37.78 59.83 18.15 1150.64 90% -21.13 21.00 59.50 37.78 59.83 15.81 -1150.64 90% 0.782 0.784 0.385 0.614 0.362 0.379 -82.282 Censoring percentage RMSE (100) Censoring percentage MAPE (100) Censoring percentage MPE (100) Censoring percentage KGE (100) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.048 0.016 0.01 0.005 0.011 0.004 0.109 10% 7.40 2.51 1.52 0.74 1.69 0.51 14.76 10% -7.40 -2.51 1.52 -0.73 1.69 0.15 -13.64 10% 0.918 0.972 0.984 0.992 0.981 0.985 0.722 20% 0.103 0.032 0.028 0.007 0.029 0.008 0.194 20% 15.95 4.84 4.22 0.92 4.41 1.04 28.07 20% -15.95 -4.84 4.22 -0.80 4.41 0.51 -27.83 20% 0.830 0.948 0.954 0.991 0.951 0.964 0.628 30% 0.157 0.044 0.051 0.007 0.052 0.013 0.304 30% 24.22 6.67 7.71 0.90 7.91 1.62 44.79 30% -24.22 -6.67 7.71 -0.24 7.91 0.88 -44.75 30% 0.749 0.930 0.918 0.997 0.914 0.938 0.418 40% 0.205 0.05 0.08 0.012 0.081 0.019 0.447 40% 31.56 7.66 12.12 1.51 12.30 2.38 66.11 40% -31.56 -7.66 12.12 1.19 12.30 1.44 -66.10 40% 0.680 0.920 0.873 0.987 0.869 0.908 0.095 50% 0.243 0.05 0.115 0.027 0.116 0.027 0.638 50% 37.36 7.55 17.45 3.65 17.63 3.29 93.84 50% -37.36 -7.55 17.45 3.62 17.63 2.04 -93.84 50% 0.627 0.922 0.817 0.962 0.813 0.861 -0.428 60% 0.268 0.04 0.158 0.051 0.159 0.036 0.91 60% 41.14 5.96 23.96 7.40 24.13 4.44 132.54 60% -41.14 -5.95 23.96 7.40 24.13 2.91 -132.54 60% 0.593 0.939 0.750 0.923 0.745 0.806 -1.300 70% 0.272 0.021 0.212 0.089 0.213 0.051 1.383 70% 41.68 2.70 32.08 13.11 32.21 6.16 196.97 70% -41.68 -2.10 32.08 13.11 32.21 4.32 -196.97 70% 0.592 0.975 0.670 0.866 0.664 0.734 -3.140 80% 0.242 0.042 0.282 0.146 0.283 0.074 2.499 80% 37.10 5.47 42.75 21.92 42.83 9.24 339.27 80% -37.10 5.37 42.75 21.92 42.83 7.24 -339.27 80% 0.636 0.941 0.559 0.776 0.550 0.605 -8.803 90% 0.139 0.139 0.385 0.246 0.386 0.129 10.619 90% 21.08 20.51 58.12 37.00 58.23 16.24 1112.10 90% -21.08 20.51 58.12 37.00 58.23 14.84 -1112.10 90% 0.782 0.789 0.394 0.62 0.377 0.389 -69.5198 Table 4 Performance metrics for estimated coefficient of variations. Censoring percentage RMSE (25) Censoring percentage MAPE (25) Censoring percentage MPE (25) Censoring percentage KGE (25) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.0571 0.0233 0.0074 0.0113 0.0148 0.0104 0.106 10% 10.23 4.13 1.03 1.90 2.26 1.50 15.30 10% -10.23 -4.13 1.03 -1.87 2.26 -0.45 -12.46 10% 0.886 0.955 0.988 0.980 0.973 0.966 0.703 20% 0.1678 0.0561 0.0366 0.0199 0.0467 0.0241 0.191 20% 30.14 9.86 6.08 3.10 7.83 3.43 30.33 20% -30.14 -9.86 6.08 -2.72 7.83 -0.30 -29.77 20% 0.683 0.899 0.934 0.973 0.912 0.891 0.592 30% 0.3045 0.0857 0.078 0.0243 0.0897 0.0408 0.322 30% 54.57 14.97 13.30 3.60 15.30 5.70 52.70 30% -54.57 -14.97 13.30 -1.96 15.30 -0.35 -52.63 30% 0.446 0.851 0.862 0.975 0.838 0.790 0.287 40% 0.413 0.1018 0.112 0.0283 0.125 0.0556 0.447 40% 74.05 17.68 19.27 4.04 21.45 7.72 73.66 40% -74.05 -17.68 19.27 -0.33 21.45 -0.42 -73.64 40% 0.264 0.828 0.805 0.968 0.779 0.694 -0.022 50% 0.54 0.114 0.151 0.0379 0.165 0.0752 0.619 50% 96.68 19.51 26.01 5.23 28.35 10.37 101.50 50% -96.68 -19.50 26.01 2.35 28.35 -0.76 -101.50 50% 0.049 0.811 0.739 0.943 0.712 0.559 -0.527 60% 0.7864 0.118 0.2202 0.0711 0.2349 0.121 1.035 60% 140.57 19.62 37.95 10.11 40.45 16.10 166.83 60% -140.57 -19.51 37.95 9.19 40.45 -2.03 -166.83 60% -0.362 0.801 0.624 0.873 0.596 0.246 -1.854 70% 1.153 0.099 0.303 0.133 0.320 0.200 1.995 70% 206.34 15.25 52.17 20.68 54.99 25.96 305.64 70% -206.34 -14.01 52.17 20.55 54.99 -4.58 -305.64 70% -0.989 0.806 0.480 0.761 0.448 -0.336 -5.930 80% 1.544 0.080 0.366 0.196 0.384 0.311 3.862 80% 275.83 11.30 62.85 31.70 65.73 38.57 540.32 80% -275.83 -4.82 62.85 31.69 65.73 -10.10 -540.32 80% -1.655 0.792 0.359 0.645 0.324 -1.155 -15.860 90% 2.246 0.122 0.442 0.289 0.464 0.580 17.203 90% 400.95 16.43 75.28 47.63 78.72 67.70 1493.23 90% -400.95 12.75 75.28 47.63 78.72 -21.86 -1493.23 90% -2.858 0.686 0.189 0.460 0.140 -3.148 -115.349 Censoring percentage RMSE (40) Censoring percentage MAPE (40) Censoring percentage MPE (40) Censoring percentage KGE (40) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.0701 0.026 0.0128 0.0106 0.0175 0.0093 0.108 10% 12.18 4.45 2.02 1.67 2.81 1.30 16.04 10% -12.18 -4.45 2.02 -1.61 2.81 -0.16 -14.16 10% 0.866 0.952 0.977 0.983 0.968 0.961 0.664 20% 0.1619 0.0517 0.0381 0.0158 0.0441 0.0181 0.183 20% 28.17 8.85 6.33 2.36 7.34 2.50 29.17 20% -28.17 -8.85 6.33 -2.06 7.34 0.11 -28.77 20% 0.701 0.908 0.932 0.979 0.919 0.903 0.593 30% 0.2722 0.0754 0.0722 0.0181 0.0791 0.0292 0.290 30% 47.39 12.88 12.18 2.59 13.36 4.03 47.73 30% -47.39 -12.88 12.18 -1.41 13.36 0.38 -47.61 30% 0.514 0.871 0.873 0.981 0.859 0.824 0.402 40% 0.4041 0.0951 0.114 0.0231 0.122 0.0434 0.442 40% 70.38 16.17 19.32 3.17 20.63 5.94 73.04 40% -70.38 -16.17 19.32 0.50 20.63 0.52 -73.02 40% 0.292 0.841 0.802 0.968 0.787 0.715 0.033 50% 0.5664 0.108 0.163 0.038 0.172 0.0623 0.661 50% 98.71 18.17 27.79 5.20 29.22 8.48 109.21 50% -98.71 -18.17 27.79 4.06 29.22 0.40 -109.20 50% 0.020 0.820 0.720 0.930 0.703 0.548 -0.597 60% 0.774 0.108 0.2223 0.0698 0.2312 0.090 1.026 60% 134.14 17.60 37.69 10.30 39.18 12.10 166.65 60% -134.14 -17.58 37.69 10.06 39.18 -0.25 -166.65 60% -0.317 0.818 0.625 0.873 0.608 0.318 -1.736 70% 1.066 0.092 0.289 0.120 0.299 0.139 1.701 70% 184.83 14.18 48.94 18.81 50.55 18.15 270.86 70% -184.83 -13.75 48.94 18.79 50.55 -1.95 -270.86 70% -0.802 0.819 0.512 0.784 0.494 -0.097 -4.251 80% 1.530 0.064 0.369 0.198 0.379 0.231 3.612 80% 264.90 8.77 62.25 32.31 63.94 28.85 540.32 80% -264.90 -3.00 62.25 32.31 63.94 -5.45 -540.32 80% -1.582 0.800 0.364 0.644 0.344 -0.929 -14.010 90% 2.523 0.146 0.464 0.320 0.476 0.537 20.314 90% 436.42 21.06 77.74 52.74 79.72 60.15 2177.32 90% -436.42 20.30 77.74 52.74 79.72 -19.88 -2177.32 90% -3.247 0.647 0.152 0.409 0.126 -3.587 -140.350 Censoring percentage RMSE (70) Censoring percentage MAPE (70) Censoring percentage MPE (70) Censoring percentage KGE (70) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.0672 0.0238 0.0138 0.0085 0.0165 0.007 0.098 10% 11.40 3.99 2.23 1.30 2.67 0.95 14.82 10% -11.40 -3.99 2.23 -1.26 2.67 0.09 -13.64 10% 0.873 0.957 0.975 0.987 0.969 0.961 0.622 20% 0.1575 0.0486 0.039 0.0125 0.0423 0.0137 0.176 20% 26.72 8.15 6.45 1.83 7.01 1.87 28.23 20% -26.72 -8.15 6.45 -1.62 7.01 0.41 -28.01 20% 0.714 0.914 0.931 0.983 0.924 0.910 0.576 30% 0.2667 0.0715 0.0729 0.0136 0.0767 0.022 0.283 30% 45.24 11.98 12.16 1.87 12.80 2.97 46.48 30% -45.24 -11.98 12.16 -0.93 12.80 0.78 -46.44 30% 0.531 0.878 0.872 0.984 0.865 0.835 0.416 40% 0.3973 0.0902 0.115 0.0183 0.119 0.0323 0.435 40% 67.32 15.04 19.16 2.44 19.87 4.36 71.62 40% -67.32 -15.04 19.16 1.02 19.87 1.06 -71.61 40% 0.317 0.850 0.803 0.968 0.795 0.736 0.089 50% 0.5584 0.102 0.164 0.0355 0.169 0.0467 0.656 50% 94.51 16.81 27.48 4.96 28.26 6.28 107.74 50% -94.51 -16.81 27.48 4.62 28.26 1.13 -107.74 50% 0.055 0.832 0.723 0.933 0.714 0.589 -0.472 60% 0.7664 0.101 0.2228 0.0684 0.2279 0.067 1.009 60% 129.61 16.37 37.23 10.48 38.07 8.81 164.67 60% -129.61 -16.37 37.23 10.46 38.07 0.95 -164.67 60% -0.283 0.829 0.628 0.875 0.619 0.384 -1.557 70% 1.058 0.082 0.290 0.120 0.296 0.100 1.677 70% 179.13 12.58 48.51 19.26 49.44 13.18 270.45 70% -179.13 -12.44 48.51 19.25 49.44 0.33 -270.45 70% -0.765 0.834 0.514 0.785 0.504 0.006 -3.999 80% 1.522 0.048 0.369 0.199 0.375 0.165 3.443 80% 257.00 6.47 61.43 32.40 62.38 20.80 536.95 80% -257.00 -2.00 61.43 32.40 62.38 -1.99 -536.95 80% -1.528 0.824 0.372 0.651 0.362 -0.703 -12.191 90% 2.518 0.141 0.463 0.321 0.470 0.372 15.473 90% 425.74 21.33 77.03 52.86 78.08 42.33 2069.65 90% -425.74 21.19 77.03 52.86 78.08 -9.81 -2069.65 90% -3.195 0.662 0.172 0.423 0.161 -3.017 -99.851 Censoring percentage RMSE (100) Censoring percentage MAPE (100) Censoring percentage MPE (100) Censoring percentage KGE (100) ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE ZDL HDL DL LR2 KM ROS MLE 10% 0.066 0.0229 0.0142 0.0076 0.016 0.0057 0.092 10% 11.06 3.80 2.29 1.16 2.59 0.77 13.92 10% -11.06 -3.80 2.29 -1.14 2.59 0.15 -12.96 10% 0.876 0.958 0.975 0.988 0.971 0.965 0.598 20% 0.1558 0.0472 0.0396 0.0108 0.0419 0.0117 0.172 20% 26.12 7.85 6.53 1.56 6.91 1.57 27.59 20% -26.12 -7.85 6.53 -1.42 6.91 0.58 -27.41 20% 0.721 0.918 0.929 0.985 0.925 0.913 0.560 30% 0.2644 0.0701 0.0729 0.0114 0.0756 0.0186 0.280 30% 44.36 11.66 12.09 1.55 12.53 2.50 45.94 30% -44.36 -11.66 12.09 -0.79 12.53 0.91 -45.92 30% 0.539 0.881 0.873 0.984 0.868 0.842 0.420 40% 0.3947 0.0883 0.115 0.016 0.118 0.0275 0.432 40% 66.21 14.65 19.07 2.11 19.55 3.72 71.11 40% -66.21 -14.65 19.07 1.18 19.55 1.35 -71.11 40% 0.326 0.853 0.805 0.971 0.799 0.752 0.118 50% 0.5551 0.099 0.165 0.0343 0.168 0.0393 0.653 50% 92.98 16.35 27.32 4.92 27.87 5.29 107.19 50% -92.98 -16.35 27.32 4.78 27.87 1.51 -107.19 50% 0.066 0.836 0.724 0.933 0.717 0.603 -0.432 60% 0.7637 0.098 0.2225 0.0675 0.2261 0.055 1.002 60% 127.88 15.98 36.96 10.53 37.54 7.35 163.77 60% -127.88 -15.98 36.96 10.53 37.54 1.46 -163.77 60% -0.272 0.831 0.629 0.873 0.623 0.398 -1.470 70% 1.054 0.078 0.290 0.120 0.294 0.082 1.666 70% 176.21 11.98 48.11 19.32 48.72 10.76 269.40 70% -176.21 -11.92 48.11 19.32 48.72 0.94 -269.40 70% -0.744 0.838 0.517 0.785 0.510 0.063 -3.848 80% 1.519 0.041 0.368 0.198 0.372 0.134 3.370 80% 254.31 5.45 61.07 32.40 61.68 17.17 533.55 80% -254.31 -1.65 61.07 32.40 61.68 -0.42 -533.55 80% -1.514 0.825 0.376 0.652 0.370 -0.621 -11.689 90% 2.516 0.139 0.463 0.321 0.467 0.292 14.471 90% 420.61 21.34 76.46 52.63 77.15 34.42 2056.53 90% -420.61 21.30 76.46 52.63 77.15 -6.25 -2056.53 90% -3.170 0.666 0.179 0.427 0.173 -2.671 -92.370 Table 5 Performance metrics for estimated medians. Censoring percentage RMSE (25) Censoring percentage MAPE (25) Censoring percentage MPE (25) Censoring percentage KGE (25) ZDL HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE 60% 0.954 0.424 0.138 0.209 0.334 0.432 60% 43.87 12.26 20.76 32.03 44.02 60% 43.87 -12.26 20.62 31.54 44.02 60% 0.365 0.772 0.677 0.373 0.290 70% 0.951 0.326 0.352 0.120 0.332 0.612 70% 32.52 35.03 10.42 29.77 63.64 70% 32.48 -35.03 4.52 25.34 63.64 70% 0.447 0.368 0.662 -0.046 0.055 80% 0.956 0.249 0.546 0.166 0.377 0.749 80% 23.04 55.30 13.89 33.15 78.05 80% 22.35 -55.30 -9.81 22.10 78.05 80% 0.462 0.010 0.492 -0.562 -0.134 90% 0.950 0.175 0.840 0.350 0.511 0.868 90% 15.05 85.92 31.96 45.59 91.22 90% 7.04 -85.92 -31.46 13.60 91.22 90% 0.368 -0.675 0.070 -1.875 -0.355 Censoring percentage RMSE (40) Censoring percentage MAPE (40) Censoring percentage MPE (40) Censoring percentage KGE (40) ZDL HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE 60% 0.943 0.411 0.141 0.194 0.305 0.427 60% 43.22 13.57 19.71 30.14 44.57 60% 43.22 -13.57 19.69 30.00 44.57 60% 0.377 0.742 0.691 0.373 0.290 70% 0.945 0.330 0.316 0.105 0.291 0.578 70% 33.97 32.06 9.28 26.91 60.75 70% 33.97 -32.06 6.62 25.30 60.75 70% 0.451 0.393 0.679 0.030 0.096 80% 0.946 0.225 0.559 0.157 0.315 0.747 80% 21.42 57.74 13.42 27.96 78.75 80% 21.13 -57.74 -11.54 20.63 78.75 80% 0.461 -0.074 0.445 -0.593 -0.133 90% 0.946 0.140 0.961 0.422 0.459 0.893 90% 11.87 99.78 41.29 40.76 94.40 90% 0.11 -99.78 -41.26 14.91 94.40 90% 0.274 -1.034 -0.164 -2.342 -0.409 Censoring percentage RMSE (70) Censoring percentage MAPE (70) Censoring percentage MPE (70) Censoring percentage KGE (70) ZDL HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE 60% 0.940 0.404 0.143 0.185 0.284 0.426 60% 42.79 14.41 19.10 28.79 44.86 60% 43.22 -13.57 19.69 30.00 44.57 60% 0.376 0.731 0.687 0.375 0.287 70% 0.940 0.320 0.319 0.085 0.256 0.578 70% 33.41 33.18 7.55 24.21 61.22 70% 33.97 -32.06 6.62 25.30 60.75 70% 0.439 0.395 0.665 0.052 0.086 80% 0.940 0.208 0.562 0.145 0.259 0.746 80% 20.57 59.01 13.06 23.13 79.22 80% 21.13 -57.74 -11.54 20.63 78.75 80% 0.462 -0.105 0.429 -0.574 -0.147 90% 0.940 0.109 0.976 0.426 0.357 0.891 90% 9.18 102.68 43.31 31.62 94.73 90% 0.11 -99.78 -41.26 14.91 94.40 90% 0.255 -1.119 -0.217 -2.314 -0.423 Censoring percentage RMSE (100) Censoring percentage MAPE (100) Censoring percentage MPE (100) Censoring percentage KGE (100) ZDL HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE HDL DL LR2 ROS MLE 60% 0.938 0.402 0.142 0.182 0.274 0.425 60% 42.70 14.60 18.97 28.22 45.02 60% 42.70 -14.60 18.97 28.22 45.02 60% 0.377 0.723 0.688 0.373 0.286 70% 0.937 0.315 0.321 0.075 0.240 0.577 70% 33.17 33.66 6.73 23.35 61.40 70% 33.17 -33.66 5.49 23.10 61.40 70% 0.450 0.364 0.664 0.053 0.085 80% 0.938 0.201 0.563 0.140 0.232 0.745 80% 20.30 59.44 12.98 20.94 79.37 80% 20.28 -59.44 -12.74 18.59 79.37 80% 0.460 -0.118 0.420 -0.570 -0.143 90% 0.937 0.093 0.979 0.426 0.308 0.889 90% 7.91 103.71 44.04 27.30 94.88 90% -1.85 -103.71 -44.04 15.25 94.88 90% 0.241 -1.142 -0.236 -2.272 -0.428 To facilitate visualization and enable a comparison of the values, graphs containing the described information were prepared. Since the number of elements did not significantly affect the quality of the results and to better manage the article's size, the simulations for series containing 40 elements were chosen for illustration. A detailed analysis was performed based on the performance indicators to choose the most appropriate technique for estimating the studied statistics (mean, standard deviation, coefficient of variation, and median). This analysis was described for CV = 0.25, with the reasoning being extended to CV = 0.10, 0.40, 0.80, and 1.60, as summarized in Table 6. Table 6 Best methods for estimating statistics. CV Variables Censoring percentage 10 20 30 40 50 60 70 80 90 0.1 Mean ROS ROS ROS ROS ROS LR2 LR2 LR2 LR2 SD ROS ROS ROS ROS ROS ROS LR2 LR2 HDL CV ROS ROS ROS ROS ROS ROS LR2 LR2 LR2 Median -- -- -- -- -- DL DL ROS LR2 0.25 Mean ROS ROS LR2 LR2 LR2 LR2 HDL HDL HDL SD LR2ROS LR2 ROS LR2 LR2 LR2 HDL ROS HDL HDL HDL CV ROS LR2 LR2 LR2 LR2 LR2 HDL HDL HDL Median -- -- -- -- -- DL LR2 LR2 HDL 0.4 Mean LR2 LR2 LR2 ROS HDL HDL HDL HDL HDL SD ROS ROS ROS HDL HDL HDL HDL ROS ZDL LR2 LR2 LR2 CV LR2 LR2 LR2 LR2 HDL HDL HDL HDL HDL Median -- -- -- -- -- LR2 LR2 HDL HDL 0.8 Mean HDL HDL HDL HDL HDL ROS ROS ROS ROS SD HDL HDL HDL HDL ROS ROS ROS ROS ZDL ROS ROS ROS ROS CV LR2 HDL HDL HDL HDL HDL ROS ROS ROS Median -- -- -- -- -- LR2 HDL ROS ROS 1.6 Mean HDL ROS HDL HDL ROS HDL ROS ROS ROS ROS ROS ROS ROS SD ROS ROS ROS ROS ROS ROS ROS ROS ZDL CV HDL HDL ROS ROS ROS ROS ROS ROS ROS Median -- -- -- -- -- LR2 HDL HDL ROS The last step consisted of evaluating the possibility of making reasonable estimates in series with a high percentage of censoring (80%) for all asymmetries. The analyses carried out to choose the most appropriate forecasting methods are described. RESULTS AND DISCUSSION In general, the censoring percentage, unlike the number of elements in synthetic series, significantly influenced the quality of predictions. Increasing the number of elements under the DL method led to an increase in MPE, MAPE, and RMSE values and a decrease in KGE value, with few exceptions. The results are described numerically and categorized into value ranges, as shown in Tables 2 to 5. The benchmark of the metrics depends on the objectives, inherent difficulties in the process, and error propagation in subsequent analyses. For example, this study used the threshold values described in Towner et al. (2019) for KGE estimation. The above authors used negative values to describe very poor estimates (in orange), values between 0 and 0.50 to describe poor estimates (in yellow), values between 0.50 and 0.75 to describe intermediate estimates (in brown), and values > 0.75 to describe good estimates (in blue). The limits established for MPE and MAPE were < 10% (blue), between 10% and 20% (brown), between 20% and 30% (yellow) and higher than 30% (orange). The same values were adopted the for RMSE, requiring the absolute value to be divided by the adopted mean (1.00 mg/L) to obtain dimensionless values. Means Quality of the estimates for CV = 0.25 Replacing the censored data with DL resulted in a negative bias in the estimated means (Figure 2) since it represents the largest possible value among censored values. Using the KM approach produced similar results to those obtained through DL substitution in different scenarios and statistical summaries. Both methods overpredicted the estimated means, as observed by George et al. (2021) and Tekindal et al. (2017), as they assign zero weight to values below the DL when estimating the mean (Zhan et al., 2022). Figure 2 Performance indicators in estimating means in synthetic series with 40 elements (CV = 0.25). Positive biases were observed in the ZDL and ROS methods, while the HDL method showed positive bias in almost all censoring percentages. The semiparametric method had the best performance, up to 80% censoring (less than 10% in magnitude), and the HDL method performed well at 90%. The LR2 method showed a similar performance from 10% to 70%, and, in general, LR2 is widely accepted for means estimation, with some nuances: Niemann (2016), Canales et al. (2018), and Tekindal et al. (2017) suggest its use for censoring up to 30%, 50%, and for any censoring, respectively, along with the ROS method. Table 2 shows that errors are below 10% for the HDL, LR2, and ROS methods in almost all censoring percentages. The MAPE values coincided with the MPE values for three of the seven treatment techniques (ZDL, DL and KM). Three techniques (HDL, LR2 and MLE) exhibited similar values for the MPE and MAPE, indicating a bias with the same sign in most simulations. The robust method showed significant differences between the MPE and MAPE due to alternating sign-in bias in most scenarios. This observation highlights the importance of using the MAPE to evaluate the estimates. Figure 2 shows increasing MAPE values with censoring percentage, except in the case of an HDL value above 60% of censored values. The best performances were observed for LR2/ROS up to 60% and for HDL from 60%. For censoring percentages up to 50%, there were reasonable estimates among all techniques except for the ZDL technique, and the highest values could be seen for the HDL and LR2 techniques. The HDL technique showed good results from 60% to 80%, similar to the results in George et al. (2021) and Niemann (2016). Observations for the RMSE were similar to those for the MAPE in the described scenario, with the LR2 and ROS techniques performing better at up to 60% censoring and the HDL performing well above 50%. The MLE had intermediate performance, while the KM and DL techniques showed similar results. In summary, the ROS and LR2 techniques could be recommended for estimating means at 10% and 20% censoring because they performed best in all metrics. However, MPEROS is four times lower than is MPELR2, and the semiparametric method was chosen. The LR2 technique is suggested from 30% to 60% because it had a lower RMSE than ROS. Above 60%, the HDL technique is recommended, as it showed significantly better performance than did the other techniques. Overall, the quality of the estimates from the selected methods was satisfactory, except for the 90% censoring percentage, where a moderate KGE value could be observed in the HDL technique. Estimates with 80% censorship Figure 3 illustrates the variation in performance indicators based on the CV used to generate synthetic series, with a censoring percentage of 80%. MPE values approached asymptotic values in all depicted curves, with final biases ranging from -53% in the KM method to 10% in the ZDL method. The DL and KM methods exhibited negative biases, while the ZDL and ROS methods had positive biases. The LR2 and HDL methods showed alternating signs across different CVs. Figure 3 Performance indicators of the averages estimated in different log-normal synthetic series (Censoring percentage = 80%). The ROS methods demonstrated low errors (<10% in magnitude) in all situations, consistent with the findings of Shunway et al. (2002), who observed no bias in estimated means at high censoring percentages (50% and 80%). Canales et al. (2018) recommended using the ROS method for highly asymmetric series and studied sets with censoring percentages above 80%. Tekindal et al. (2017) suggested using the LR2 and ROS methods for mean estimation at CV = 0.473 and 1.27. The HDL method exhibited the lowest biases at CV = 0.25 and 0.40, deviating slightly from the findings of She (1997), who obtained better results with the HDL method in higher asymmetries (CV = 1.00 and 2.00). The above authors used three randomly sampled percentages between 10% and 80% and were able to explain this discrepancy. The MLE generated significantly disparate values, particularly in series with CV = 0.40, 0.80, and 1.60, similar to the findings of Niemann (2016) and Canales et al. (2018). Although Niemann (2016) did not specify the CV of the generated log-normal series, they obtained poor mean estimates with the MLE, exhibiting biases above 40% and root mean squared errors more than five times the value of the true mean at 50% censoring, likely due to the generation of highly asymmetric series. Higher asymmetries were more likely to contain lower values and means, reducing bias in the ZDL method from 73% at CV = 0.10 to 10% at CV = 1.60. In the DL method, MPE values changed from -20% (CV = 0.10) to -45% (CV = 1.6). This pattern also occurred in the KM and LR2 methods, with LR2 exhibiting positive bias in series generated from CV = 0.10 and negative bias in the other series. At the same time, the HDL method showed positive biases in series with CV = 0.10 and 0.25 and negative biases in the other series. MAPE and MPE values were similar/coincident in all techniques, except for the HDL method at CV = 0.40 and the ROS method in all situations. This finding explains the better performance of the HDL for CV = 0.25 and 0.40 and the similar values from the ROS method for CV = 0.80 when considering MAPE. The best results for each CV were almost below 18%. KGE values increased after CV = 0.40 and reached high values at CV = 1.60, particularly for the ZDL (0.953), HDL (0.916), and ROS (0.975) methods. For the methods that yielded the best results, the estimates were classified as either good or intermediate. We recommend the use of the LR2 method at CV = 0.10 (0.710), the HDL method at CV = 0.25 (0.799) and CV = 0.40 (0.778), and the ROS method at CV = 0.80 (0.896) and CV = 1.60. The lowest RMSE values coincided with the scenarios where the KGE showed the highest values. However, the variance observed at CV = 0.80 and 1.60 (~ 0.30 mg/L) was three times higher than the RMSE that occurred in other CVs. In summary, estimating means in highly censored synthetic series (80%) with acceptable errors, mainly in lower asymmetries, is possible. The LR2 method was the preferred method for estimating means at CV = 0.10, the HDL method was the choice for CV = 0.25 and 0.40, and the ROS method was chosen for CV = 0.80 and CV = 1.60. Three metrics returned good results (MPE and MAPE < 15% in magnitude) and KGE > 0.70. Up to CV = 0.40, the RMSE value was lower than 0.15 mg/L, but the variance could prevent good estimates in higher asymmetries. Although Antweiller & Taylor (2008) did not achieve satisfactory results for censoring data above 70%, they used monitored series that did not have a specific probability distribution function (PDF). Standard deviations Quality of the estimates for CV = 0.25 Increasing the censoring percentage reduces KGE values and increases most MPE and MAPE values, as observed in Tekindal et al. (2017). In all simulations, the ZDL and MLE exhibit negative bias. Higher asymmetries result in better estimates with the ZDL and worse results when using the MLE. The estimates obtained with MLE stand out negatively, with MPE values below -1,000% under 90% censoring. This observation aligns with the significant biases observed in the most asymmetric series simulated by Tekindal et al. (2017) and George et al. (2021). Simulations using the robust method yield the smallest, typically positive, biases in most scenarios, only exceeding 10% in magnitude in series with 90% undetectable values. When adopting DL/20.5 substitution, the bias exhibits a negative sign at 10%, 20%, and 30% censoring and becomes positive above 30%. The LR2 method exhibits small biases (< 8% in magnitude) up to 60% and shows similar results as those of the ROS method. Among the papers listed in Table 1, Tekindal et al. (2017) was the only one that employed the LR2 method to estimate standard deviations. This technique demonstrated satisfactory results, ranking second-best among the employed techniques, with MPE values below 15% at 65% censoring for series generated with CV = 0.473. In simulations with CV = 1.27, the MPE reached 68% for the same censoring level and in short series (20 elements). Estimates using robust methods exhibited MPE values below 10% in magnitude up to 70% censoring. In Tekindal et al. (2017), the ROS method was the recommended method for estimating standard deviations, particularly for more asymmetric series, where the MPE was below 4%. Estimates using the HDL method also yielded satisfactory results, with MPE values below 10% at up to 80% censoring (CV = 0.25). The HDL method demonstrated the best performance for censoring percentages above 70% (CV = 0.25). Simulations conducted with the HDL method exhibited variable biases, both positive and negative. George et al. (2021) obtained biases below 10% in series generated with CV = 0.53, while an approximate 30% underestimation was observed in series with CV = 3.45. The MPE coincides with the MAPE in the ZDL, DL, and KM methods, indicating that all 10,000 synthetic series exhibit the same bias direction (positive or negative) and differ substantially from the ROS method. This finding helps explain the smallest biases observed with the ROS method in Tekindal et al. (2017). In the HDL, MLE, and LR2 methods, the MPE and MAPE values are very close, indicating that almost all forecasts behave similarly within the same studied scenario. Despite these differences, the technique that yields better estimates of standard deviation, as analyzed by the MPE and MAPE are the same in most scenarios. KGE values exceeded 0.75 at censorship up to 60% for five techniques, excluding the ZDL and MLE techniques. The HDL and LR2 techniques displayed the best results in these situations, with only the HDL technique having good estimates in higher censoring percentages. The lowest RMSE values were observed in the LR2 and ROS techniques for 10% and 20% censoring percentages. From 30% to 60%, the LR2 technique exhibited the best results, and the HDL technique is recommended for censoring percentages above 50%. The MLE is not included in Figure 4 because the indicator could be up to two orders higher than those obtained with other techniques. The ZDL technique ranked as the second-worst technique for estimating standard deviations up to 70%. Starting from 60% censoring, the KM and DL techniques displayed very high values. Antweiller & Taylor (2008) used actual samples with 32% of values below the DL to assess the performance of methods when handling censored data and they obtained similar results to those of the present study, with the highest bias being in the ZDL and MLE methods and the lowest bias being in the robust and HDL methods. The authors did not test substitution by DL/20.5 in this research. Figure 4 Performance indicators in estimating standard deviations in synthetic series with 40 elements (CV = 0.25). It was observed that the LR2 method proved to be adequate for estimating standard deviations up to 60% censoring, regardless of the performance metric used (KGE > 0.870; MPE, MAPE < 11% in magnitude; RMSE < 0.08 mg/L). It was also noted that semiparametric methods can be suitable, especially at low censoring levels (10% and 20%). Although the three performance indicators yielded similar results to those of the LR2 method, they exhibited higher RMSE values. The HDL curves displayed an inflection point near 60% censoring, and the values decreased afterward. From 60% to 90%, the HDL method was the best technique, and although the results increased at a censorship of 90%, they were satisfactory (KGE > 0.750; MPE, MAPE < 21% in magnitude, and RMSE < 0.150 mg/L). Estimates with 80% censorship Figure 5 shows the performance indicators in estimating standard deviations for different log-normal synthetic series (censoring percentage = 80%). The ROS method exhibited low biases (< 5% in absolute value) in CV = 0.40, 0.80, and 1.60, the LR2 method exhibited low biases at CV = 0.10 (5.00%), and the HDL exhibited low biases at CV = 0.25 (5.40%). The biases almost stabilized at low values in higher asymmetries (< 6% in absolute value), particularly in the ROS method, which presented a value of 0.15%. The KM and DL methods underestimated standard deviations to a greater extent at CV = 0.10 (~ 60%) and lower than 6% at CV = 1.60. The ZDL and MLE methods had negative biases, while other techniques had positive biases, except for the HDL method at CV = 0.10. Figure 5 does not represent the MLE due to its high errors, as reported by Helsel & Cohn (1988). The MAPE exhibited similar behavior to that of the MPE, except in the ROS method, although MAPEROS maintained the lowest values (< 9%) at CV = 0.40, 0.80, and 1.60. Figure 5 Performance indicators of the standard deviations estimated in different log-normal synthetic series (Censoring percentage = 80%). KGE values showed a systematic increase and were consistently high (> 0.890) in standard deviation estimations for CV = 0.80 and CV = 1.60, except for the maximum likelihood method (MLE). The errors associated with the MLE were high, rendering any estimation impossible. The best techniques in each CV returned good predictions (KGE > 0.75). The LR2 method exhibited the best performance at CV = 0.10 (0.766), the HDL method exhibited the best performance at CV = 0.25 (0.940) and CV = 0.40 (0.917), and the ROS method exhibited the best performance at CV = 0.80 (0.993) and CV = 1.60 (0.999). RMSE values were reasonable across all asymmetries, with the LR2 method performing the best at CV = 0.10 (0.023 mg/L), the HDL method performing best at CV = 0.25 (0.050 mg/L), and CV = 0.40 (0.123 mg/L), and the ROS method performing best at CV = 0.40 (0.126 mg/L), CV = 0.80 (0.133 mg/L), and CV = 1.60 (0.288 mg/L). In summary, the use of the LR2 method for CV = 0.10, the HDL method for CV = 0.25 and CV = 0.40, and the ROS method for CV = 0.40, 0.80, and 1.60 when estimating standard deviations is recommended. These methods consistently performed the best across all four metrics However, RMSE values increased with censoring but did not hinder reasonable estimates. The maximum value was 0.288 mg/L for the series generated with a CV of 1.6. Kroll & Stedinger (1996) emphasized using the ROS method to estimate standard deviations in situations involving short and medium-level censoring. However, they reached this conclusion by encompassing the results of a series generated from four different coefficients of variation. However, according to the presented results, the robust technique can be employed even in scenarios with a high percentage of undetectable values. Coefficients of variation Quality of the estimates for CV = 0.25 Figure 6 shows positive biases in the DL and KM methods due to the overestimation of means and underestimation of standard deviations that occurred in Tekindal et al. (2017) and George et al. (2021). The ZDL, HDL, and MLE methods have negative biases, while the ROS and LR2 methods present variable signs. Figure 6 Performance indicators in estimating variation coefficients in synthetic series with 40 elements (CV = 0.25). The smallest biases occurred in the ROS (up to 70%) and HDL (80% and 90%) methods. The LR2 method presented good results up to 50% (MPE < 3% in modulus). Up to a censorship percentage of 80%, minor mistakes were always less than 12% and approximately 20% at 90%. The results showed consistency in covariates (mean and standard deviation) regarding the best techniques for estimating the variables. Overestimation in the ZDL and MLE methods led to values lower than -400% and -1,400%, respectively, in modulus due to the low accuracy in estimating the means (ZDL) and standard deviation (MLE). The MPEs obtained in the DL and KM methods were close to each other and reasonableER, up to 40% censorship, with modulus values not exceeding 21%. Using the means and standard deviations data presented by George et al. (2021), it was observed that the magnitude asymmetries of the simulated series influenced the bias value and direction. When MPE > 0 was observed in the coefficients of variation estimated by the MLD and ROS methods, moderately asymmetric series (CV = 0.45) showed a positive bias. In contrast, asymmetric series (CV = 3.45) showed a negative bias. In the synthetic series generated with CV = 0.473, Tekindal et al. (2017) showed an overestimation of the coefficients of variation at 5% and 25% censorship levels and an underestimation at 65% when adopting the LR2 method. In the series generated with CV = 1.27, underestimation was observed at all censorship levels. For the robust methods, there were super forecasts at all censorship levels in the most skewed series and an undefined scenario in those series with a moderate level of skewness. MAPE values coincided with MPE values in three methods of censoring treatment (ZDL, DL and KM methods), with three showing little difference (MLE, HDL, and LR2 methods), and the ROS method showed a significant difference. The MAPE shown in Figure 6 omit the ZDL and MLE, which are inconsistent. The KGE method also verifies the complete inadequacy of the estimates of parametric variables using the ZDL and MLE methods, as in Niemann (2016), Tekindal et al. (2017), Canales et al. (2018), and George et al. (2021). KGE values were high (> 0.7) at up to 40% censoring, except in the ZDL, ROS, and MLE methods. The LR2 method was more suitable, at up to 60%, and the HDL method was more suitable from 70% censoring. There were good estimates of up to 80% in the recommended methods. At 90% censoring, the KGE method was considered intermediate. The KM and DL methods had similar values, as observed in the analysis of this research. The ROS method yielded only good results above 50% (KGE < 0.50). The RMSE showed increasing values according to the censoring percentage, except in the HDL method (above 60%). The LR2 method had the best performance, at up to 60%, and the HDL method had the best performance from 60% to 90%. Unreal error variances were observed when adopting the ZDL and MLE methods for CV simulations. The ROS method had good results (RMSE < 0.100 mg/L) at up to 50%. According to the preceding analysis, the use of the ROS method was recommended at 10% because this technique had the three best performances, except in the KGE method. At 20%, the LR2 method is suggested because it produced the best MAPE, RMSE, and KGE results. Moreover, this technique returned a slight bias. From 30% to 50%, the LR2 method presented better results than did the ROS method in terms of the RMSE and KGE, even though the MPE and MAPE values were similar. At 60%, the ROS and LR2 methods had similar performance in terms of the MAPE and RMSE. However, KGEROS (0.318) << KGELR2 (0.818), and MPEROS was reasonable (10.06%). From 70% to 90%, the HDL method was recommended due to its best performance in terms of the MAPE, RMSE, and KGE and a reasonable bias. The results obtained by the selected techniques were satisfactory for all censoring percentages, with performance indicator values similar to those observed in standard deviations. Estimates with 80% censorship Figure 7 illustrates the performance indicators in estimating the coefficient of variations for different log-normal synthetic series. The ROS method had the lowest errors (< 15% in absolute value), along with the LR2 method at CV = 0.10 (-4.12%) and the HDL method at CV = 0.25 (-3.00%), and CV = 0.40 (13.17%). The biases stabilize at higher asymmetries, reaching reasonable values in the ZDL, HDL, and ROS methods (smaller than 15% in absolute value). The ZDL and MLE methods had negative bias, the DL and KM methods had positive bias, and the HDL, LR2, and ROS methods had alternating bias signs. The MAPE had similar/coincident values as those of the MPE, except in the ROS, LR2 (CV = 0.10), and HDL (CV = 0.25) methods. Figure 7 Performance indicators of the coefficients of variation estimated in different log-normal synthetic series (Censoring percentage = 80%). The KGE curves showed increasing values, which can be visualized in CV = 1.60, having the best value, 0.925 (in the ROS method) compared to the best value at CV = 0.10 (0.601) in LR2 method. The best values at CV = 0.25 (0.800), CV = 0.40 (0.833), and CV = 0.80 (0.806) were obtained using the HDL method. The best estimates were good, except in CV = 0.10, which was classified as intermediate. The RMSE presented the highest values at CV = 0.80, except in the HDL method. Significant differences between these values and those observed at CV = 1.60 were observed in the ZDL and ROS methods. The smallest values occurred in the LR2 method at CV = 0.10 (0.033 mg/L), in the HDL method at CV = 0.25 (0.064 mg/L), CV = 0.40 (0.163 mg/L), and CV = 0.80 (0.411 mg/L), and in the ROS method at CV = 0.80 (0.406 mg/L), and CV = 1.60 (0.242 mg/L). In summary, we recommend using the LR2 method to estimate the coefficient of variation at CV = 0.10, the HDL method at CV = 0.25 and 0.40, and the ROS method at CV = 1.60, as they are the best methods for all performance metrics. Under these conditions, the estimates showed satisfactory results, with absolute errors and biases below 20%, variances less than 0.250 mg/L, and KGE values greater than 0.60. Using the HDL method, the results were similar to those of the ROS method in higher asymmetries. She (1997) described adequate mean and standard deviation estimates when using the HDL model in series with CV = 1.00 and 2.00. The coefficient of variation may repeat this behavior because it is a covariate of these variables For CV = 0.80, the semiparametric method was the most suitable because it had the lowest MPE, MAPE, and RMSE values. Although its KGE value was lower than that obtained with the HDL method, the value was still very good (0.650). However, we did not recommend using any estimation method because the RMSE value was too high (> 0.400 mg/L). Median Quality of the estimates for CV = 0.25 Only series with censoring percentages above 60% were used to estimate the medians. At lower percentages, this variable is already known. The KM method only works with data ordering and does not provide median estimates; thus it was excluded from this analysis. Figure 8 shows the MPE, MAPE, KGE, and RMSE variations according to the censoring percentage. Figure 8 Performance indicators in estimating medians in synthetic series with 40 elements (CV = 0.25). There was overestimation in the DL method, underestimation in the ROS, MLE, and HDL methods; and alternating bias signs in the LR2 method (Figure 8). The lowest values were obtained using the substitution methods, with the DL method at 60% censoring, the LR2 method at 70% and 80%, and the HDL method at 90%. The smallest biases were always less than 15% in absolute value in each scenario. The MPE and MAPE values in the DL, DL, and MLE methods coincided. There were substantial differences between the HDL and ROS methods in some scenarios. The estimates had good values, less than 20% in magnitude for the best methods in each situation. According to KGE values, the techniques returned good estimates only when the LR2 method was used at 60% and 70% censoring and the DL method at 60%. The worst values occurred in the ROS method, and are not shown in the graph because they were far below the range represented on the vertical axis, making it difficult to visualize (they reached approximately -2.30). The best simulations occurred using the DL method at 60% censoring, the LR2 method at 70% and 80%, and the HDL method at 80% and 90%. Based on the last analysis, the best methods to estimate medians were the DL method at 60%, the LR2 method at 80%, and the HDL method at 90% censoring because the results in the four metrics were the same. The choice of the LR2 method at 70% censoring was made because this technique had the best performance in terms of the MPE, MAPE, and RMSE and a similar value in KGE compared to the HDL method. The results were satisfactory at 60% and 70% censorship. At 80% and 90% censorship, KGE showed low values (< 0.50), and thus, results must be evaluated before they can be used in other contexts. Estimates with 80% censorship Figure 9 shows the performance indicators at a censoring percentage of 80%. Positive biases were observed in the MLE and ROS methods, and negative in the DL, LR2 methods (CV = 0.25), and the HDL method (CV = 0.40). The smallest MPE values in the module occurred in the ROS method at CV = 0.10 (11.40%), 0.80 (32.96%), and 1.60 (28.00%). The LR2 method at CV = 0.25 (-11.54%) and HDL at CV = 0.40 (- 0.63%). MAPE and MPE were similar/coincident, except for those in the ROS method. The MAPE had the smallest values in the LR2 method at CV = 0.10 (15.53%) and 0.25 (13.42%), the HDL method at CV = 0.40 (18.18%), and the ROS method at CV = 0.80 (59.91%) and 1.60 (85.40%). These errors in high asymmetry (> 0.80) may hinder median estimation. Figure 9 Performance indicators of the medians in different log-normal synthetic series (Censoring percentage = 80%). KGE indicated good predictions for CV values up to 0.40, with values greater than 0.45. However, there was a significant decrease at CV = 0.80 and 1.60, with most values being negative. The best results were obtained with the LR2 method at CV = 0.10 (0.498), the HDL method at CV = 0.25 (0.461) and 0.40 (0.524), and the ROS method at CV = 0.80 (-0.270) and 1.60 (-0.487). The last two results were too low. For example, if the mean results replace the unknown values, then the KGE value would be -0.41 (Knoben et al., 2019). The RMSE values significantly increased after CV = 0.40 in the HDL, LR2, and DL. ZDL methods provided unreliable estimates. However, the smallest RMSE values were observed in the ROS method at CV = 0.80 (0.435 mg/L) in the MLE method at CV = 1.60 (0.301 mg/L), while the substitution methods had the smallest values in the LR2 method at CV = 0.10 (0.163 mg/L) and 0.25 (0.157 mg/L), and the HDL method at CV = 0.40 (0.197 mg/L). In summary, the LR2 method had the best performance at CV = 0.25, the HDL method had the best performance at CV = 0.40, and the ROS method had the best performance at CV = 0.80, as these methods demonstrated the best performance according to all four metrics. The recommendation to use the LR2 method at CV = 0.10 is based on its higher KGE value (0.498) compared to the KGE value for the ROS method (-1.029) and similar performance in the other three indicators. At CV = 1.60, the ROS method returned the best results in three metrics and the second-best RMSE value. The results were satisfactory up to CV = 0.40, with MPE and MAPE values below 0.20, RMSE < 0.200 mg/L, and KGE > 0.440. No method is recommended for higher asymmetries, as the absolute errors exceeded 59%, RMSE > 0.300 mg/L, and KGE < -0.250. Best methods for estimating statistics Table 6 presents the best methods for estimating means, standard deviations, coefficients of variations, and medians based on comparing the results obtained using the described metrics. The choice depends on the censoring percentage, the estimated variable, and the asymmetry that generated the synthetic series. Three techniques stood out due to their mean values (HDL, ROS, and LR2 methods). The semiparametric method was more frequent and appeared mainly in higher asymmetries (CV = 0.8 and 1.6), similar to the finding in Shunway et al. (2002). The semiparametric method also appeared in low censoring percentages (up to 50%) in lower asymmetries. The LR2 method appeared mainly in low asymmetries (CV = 0.10, 0.25, and 0.40), and the HDL method appeared at high censoring percentages associated with medium asymmetry (CV = 0.25, 0.40), and at lower percentages for CV = 1.60. We recommended four methods for estimating standard deviations: the ZDL, HDL, ROS, and LR2 methods. The robust technique was more frequently recommended, indicating its adequacy for smaller asymmetries (CV = 0.10, 0.20, and 0.40) at lower censoring percentages and higher asymmetries (CV = 0.80 and 1.60) at up to 80% of undetected values. For censoring percentages above 60%, substitution methods may be better than the ROS method. It is essential to mention that there are shallow errors in estimating standard deviations at CV = 0.80 and 1.60, even at high censoring percentages. Another important observation concerns the series generated with CV = 0.40, where the HDL method is recommended from 40% to 70%. This choice was made mainly because the ROS method had an RMSE value that was at least 40% higher than that in the HDL method. George et al. (2021) simulated series with CV = 0.50 and censoring percentage = 50% and found better results for estimating standard deviations using the ROS method, possibly due to the CV difference and the use of bias instead of other performance metrics. Tekindal et al. (2017) simulated series with CV = 0.473 and censoring percentage = 25% and found a bias that was 50% higher than in the LR2 method, similar to the findings in the present research. However, their paper recommends both techniques for CV = 0.40 and censoring percentages up to 30% based on the MAPE results (LR2: 1.22% > ROS: 1.82%), KGE values (LR2 ~ ROS), and other excellent metric values. The recommended methods for estimating coefficients of variation include the HDL, ROS, and LR2 methods with almost the same frequency. The LR2 method was recommended at high censoring percentages associated with slight skewness and up to 50% non-detectable values and intermediate CVs (0.25, 0.40, and 0.80). The HDL method was suggested in high censorship and/or asymmetry scenarios, while the LR2 was the best method. The semiparametric technique was recommended at the ends of the table (CV = 0.10 and percentages up to 60%) and in higher asymmetries at specific censoring percentages. To estimate medians, the DL method was chosen for small percentages and levels of asymmetry, where there is a smaller density of lower values. The substitution methods are distributed in this table using this logic. The LR2 method is associated with higher percentages and/or more asymmetric series than is the DL method, and the HDL method is related to higher percentages and/or more asymmetric series than is the LR2 method. Tekindal et al. (2017) obtained the best results using the LR2 method (bias ~ 40%) at 65% censoring in series generated with CV = 0.473 and bias ~ 45% to estimate medians. These observations are in line with Table 6 and, made using four different metrics. Antweiller & Taylor (2008) analyzed the median values of series with more than 70% censored data and obtained poor estimates. Among the methods examined in that study, the use of the ROS method yielded relatively better results (MPE = -49.5% and MAPE = 63.3%). In the current research, the performance of the ROS method was superior, possibly due to the authors of the above study using monitored series without verifying their adherence to the probability distribution. The summary presented in Table 6 should not be used indiscriminately. This study is restricted to monitored series, which fits the log-normal (2P) distribution with a CV ranging from 0.10 to 1.60. CONCLUSIONS From the results of the simulations, the below conclusions can be drawn: The use of four metrics to select the best estimation method was appropriate, as they complement each other. In certain situations, when the results do not converge, it is important to compare them to draw more accurate conclusions; The use of the coefficients of variation of environmental series that fit a log-normal distribution was essential to appropriately select the best technique for estimating statistics; The semiparametric technique produced significant differences in MPE and MAPE values, indicating the presence of bias with varying signs, and if bias alone was used to select the best method for estimating variables, then choosing the ROS method would lead to an incorrect forecast; Substitution by the DL/2, by DL/20.5 and ROS methods was the most appropriate techniques for estimating the variables described, emphasizing the ROS method when estimating parametric variables and the substitution by DL/20.5 method for medians. The recommended techniques for estimating the coefficient of variation differed from those most suitable for forecasting means and standard deviations, especially in highly skewed series and therefore, this statistic must be studied separately and incorporated into stochastic simulation studies for censored data treatment; It is possible to estimate the statistical summaries of interest with moderate errors, even at high censoring percentages (80%), except for the median in synthetic series generated with a coefficient of variation at CV = 0.80; Despite the limitations reported in the literature regarding imputation methods, such as their recommended use for small percentages of censoring and the lack of scientific basis, these techniques have provided more accurate estimates in several studied scenarios, even at high percentages of censoring; vii The number of elements in the synthetic series did not significantly influence the quality of the results, unlike the censoring percentage. REFERENCES Antweiller R. C. Taylor H. E. 2008 Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics Environmental Science & Technology 42 10 3732 3738 http://dx.doi.org/10.1021/es071301c Antweiller, R. C., & Taylor, H. E. (2008). Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics. Environmental Science & Technology, 42(10), 3732-3738. http://dx.doi.org/10.1021/es071301c. Bahk G. J. Lee H. J. 2021 Microbial-Maximum Likelihood estimation tool for microbial quantification in food from left-censored data using maximum likelihood. Frontiers in Microbiology 12 730733 http://dx.doi.org/10.3389/fmicb.2021.730733 Bahk, G. J., & Lee, H. J. (2021). Microbial-Maximum Likelihood estimation tool for microbial quantification in food from left-censored data using maximum likelihood. Frontiers in Microbiology, 12, 730733, http://dx.doi.org/10.3389/fmicb.2021.730733. Brasil Ministério da Saúde 2021 4 5 Portaria GM/MS nº 888, de 4 de maio de 2021. Altera o Anexo XX da Portaria de Consolidação GM/MS nº 5, de 28 de setembro de 2017, para dispor sobre os procedimentos de controle e de vigilância da qualidade da água para consumo humano e seu padrão de potabilidade Diário Oficial da República Federativa do Brasil Brasília Brasil. Ministério da Saúde. (2021, 4 de maio). Portaria GM/MS nº 888, de 4 de maio de 2021. Altera o Anexo XX da Portaria de Consolidação GM/MS nº 5, de 28 de setembro de 2017, para dispor sobre os procedimentos de controle e de vigilância da qualidade da água para consumo humano e seu padrão de potabilidade. Diário Oficial da República Federativa do Brasil, Brasília. Canales R. A. Wilson A. M. Pearce-Walker J. I. Verhougstraete M. P. Reynolds K. A. 2018 Methods for handling left-censored data in quantitative microbial risk assessment Applied and Environmental Biology 84 20 1 10 http://dx.doi.org/10.1128/AEM.01203-18 Canales, R. A., Wilson, A. M., Pearce-Walker, J. I., Verhougstraete, M. P., & Reynolds, K. A. (2018). Methods for handling left-censored data in quantitative microbial risk assessment. Applied and Environmental Biology, 84(20), 1-10. http://dx.doi.org/10.1128/AEM.01203-18. Cantoni B. Compagni R. D. Turola A. Epifani I. Antonelli M. A. 2020 Statistical assessment of micropollutants occurrence, time trend, fate and human health risk using left-censored water quality data Chemosphere 257 1 11 https://doi: 10.1016/j.chemosphere.2020.127095 Cantoni, B., Compagni, R. D., Turola, A., Epifani, I., & Antonelli, M. A. (2020) Statistical assessment of micropollutants occurrence, time trend, fate and human health risk using left-censored water quality data. Chemosphere, 257, 1-11. https://doi: 10.1016/j.chemosphere.2020.127095. Christófaro C. Leão M. D. 2014 Tratamento de dados censurados em estudos ambientais Quimica Nova 37 1 104 110 http://dx.doi.org/10.1590/S0100-40422014000100019 Christófaro, C., & Leão, M. D. (2014). Tratamento de dados censurados em estudos ambientais. Quimica Nova, 37(1), 104-110. http://dx.doi.org/10.1590/S0100-40422014000100019. Daneshkhah A. R. Menzemer C. C. 2018 Lifetime statistical analysis of welded aluminum light pole structuresunder cyclic loading Journal of Structural Engineering 144 9 1 8 http://dx.doi.org/10.1061/(ASCE)ST.1943-541X.0002159 Daneshkhah, A. R., & Menzemer, C. C. (2018). Lifetime statistical analysis of welded aluminum light pole structuresunder cyclic loading. Journal of Structural Engineering, 144(9), 1-8. http://dx.doi.org/10.1061/(ASCE)ST.1943-541X.0002159. Faucheux L. Resche-Rigon M. Curis E. Soumellis V. Chevret S. 2021 Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures Biometrical Journal. Biometrische Zeitschrift 63 372 393 http://dx.doi.org/10.1002/bimj.201900366 Faucheux, L., Resche-Rigon, M., Curis, E., Soumellis, V., & Chevret, S. (2021). Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures. Biometrical Journal. Biometrische Zeitschrift, 63, 372-393. http://dx.doi.org/10.1002/bimj.201900366. Fusek M. Michálek J. Bunkóva L. Bunka F. 2020 Modelling biogenic amines in fish meat in Central Europe using censored distributions. Chemosphere, 251, 1-7 Article 126390 http://dx.doi.org/10.1016/j.chemosphere.2020.126390 Fusek, M., Michálek, J., Bunkóva, L., & Bunka, F. (2020). Modelling biogenic amines in fish meat in Central Europe using censored distributions. Chemosphere, 251, 1-7. Article, 126390, http://dx.doi.org/10.1016/j.chemosphere.2020.126390. George B. G. Gains-German L. Broms K. Black K. Furman M. Hays M. D. Thomas K. W. Simmons J. E. 2021 Censoring trace-level environmental data: statistical analysis considerations to limit bias Environmental Science & Technology 55 3786 3795 http://dx.doi.org/10.1021/acs.est.0c02256 George, B. G., Gains-German, L., Broms, K., Black, K., Furman, M., Hays, M. D., Thomas, K. W., & Simmons, J. E. (2021). Censoring trace-level environmental data: statistical analysis considerations to limit bias. Environmental Science & Technology, 55, 3786-3795. http://dx.doi.org/10.1021/acs.est.0c02256. Hall L. W. Junior Perry E. Anderson R. D. 2020 A comparison of diferent statistical methods for addressing censored left data in temporal trends analysis of pyrethroids in a California stream Archives of Environmental and Toxicology 79 508 523 http://dx.doi.org/10.1007/s00244-020-00769-0 Hall Junior, L. W., Perry, E., & Anderson, R. D. (2020). A comparison of diferent statistical methods for addressing censored left data in temporal trends analysis of pyrethroids in a California stream. Archives of Environmental and Toxicology, 79, 508-523. http://dx.doi.org/10.1007/s00244-020-00769-0. Helsel D. R. Cohn T. A. 1988 Estimation of descriptive statistics for multiply censored water quality data Water Resources Research 24 12 1997 2004 http://dx.doi.org/10.1029/WR024i012p01997 Helsel, D. R., & Cohn, T. A. (1988). Estimation of descriptive statistics for multiply censored water quality data. Water Resources Research, 24(12), 1997-2004. http://dx.doi.org/10.1029/WR024i012p01997. Helsel D. R. Hirsch R. M. 2002 Statistical methods in water resources – Chapter A3: techniques of water-resources investigations - Book 4 Reston United States Geological Survey Helsel, D. R., & Hirsch, R. M. (2002) Statistical methods in water resources – Chapter A3: techniques of water-resources investigations - Book 4. Reston: United States Geological Survey. Helsel D. R. Hirsch R. M. Ryberg K. R. Archfield S. A. Gilroy E. J. 2020 Statistical Methods in Water Resources. Department of Interior Hydrologic Analysis and Interpretation Reston United States Geological Survey https://doi.org/10.3133/tm4A3 Helsel, D. R., Hirsch, R. M., Ryberg, K. R., Archfield, S. A., & Gilroy, E. J. (2020). Statistical Methods in Water Resources. In Department of Interior (Ed). Hydrologic Analysis and Interpretation. Reston: United States Geological Survey. https://doi.org/10.3133/tm4A3. Hewett P. Ganser G. 2007 A comparison of several methods for analyzing censored data The Annals of Occupational Hygiene 51 7 611 632 http://dx.doi.org/10.1093/annhyg/mem045 Hewett, P., & Ganser, G. (2007). A comparison of several methods for analyzing censored data. The Annals of Occupational Hygiene, 51(7), 611-632. http://dx.doi.org/10.1093/annhyg/mem045. Knoben W. J. M. Freer J. E. Woods R. A. 2019 Technical note: inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores Hydrology and Earth System Sciences 23 4323 4331 http://dx.doi.org/10.5194/hess-23-4323-2019 Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences, 23, 4323-4331. http://dx.doi.org/10.5194/hess-23-4323-2019. Kroll C. N. Stedinger J. R. 1996 Estimation of moments and quantiles using censored data Water Resources Research 32 4 1005 1012 http://dx.doi.org/10.1029/95WR03294 Kroll, C. N., & Stedinger, J. R. (1996). Estimation of moments and quantiles using censored data. Water Resources Research, 32(4), 1005-1012. http://dx.doi.org/10.1029/95WR03294. Liu Y. Ortega J. F. Mudarra M. Hartman A. 2022 Pitfalls and a feasible solution for using KGE as an informal likelihood function in MCMC methods: DREAaM(ZS) as an example Hydrology and Earth System Sciences 26 20 5341 5355 http://dx.doi.org/10.5194/hess-26-5341-2022 Liu, Y., Ortega, J. F., Mudarra, M., & Hartman, A. (2022). Pitfalls and a feasible solution for using KGE as an informal likelihood function in MCMC methods: DREAaM(ZS) as an example. Hydrology and Earth System Sciences, 26(20), 5341-5355. http://dx.doi.org/10.5194/hess-26-5341-2022. Mohamed R. A. B. Brooks S. C. Tsai C. H. Ahmed T. Rucker D. F. Ulery A. L. Pierce E. M. Carroll K. C. 2021 Geostatistical interpolation of streambed hydrologic attributes with addition of left censored data and anisotropy Journal of Hydrology 599 126474 http://dx.doi.org/10.1016/j.jhydrol.2021.126474 Mohamed, R. A. B., Brooks, S. C., Tsai, C. H., Ahmed, T., Rucker, D. F., Ulery, A. L., Pierce, E. M., & Carroll, K. C. (2021). Geostatistical interpolation of streambed hydrologic attributes with addition of left censored data and anisotropy. Journal of Hydrology, 599, 126474, http://dx.doi.org/10.1016/j.jhydrol.2021.126474. Mora M. Walker T. R. Willis R. 2022 Spatiotemporal characterization of petroleum hydrocarbons and polychlorinated biphenyls in smal craft harbours sediments in Nova Scotia, Canada Marine Pollution Bulletin 177 1 14 http://dx.doi.org/10.1016/j.marpolbul.2022.113524 Mora, M., Walker, T. R., & Willis, R. (2022). Spatiotemporal characterization of petroleum hydrocarbons and polychlorinated biphenyls in smal craft harbours sediments in Nova Scotia, Canada. Marine Pollution Bulletin, 177, 1-14. http://dx.doi.org/10.1016/j.marpolbul.2022.113524. Morley S. K. Brito T. V. Welling D. T. 2018 Measures of model performance based on the log accuracy ratio Space Weather 16 69 88 http://dx.doi.org/10.1002/2017SW001669 Morley, S. K., Brito, T. V., & Welling, D. T. (2018). Measures of model performance based on the log accuracy ratio. Space Weather, 16, 69-88. http://dx.doi.org/10.1002/2017SW001669. Naghettini M. 2017 Fundamentals of statistical hydrology. Switzerland Springer Cham https://doi.org/10.1007/978-3-319-43561-9 Naghettini, M. (2017). Fundamentals of statistical hydrology. Switzerland: Springer Cham. https://doi.org/10.1007/978-3-319-43561-9. Niemann J. 2016 Statistical modelling of environmental data with non-detects Retrieved in 2023, August 20 from https://www.causeweb.org/usproc/sites/default/files/usresp/2016/december/jennifer-niemann.pdf Niemann, J. (2016). Statistical modelling of environmental data with non-detects. Retrieved in 2023, August 20, from https://www.causeweb.org/usproc/sites/default/files/usresp/2016/december/jennifer-niemann.pdf. Nostbaken O. J. Rasinger J. D. Hannisdal R. Sanden M. Froyland M. Duinker A. Frantzen S. Dahl L. M. Lundebye A. K. Madsen L. 2021 Levels of omega 3 fatty acids, vitamin D, dioxins and dioxin-like PCBs in oily fish; a new perspective on the reporting of nutrient and contaminant data for risk–benefit assessments of oily seafood Environment International 147 106322 http://dx.doi.org/10.1016/j.envint.2020.106322 Nostbaken, O. J., Rasinger, J. D., Hannisdal, R., Sanden, M., Froyland, M., Duinker, A., Frantzen, S., Dahl, L. M., Lundebye, A. K., & Madsen, L. (2021). Levels of omega 3 fatty acids, vitamin D, dioxins and dioxin-like PCBs in oily fish; a new perspective on the reporting of nutrient and contaminant data for risk–benefit assessments of oily seafood. Environment International, 147, 106322, http://dx.doi.org/10.1016/j.envint.2020.106322. Pinto C. C. Calazans G. M. Oliveira S. C. 2019 Assessment of spatial variations in the surface water qualityof the Velhas River Basin, Brazil, using multivariate statistical analysis and nonparametric statistics Environmental Monitoring and Assessment 191 164 1 13 http://dx.doi.org/10.1007/s10661-019-7281-y Pinto, C. C., Calazans, G. M., & Oliveira, S. C. (2019). Assessment of spatial variations in the surface water qualityof the Velhas River Basin, Brazil, using multivariate statistical analysis and nonparametric statistics. Environmental Monitoring and Assessment, 191(164), 1-13. http://dx.doi.org/10.1007/s10661-019-7281-y. She N. 1997 Analyzing censored water quality data using a nonparametric approach Journal of the American Water Resources Association 33 615 624 http://dx.doi.org/10.1111/j.1752-1688.1997.tb03536.x She, N. (1997). Analyzing censored water quality data using a nonparametric approach. Journal of the American Water Resources Association, 33, 615-624. http://dx.doi.org/10.1111/j.1752-1688.1997.tb03536.x. Shunway R. Azari R. Kayhanian M. 2002 Statistical approaches to estimating mean water quality concentrations with detection limits Environmental Science & Technology 36 3345 3353 http://dx.doi.org/10.1021/es0111129 Shunway, R., Azari, R., & Kayhanian, M. (2002). Statistical approaches to estimating mean water quality concentrations with detection limits. Environmental Science & Technology, 36, 3345-3353. http://dx.doi.org/10.1021/es0111129. Soares A. L. C. Pinto C. C. Cordova J. E. Gomes L. N. L. Oliveira S. M. A. C. 2021 Water quality assessment of a multiple use reservoir in southeastern Brazil: case study of the Vargem das Flores reservoir Environmental Earth Sciences 80 210 1 21 http://dx.doi.org/10.1007/s12665-021-09474-0 Soares, A. L. C., Pinto, C. C., Cordova, J. E., Gomes, L. N. L., & Oliveira, S. M. A. C. (2021). Water quality assessment of a multiple use reservoir in southeastern Brazil: case study of the Vargem das Flores reservoir. Environmental Earth Sciences, 80(210), 1-21. http://dx.doi.org/10.1007/s12665-021-09474-0. Tekindal M. A. Erdogan B. D. Yavuz Y. 2017 Evaluating left-censored data through substitution, parametric, semiparametric, and nonparametric methods: a simulation study Interdisciplinary Sciences, Computational Life Sciences 9 2 153 172 http://dx.doi.org/10.1007/s12539-015-0132-9 Tekindal, M. A., Erdogan, B. D., & Yavuz, Y. (2017). Evaluating left-censored data through substitution, parametric, semiparametric, and nonparametric methods: a simulation study. Interdisciplinary Sciences, Computational Life Sciences, 9(2), 153-172. http://dx.doi.org/10.1007/s12539-015-0132-9. Towner J. Cloke H. L. Zsoter E. Flamig Z. Hoch J. M. Bazo J. Coughlan de Perez E. Stephens E. M. 2019 Assessing the performance of global hydrological models for capturing peak river flows in the Amazon basin Hydrology and Earth System Sciences 23 7 3057 3080 http://dx.doi.org/10.5194/hess-23-3057-2019 Towner, J., Cloke, H. L., Zsoter, E., Flamig, Z., Hoch, J. M., Bazo, J., Coughlan de Perez, E., & Stephens, E. M. (2019). Assessing the performance of global hydrological models for capturing peak river flows in the Amazon basin. Hydrology and Earth System Sciences, 23(7), 3057-3080. http://dx.doi.org/10.5194/hess-23-3057-2019. Tran T. M. P. Abrams S. Aerts M. Maertens K. Hens N. 2021 Measuring association among censored antibody titer data Statistics in Medicine 40 3740 3761 http://dx.doi.org/10.1002/sim.8995 Tran, T. M. P., Abrams, S., Aerts, M., Maertens, K., & Hens, N. (2021). Measuring association among censored antibody titer data. Statistics in Medicine, 40, 3740-3761. http://dx.doi.org/10.1002/sim.8995. US Environmental Protection Agency 2016 Definition and procedure for the determination of the method detection limit, revision 2 Washington, DC US Environmental Protection Agency US Environmental Protection Agency. (2016). Definition and procedure for the determination of the method detection limit, revision 2. Washington, DC: US Environmental Protection Agency. Von Sperling M. Verbyla M. E. Oliveira S. M. A. C. 2020 Assessment of treatment plant performance and water quality data: a guide for students, researchers and practitioners. London IWA Publishing Von Sperling, M., Verbyla, M. E., & Oliveira, S. M. A. C. (2020). Assessment of treatment plant performance and water quality data: a guide for students, researchers and practitioners. London: IWA Publishing. Wang X. Guoyou Q. Xinyuan S. Yanlin T. 2022 Censored quantile regression based on multiply robust propensity scores Statistical Methods in Medical Research 31 3 475 487 http://dx.doi.org/10.1177/09622802211060520 Wang, X., Guoyou, Q., Xinyuan, S., & Yanlin, T. (2022). Censored quantile regression based on multiply robust propensity scores. Statistical Methods in Medical Research, 31(3), 475-487. http://dx.doi.org/10.1177/09622802211060520. Zhan H. N. Zaman Q. Azmi F. Shahzada G. Jakovljevic M. 2022 Methods for improving the variance of Kaplan-Meier survival function, when there is no, mderate and heavy censoring-applied in oncological datasets Frontiers in Public Health 10 1 13 https://doi.org/10.3389/fpubh.2022.793648 Zhan, H. N., Zaman, Q., Azmi, F., Shahzada, G., & Jakovljevic, M. (2022).Methods for improving the variance of Kaplan-Meier survival function, when there is no, mderate and heavy censoring-applied in oncological datasets. Frontiers in Public Health, 10, 1-13. https://doi.org/10.3389/fpubh.2022.793648. Zhang W. Gu X. Hong L. Han L. Wang L. 2023 Comprehensive review of machine learning in geotechnical reliability analysis: Algorithms, applications and further challenges Applied Soft Computing 136 1 18 http://dx.doi.org/10.1016/j.asoc.2023.110066 Zhang, W., Gu, X., Hong, L., Han, L., & Wang, L. (2023). Comprehensive review of machine learning in geotechnical reliability analysis: Algorithms, applications and further challenges. Applied Soft Computing, 136, 1-18. http://dx.doi.org/10.1016/j.asoc.2023.110066.
location_on
Associação Brasileira de Recursos Hídricos Av. Bento Gonçalves, 9500, CEP: 91501-970, Tel: (51) 3493 2233, Fax: (51) 3308 6652 - Porto Alegre - RS - Brazil
E-mail: rbrh@abrh.org.br
rss_feed Acompanhe os números deste periódico no seu leitor de RSS
Acessibilidade / Reportar erro