ABSTRACT
The distribution of externally studentized midrange was created based on the original studentization procedures of Student and was inspired in the distribution of the externally studentized range. The large use of the externally studentized range in multiple comparisons was also a motivation for developing this new distribution. This work aimed to derive analytic equations to distribution of the externally studentized midrange, obtaining the cumulative distribution, probability density and quantile functions and generating random values. This is a new distribution that the authors could not find any report in the literature. A second objective was to build an R package for obtaining numerically the probability density, cumulative distribution and quantile functions and make it available to the scientific community. The algorithms were proposed and implemented using Gauss-Legendre quadrature and the Newton-Raphson method in R software, resulting in the SMR package, available for download in the CRAN site. The implemented routines showed high accuracy proved by using Monte Carlo simulations and by comparing results with different number of quadrature points. Regarding to the precision to obtain the quantiles for cases where the degrees of freedom are close to 1 and the percentiles are close to 100%, it is recommended to use more than 64 quadrature points.
Index terms:
Distribution function; density function; Gauss-Legendre quadrature; Newton-Raphson method; R.
RESUMO
A distribuição da midrange estudentizada externamente foi criada com base nos procedimentos de estudentização de Student e foi inspirada na distribuição da amplitude estudentizada externamente. O amplo uso da amplitude estudentizada externamente em comparações múltiplas também foi uma das motivações para desenvolver esta nova distribuição. Neste trabalho objetivou-se derivar expressões analíticas da distribuição da midrange estudentizada externamente, obtendo a função de distribuição, função densidade de probabilidade, função quantil e geradores de números aleatórios. Essa é uma nova distribuição que os não há relatos na literatura especializada. Um segundo objetivo foi construir um pacote R para obter numericamente as funções mencionadas e torná-las disponíveis para a comunidade científica. Os algoritmos foram propostos e implementados usando os métodos de quadratura Gauss-Legendre e Newton-Raphson no software R, resultando no pacote SMR, disponível para baixar na página do CRAN. As rotinas implementadas apresentaram alta acurácia, sendo verificadas usando simulação Monte Carlo e pela comparação com diferentes pontos de quadratura. Quanto a precisão para se obter os quantis da distribuição da midrange estudentizada externamente para 1 grau de liberdade ou percentis próximo de 100%, é sugerido utilizar mais do que 64 pontos de quadratura.
Termos para indexação:
Função de distribuição; função densidade; quadratura Gauss-Legendre; método Newton-Raphson; algoritmo; R
INTRODUCTION
Many problems on statistical investigations are based on studies of sample order statistics. Important cases of the order statistics are the minimum and maximum value of a sample. Among other functions of order statistics, the range and midrange are of special interest here, which correspond to the difference between the maximum and the minimum and to the average from these two extreme values, respectively. Several authors studied and applied this subject (Tippet, 1925TIPPET, L. H. C. On the extreme individuals and the range of samples taken from a normal population. Biometrika , 17:365-387, 1925.; Pearson and Hartley, 1942PEARSON, E. S.; HARTLEY, H. O. The probability integral of the range in samples of on observations from a normal population. Biometrika , 32:301-310, 1942. ; Gumbel, 1944GUMBEL, E. J. Range and midranges. The Annals of Mathematical Statistics , 15(4):414-422, 1944. , 1946GUMBEL, E. J. On the independence of the extremes in a sample. The Annals of Mathematical Statistics, 17(1):78-80, 1946.; Wilks, 1948WILKS, S. S. Order statistics. Bulletin of the American Mathematical Society, 54:6-50, 1948; Yalcin et. al., 2014YALCIN, G. C.; ROBLEDO, A.; GELL-MANN, M. Incidence of q statistics in rank distributions. Proceedings of the National Academy of Sciences, 111(39):14082-14087, 2014.; Wan et. al., 2014WAN, X. et al. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14:135, 2014. ; Barakat et. al., 2015BARAKAT, H. M.; NIGM, E. M.; ELSAWAH, A. M. Asymptotic distributions of the generalized range, midrange, extremal quotient, and extremal product, with a comparison study. Communications in Statistics - Theory and Methods, 44(5):900-913, 2015.; Bland, 2015BLAND, M. Estimating mean and standard deviation from the sample size, three quartiles, minimum, and maximum. International Journal of Statistics in Medical Research, 4:57-64, 2015.; Mansouri, 2015MANSOURI, H. Simultaneous inference based on rank statistics in linear models. Journal of Statistical Computation and Simulation , 85(4):660-674, 2015.; Li and Mansouri, 2016LI, B.; MANSOURI, H. Simultaneous rank tests for detecting differentially expressed genes. Journal of Statistical Computation and Simulation, 86(5):959-972, 2016.) and some of these studies are discussed in the sequence.
Tippet (1925TIPPET, L. H. C. On the extreme individuals and the range of samples taken from a normal population. Biometrika , 17:365-387, 1925.) studied the first four moments of the range. Pearson and Hartley (1942PEARSON, E. S.; HARTLEY, H. O. The probability integral of the range in samples of on observations from a normal population. Biometrika , 32:301-310, 1942. ) obtained tabulated values of the cumulative probabilities for several range values in small samples (n = 2 to 20) drawn from normal populations. Gumbel (1944GUMBEL, E. J. Range and midranges. The Annals of Mathematical Statistics , 15(4):414-422, 1944. , 1946) established the independence of extreme values for large samples from several continuous distribution, as well as the distribution of the range and midrange. Wilks (1948WILKS, S. S. Order statistics. Bulletin of the American Mathematical Society, 54:6-50, 1948) reviewed several articles relating to the order statistics and suggested several examples of their applications to statistical inference.
Studies related with studentization were initially proposed by Student (1927) and the particular case of the studentized range distribution have been largely used in multiple comparison procedures in different areas of scientific research since the pioneering works in this area (Duncan, 1952DUNCAN, D. B. On the properties of the multiple comparisons test. Virginia Journal of Science, 3:50-67, 1952., 1955DUNCAN, D. B. Multiple range and multiple F tests. Biometrics, 11:1-42, 1955.; Tukey, 1949TUKEY, J. W. Comparing individual means in the analysis of variance. Biometrics , 5:99-114, 1949., 1953TUKEY, J. W. The problem of multiple comparisons. Unpublished Dittoed Notes, Princeton University, 1-300, 1953.). The studentized range refers to the random variable defined simply as the range divided by the sample standard deviation, considering that both terms of this ratio are random variables independently distributed and computed in samples drawn from the normal distribution.
Let Y 1, Y 2, …, Y n be the order statistics in a sample of size , that are defined by sorting the original sample variables X 1, X 2, …, X n in increasing order. The sample X 1, X 2, …, X n are drawn from a population with distribution function F(x). The range is defined by W = Y n - Y 1. The cumulative distribution and the probability density functions (cdf and pdf) of the range are the Equations 1 and 2,
respectively, as showed in David and Nagaraja (2003DAVID, H. A.; NAGARAJA, H. N. Order Statistics. John Wiley & Sons, Canada, 2003. p.458.) and in Gumbel (1947GUMBEL, E. J. The distribution of the range. The Annals of Mathematical Statistics , 18(3):384-412, 1947.).
Considering now samples of size n from the normal distribution with standard deviation σ and mean , the externally studentized range is defined by the ratio where W’ = W/σ is the sample standard range and S 2 is an independent and unbiased estimator of σ2, associated with ν degrees of freedom. The cumulative distribution and the probability density functions, according to David and Nagaraja (2003DAVID, H. A.; NAGARAJA, H. N. Order Statistics. John Wiley & Sons, Canada, 2003. p.458.), are given by Equations 3 and 4,
where ϕ(y) and Ф(y) are the probability density and cumulative distribution functions from a standard normal population evaluated at y, with y ∈]−∞,∞[ and f(x;ν) is the probability density function of X = S/σ. Considering that X is obtained in a sample of size ν+1 from the normal distribution, then it is well known the fact that , i.e., it has a chi-square distribution with ν degrees of freedom (Mood et al., 1974MOOD, A. M.; GRAYBILL, F. A.; BOES, D. C. Introduction to the Theory of Statistics. McGraw-Hill, New York, 1974. p.564.). Hence, . Therefore, it can be concluded that .
Theorem 1. If X=S/σ is computed in a sample of size ν+1 from a normal distribution with mean μ and variance σ², then its probability density function is given by Equation 5,
Proof. If , as discussed above, then the distribution of X = S/σ can be obtained from the transformation U = v X 2 . The Jacobian of this transformation is given by
for x > 0.
The density function of X, from samples of a normal population, is obtained from the chi-square distribution by
resulting in Equation 5.
Another very interesting order statistic is the midrange and introduced, among others, by Gumbel (1947GUMBEL, E. J. The distribution of the range. The Annals of Mathematical Statistics , 18(3):384-412, 1947.) and Rider (1957RIDER, P. R. The midrange of a sample as an estimator of the population midrange. Journal of the American Statistical Association, 52(280):537-542, Dec. 1957.).
Definition 1. The midrange is defined as the mean between the minimum and the maximum order statistics by Equation 6,
The externally studentized midrange is defined considering the original studentization procedures of Student (1927) and was inspired in the externally studentized range definition.
Definition 2. The externally studentized midrange are defined by
where is expressed in Equation 6 and S is an estimator of the population standard deviation σ with ν degrees of freedom obtained independently from . It should be noticed that is a random variable.
However, few studies address the midrange distribution and none was found on the externally studentized midrange, considering normal or non-normal populations. The importance of studies about the distribution of the externally studentized midrange could be enormous in the analysis of experiments. Rider (1957RIDER, P. R. The midrange of a sample as an estimator of the population midrange. Journal of the American Statistical Association, 52(280):537-542, Dec. 1957.), among others, proved that the midrange estimator is more efficient than the sample mean in platykurtic distributions. Another important aspect that could be useful is the proposition of multiple comparison procedures based on externally studentized midrange, that could potentially show better results than the traditional tests based on the studentized range.
This work aimed to obtain the externally studentized midrange distribution. It intended to develop analytic equations to distribution of , obtaining the cumulative distribution, probability density and quantile functions and generating random values. A second objective was to build an R package (R Development Core Team, 2017R DEVELOPMENT CORE TEAM. R: A Language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2017.) for obtaining numerically the probability density, cumulative distribution and quantile functions using Gaussian quadratures and Newton-Raphson method and make the R package available to the scientific community.
MATERIAL AND METHODS
The externally studentized normal midrange distribution
Let Y 1, Y 2, … , Y n be the order statistics in a sample of size n, that are defined by sorting the original sample variables X 1, X 2, … , X n , in increasing order. The sample X 1, X 2, … , X n are drawn from a population with distribution function F(x). Therefore, the distribution of the midrange is given by the following theorem.
Theorem 2. The probability density function and the cumulative distribution function of , Definition 1, from a random sample X1, X2, … , Xn, of size n, where Xj has distribution function F(x) and probability density function f(x), j = 1, 2, … , n given by Equations 7 and 8,
and
respectively
Proof. Let the joint distribution of Y 1 and Y n (David and Nagaraja, 2003DAVID, H. A.; NAGARAJA, H. N. Order Statistics. John Wiley & Sons, Canada, 2003. p.458.) given by Equation 9,
for u< x, then to obtain the distribution of , the transformations and should be considered. The Jacobian of these transformations is
Therefore, the joint density of and Z, using 9, is given by
The required density can be obtained by integrating the above joint density in relation to z, resulting in
The distribution function can be obtained by Equation 10,
The simplification of the equation 10 can be performed by inversion of the order of the integrals. There is a dependency between and Z, then by fixing z, t will vary in the interval [ ]. Note that the upper limit of the interval, , refers to the definition of the cumulative distribution function. Therefore, the results of the change of integration order is
Note that
So,
As,
the result of the cumulative distribution function is
as showed in Gumbel (1958GUMBEL, E. J. Statistics of Extremes. Columbia University Press, New York, 1958. p.375).
RESULTS AND DISCUSSION
In the particular case of standard normal population, the Equations 7 and 8 can be rewritten by Equations 11 and 12,
and
respectively.
If samples from a normal distribution with mean 0 and variance σ2 are considered, then the distribution of will depend on σ. The cumulative distribution function of in this case is obtaining directly from 12 by Equation 13,
where ϕ0,σ 2 (z) and Ф0,σ 2 (z) are the probability density and cumulative distribution functions, respectively, of the normal distribution with mean 0 and variance σ2. The probability density function Ф0,σ 2 (z) is related to the probability density function of the standard normal distribution by ϕ0,σ 2 (z) = ϕ(z/σ)/σ. In the same way, the relationship between the cumulative probability functions is ϕ0,σ 2 (z) = ϕ(z/σ). Hence, if Z/σ is denote by Y, the cumulative distribution function, Equation 13, can be rewritten by Equation 14,
Similarly, the same procedure can be realized for Equation 11.
Definition 3. The standardized midrange is defined by Equation 15,
where is the midrange from Definition 1 and is the population standard deviation.
Theorem 3. The probability density function and cumulative distribution function of , from Equation 15 are given by Equations 16 and 17,
and
respectively.
Proof. The cumulative distribution function of is obtaining from the cumultive distribution function given in 14. Hence, making the transformations of variables and Y, with the Jacobian J = σ, the cumulative distribution function of is given by
and the probability density function of is obtained by , that is,
Solving
the probability density function of is
as expected.
It is worth noting that the probability density functions 11 and 16 and the cumulative probability functions 12 and 17 are equals. Therefore, the standardized midrange distribution obtained in normal populations with mean zero and variance σ2 is the same of the midrange from standard normal populations.
However, the objective is to find the externally studentized normal midrange (, Definition 2) distribution, where can also be defined by Equation 18,
where and S, with v degrees of freedom, are independently distributed. This occurs, for example, when is obtained from a random sample of size n from a normal population, and S, the standard deviation, is from another random sample of size v + 1. The independence also occurs when is a function of factor means in an experimental design, with n levels and , where MSE is the mean square error with v degrees of freedom and r is the number of replications associated with each treatment mean, as can be found, e.g., in Searle (1987SEARLE, S. R. Linear models for unbalanced data. J. New York: Wiley, 1987. 560p.).
Theorem 4. The probability density function and cumulative distribution function of from Definition 2 are given by Equations 19 and 20,
and
respectively.
Proof. The distribution of given in (18), is obtained from the joint distribution of given in 16 and X = S/σ given in 5. As the variables and X are independently distributed, the joint density is the product of their marginal densities, given by Equation 21,
where f(x;y) is given in 5.
Considering the transformations and X, the Jacobian is
Therefore, from the above transformations and using the joint distribution of and X, given in 21, the joint distribution of X and is given by Equation 22,
Integrating the Equation 22 with respect to x, for , the probability density function of is obtained. Thus,
The cumulative distribution function can be obtained by performing the integration of 19 in over . Changing the variable to z, the cumulative distribution function of is given by
Changing the order of the integrals, and fixing the smallest studentized order statistic, the lower limit with respect to is now y/x. So,
Since
then,
Solving,
the cumulative distribution function of is given by
as expected.
The Equations 19 and 20 are the probability density and cumulative distribution functions, respectively, of , which is the externally studentized normal midrange. This is a novel distribution that the authors could not find any report in the literature.
In the next section, numerical methods for obtaining the probability density and cumulative distribution functions are described. Also, the quantile functions are also taken into account. The same approaches will be considered for the distribution of , Equations 16 and 17, which is the standard normal midrange.
Gauss-Legendre quadrature
The basic idea of Gauss-Legendre quadrature of a function f(x) is to write equal the Equation 23,
where f(x) ≡ w(x)g(x) and w(x) is the weight function in the Gaussian quadrature, w k and x k are the nodes and weights, respectively, in an s-point Gaussian quadrature rule, for k = 1, 2, …, s. The weight function is w(x)=1 in the Gauss-Legendre quadrature, thus f(x)=g(x). The set {x k , w k } should be determined such that equation 23 yields an exact result for polynomials of degree 2s-1 or less. For non-polynomial function the Gauss-Legendre quadrature error is defined by Equation 24,
The Gauss-Legendre quadrature was used to compute the functions 16, 17, 19 and 20. However, these functions depend on integrals over infinite intervals. The integral over an infinite range must be changed into an integral over [-1,1] by using the Equations 25, 26and 27,
Therefore, the integrals 25, 26 and 27 were computed by applying the Gauss-Legendre quadrature rule in these transformed variables by Equations 28, 29 and 30.
Details of how to compute the nodes and weights to apply the Gauss-Legendre quadrature can be found in Hildebrand (1974HILDEBRAND, F. B. Introduction to Numerical Analysis. McGraw-Hill, New York, 2 edition, 1974. p.669.).
An R package (R Development Core Team, 2017) denoted by SMR (Batista; Ferreira, 2012BATISTA, B. D. O.; FERREIRA, D. F. SMR: Externally Studentized Midrange Distribution, 2012. R package version 1.0.0.) was developed to apply the Gauss-Legendre quadratures to compute the cumulative distribution functions 17 and 20 and the probability density functions 16 and 19. The numerical transformations given by Equations 28, 29 and 30 and the methods of computation of the nodes and weights of Gauss-Legendre quadrature, as described in Hildebrand (1974HILDEBRAND, F. B. Introduction to Numerical Analysis. McGraw-Hill, New York, 2 edition, 1974. p.669.) and Gil, Segura and Temme (2007GIL, A.; SEGURA, J.; TEMME, N. M. Numerical methods for special functions. SIAM, 2007. 417p.) were used in the R codes of th e implemented package. The quantiles were computed using the Newton-Raphson method solving equations formed by equating the functions 17 or 20 to p, where 0 < p < 1 is the cumulative probability (Gil, Segura, Temme, 2007GIL, A.; SEGURA, J.; TEMME, N. M. Numerical methods for special functions. SIAM, 2007. 417p.), which is known. These computations make use of numerical quadratures of the respectively probability density functions 16 or 19, which are the first derivatives of 17 or 20.
Besides computing the cumulative distribution, probability density and quantile functions, the SMR package generates random samples of size N by Monte Carlo simulation. For this, random samples of size n, X 1, X 2, …, X n , are generated, where the X i ’s are independent and identically distributed standard normal variables, N(0, 1), for i = 1, 2, …, n. A random variable U, distributed as a chi-square variable, , is simulated, which is independently distributed of the X i ’s, where v > 0 is the degrees of freedom. Finally, the following transformation is performed
With infinity degrees of freedom v = ∞, it is suffice to compute
The process was repeated N times and the required sample values of or of were obtained. The package SMR provides the following functions, where np is the number of nodes and weights of the Gauss-Legendre quadrature:
dSMR(x, n, nu, np=32): computes values of the probability density function, given in (16) or (19);
pSMR(x, n, nu, np=32): computes values of the cumulative distribution function, given in (17) or (20);
qSMR(p, n, nu, np=32): computes quantiles of the externally studentized normal midrange;
rSMR(N, n, nu=Inf): drawn random samples of the externally studentized normal midrange.
The user can choose the argument nu as finity or infinity value. If nu=Inf, values of the probability density, cumulative distribution and quantile functions of the normal midrange (standard normal midrange) are computed. If the argument nu is not specified in the rSMR function, the default value Inf is used and random samples from the normal midrange distribution are drawn.
Performance
Evaluations of the accuracy of quadratures for computing the cumulative distribution functions 17 and 20 and for computing quantiles were performed. The SMR package was used for this purpose. There are no cumulative probabilities or quantiles of the externally studentized normal midrange to be compared, since reports of this distribution were not found in the literature. Two strategies were proposed to verify the accuracy. First, Monte Carlo simulations were used to obtain quantiles and cumulative probabilities and to compare them to those obtained by using the Gauss-Legendre quadratures from the SMR package. Second, two different number of quadrature points were used to compute these quantities. Thus, the quadrature errors were computed by comparing these two values, as showed in 24.
Validations of the algorithm by Monte Carlo simulation were done using the R software with the SMR package (Batista; Ferreira, 2012BATISTA, B. D. O.; FERREIRA, D. F. SMR: Externally Studentized Midrange Distribution, 2012. R package version 1.0.0.). A random sample of externally studentized normal midrange, , of size N = 1,000,001 was simulated using the rSMR function of the SMR package, following the procedure described above. If the degrees of freedom were ∞, then this function generates random sample from the standard normal midrange, .
Therefore, given a quantile or , the cumulative probability is computed, respectively, by
Table 1 shows the values of the distribution function computed by 64 Gauss-Legendre quadrature points and by Monte Carlo simulations (with N=1,000,001 observations in the Monte Carlo sample). Several combinations of n and v were used for some particular choices of the quantiles or . The cumulative probabilities computed by Gauss-Legendre quadrature were close to those obtained by Monte Carlo simulations (MC). The MC error is proportional to (Ciftja and Wexler, 2003CIFTJA, O.; WEXLER, C. Monte Carlo simulation method for laughlin-like states in a diskgeometry. Physical Review B, 67(7):075304, 2003.), which in this case is 0.001. The two methods showed differences in the third and fourth decimal places (Table 1), as expected. These results show, in principle, that the Gauss-Legendre quadrature has at least three significant digits of accuracy.
The quadrature error ɛ can also be computed increasing the number of points and calculating the difference between the two results. Cumulative probabilities computed by using s=64 and i=250 quadrature points and the errors obtained by calculating the differences between these values are shown in Table 2. The maximum observed error is of the order of 10-10, showing that with 64 quadrature points a high precision was achieved, for computing 17 or 20. Several other combinations of n, v and or were used to compute the Monte Carlo errors and their maximum value is still the same.
Table 3 shows quantiles for a settled value of the cumulative probabilities of 0.95, using the qSMR function of SMR package (Batista; Ferreira, 2012BATISTA, B. D. O.; FERREIRA, D. F. SMR: Externally Studentized Midrange Distribution, 2012. R package version 1.0.0., 2014BATISTA, B. D. O.; FERREIRA, D. F. SMR: An R package for computing the externally studentized normal midrange distribution. The R Journal, 6(2):123-136, 2014.). In this case, almost all the quantiles were computed using 64 quadrature points, except, for the case of v = 1, where this number of points was insufficient. In such circumstances, there are two alternatives: a) refine the quadrature, dividing the integral interval into small subintervals and approximate the integral by a sum of computation on each subinterval; or b) increasing the number of quadrature points. In this case, we opted for the latter, using 250 quadrature points. The results were shown using 3 decimal places, for v degrees of freedom, of 1(1)20(5)30, 50(50)200, 300, 1,000 and ∞, and for sample sizes n of 2(1)10(5)50(25)100. When v → ∞, thus s 2 → σ2 and the studentized midrange cumulative distribution function from standard normal populations , given in 20, tends to , the standardized midrange cumulative distribution function, given in 17. If the desired value of can not be found in Table 3, the SMR package can be used or linear interpolations can be applied to compute this quantile.
In Table 3, if the number of degrees of freedom and the cumulative probability is settled, the quantiles decrease as the sample size increases. Comparing these results of the studentized normal midrange with quantiles of the studentized normal range in the same circumstances (Newman, 1939NEWMAN, D. The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation. Biometrika, 31(1-2):20-30, 1939.), this is not observed. Fixing the number of degrees of freedom and the probability, the studentized normal range quantiles increase as the sample size increases. This difference can be observed in Figure 1, where the studentized range and midrange probability density functions were plotted, for v = 10 with n = 10 and n = 1,000. Another interesting observation is that fixing the sample size and the cumulative probability, the studentized normal midrange quantiles decrease as the number of degrees of freedom increases (Table 3). The same can be observed for the studentized normal range quantiles. In Figure 1(a), for n=10 and n=1,000, with v = 10, the quantiles for P(Q ≤ q) = F(q;n,v) = 0.95 are q n=10 = 5.6 and q n=1,000 = 10.5, respectively. Increasing the sample size there is a translation of the range probability density function to the right. This occurs because the range increases as sample size increases.
Probability density function (pdf) of the externally studentized normal range (a) and midrange (b), considering v = 10 with n = 10 and n = 1,000.
In Figure 1(b), for n=10 and n=1,000, with v = 10, the quantiles for are q n=10 = 0.78 and q n=1,000 = 0.45, respectively. When the sample size increases there is an increased concentration of data around its center without change its central position. This concentration of data decreases the value of the quantile in larger sample, considering the same cumulative probability.
Applications
The development of externally studentized normal midrange distribution allows theat several applications can be performed. One of the already results of this distribution is the midrangeMCP package that performs four multiple comparison tests based on this distribution (Batista; Ferreira, 2014BATISTA, B. D. O.; FERREIRA, D. F. SMR: An R package for computing the externally studentized normal midrange distribution. The R Journal, 6(2):123-136, 2014.). The tests developed and implemented in this package are versions similar to the Tukey, SNK and Scott-Knott tests. The first two, in their original version, are based on the distribution of externally studentized normal range. The Scott-Knott test is based on the likelihood ratio as a criterion for separating groups of means. However, in the midrangeMCP package, these tests were based on the distribution of externally studentized normal midrange, which present high performance when evaluated for the type I error rates and power, although results are in the publishing process. Another interesting application of this distribution is in the building of control charts in the statistical area of quality control, replacing the normal standardized range or the studentized range.
CONCLUSIONS
The analytical equations of the distribution of the externally studentized normal midrange (), as well as of the distribution of normal midrange (), were achieved. Probability density, cumulative distribution and quantiles functions were obtained computationally for and . The algorithms were proposed and implemented using Gauss-Legendre quadrature and the Newton-Raphson method in R software, resulting in the SMR package, available for download in the CRAN site. The implemented routines showed high accuracy proved by using Monte Carlo simulations and by comparing results with different number of quadrature points. Regarding to the precision to obtain the quantiles for cases where the degrees of freedom are close to 1 and the percentiles are close to 100%, it is recommended to use more than 64 quadrature points.
ACKNOWLEDGMENTS
We would like to thank CNPq and CAPES for their financial support.
REFERENCES
- BATISTA, B. D. O.; FERREIRA, D. F. SMR: An R package for computing the externally studentized normal midrange distribution. The R Journal, 6(2):123-136, 2014.
- BATISTA, B. D. O.; FERREIRA, D. F. SMR: Externally Studentized Midrange Distribution, 2012. R package version 1.0.0.
- BARAKAT, H. M.; NIGM, E. M.; ELSAWAH, A. M. Asymptotic distributions of the generalized range, midrange, extremal quotient, and extremal product, with a comparison study. Communications in Statistics - Theory and Methods, 44(5):900-913, 2015.
- BLAND, M. Estimating mean and standard deviation from the sample size, three quartiles, minimum, and maximum. International Journal of Statistics in Medical Research, 4:57-64, 2015.
- CIFTJA, O.; WEXLER, C. Monte Carlo simulation method for laughlin-like states in a diskgeometry. Physical Review B, 67(7):075304, 2003.
- DAVID, H. A.; NAGARAJA, H. N. Order Statistics. John Wiley & Sons, Canada, 2003. p.458.
- DUNCAN, D. B. Multiple range and multiple F tests. Biometrics, 11:1-42, 1955.
- DUNCAN, D. B. On the properties of the multiple comparisons test. Virginia Journal of Science, 3:50-67, 1952.
- GIL, A.; SEGURA, J.; TEMME, N. M. Numerical methods for special functions. SIAM, 2007. 417p.
- GUMBEL, E. J. On the independence of the extremes in a sample. The Annals of Mathematical Statistics, 17(1):78-80, 1946.
- GUMBEL, E. J. Statistics of Extremes. Columbia University Press, New York, 1958. p.375
- GUMBEL, E. J. Range and midranges. The Annals of Mathematical Statistics , 15(4):414-422, 1944.
- GUMBEL, E. J. The distribution of the range. The Annals of Mathematical Statistics , 18(3):384-412, 1947.
- HILDEBRAND, F. B. Introduction to Numerical Analysis. McGraw-Hill, New York, 2 edition, 1974. p.669.
- YALCIN, G. C.; ROBLEDO, A.; GELL-MANN, M. Incidence of q statistics in rank distributions. Proceedings of the National Academy of Sciences, 111(39):14082-14087, 2014.
- LI, B.; MANSOURI, H. Simultaneous rank tests for detecting differentially expressed genes. Journal of Statistical Computation and Simulation, 86(5):959-972, 2016.
- MANSOURI, H. Simultaneous inference based on rank statistics in linear models. Journal of Statistical Computation and Simulation , 85(4):660-674, 2015.
- MOOD, A. M.; GRAYBILL, F. A.; BOES, D. C. Introduction to the Theory of Statistics. McGraw-Hill, New York, 1974. p.564.
- NEWMAN, D. The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation. Biometrika, 31(1-2):20-30, 1939.
- PEARSON, E. S.; HARTLEY, H. O. The probability integral of the range in samples of on observations from a normal population. Biometrika , 32:301-310, 1942.
- R DEVELOPMENT CORE TEAM. R: A Language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2017.
- RIDER, P. R. The midrange of a sample as an estimator of the population midrange. Journal of the American Statistical Association, 52(280):537-542, Dec. 1957.
- SEARLE, S. R. Linear models for unbalanced data. J. New York: Wiley, 1987. 560p.
- STUDENT. Errors in routine analysis. Biometrika , 19:151-164, 1927.
- TIPPET, L. H. C. On the extreme individuals and the range of samples taken from a normal population. Biometrika , 17:365-387, 1925.
- TUKEY, J. W. Comparing individual means in the analysis of variance. Biometrics , 5:99-114, 1949.
- TUKEY, J. W. The problem of multiple comparisons. Unpublished Dittoed Notes, Princeton University, 1-300, 1953.
- WAN, X. et al. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14:135, 2014.
- WILKS, S. S. Order statistics. Bulletin of the American Mathematical Society, 54:6-50, 1948
Publication Dates
-
Publication in this collection
Jul-Aug 2017
History
-
Received
15 Dec 2016 -
Accepted
15 Feb 2017