Abstract
The gamma distribution has been extensively used in many areas of applications. In this paper, considering a Bayesian analysis we provide necessary and sufficient conditions to check whether or not improper priors lead to proper posterior distributions. Further, we also discuss sufficient conditions to verify if the obtained posterior moments are finite. An interesting aspect of our findings are that one can check if the posterior is proper or improper and also if its posterior moments are finite by looking directly in the behavior of the proposed improper prior. To illustrate our proposed methodology these results are applied in different objective priors.
Key words Gamma distribution; improper prior; objective prior; posterior property
1 - INTRODUCTION
The Gamma distribution is one of the most well-known distributions used in statistical analysis. Such distribution arises naturally in many areas such as environmental analysis, reliability analysis, clinical trials, signal processing and other physical situations. Let be a non-negative random variable with the gamma distribution given by
where and are unknown shape and scale parameters, respectively, and is the gamma function.Commonly-used frequentist methods of inference for gamma distribution are standard in the statistical literature. Considering the Bayesian approach, where a prior distribution must be assigned, different objective priors for the gamma distribution have been discussed earlier by Miller 1980, Sun & Ye 1996, Berger et al. 2015 and Louzada & Ramos 2018. Although these priors are constructed by formal rules (see, Kass & Wasserman 1996, Ramos et al. 2019), they are improper, i.e., do not correspond to proper probability distribution and could lead to improper posteriors, which is undesirable. Northrop & Attalides 2016 argued that “… there is no general theory providing simple conditions under which an improper prior yields a proper posterior for a particular model, so this must be investigated case-by-case". In this study, under the assumption that the obtained sample is independent and identically distributed (iid), we overcome this problem by providing in a simple way necessary and sufficient conditions to check whether or not these objective priors lead to proper posterior distributions. Even if the posterior distribution is proper the posterior moments for the parameters can be infinite. Further, we also provided sufficient conditions to verify if the posterior moments are finite. Therefore, one can easily check if the obtained posterior is proper or improper and also if its posterior moments are finite considering directly the behavior of the improper prior. Our proposed methodology is fully illustrated in more than ten objective priors such as independent uniform priors, Jeffreys’ rule (Kass & Wasserman 1996), Jeffreys’ prior (Jeffreys 1946), maximal data information (MDI) prior (Zellner 1977, 1984), reference priors (Berger et al. 2015) and matching priors (Mukerjee & Dey 1993 and Tibshirani 1989), to list a few. Finally, the effect of these priors in the posterior distribution is compared via numerical simulation. It is worth mentioning that we only considered improper objective priors, when prior information is available one may consider the use of elicited prior (see for instance, Dey & Moala 2018).
The remainder of this paper is organized as follows. Section 2 presents a theorem that provides necessary and sufficient conditions for the posterior distributions to be proper and also sufficient conditions to check if the posterior moments of the parameters are finite. Section 3 presents the applications of our main theorem in different objective priors. In Section 4, a simulation study is conducted in order to identify the most efficient estimation procedure. Finally, Section 5 summarizes the study.
2 - PROPER POSTERIOR
Let be an iid sample where Gamma,. Then the joint posterior distribution for is given by the product of the likelihood function and the prior distribution divided by a normalizing constant , resulting in
where and is the parameter space of . For any prior distribution in the form , our purpose is to find necessary and sufficient conditions for these class of posterior be proper, i.e., . The following propositions will be useful to attain this objective. For the following we let denote the extended real number line and the subscript in and will denote the exclusion of in these sets.Definition 2.1. Let and , where . We say that if there exists and such that for every .
Definition 2.2. Let , and , where . We say that if
The meaning of the relations and for are defined analogously.Note that, from the above definiton, if for some we have that , then it will follow that . The following proposition is a direct consequence of the above definition.
Proposition 2.3. For and , let and . Then we have that
The following proposition gives us a relation between Definition 2.1 and Definition 2.2.
Proposition 2.4. Let and be continuous functions on , where and . Then if and only if and .
Proposition 2.5. Let and be continuous functions in , where and , and let . Then, if either or , it will follow respectively that
Theorem 2.6. Let the behavior of be given by , for some . Then we have that:
-
If , then the posterior distribution (3) is improper.
-
If and then the posterior distribution (3) is improper.
-
If and the behavior of is given by
where and , then the posterior distribution (3) is proper if and only if in case , and is proper if and only if in case .
Proof. See Appendix A. ◻
Theorem 2.7. Let , and suppose the behavior of are are given by
for , and . Then, if the posterior of is proper, then the posterior mean of and are finite for this prior, as well as all moments.
Proof. Since the posterior is proper, by Theorem 2.6 we have that , and moreover if and if .
Now let . Then , where , and it follows that
Therefore, since , and since if and if , it follows from Theorem 2.6 that the posterior
relative to the prior is proper. Therefore Proceeding analogously it also follows thatTherefore we have proved that if a prior satisfying the assumptions of the theorem leads to a proper posterior, then the priors and also leads to proper posteriors, and it follows by induction that also leads to proper posteriors for any and in , which concludes the proof. ◻
Proposition 2.8. Suppose leads to a proper posterior for and , and consider the constants for . Then
-
leads to a proper posterior
-
leads to a proper posterior if additionally .
Proof. The item is a direct of consequence of the linearity of the Lebesgue integral while is a direct consequence of the Holder’s inequality. ◻
3 - APPLICATION
In this section, we applied the proposed theorems in different objective priors.
3.1 - Uniform prior
A simple noninformative prior can be obtained considering uniform priors contained in the interval . This prior usually is not attractive due to its lack of invariance to reparameterisation. The uniform prior is given by . The joint posterior distribution for and , produced by the uniform prior, is
Theorem 3.1. The posterior distribution (4) is proper for any sample size, in which case the posterior moments for and are finite.
Proof. Since and , it follows that and are valid constants for application of Theorem 2.6. Thus, since and for all , the result follows from Theorem 2.6 and Theorem 2.7. ◻
The marginal posterior distribution for is
The conditional posterior distribution for is given by
3.2 - Jeffreys rule
Jeffreys considered different procedures for constructing objective priors. For (see Kass & Wasserman 1996), Jeffreys suggested the prior . The main justification for this choice was its invariance under power transformations of the parameters. Since the parameters of the Gamma distribution are contained in the interval , the prior using the Jeffreys rule (Miller 1980) is
The joint posterior distribution for and produced by the Jeffreys rule prior is given by
Theorem 3.2. The posterior density (7) is proper if and only if , in which case the posterior moments for and are finite.
Proof. Since and , then and are valid constants for application of Theorem 2.6. Thus, since , and since the inequality holds if and only if , the result follows from the Theorem 2.6 and Theorem 2.7. ◻
The marginal posterior distribution for is given by
The conditional posterior distribution for is
3.3 - Jeffreys prior
In a further study, Jeffreys 1946 proposed a general rule to obtain an objective prior. This prior is obtained through the square root of the determinant of the Fisher information matrix and has been widely used due to its invariance property under one-to-one transformations. For the Gamma distribution, the Jeffreys prior (see Miller 1980) is given by
The joint posterior distribution for and produced by the Jeffreys prior is
Theorem 3.3. The posterior density (10) is proper for any sample size, in which case the posterior moments for and are finite.
Proof. Here, we have . Following Abramowitz & Stegun 1972, we have that and thus
which implies that Moreover, following Abramowitz & Stegun 1972, we also have that , and thus which implies thatTherefore, and are valid constants for application of Theorem 2.6, and since for all , the posterior is proper for any sample size and the posterior moments are finite using Theorems 2.6 and 2.7. ◻
The conditional posterior distribution for is (8). The marginal posterior distribution for is given by
3.4 - Miller prior
Miller 1980 discussed three objective priors for the parameters of the gamma distribution, where the first two were the Jeffreys Rule and the Jeffreys prior. However, the author chose a prior using the justification that such approach involves less computational subroutines. This prior is given by
Note that much progress has been made in computational analysis and many of these computational limitations have been overcome specially after Gelfand and Smith (see Gelfand & Smith 1990) successfully applied the Gibbs sampling in Bayesian Analysis.
The joint posterior distribution for and produced by the Miller’s prior is
Theorem 3.4. The posterior density (12) is proper for any sample size, in which case the posterior moments for and are finite.
Proof. Since and , then and are valid constants for application of Theorem 2.6. Therefore, since and for all , the result follows directly from the Theorem 2.6 and Theorem 2.7. ◻
The conditional posterior distribution for is (8). The marginal posterior distribution for is given by
3.5 - Reference prior
Bernardo 1979 proposed to maximize the expected Kullback-Leibler divergence between the posterior distribution and the prior to obtain objective prior. They obtained a class of non-informative priors known as reference priors. The reference prior provides posterior distributions with interesting properties such as invariance under one-to-one transformations, consistent marginalization and consistent sampling properties (Bernardo 2005). The procedure to obtain reference priors is described as follows.
Corollary 3.5. Bernardo 2005: Let be the vector of parameters and let p be the the posterior distribution with asymptotic normal distribution and dispersion matrix . Moreover, let be the parameter of interest and the nuisance. Then, if the parameter space of is independent of and if the functions factorize in the form it will follow that and that there is no need for compact approximations.
3.5.1 - Reference prior when is the parameter of interest
From Corollary 3.5 the reference prior when is the parameter of interest and is the nuisance parameter is given by
Therefore, the joint posterior distribution for and , produced by the reference prior (13) is given by
Theorem 3.6. The posterior density (14) is proper if and only if , in which case the posterior moments for and are finite.
Proof. We proved in Theorem 3.3 that and . It follows that
Then and , therefore the result follows directly from the Theorem 2.6 and 2.7. ◻The conditional posterior distribution for is (8). The marginal posterior distribution for is given by
3.5.2 - Reference prior when is the parameter of interest
The reference prior when is the parameter of interest and is the nuisance parameter is given by
The joint posterior distribution for and , produced by the reference prior (15) is given by
Theorem 3.7. The posterior density (14) is proper if and only if , in which case the posterior moments for and are finite.
Proof. Following Abramowitz & Stegun 1972, we have that and . Thus, and . Therefore we conclude that , , are valid constants for application of Theorem 2.6. Thus, since if and only if the result follows from the Theorem 2.6 and Theorem 2.7. ◻
The conditional posterior distribution for is (8). The marginal posterior distribution for is given by
There are different ways to derive the same reference priors in the presence of nuisance parameters, e.g, Liseo 1993, Sun & Ye 1996 and Moala et al. 2013.
3.5.3 - Overall reference prior
The reference priors presented so far consider the presence of nuisance parameters. However, in many situation we are simultaneously interested in all parameters of the model. Sun Ye 1996 considered the Bar-Lev & Reiser 1982 two parameter exponential family and presented a straightforward procedure to derive overall reference priors. Since the gamma distribution can be expressed as Bar-Lev and Reiser’s two parameter exponential distribution, the overall reference Berger et al. 2015 is given by
which is the same as the reference prior when is the parameter of interest and is the nuisance parameter.3.6 - Maximal Data Information prior
Zellner 1977, 1984 introduced another objective prior in which its information is weak comparing with data information. Such prior is known as Maximal Data Information (MDI) prior and can be obtained by solving
Therefore, the MDI prior (18) for the Gamma distribution (1) is given byThe joint posterior distribution for and , produced by the MDI prior, is
Moala et al. 2013 argued that the posterior distribution (20) is improper. However, the authors did not present a proof of such result. The following theorem presents a formally rigorous proof in which confirmed such conjecture.
Theorem 3.8. The joint posterior density (20) is improper for any .
Proof. Following Abramowitz & Stegun 1972, and . Thus,
Since and , the result follows from the Theorem 2.6. ◻
3.6.1 - Modified MDI prior
Moala et al. 2013, introduces a modified maximal data information (MMDI) prior given by
The joint posterior distribution for and , produced by the MMDI prior, is
Theorem 3.9. The posterior density (23) is proper for every , in which case the posterior moments for and are finite.
Proof. Following Abramowitz & Stegun 1972, and . Thus and
On the other hand, and by the Stirling approximation (see Abramowitz & Stegun 1972) we have and . Then
Now, define
Then, from (24) and (25) we have and , which implies that from Proposition 2.4. However, and the prior leads to a proper posterior as well as posterior moments for every by Theorem 2.6 and Theorem 2.7. Therefore also leads to a proper posterior for every , and which proves the result. ◻
The marginal posterior distribution for is given by
The conditional posterior distribution for is given by
3.7 - Tibshirani priors
Tibshirani 1989 discussed an alternative method to derive a class of objective priors where is the parameter of interest so that the credible interval for has coverage error () in the frequentist sense, i.e.,
where denote the th quantile of the posterior distribution of . The class of priors satisfying (27) are known as matching priors up to . Mukerjee & Dey 1993 discussed sufficiency and necessary conditions for a class of Tibshirani priors be matching prior up to .Sun & Ye 1996 prove that the reference prior (13) is also a Tibshirani prior when is the parameter of interest and is the nuisance parameter and the Tibshirani prior when is the parameter of interest and is the nuisance parameter with order . They also proved that when is the parameter of interest, there is no matching prior up to order . Finally, they present a Tibshirani prior when is the parameter of interest that is matching prior up to order , such prior is given as follows
The joint posterior distribution for and , produced by the Tibshirani prior (28) is given by
Theorem 3.10. The posterior density (29) is proper if and only if , in which case the posterior moments for and are finite.
Proof. We proved in Theorem 3.3 that and that . From that, it follows that
Thus and , therefore the result follows directly from the Theorem 2.6 and Theorem 2.7. ◻The conditional posterior distribution for is (8). The marginal posterior distribution for is given by
3.8 - Consensus prior
A rather natural approach to find an objective prior is to start with a collection of objective priors and take its average. Berger et al. 2015 discussed this prior averaging approach under the two most natural averages, the geometric mean and the arithmetic mean.
3.8.1 - Geometric mean
Let be a collection of objective priors. Such priors were selected conveniently due its invariance property under one-to-one transformations. Then, our geometric mean (GM) prior is given by
Note that, since our prior was constructed as a geometric mean of one-to-one invariant priors then such prior has also invariance property under one-to-one transformations.
The joint posterior distribution for and , produced by the consensus prior, is
Theorem 3.11. The posterior density (31) is proper if and only if , in which case the posterior moments for and are finite.
Proof. The result follows directly from the Theorem 2.8 and by Theorem 2.7. ◻
The conditional posterior distribution for is (8). The marginal posterior distribution for is given by
3.8.2 - Arithmetic mean
Let be a collection of objective priors. Then, our arithmetic mean (AM) prior is given by
whereThe joint posterior distribution for and , produced by the consensus prior, is
Theorem 3.12. The posterior density (32) is proper if and only if , in which case the posterior moments for and are finite.
Proof. The result follows directly from the Theorem 2.8 and by Theorem 2.7. ◻
The conditional posterior distribution for is (8). The marginal posterior distribution for is given by
4 - NUMERICAL EVALUATION
A simulation study is presented to compare the influence of different objective priors in the posterior distributions and select an objective prior that return good results in terms of the mean relative errors (MRE) and the mean square errors (MSE), given by
where and is the number of estimates obtained through the posterior means of and . The coverage probability () of the credibility intervals for and are evaluated. Considering this approach, the best estimators will show MRE closer to one and MSE closer to zero. In addition, for a large number of experiments considering a confidence level, the frequencies of intervals that covered the true values of should be closer to .The results were computed using the software R. Considering the results were presented only for for reasons of space. However, the following results were similar for other choices of and . Using the MCMC methods, we computed the posterior mean for , and the credibility (confidence) intervals for both parameters. In terms of decision theory, we have considered the squared error loss function (SELF) as the loss function. Moreover, the posterior mean is finite for and has optimality under the Kullback-Leibler divergence. Tables I and II available in Appendix B present the MREs, MSEs and from the different estimators of and .
From these results, for both parameters the posterior mean using the Tibshirani prior indicates better performance than the obtained with other priors in terms of MREs and MSEs. The better performance of this approach is also confirmed through the coverage probability obtained from the credibility intervals. It is worth mentioning that the fact that the Tibshirani prior has frequentist coverage close to the nominal is a consequence of its construction. Although we have presented here only one scenario for the parameters, the results were similar for other choices of . Overall, we conclude that the posterior distribution obtained with Tibshirani prior should be used to make inference on the parameters of the Gamma distribution.
5 - DISCUSSION
In this study, we presented a theorem that provides simple conditions under which improper prior yields a proper posterior for the Gamma distribution. Further, we provided sufficient conditions to verify if the posterior moments of the parameters are finite. An interesting aspect of our findings are that one can check if the posterior is proper or improper and also if its posterior moments are finite looking directly at the behavior of the proposed improper prior.
The proposed methodology is applied in different objective priors. The MDI prior was the only one that yield an improper posterior for any sample sizes. An extensive simulation study showed that the posterior distribution obtained under Tibshirani prior provided more accurate results in terms of MRE, MSE and coverage probabilities. Therefore, this posterior distribution should be used to make inference in the unknown parameters of the Gamma distribution. This study can be extended for other distributions, for instance, in a homogeneous Poisson process, the lengths of inter-arrival times can be modeled using an exponential distribution Exp() with the following hierarchical structure
In this case we have a posterior distribution that depends on three parameters (see Papadopoulos 1989). Although the results presented here can not be used to select the best prior due to the additional parameter, the same approach will be considered in further research.ACKNOWLEDGMENTS
The authors are thankful to the Editorial Board and two reviewers for their valuable comments and suggestions which led to this improved version. Pedro L. Ramos is grateful to the São Paulo State Research Foundation (FAPESP Proc. 2017/25971-0). Eduardo Ramos acknowledges financial support from S~ao Paulo State Research Foundation (FAPESP Proc. 2019/27636-9). Francisco Louzada is supported by the Brazilian agencies CNPq (grant number 301976/2017-1) and FAPESP (grant number 2013/07375-0).
REFERENCES
- ABRAMOWITZ M & STEGUN IA. 1972. Handbook of Mathematical Functions. 10th ed. Washington, D.C.: NBS, p. 1046.
- BAR-LEV SK & REISER B. 1982. An exponential subfamily which admits UMPU tests based on a single test statistic. Ann Stat 979-989.
- BERGER JO, BERNARDO JM & SUN D. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.
- BERNARDO JM. 1979. Reference posterior distributions for Bayesian inference. J Roy Stat Soc B p. 113-147.
- BERNARDO JM. 2005. Reference analysis. Handb Stat 25: 17-90.
- DEY S & MOALA FA. 2018. Objective and subjective prior distributions for the Gompertz distribution. An Acad Bras Cienc 90: 2643-2661.
- FOLLAND GB. 1999. Real analysis: modern techniques and their applications. 2nd ed. New York: Wiley, 408 p.
- GELFAND AE & SMITH AF. 1990. Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85(410): 398-409.
- JEFFREYS H. 1946. An invariant form for the prior probability in estimation problems. P Roy Soc A-Math Phy 186(1007): 453-461.
- KASS RE & WASSERMAN L. 1996. The selection of prior distributions by formal rules. J Am Stat Assoc 91(435): 1343-1370.
- LISEO B. 1993. Elimination of nuisance parameters with reference priors. Biometrika 80(2): 295-304.
- LOUZADA F & RAMOS PL. 2018. Efficient closed-form maximum a posteriori estimators for the gamma distribution. J Stat Comput Sim 88(6): 1134-1146.
- MILLER RB. 1980. Bayesian analysis of the two-parameter gamma distribution. Technometrics 22(1): 65-69.
- MOALA FA, RAMOS PL & ACHCAR JA. 2013. Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors. Rev Colomb Eetad 36(2): 321-338.
- MUKERJEE R & DEY DK. 1993. Frequentist validity of posterior quantiles in the presence of a nuisance parameter: higher order asymptotics. Biometrika 80(3): 499-505.
- NORTHROP P & ATTALIDES N. 2016. Posterior propriety in Bayesian extreme value analyses using reference priors. Stat Sinica 26(2).
- PAPADOPOULOS AG. 1989. A hierarchical approach to the study of the exponential failure model. Commun Stat-Theor M 18(12): 4375-4392.
- RAMOS PL, ALMEIDA MP, TOMAZELLA VL & LOUZADA F. 2019. Improved Bayes estimators and prediction for the Wilson-Hilferty distribution. An Acad Bras Cienc 91: e20190002.
- SUN D & YE K. 1996. Frequentist validity of posterior quantiles for a two-parameter exponential family. Biometrika 83(1): 55-65.
- TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.
- ZELLNER A. 1977. Maximal Data Information Prior Distributions. New Meth Appli Bay Meth 211-232.
- ZELLNER A. 1984. Maximal Data Information Prior Distributions. Bas Iss Econ, 334 p.
APPENDIX A
PROOF OF THEOREM 2.7Proof. Let
Since , by the Fubini-Tonelli Theorem (see Folland 1999) we have
The rest of the proof is divided in three items which are given bellow:
Case i): Suppose . Notice that for any and . Then, for we have , and it follows that
and the case i) is proved.Now suppose . Denoting
we have that by the inequality of the arithmetic and geometric means, and where and .Then if and only if and . These results lead us to the two remaining cases.
Case ii): Suppose and . From Abramowitz & Stegun (1972), we have . Then, if
where the last equality comes from the fact that . Therefore, if .On the other hand, if then for , which implies and
Therefore, if and the case ii) is proved.Case iii): Suppose that and the behavior of is given by
where and . Following Abramowitz & Stegun 1972, p. 260, we obtain that and for . Then andTherefore
i.e., for all . Therefore .Now, following the same from case , if we have
i.e., if and only if when . On the other hand, if i.e., if and only if when and the proof is completed. ◻APPENDIX B
Publication Dates
-
Publication in this collection
03 Dec 2021 -
Date of issue
2021
History
-
Received
22 Nov 2019 -
Accepted
8 Feb 2020