Acessibilidade / Reportar erro

The Zero-Adjusted Log-Symmetric Distributions: Point and Intervalar Estimation

Abstract

In this paper, a new class of semi-continuous distributions called zero-adjusted log-symmetric is introduced and studied. Some properties and parameters estimation by maximum likelihood method are derived and confidence intervals (CIs) are developed. A simulation study is conducted to evaluate properties of the maximum likelihood estimators in lighter/heavier-tailed distributions. Finally, an application in a real data set is presented to illustrate the flexibility of the proposed class of distributions.

Key words
lighter/heavier-tailed distributions; maximum likelihood method; semi-continuous distributions; zero true value

Introduction

In real situations, semi-continuous variables commonly occur in biometrics, ecology or in insurance expenditures data sets and they are characterized by the presence of true zeros and positive continuous behavior. For example, in ecology is common to find a large proportion of zeros values caused by real ecological effects like counts of abundance, proportional occupancy rates or continuous population densities and do not readily fit standard distributions like normal, Poisson, and beta distribution (Martin et al. 2005MARTIN TG, WINTLE BA, RHODES JR, KUHNERT PM, FIELD SA, LOW-CHOY SJ & POSSINGHAM HP. 2005. Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecol Lett 8(11): 1235-1246.). Another example is the Medical Expenditure Panel Survey (MEPS) that contain health care expenditures from adults in USA in which many individuals record no medical expenditures (zero response) over the years. Usual methods such as transforming the response variable into logarithmic transformation cannot be used in the presence of zeros or ignoring zeros can be a bad strategy making impossible to predict the probability of zero and leading to a bad inference of the others parameters.

A strategy for modeling this kind of data with the presence of true zeros is to use a mixture distribution of two components: a continuous distribution whose support is the interval (0,) and a degenerate distribution at a zero value (discrete distribution). We can see as an example, Aitchison & Brown (1957)AITCHISON J & BROWN JAC. 1957. The Lognormal distribution with special reference to its uses in economics, 1st ed., Cambridge University Press. introduced a mixture distribution between a degenerate distribution at a zero and the log-normal distribution, named delta distribution. Duan et al. (1983)DUAN N, MANNING WG, MORRIS CN & NEWHOUSE JP. 1983. A comparison of alternative models for the demand for medical care. J Bus Econ Stat 1(2): 115-327. proposed the two-part model. The first stage is a probit model for binary event of having zero or positive expenses and the second stage is a log-linear model for positive expenses. Heller et al. (2006)HELLER G, STASINOPOULOS M & RIGBY B. 2006. The zero-adjusted inverse Gaussian distribution as a model for insurance claims. Proc 21th Int Work Stat Model 1984: 1-8. URL http://en.scientificcommons.org/58942171. presented the zero-adjusted inverse Gaussian distribution for modelling insurance claim sizes. Rodrigues-Motta et al. (2015)RODRIGUES-MOTTA M, GALVIS SOTO DM, LACHOS VH, VILCA F, BALTAR VT, JUNIOR EV, FISBERG RM & LOBO MARCHIONI DM. 2015. A mixed-effect model for positive responses augmented by zeros. Stat Med 34: 1761-1778. proposed a framework for zero-augmented positive distributions considering the two-parameter exponential family as the continuous component, including the log-normal Weibull, gamma and inverse Gaussian distributions. Leiva et al. (2016)LEIVA V, SANTOS-NETO M, CYSNEIROS FJA & BARROS M. 2016. A methodology for stochastic inventory models based on a zero-adjusted Birnbaum-Saunders distribution. Appl Stoch Models Bus Ind 32(1): 74-89. proposed the zero-adjusted reparameterized Birnbaum–Saunders distribution for fatigue data with true zeros. Recently, Hashimoto et al. (2019)HASHIMOTO EM, ORTEGA EMM, CORDEIRO GM, CANCHO VG & KLAUBERG C. 2019. Zero-spiked regression models generated by gamma random variables with application in the resin oil production. J Stat Comput Simul 89(1): 52-70. proposed the zero-spiked gamma-Weibull regression model.

The continuous part of these distributions mentioned are one of the most flexible distribution in the literature. However, according to Lawless (2003)LAWLESS JF. 2003. Statistical Models and Methods for Lifetime Data, 2nd ed., New York: John Wiley & Sons., the log-normal distribution has been used as a model in diverse applications in engineering, medicine and other areas, making the log-normal distribution one of the most used for modeling interest variable that may include outliers. Jones (2008)JONES MC. 2008. On reciprocal symmetry. J Stat Plan Inference 138(10): URL 3039-3043. investigated the log-symmetric class, which is a generalization of the log-normal distribution. This class includes bimodal distributions and higher/heavier tails distributions than the log-normal one, for example, log-Student-t, type-I-log-Logistic, type-II-log-Logistic, log-contaminated-normal and log-power-exponential. Furthermore, according to Puig (2008)PUIG P. 2008. A note on the harmonic law: a two-parameter family of distributions for ratios. Stat Probab Lett 78(3): 320-326. the log-symmetric distributions have a very desirable properties, closure under change of scale and closure under reciprocals, which, are used to describe data with ratios of positive magnitudes. The class involve the median and skewness as parameters making it a very flexible class and also according to Vanegas & Paula (2016)VANEGAS LH & PAULA GA. 2016. Log-symmetric distributions: statistical properties and parameter estimation. Braz J Probab Stat 30(2): 196-220., the hazard function of the log-symmetric distributions is quite more flexible compared to others distributions like the Gamma or Inverse Gaussian distributions, and it can take various shapes, for instance, increasing, decreasing, upside-down bathtub shaped or bathtub shaped. There are multiple studies about the log-symmetric class; see, for example, Vanegas & Paula (2016)VANEGAS LH & PAULA GA. 2016. Log-symmetric distributions: statistical properties and parameter estimation. Braz J Probab Stat 30(2): 196-220. studied and discussed some properties of the log-symmetric class, Medeiros & Ferrari (2017)MEDEIROS FM & FERRARI SL. 2017. Small-sample testing inference in symmetric and log-symmetric linear regression models. Stat Neerl 71(3): 200-224. derived hypothesis testing in symmetric and log-symmetric linear regression models, while Ventura et al. (2019)VENTURA M, SAULO H, LEIVA V & MONSUETO S. 2019. Log-symmetric regression models: information criteria and application to movie business and industry data with economic implications. Appl Stoch Models Bus Ind 35(4): 963-977. conduct a Monte Carlo simulation study to investigate the accuracy of popular information criteria in the log-symmetric regressions models

Motivated by the works of semi-continuous distributions and the advantage of the log-symmetric class in to in accommodating outliers, we propose a semi-continuous distribution using the log-symmetric class, with his support is the interval [0,). In the literature, there are different names from this type of distribution, the “zero-inflated" as in Lambert (1992)LAMBERT D. 1992. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1): 1-14., the “zero-adjusted" as in Heller et al. (2006)HELLER G, STASINOPOULOS M & RIGBY B. 2006. The zero-adjusted inverse Gaussian distribution as a model for insurance claims. Proc 21th Int Work Stat Model 1984: 1-8. URL http://en.scientificcommons.org/58942171. or the “zero-augmented" as in Rodrigues-Motta et al. (2015), but according to Lambert (1992)LAMBERT D. 1992. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1): 1-14., zero-inflated models add additional probability mass to the outcome of zero, so it is more common to use it when it involves discrete distributions. In the present work, zero-adjusted is used, then the new class zero-adjusted log-symmetric is called. The zero-adjusted log-symmetric distributions have several desirable statistical properties. For instance, the Quantile function is easy to calculate, the parameters are orthogonal and the zero-adjusted log-symmetric class in some cases may involve an extra parameter or a vector of extra parameters.

This paper is organized as follows. The zero-adjusted log-symmetric class (ZALS) and some properties are presented. The maximum likelihood estimators of the parameters and asymptotic confidence intervals of the class are developed. Monte Carlo (MC) simulation study is conducted to evaluate the performance of maximum likelihood estimators from lighter/heavier tails than the zero-adjusted log-normal distribution also, considering as well as the coverage of the confidence interval for each parameter. An application of the real data set to illustrate the proposed methodology are proposed. Finally, conclusions are presented.

The zero-adjusted log-symmetric distributions

Let Y be a random variable that follow a class of distributions called log-symmetric, whose the support is defined in the interval (0,). The log-symmetric class with parameters η>0, ϕ>0 and density generator g() under conditions that g(u)>0 and 0u12g(u)u=1 for u>0, has density function (PDF) given by:

fY(y)=g(ỹ2)yϕ,y>0,(1)
where ỹ=log[(yη)1ϕ]. Y is a transformation for the random variable T by setting Y=exp(T) whose distribution follows the symmetric class (Fang et al. 1990FANG KT, KOTZ S & NG KW. 1990. Symmetric Multivariate and Related Distributions. London: Chapman & Hall.) with notation TS(μ,ϕ,g()) where <μ< is the location parameter. If a random variable follows the log-symmetric class, then is denoted by YLS(η,ϕ,g(.)), where η=exp(μ) and ϕ are the scale and power parameters. In some cases the density generator g() are indexed by an extra parameter or vector parameter denoted by v and in this work v is considered fixed. Some distributions as Birnbaum-Saunders (Birnbaum & Saunders 1969BIRNBAUM ZW & SAUNDERS SC. 1968. A new family of life distributions. J Appl Probab 6: 319-327.), log-normal, log-slash, log-Student-t, log-power-exponential, type-I-log-Logistic, type-II-log-Logistic, log-contaminated-normal, and generalized Birnbaum-Saunders (Díaz-Gárcia & Leiva 2005DÍAZ-GÁRCIA JA & LEIVA V. 2005. A new family of life distributions based on elliptically contoured distributions. J Stat Plan Inference 128(2): 445-327. , Leiva et al. 2008LEIVA V, RIQUELME M, BALAKRISHNAN N & SANHUEZA A. 2008. Lifetime analysis based on the generalized Birnbaum-Saunders distribution. Comp Statis Data Analys 52(4): 2079-2097.) are included in this class.

When zeros occur in the data, the log-symmetric distributions are not appropriate. An alternative is to use a mixed distribution discrete-continuous. Let W be a mixture discrete-continuous distribution between two random variables, a discrete component following a Bernoulli distribution and the continuous component . In this work we propose to consider one continuous component belongs to log-symmetric class, denoted WZALS(η,ϕ,π,g()). The cumulative distribution function (CDF) W is given by

FW(w)={π,ifw=0,π+(1π)FY(w),ifw>0,(2)
where FY(w) is the CDF of the log-symmetric class and 0<π<1 is the mixture parameter. Because Y is a transformation for the random variable T, then the CDF of Y can be computer as FY(y)=FZ(ỹ), where Z=(Tμ)/ϕS(0,1,g()). The corresponding probability density function (PDF) is given by
fW(w)={π,ifw=0,(1π)fY(w),ifw>0,(3)
where fY(w) is the PDF given in Equation (1).

Members of this class are characterized by g() and for each member a weight function is defined by v(w)=2g(w2)/g(w2). Thus, the choice of g() may induce also the weight function v(), that it is an important function to estimate the parameters on the maximum likelihood (ML) method as will be seen later. Some properties from the zero-adjusted log-symmetric class are presented

  • (P1) If WZALS(η,ϕ,π,g()) then cWZALS(cη,ϕ,π,g()) for c>0.

  • (P2) The Quantile function of WZALS(η,ϕ,π,g()) is obtained from Castellacci (2012)CASTELLACCI G. 2012. A formula for the quantiles of mixtures of distributions with disjoint supports. URL http://ssrn.com/abstract=2055022.
    http://ssrn.com/abstract=2055022...

F W 1 ( q ; η , ϕ , π , g ( ) ) = { 0 , i f π q , η e x p ( ϕ Z ξ ( q π 1 π ) ) , i f π < q , (4)
  • where 0<q<1 and Zξ(a) is the 100(a)% quantile of ZS(0,1,g()).

  • (P3) The r-th moment of W exists if the moment generation function of T, MT(r) exists. Then E(Wr)=(1π)MT(r).

  • (P4) If MT(r) exists, then we can expansion the characteristic function of W with a Taylor series form

φ ( r ) = 1 + ( 1 π ) k = 1 M T ( r ) ( i r ) k k ! (5)
  • (P5)The hazard function of W is constant , rW(w)=1 for w=0 and rW(w)=rY(w)=fY(w)/[1FY(w)] for w>0. Similar to Vanegas & Paula (2016)VANEGAS LH & PAULA GA. 2016. Log-symmetric distributions: statistical properties and parameter estimation. Braz J Probab Stat 30(2): 196-220., rW(w) for zero-adjusted log-symmetric log-symmetric class is quite flexible. For instance, it can assume an increasing, decreasing, upside-down bathtub and bathtub shape. While Gamma hazard function has increasing, constant or decreasing shape, and the Inverse Gaussian hazard function has -shape.

  • (P6) The Shannon entropy of W take the form ET(W) and it is defined by E[logfW(w)]. To obtain the Shannon entropy we can use the equation: E[logfW(w)]=E[E[logfW(w)|I{0}(w)]]. After some manipulation, ET(w) can be expressed as ET(w)=(1π)[log(ηπ)+ET(Z)] and only exist if ET(Z) exist.

g(u) and v(w) functions for zero-adjusted log-symmetric class are presented below

Zero-adjusted log-normal

g ( u ) = 1 2 π e x p [ u / 2 ] , v ( w ) = 1

Zero-adjusted log-Student-t

g ( u ) = v v / 2 B ( 1 / 2 , ν / 2 ) ( ν + u ) ν + 1 2 , ν > 0 a n d v ( w ) = ν + 1 ν + w 2

Zero-adjusted log-power-exponential

g(u)=C(ν)exp[12u1/(1+ν)],1<ν1andv(w)=|w|(2ν)/(ν+1)1+ν
where C(ν)1=Γ(1+1+ν2)21+(1+ν)/2. For special cases we have the zero-adjusted log-normal(ν=0) and the zero-adjusted log-Laplace(ν=1)

Zero-adjusted type-I-log-Logistic

g(u)=ceu(1+eu)2,v(w)=2tanh(w2/2)
where c1.484300029 is a normalizing constant that satisfies 0u12g(u)u=1.

Zero-adjusted type-II-log-Logistic

g ( u ) = e u 1 / 2 ( 1 + e u 1 / 2 ) 2 , v ( w ) = ( e | w | 1 ) / ( | w | ( 1 + e w ) )

Zero-adjusted log-contaminated-normal

g(u)=v2exp[12v2u]+(1v1)v1exp[12u],v(w)=v23/2v1exp[(1v2)(w2/2)]+(1v1)v21/2v1exp[(1v2)(w2/2)]+(1v1)
where 0 <v1<1,0<v2<1. The zero-adjusted log-contaminated-normal has a PDF, fW(w)=πI{0}(w)[v1fY1(w)+(1v1)fY2(w)](1I{0}(w)) where Y1 follows a log-normal distribution with parameters (η,ϕ/v2)t and Y1 follows a log-normal distribution with parameters (η,ϕ)t.

Figure 1
Densities for some distributions of the zero-adjusted log-symmetric class: (a) zero-adjusted log-normal (η=2,π=0.4), (b) zero-adjusted log-Student-t (η=2,π=0.4,v=4), (c) zero-adjusted log-power-exponential (η=2,π=0.4,v=0.5)(c) distributions.

Parameter estimation

Let W1,...,Wn be n independent random variables from WiZALS(η,ϕ,π,g()). The corresponding likelihood function for θ=(η,ϕ,π)t can be written as

L(θ)=i=1nfW(wi)=i=1nπI{0}(wi)(1π)1I{0}(wi)fY(wi)1I{0}(wi).

The log-likelihood function can be expressed as (θ)=1(π)+2(η,ϕ), where 1(π)=W*log(π)+(nW*)log(1π) and 2(η,ϕ)=(nW*)(12log(ϕ))+i:wi>0log[g(wĩ2)]+i:wi>0log(wi). W* is the number of zeros in a sample of size n, W*=i=1nI{0}(wi). The log-likelihood function can be factored in two terms, one that depend only on π and other that depends on the parameters of YLS(η,ϕ,g()). Thus, the maximum likelihood method can be performed separately from (η,ϕ)t and π (Pace & Salvan 1997PACE L & SALVAN A. 1997. Principles of Statistical Inference from a Neo-Fisherian Perspective. World Scientific Publishing: Singapore.). The score function is U(𝛉)=(U𝛈(𝛉),U𝛟(𝛉)U𝛑(𝛉))t with U𝛈(𝛉)=1ηϕi:wi>0log(wi/η)v(wĩ), U𝛟(𝛉)=(nW*)+i:wi>0v(wĩ)wĩ2/2ϕ, U𝛑(𝛉)=W*nπ/π(1π). And the Fisher information matrix is given by K(θ)=diag(kηη,kϕϕ,kππ) where kηη=n(1π)dg/(ϕη2), kϕϕ=n(1π)(fg1)/(4ϕ2) and kππ=n/[ϕ(1ϕ)] with dg=E[v2(Z)Z2] and fg=E[v2(Z)Z4] for ZS(0,1,g()). Therefore the maximum likelihood estimator (MLE) of θ are asymptotically independent.

Table I
Values of dg and fg for some symmetric class distributions Z.

The MLE of π is W*/n, represents the proportion of zeros in the sample. In the case for η and ϕ do not have a closed expression, thus, a nonlinear optimization algorithm is used, for instance, the Rigby and Stasinopoulos (RS) or the Cole and Green (CG) algorithms (Stasinopoulos & Rigby 2007STASINOPOULOS DM & RIGBY RA. 2007. Generalized additive models for location scale and shape (GAMLSS) in R. J Stat Softw 23(7): 1-46., Stasinopoulos et al. 2019 ) from the statistical computing environment R.

Since the regularity conditions are satisfied (see, Vanegas & Paula 2016VANEGAS LH & PAULA GA. 2016. Log-symmetric distributions: statistical properties and parameter estimation. Braz J Probab Stat 30(2): 196-220.), θ̂ is a consistent estimator of θ with asymptotic distribution given by n(θ̂θ)D𝒩3(0,K1(θ)). Note that K1(θ) is the asymptotic variance–covariance matrix of θ̂. A consistent estimator of K1(θ) is K1(θ̂). Asymptotic confidence intervals (CIs) for θ can be obtained from the asymptotic distribution. From the delta method (Lehmann & Casella 1998LEHMANN EL & CASELLA G. 1998. Theory of point estimation, 2nd ed., New York: Springer., Sect. 1.9) we obtain asymptotic distribution for h(θ) given by n(h(θ̂)h(θ))D𝒩3(0,h(θ)tK1(θ)h(θ)), h(θ)=h(θ)/θ.

Simulation study and results

To evaluate the performance of the MLE for the parameters of the ZALS class, a Monte Carlo simulation study is conduced. In this study we considered some distributions with heavy/light tails, for example zero-adjusted log-Student-t (ZALSt) with ν=4 and the zero-adjusted log-power-exponential (ZALPE) with ν=0.5. In each scenario we consider sample size n=10, 30 and 50, the skewness parameter for the continuous component (or relative dispersion) ϕ{1,2,3} and the proportion parameter π{0.1,0.2,0.3,0.4} . The parameter η is fixed at 2 and we consider 5000 Monte Carlo replications. To generate samples the inverse CDF method is used by (P2). All the simulations were performed using the gamlss package in R language (R Core TeamR CORE TEAM. 2019. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL https://www.R-project.org/.
https://www.R-project.org/...
) . All codes were developed by authors and it can be obtained upon request by authors.

The Tables II and III present descriptive measures : empirical Mean, Median, Standard Deviation (SD), Mean Square Error (MSE), MSE and bias of θ̂ of the simulations results for the ZALSt and ZALPE distributions. We can be observe in the Table III from the ZALSt distribution that the empirical Mean of η̂ is affected by increasing ϕ or π, or on both parameters, but for the case of the Median, it is not affected by increasing π, but this occurs when we increase the value of ϕ. Similar behavior can be observed for ZALPE distribution (see Table III). For the two distributions and for each scenario considered, when the sample size increases, the empirical Mean and the Median close to true value. In particular, the lowest bias of η̂ is presented based on ZALPE distribution. For the behavior of ϕ̂, the empirical Mean and the Median are affected by increasing the value of π. In addition when n increase the ϕ̂ tend to true value. For the case of the MLE of π, the empirical Mean and the Median are closer to true value in each scenario regardless of the sample size n, this means that the for all values of n considered, the results are similar. The bias of the MLE for η overestimates the true value in all scenarios for the ZALSt and ZALPE distributions. As expected, the bias decreases as the sample size increases. Note that for the case of the ZALPE distribution, presents better properties in relation to the bias of η. In the Table III, the bias of the MLE of ϕ underestimates the true value but it did not happen in the case of the ZALSt distribution, and it can be observed that is less biased regardless of the parameter π. For the ZALPE distribution, the bias is decreasing as the value of π increases but it increases as the sample size increases, this means that it less biased as the value n increases. In the case of the bias for the MLE of π, it is little biased and also underestimates in the majority of the scenarios for each distribution. Respect to the values of SD and the MSE, the distribution ZALPE had better properties than the ZALSt distribution for the cases of the MLE of η and ϕ. As expected, these values are increasing as the parameter ϕ or π or on both parameters increase, but also decrease considering a larger sample size n.

Tables IV and V provide the lower (LC), upper (UC) and the coverage (CP) probabilities in % of the CIs for the parameters log(η), log(ϕ) and logit(π) with var(log(η̂))=ϕ̂/{ndg(1π̂)}, var(log(ϕ̂))=4/{n(fg1)(1π̂)} and var(logit(π̂))=1/{nπ̂(1π̂)} by taken the 0.90, 0.95 and 0.99 confidence levels for the ZALSt and ZALPE distribution, respectively. For the cases of the parameters log(η) and log(ϕ), we can observe that the coverage probabilities of the CI for the three confidence levels from the ZALPE distribution have smaller coverage probability than the ZALSt distribution. Also, as the value of π increases, the coverage probabilities on these two parameters are affected in the sense that it is getting smaller than the indicated confidence level. In the case of logit(π), the coverage probabilities have similar properties regardless of the distribution, this because the MLE of logit(π) is the same for each distribution. As expected, the coverage probabilities increase as n increases. Note that the confidence intervals for log(η) are balanced, this means, the values of LC and UC are similar. The same behavior can be observed in the intervals for logit(π) as the value of π increases. An unbalanced behavior is noted in the confidence intervals for log(ϕ). This occurs for the three distributions and also for the confidence levels considered.

Next, in the Figures 2 and 3 display the empirical(histogram) and asymptotic distribution of log(η̂), log(ϕ̂) and logit(π̂) for each sample size by considering the scenario θ=(2,4,0.4)t for the ZALSt and ZALPE distribution, respectively. Additionally, a straight-line segments represent the asymptotic CI computed, and the vertical line represent the true value of the parameter. The empirical distribution for log(η̂) is apparently symmetric in all three distributions considered, but for the case of log(ϕ̂) and logit(π̂) present a little skewness.


Figure 2. Histogram of MLE for log(η), log(ϕ), logit(π) with respective asymptotic CI under ZALSt distribution for fixed θ=(2,4,0.4) with v=4.

Figure 3. Histogram of MLE for log(η), log(ϕ), logit(π) with respective asymptotic CI under ZALPE distribution for fixed θ=(2,4,0.4) with v=0.5.
Table II
Empirical statistics for the MLE estimator for 𝜙 and 𝜋 with 𝜂 = 2 and 𝑛 = 10, 30, 50 for the zero-adjusted log-Student-t distribution with 𝑣 = 4 .
Table III
Empirical statistics for the MLE estimator for 𝜙, 𝜋 with 𝜂 = 2 and 𝑛 = 10, 30, 50 for the zero-adjusted log-power-exponential with 𝜈 = −0.5.
Table IV
100x[1 − 𝛼]% asymptotic CI and coverage probabilities for 𝑙𝑜𝑔(𝜂), 𝑙𝑜𝑔(𝜙) and log(𝜋/(1 − 𝜋) with 𝑛 = 10, 30, 50 for the zero-adjusted log-Student-t with v=4.
Table V
100x[1 − 𝛼]% asymptotic CI and coverage probabilities for 𝑙𝑜𝑔(𝜂), 𝑙𝑜𝑔(𝜙) and log(𝜋/(1 − 𝜋) for 𝑛 = 10, 30, 50 and for the zero-adjusted log-power-exponential with v=−0.5 .

Application

We consider the allowances and expenses dataset from elected councilors including the City Mayor in Leicester City, UK, from the period 2012/2013. The data were extracted from the Leicester City Council, database available at https://data.leicester.gov.uk/pages/home/. We are interested in the Special Responsibility Allowance (£) that consist in additional allowance for specific responsibilities such as Committee Chair received by some councilors. The sample size contains 18 zeros values, which represents 32.73% of the observations.Table VI presents some such as as the sample size (n), minimum and maximum values, mean (y), median, standard deviation (SD), first quartile (Q1), third quartile (Q3), interquatile range (IQR), coefficient of variation(CV), and coefficient of kurtosis, coefficient of skewness (CS), and coefficient of kurtosis (CK). Note that data has positive skewness (asymmetry) and large kurtosis.

Table VI
Some statistics for the Special Responsibility Allowance of councillors in Leicester City, period 2012/2013 dataset.

Also, Figure 4 presents the histogram considering the zero values and an adjusted-boxplot considering only the positive values. We can observe a positive skewness and the presence of two outliers in Figure 4. Therefore, we fit three distribution of ZALS class such as zero-adjusted log-normal (ZALN), zero-adjusted Student-t (ZALSt) and zero-adjusted Power-exponential (ZALPE) and we compared with zero-adjusted inverse Gaussian (ZAIG), zero-adjusted Gamma (ZAG) and the zero-adjusted reparameterized Birnbaum-Saunders (ZARBS) distribution. The extra parameter for ZALSt and ZALPE distributions was chosen by minimizing the AIC in a grid of value grid v=[4,10],v=(±0.3,±0.2,±0.1), respectively. Based on the criteria, the extra parameter are v=4 and v=0.3 for ZALSt and ZALPE distribution, respectively.


Figure 4. Histogram, adjusted-boxplot and fitted log-Student-t distribution hazard function for the Special Responsibility Allowance of councilors in Leicester City, period 2012/2013.

Table VII presents the MLEs of the parameters for fitted distributions standard errors (in parentheses), the 2log(L̂), the Akaike Information Criterion (AIC=2p2log(L̂)) and the Bayesian information criterion (BIC=plog(n)2log(L̂)), were n is the number of observations, p is the number of estimated parameters and L̂=L(θ̂) is the likelihood evaluated at the estimated parameters. Additionally the Kolmogorov–Smirnov (KS) test is computed as the goodness-of-fit. We can observe the fitted ZALSt distribution presents the lower value of AIC and BIC. In addition, the KS test indicates that the three fitted ZALS distributions and the fitted Gamma distribution not have enough evidence to reject that is drawn from the reference distribution. With respect fitted the reparameterized Birnbaum–Saunders and fitted Inverse Gaussian distribution we do not same conclusion. Additionally hazard function from the selected continuous fitted distribution is displayed in Figure 4 has a decreasing behavior. Finally, fitted densities and Q-Q plots for each distribution are displayed in Figure 5.


Figure 5. Histogram with fitted densities and Q-Q plots for the considered distributions for Special Responsibility Allowance dataset.
Table VII
Estimated parameters and statistics for the considered fitted distributions for the Special Responsibility Allowance of councilors in Leicester City, period 2012/2013 dataset.

Conclusions

We have proposed a new distribution class for asymmetric positive data with light/heavy tails that contains true zeros named Zero-adjusted log-symmetric (ZALS). Some properties are presented, for example, moments, quantily function, Shannon entropy among others. Maximum likelihood method was used to estimate the parameters of the ZALS class and asymptotic confidence intervals are developed. The Monte Carlo simulation study is performed to examine the properties of MLEs of the parameters considering the ZALPE and ZALSt distributions. The MLEs have a good performance. The empirical distributions of the MLEs close to normal distribution. Finally, an application using real data set is presented in order to show the flexibility and variety of the ZALS class. Fitted ZALSt distribution has better performance than competitive distributions based on AIC and BIC criteria.

ACKNOWLEDGMENTS

This research work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and FACEPE from Brazil. The authors would like to thank the Associate Editor and anonymous referees for their constructive comments on an earlier version of the manuscript, which resulted in this improved version.

REFERENCES

  • AITCHISON J & BROWN JAC. 1957. The Lognormal distribution with special reference to its uses in economics, 1st ed., Cambridge University Press.
  • BIRNBAUM ZW & SAUNDERS SC. 1968. A new family of life distributions. J Appl Probab 6: 319-327.
  • CASTELLACCI G. 2012. A formula for the quantiles of mixtures of distributions with disjoint supports. URL http://ssrn.com/abstract=2055022
    » http://ssrn.com/abstract=2055022
  • DÍAZ-GÁRCIA JA & LEIVA V. 2005. A new family of life distributions based on elliptically contoured distributions. J Stat Plan Inference 128(2): 445-327.
  • DUAN N, MANNING WG, MORRIS CN & NEWHOUSE JP. 1983. A comparison of alternative models for the demand for medical care. J Bus Econ Stat 1(2): 115-327.
  • FANG KT, KOTZ S & NG KW. 1990. Symmetric Multivariate and Related Distributions. London: Chapman & Hall.
  • HASHIMOTO EM, ORTEGA EMM, CORDEIRO GM, CANCHO VG & KLAUBERG C. 2019. Zero-spiked regression models generated by gamma random variables with application in the resin oil production. J Stat Comput Simul 89(1): 52-70.
  • HELLER G, STASINOPOULOS M & RIGBY B. 2006. The zero-adjusted inverse Gaussian distribution as a model for insurance claims. Proc 21th Int Work Stat Model 1984: 1-8. URL http://en.scientificcommons.org/58942171.
  • JONES MC. 2008. On reciprocal symmetry. J Stat Plan Inference 138(10): URL 3039-3043.
  • LAMBERT D. 1992. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1): 1-14.
  • LAWLESS JF. 2003. Statistical Models and Methods for Lifetime Data, 2nd ed., New York: John Wiley & Sons.
  • LEHMANN EL & CASELLA G. 1998. Theory of point estimation, 2nd ed., New York: Springer.
  • LEIVA V, RIQUELME M, BALAKRISHNAN N & SANHUEZA A. 2008. Lifetime analysis based on the generalized Birnbaum-Saunders distribution. Comp Statis Data Analys 52(4): 2079-2097.
  • LEIVA V, SANTOS-NETO M, CYSNEIROS FJA & BARROS M. 2016. A methodology for stochastic inventory models based on a zero-adjusted Birnbaum-Saunders distribution. Appl Stoch Models Bus Ind 32(1): 74-89.
  • MARTIN TG, WINTLE BA, RHODES JR, KUHNERT PM, FIELD SA, LOW-CHOY SJ & POSSINGHAM HP. 2005. Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecol Lett 8(11): 1235-1246.
  • MEDEIROS FM & FERRARI SL. 2017. Small-sample testing inference in symmetric and log-symmetric linear regression models. Stat Neerl 71(3): 200-224.
  • PACE L & SALVAN A. 1997. Principles of Statistical Inference from a Neo-Fisherian Perspective. World Scientific Publishing: Singapore.
  • PUIG P. 2008. A note on the harmonic law: a two-parameter family of distributions for ratios. Stat Probab Lett 78(3): 320-326.
  • R CORE TEAM. 2019. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL https://www.R-project.org/
    » https://www.R-project.org/
  • RODRIGUES-MOTTA M, GALVIS SOTO DM, LACHOS VH, VILCA F, BALTAR VT, JUNIOR EV, FISBERG RM & LOBO MARCHIONI DM. 2015. A mixed-effect model for positive responses augmented by zeros. Stat Med 34: 1761-1778.
  • STASINOPOULOS DM & RIGBY RA. 2007. Generalized additive models for location scale and shape (GAMLSS) in R. J Stat Softw 23(7): 1-46.
  • VANEGAS LH & PAULA GA. 2016. Log-symmetric distributions: statistical properties and parameter estimation. Braz J Probab Stat 30(2): 196-220.
  • VENTURA M, SAULO H, LEIVA V & MONSUETO S. 2019. Log-symmetric regression models: information criteria and application to movie business and industry data with economic implications. Appl Stoch Models Bus Ind 35(4): 963-977.

Publication Dates

  • Publication in this collection
    31 July 2023
  • Date of issue
    2023

History

  • Received
    8 June 2020
  • Accepted
    22 Feb 2021
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100, CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br