Acessibilidade / Reportar erro

OCCURRENCE OF RAILWAY ACCIDENTS RELATED TO SOME PERSONAL AND PROFESSIONAL TRAIN CONDUCTOR FACTORS: USE OF STATISTICAL MODELS IN THE PRESENCE OF EXCESS OF ZEROS

ABSTRACT

The main goal of this research was to identify personal and professional factors of train drivers which imply in the occurrence of accidents (denoted in this study as situations to non-compliance with procedures of the train driver consisting of unsafe acts that could cause major accidents) of a Brazilian logistics company related to accidentes in the period from 2014 to 2016. The research involved 348 drivers and some independent variables associated with each worker such as distances of removal (railway sections), driver’s age, working time (time working in the company), marital status, number of trips (number of train trips as a driver); months of work (months of work in the company) and total hours driving trains. Responses of interest are accident rates per work month and accident counts. Under a Bayesian approach, Beta and Poisson regression models were assumed in the presence of excess zeros. For the accident rate/months of work response, the covariates number of trips, posting distances and age showed some evidence of significant effects. For the response accident count, the covariate hours driving trains shows some evidence that it is a significant covariate. These results may be of great interest to rail logistics company managers to improve rail safety using some Bayesian modeling approaches in the presence of excess zeros.

Keywords:
railroad accidents; beta regression in the presence of excess of zeros; bayesian analysis

1 INTRODUCTION

Trains usually are a safe way to travel. However, railway disasters could happen in many occasions. Each year there are thousands of accidents worldwide with injuries involving trains, and hundreds of individuals are killed in these types of accidents. In general, some of the most common causes of train accidents (https://www.burge-law.com/what-are-the-causes-of-railway-accidents/) include:

  • Train operator error: Human error could be an important factor in railway accidents (poor training, inexperience, reckless behavior, or a combination of these). As reckless behavior we could include operating the train too fast.

  • Track problems: Track owners are responsible for keeping their tracks maintained and in good repair.

  • Lack of warning signals: In some parts of the railway could be lack of warning signals where motorists, bicyclists, and pedestrians may not realize that a train is coming.

  • Warning signal defects: The warning signals could have not been maintained properly or there are malfunctions.

  • Obstructed view of the railroad crossing: Sometimes trees and other vegetation become overgrown, which could obstruct the view of the crossing.

  • Stalled vehicles: Some train-vehicle crashes could happen because a vehicle gets stalled on top of the track, often due to a mechanical failure.

  • Distractions: Some railway accidents are caused by distractions such as sending texts or other smartphone activity.

  • Faulty Equipment: A train accident could happen due to some type of mechanical defect.

Other factors, such as organizational aspects, supervision characteristics, physical and technological factors, conditions of the operator (such as mental and physical state and limitations) and of the team Shappell & Wiegmann (2000SHAPPELL SA & WIEGMANN DA. 2000. The human factors analysis and classification system - HFACS, US Department of Transportation, 1-15.) can cause train accidents.In this study, train accidents denote situations to non-compliance with procedures.Non-compliance with procedures, treated in the literature as violation of procedures, errors and unsafe acts. The violations of procedures can have different results, ranging from a violation without no consequences, going through violations that cause incidents and reaching violations that cause major accidents. Some studies (Tavares et al. (2021TAVARES FM, HERMOSILLA JLG, ACHCAR JA & DA SILVA E. 2021. O perfil do trabalhador e sua relação com a ocorrência de atos inseguros: o caso de maquinistas de trens de uma empresa de logística. Conjecturas, 21(3): 98-121.); Evans (2011EVANS AW. 2011. Fatal train accidents on Europe’s railways: 1980-2009. Accident Analysis & Prevention, 43(1): 391-401.); Kyriakidis et al. (2015KYRIAKIDIS M, PAK KT & MAJUMDAR A. 2015. Railway accidents caused by human error: historic analysis of UK railways, 1945 to 2012. Transportation Research Record, 2476(1): 126-136.)) relate unsafe acts by train drivers (unintentional failures in the mental or physical activities of individuals) and violations (disobedience of existing operational procedures in the organization) such as main cause of accidents in the railway sector. Many other studies on railway accidents and their causes are introduced in the literature. Wasnik (2010WASNIK R. 2010. Analysis of railway fatalities in central India. Governing Council 2010-2012, 32: 311.) introduced an analysis of railway fatalities in Central India; San Kim & Yoon (2013SAN KIM D & YOON WC. 2013. An accident causation model for the railway industry: Application of the model to 80 rail accident investigation reports from the UK. Safety Science, 60: 57-68.) considered an accident caution model for the railway industry with application of the model to 80 rail accident investigation reports from the UK; Shi et al. (2020SHI J, WANG Y & ZHENG W. 2020. Correlation Analysis of Causes of Railway Accidents Based on Improved Apriori Algorithm. In: 2020 13th International Symposium on Computational Intelligence and Design (ISCID). pp. 274-277. IEEE.) considered a correlation analysis of causes of railway accidents based on an mathematical algorithm; Wang et al. (2020WANG Y, ZHENG W, DONG H & GAO P. 2020. Factors correlation mining on railway accidents using association rule learning algorithm. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). pp. 1-6. IEEE.) considered a study on correlation factors on railway accidents using association rule learning algorithms; Aher & Tiwari (2018AHER SB & TIWARI DR. 2018. Railway disasters in India: causes, effects and management. Int J Rev Res Soc Sci, 6(2): 125-32.) studied railway accidents in India considering impacts of causes, effects and management; Bala & Bhasin (2018BALA M & BHASIN A. 2018. A Review on Analysis of Railway Traffic Accident with Data Mining Techniques. International Journal of Computer Sciences and Engineering, 6(6): 1251-1256.) introduced a review on analysis of railway traffic accident using data mining technique; Dhaygude et al. (2019bDHAYGUDE A, DEOKAR Y, BASTE N, DENGALE R & DARADE M. 2019b. Study of railway accidents. International Journal of Scientific & Engineering Research (IJSER), 10(4): 223-230., aDHAYGUDE A, DEOKAR Y, BASTE N, DENGALE R & DARADE M. 2019a. Statistical Analysis of Railway Accidents. International Research Journal of Engineering and Technology (IRJET), 6: 4539-4544.) introduced different statistical analysis of railway accidents; Tavares et al. (2021TAVARES FM, HERMOSILLA JLG, ACHCAR JA & DA SILVA E. 2021. O perfil do trabalhador e sua relação com a ocorrência de atos inseguros: o caso de maquinistas de trens de uma empresa de logística. Conjecturas, 21(3): 98-121.) considered a study on the worker’s profile and its relationship with the occurrence of unsafe acts assuming the case of train drivers of a logistics company in Brazil.

Hong et al. (2023HONG WT, CLIFTON G & NELSON JD. 2023. Railway accident causation analysis: current approaches, challenges and potential solutions. Accident Analysis & Prevention, 186: 107049.) carried out a literature review analysis on the causality analysis of railway accidents, investigating the application of Natural Language Processing (NLP) to assist in the analysis; Rad et al. (2023RAD MA, LEFSRUD LM & HENDRY MT. 2023. Application of systems thinking accident analysis methods: A review for railways. Safety Science, 160: 106066.) presented the lack of a comprehensive review of the literature on systemic modeling of railway accidents, analyzing accidents on railways from 2000 to 2022; Wang et al. (2023WANG N, YANG X, CHEN J, WANG H & WU J. 2023. Hazards correlation analysis of railway accidents: A real-world case study based on the decade-long UK railway accident data. Safety Science, 166: 106238.) proposed a modeling method to analyze the correlation of hazards in railway accidents based on graph knowledge theory identifying several important hazards; Liu et al. (2024KYRIAKIDIS M, PAK KT & MAJUMDAR A. 2015. Railway accidents caused by human error: historic analysis of UK railways, 1945 to 2012. Transportation Research Record, 2476(1): 126-136.) proposed a method called Comprehensive-Biased Random Walk with Different Restart (CBDRWR) in an analysis of the potential risk of the railway accident generation process; Yan et al. (2023YAN D, LI K, ZHU Q & LIU Y. 2023. A railway accident prevention method based on reinforcement learning-Active preventive strategy by multi-modal data. Reliability Engineering & System Safety, 234: 109136.) presented a railway accident prevention method based on the reinforcement learning model and multi-modal data to achieve active railway accident prevention strategies. In Brazil, the performance of this type of transport is worrying, as it is a country that has a railway network in operation of approximately 30,000 km (Murta et al., 2023MURTA ALS, MURTA MDPA, DE FREITAS MS & DAS NEVES FMDA. 2023. CO2 reduction in brazilian road and rail transport. Revista Valore, 8: 8015. Available at: https://api.semanticscholar.org/CorpusID:258751889.
https://api.semanticscholar.org/CorpusID...
), almost all are freight trains, and has a railway accident rate higher than the accident rate railways in the European Union, which has a railway network of 250,000 km. We observe that both personal, technical and structural investments in Brazilian railways are not sufficient to contain railway accidents. In this study, we address some important issues that concern the personal and professional characteristics of train conductors, such as understanding the level of stress, fatigue, decision-making capacity, as well as the emotional state that could affect the occurrence of non-compliance with procedures by the train conductors which could lead to accidents. Once the characteristics of the conductor are known, it is possible to establish training and management programs for such circumstances. Promoting safety in the railway sector requires a broad knowledge of the factors that contribute to a significant reduction in accidents on railway lines, so it is viable to promote a culture that values and encourages safety practices among railway professionals. Another important factor for safer rail travel would be the drafting and implementation of laws and regulations based on the personal characteristics of rail operators. Among the studies on Brazilian railways, we can highlight the study introduced by Georgiou (2009GEORGIOU I. 2009. Mapping railway development prospects in Brazil. Transport Reviews, 29(6): 685-714.), who mapped the dynamics in the investigation of Brazilian railway development, identifying two problems: the misappropriation of public resources and a degenerative feedback system in decision-making; Keretch & De Paiva (2016KERETCH EM & DE PAIVA CEL. 2016. Rail Accidents Caused by Failure on the Permanent Way. Journal of Traffic and Logistics Engineering, 4(1). Available at: https://api.semanticscholar.org/CorpusID:115020659.
https://api.semanticscholar.org/CorpusID...
) introduced a study on the number of accidents on railways due to different causes in Brazil. They observed that despite the increase in products transported, the number of accidents decreased from 1,638 in 2006 to 866 in 2013 according to a report from the Brazilian National Land Transport Agency (ANTT). In other study, Araújo & da Silva Sousa (2023ARAÚJO MCM & DA SILVA SOUSA MA. 2023. Ferrovias brasileiras: Histórico e processo de estagnação. Engenharia, Gestão e Inovação, 8: 42-51.) analyzed the reasons for the eradication of many railway branches in the last decades under a economy perspective. Souza et al. (2019SOUZA APD, TAVARES FM, BONIKOWSKI RTR & PEREIRA VHM. 2019. Reduction of number of railroad accidents with locomotive as the main cause. Final Project (Rail Transportation Management) - Deutsche Bahn; Instituto de Transporte e Logística, Brasília.) presented a study showing that the assembly of the wheelset, installation of the engine and traction assembly and wheel machining are the main failures that cause locomotives to derail; Souza et al. (2023SOUZA E, BITTENCOURT T, AMES I, RIBEIRO D & CARVALHO H. 2023. Drive-by damage detection methodology for high-speed railway bridges applying Mel-frequency cepstral coefficients. In: Life-Cycle of Structures and Infrastructure Systems. pp. 252-259. CRC Press.) also studied important factors that could possibly contribute to the occurrence and severity of these accidents. It is known that the maintenance process is essential for whatever the means of transport to avoid possible accidents, however, among the railway maintenance proposals, we can highlight the study by Arruda et al. (2022ARRUDA J, FRANCO L & VASCONCELOS L. 2022. Redução de custo com inspeção e manutenção em rodeiros de vagões ferroviários: Um estudo de casos nos rolamentos de rodeiros ferroviários. In: Gestão e Manutenção Industrial e Mineração, vol. 2. Poisson.) who evaluated a possible “Hot Box” failure, which is a failure that arises through temperature conditions in the bearings. de Almeida Eleutério & Rosa (2023DE ALMEIDA ELEUTÉRIO G & ROSA RA. 2023. Planejamento das rotas dos recursos ferroviários para realização da manutenção ferroviária considerando sincronismo, precedência e prioridade. Transportes, 31(2): e2644-e2644.) proposed a mathematical model to plan resource routes to meet the maintenance order, maximizing the number of maintenance orders fulfilled in the planned period.

In this study, we consider as the main objective of research, to discover possible relationships between personal and professional characteristics of train drivers and some factors related to the structure of a railroad located in southern Brazil with the occurrence of accidents. The quantitative research involved 348 train conductors related to the occurrence of accidents in the period of years 2014/2016 and some independent variables (factors) associated with each train driver. The responses of interest studied are the accident rates per month of work (number of accidents/months of work) and total accident count for each train conductor. The dataset shows excess of zeros, that is, many workers had no accidents in the period, which is common in the railway area where accidents are not so frequent. As the accident rates per month of work (number of accidents/months of work) are defined in the interval (0,1), we assume beta regression models adapted for the presence of excess of zeros for data analysis (zero-inflated Beta or ZIB model). For the accident count response by each worker, we assume a Poisson regression model also in the presence of excess of zeros (zero-inflated Poisson or ZIP model). Given the difficulty to obtain usual classical inferences (maximum likelihood estimators), we use a Bayesian inference approach and MCMC (Markov Chain Monte Carlo) simulation methods to simulate samples from the joint posterior distributions of interest.

The article is organized from here as follows: Section 2 presents the data and a preliminary statistical analysis; Section 3 introduces the proposed methodology; Section 4 presents the obtained results; finally, Section 5 presents some concluding remarks.

2 DATA AND PRELIMINARY STATISTICAL ANALYSIS

The data set has information on 348 train conductors related to the occurrence of accidents and information on some personal and professional covariates associated with each worker (independent variables) as detachment distances (railway sections), train conductor age, job time (time working in the company), marital status (1: married; 2: single), trip count (number of train rides as conductor); work months (months working in the company) and total hours conducting trains (Appendix 1). The responses of interest studied are the accident rates per month of work (number of accidents/months of work) and total accident count for each driver.

As a preliminary analysis of the data, we initially consider the binary responses (occurrence or not of accidents) denoting as success the non-occurrence of accidents for the train conductor (success denoted as Y=1) and as failure the occurrence of one or more accidents for the train conductor (failure denotes as Y=0) considering the total data set (n=348 observations) where the binary random variable Y has a Bernoulli distribution with probability of success p given by P(Y=y)=p y (1-p)n -y , y=0 or 1 and a logistic regression model (see for example, Montgomery & Runger (2010MONTGOMERY DC & RUNGER GC. 2010. Applied Statistics and Probability for Engineers. John Wiley & Sons.)) given by,

log i t p i = log p i / 1 - p i = β 0 + β 1 d e t a c h m e n t . d i s tan c e s i + β 2 a g e i + β 3 j o b . t i m e i + β 4 m a r i t a l . s t a t u s i + β 5 t r i p . c o u n t i + β 6 w o r k . m o n t h s i + β 7 h o u r s . c o n d u c t i n g . t r a i n s i (1)

where, i=1, 2, ..., 348.

From the data set, we observed 135 failures (train conductors with accidents) and 213 successes (train conductors without accidents). Table 1 shows the results (maximum likelihood estimators - MLE) of the regression parameters associated with the covariates detachment distances, train conductor age, job time, marital status, trip count, work months and total hours conducting trains (use of the Minitab software®).

Table 1
MLE, standard errors (SE) and p-values (logistic regresssion).

From the results in Table 1, we observe that the covariates detachment distances, trip count and work months show significant effects on the probabilities of no occurrence of accidents since the associated p-values are smaller than 0.05. From the signals of the MLE in each case, we conclude that larger detachment distances (negative MLE estimator) implies in smaller probabilities p of no occurrence of accidents (possible long distances with few stops increase the speed of the trains implying in higher chance for accidents), showing an increase on the probability 1-p to have accidents; also larger trip counts (negative MLE estimator) implies in smaller probabilities of a train conductor do not have accidents (increasing the probability to have accidents associated to great number of trips, possibly leading to stress of the train conductor). In the contrary, increasing work months (months working in the company) increases (positive estimator) the probability of a train conductor do not have accidents (more experience of the conductor increases the probability of success, that is, the probability of no occurrence of accidents).

Since the main goal of this study is related to the responses to accident rates per month of work (number of accidents/months of work) and total accident count for each driver associated with the covariates detachment distances, train conductor age, job time, marital status, trip count, work months and total hours conducting trains, we need to use more elaborated statistical models related to rates and count of accidents. In this way, we assume beta regression models for the rates and Poisson regression models for the accident counts for each train conductor.

As the dataset shows excess of zero responses, we need to use existing zero-inflated statistical models which consider the data as a mixture of observations with one component consisting of zero responses and another component consisting of non-zero responses, where we need to check the possible dependences between the responses (rates or count) associated with each covariate. Figure 1 shows the scatter plots of the two responses (accident rates per month of work and total accident count for each driver) associated with each covariate only considering the observations with non-zero count of accidents.

Figure 1
Scatter plots (accident rates by months worked and total accident counts for each conductor) associated to each covariate.

From Figure 1, we observe that it is difficult to conclude which covariates affect the responses (accident rates per month worked and total accident counts for each train conductor). Possibly trip count and detachment distances affect the response accident rates per month worked and total accident count for each train conductor affect the response count of accidents, but it is needed a good statistical model to discover possible covariates affecting the two responses.

The main goals of this study are:

  • To verify statistically if some covariate affects the response given by the accident rates per month worked assuming a beta regression model adapted for the presence of excess zeros for data analysis (zero-inflated Beta or ZIB model).

  • To verify statistically if some covariate affects the response given by the number of accidents assuming a Poisson regression model adapted for the presence of excess zeros for data analysis (zero-inflated Poisson or ZIP model).

3 METHODS

In this section, we present the statistical models used in the data analysis.

3.1 The zero-inflated Poisson (ZIP) model

An important assumption of the Poisson distribution is that the variance of the count outcome is equal to the mean. In practical work this assumption could be not verified, that is, we have ‘overdispersion’. The zero-inflated Poisson (ZIP) is an alternative to deal with this problem. This model assumes that there are two different types of individuals in the data:

  • Individuals with zero count (no occurrence of accidents) with a probability p (0-group).

  • Individuals with counts (number of accidents different of zero) that could be predicted by the standard Poisson distribution (not 0-group).

We could have zero count from each one of the two groups: if the zero is from the 0-group, it indicates that the observation is free from the probability of having a positive outcome Scott Long (1997SCOTT LONG J. 1997. Regression Models for Categorical and Limited Dependent Variables. Advanced Quantitative Techniques in the Social Sciences, 7.); Hall (2000HALL DB. 2000. Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics, 56(4): 1030-1039.). The overall model is a mixture of the probabilities from the two groups, which allows for both the overdispersion and excess zeros that cannot be predicted by the standard Poisson model.

The binary outcome to be in the 0-group could be modeled by a binary Bernoulli distribution with success probability p. The probability of outcome not be in the 0-group is given by 1-p. For an observation not belonging to the 0-group, we could assume a standard Poisson distribution with mass probability function given by

f y = P Y = y = e - μ μ y y ! , y = 1 , 2 , 3 , . . . (2)

where µ is the conditional mean given the outcome belong to the not 0-group. In this way, the mixed probabilities for ZIP are expressed as follows:

  • Zero counts in 0-group: P(Y=0)=p

  • Non zero counts in not 0-group: P(Y=y)=(1-p)e-µµy /y!

  • Overall, we have,

P Y = y = p i f y = 0 a n d P Y = y = 1 - p e - μ μ y / y ! , i f y > 0 (3)

Since 0≤p≤1, the overall mean of the ZIP given by E(Y)=µ(1-p) is smaller than the conditional mean µ. The ZIP structure also shows overdispersion, since the overal variance is given by var(Y)=µ(1-p)(1+µp) (see Erdman et al. (2008ERDMAN D, JACKSON L, SINKO A et al. 2008. Zero-inflated Poisson and zero-inflated negative binomial models using the COUNTREG procedure. Sas Global Forum, 2008: 1-11.)).

Assuming an indicator variable δ=1 if Y=0 and δ=0 if Y>0, the contribution of one observation to the likelihood function for µ and p is given by,

L μ , p = p δ 1 - p e - μ μ y / y ! 1 - δ (4)

In presence of a vector of p covariates x=(x 1, x 2, ..., x p ), we assume the regression model, µ=exp(x’β), where x’β=β01 x 12 x 2+...+βp x p .

3.2 The zero-inflated Beta (ZIB) model

Following the same arguments presented in Section (3.1) for the ZIP model, we now assume a Beta distribution for the rates (accident rates by months of work) that is a continuous random variable defined in the interval (0,1). The probability density function for the Y (rate) assuming a Beta distribution, is given by,

f y = c y a - 1 1 - y b - 1 , 0 < y < 1 (5)

where c is the Beta function, given by c=B(a, b)=Γ(a+b)/Γ(a)Γ(b) and the conditional mean given the outcome belong to the not 0-group is given by µ=E(Y)=a/(a+b). The conditional variance is given by, var(Y)=ab/[(a+b)2(a+b+1).

In this way, the mixed probabilities for ZIB are expressed as follows:

  • For the zero counts in 0-group we have the probabilities:

P Y = 0 = p a n d P Y > 0 = 1 - p

  • For the non zero counts in not always-0 group, we have the probability density function:

f 1 y = 1 - p Γ a + b / Γ a Γ b y a - 1 1 - y b - 1

  • Overall

P Y = y = p i f y = 0 a n d P Y = y = 1 - p c y a - 1 1 - y b - 1 i f y > 0 (6)

Since 0≤y≤1, the overall mean of the ZIB model is given by E(Y)=µ(1-p), where µ=a/(a+b).

Assuming an indicator variable δ=1 if Y=0 and δ=0 if Y>0, the contribution of one observation to the likelihood function for a, b and p is given by,

L a , b , p = p δ 1 - p c y a - 1 1 - y b - 1 1 - δ (7)

In presence of a vector of covariates associated to each unit, it is assumed a regression model considering a reparametrized form for the beta distribution with density (5) given by, µ=a/(a+b) and Φ=a+b (Ferrari & Cribari-Neto (2004FERRARI S & CRIBARI-NETO F. 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7): 799-815.), Jørgensen (1997JØRGENSEN B. 1997. Proper dispersion models. Brazilian Journal of Probability and Statistics, 11(2): 89-128.), da Silva et al. (2021DA SILVA LRL, ACHCAR JA & HERMOSILLA JLG. 2021. Longitudinal work ability index (wai): a case study with workers in an agricultural research company. Independent Journal of Management & Production, 12(2): 470-485.)). In this way, we have, a=Φµ, b=(1-µ)Φ, E(Y)=µ and var(Y)=V(µ)/(1+Φ) where V(µ)=µ(1-µ), so that µ is the mean of the response variable and Φ can be interpreted as a precision parameter in the sense that, for fixed µ, the larger the value of Φ, the smaller the variance of Y. The probability density function of the random variable Y can be written, in the new parameterization, as,

f y / μ , Φ = Γ Φ Γ Φ μ Γ 1 - μ Φ y Φ μ - 1 1 - y 1 - μ Φ - 1 (8)

where 0<µ<1 and Φ>0.

Assuming the presence of a covariate vector x=(x 1, x 2, ..., x p )’ with p covariates associated to each observation, it is assumed the following regression model for the mean Cepeda-Cuervo et al. (2014CEPEDA-CUERVO E, ACHCAR JA & LOPERA LG. 2014. Bivariate beta regression models: joint modeling of the mean, dispersion and association parameters. Journal of Applied Statistics, 41(3): 677-687.),

log i t μ = log μ / 1 - μ = β ' x = β 0 + β 1 x 1 + β 2 x 2 + + β p x p (9)

where β=(β0, β1, β2, ..., βp )’ is a vector of regression parameters.

We assume a Bayesian analysis for the data assuming both classes of assumed models (accident rates and accident counts). Combining the joint prior distribution for the parameters of each assumed model, the joint posterior distribution for the parameters of the model is obtained using the Bayes formula Box & Tiao (1973BOX GE & TIAO GC. 1973. Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.). The posterior summaries of interest are obtained using Markov Chain Monte Carlo (MCMC) simulation methods as the popular Gibbs sampling algorithm or the Metropolis-Hastings algorithm (Gelfand & Smith (1990GELFAND AE & SMITH AF. 1990. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410): 398-409.); Chib & Greenberg (1995CHIB S & GREENBERG E. 1995. Understanding the metropolis-hastings algorithm. The American Statistician, 49(4): 327-335.)) using the free existing OpenBugs software (Lunn et al. (2000LUNN DJ, THOMAS A, BEST N & SPIEGELHALTER D. 2000. WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing, 10: 325-337.)).

4 RESULTS

From the data of 348 train conductors, we observed 213 observations (train conductors) with zero occurrences of accidents, that is, workers without accidents. Thus, the proportion of values equal to zero is 0.6121 (61.21%) and the proportion of individuals with accidents is given by 1-p=0.3879 (38.79%).

4.1 First response of interest: accident rate per months worked

As all observed accident rates per months worked are given in the interval (0,1), that is, the number of accidents per worker (a rare event) is always less than the months worked in the company, we assume the rates assumed by a logit transformation, that is, the responses are given by y=log[rate/(1-rate)].

Initially we assume the beta regression model defined in Section 3.2 without assuming the presence of covariates, defined by (5) and (6). For a Bayesian analysis, we assume uniform independent prior distributions, that is, a~U (0.100), b~U (0.1000) and p~U (0.1) where U (α, β) denotes a uniform distribution in the interval (α, β). Thus we are assuming non-informative prior distributions for the parameters a, b and p. Using the Openbugs software, we initially generated 11,000 Gibbs samples, discarded to eliminate the effect of initial values in the iterative procedure of simulating samples of the joint posterior distribution for a, b and p. Next, we simulated another 10,000 samples by choosing each 10th generated sample, totaling 1,000 samples to be used to get the posterior summaries of interest. The convergence of the simulation algorithm was verified from graphs of the samples generated for each parameter. Table 2 shows the posterior summaries of interest (posterior means, posterior standard deviations and 95% credibility intervals for each parameter). The posterior means are the estimators of the parameters obtained by assuming a quadratic loss function.

Table 2
Posterior summaries (accident rates/months of work without covariates).

The conditional mean (only for workers where accidents are observed) of the beta distribution (4) is given by a/(a+b)=4.571/(4.571+98.22)=0.04446887. The conditional sample mean obtained from the data is given by, 5.99722/153=0.04442385. The proportion of zeros estimated by the model is given by 0.6113. As the proportion of zeros in the sample is given by 0.6121, we conclude that the proposed model is well fitted by the data.

The non-conditional sample mean is given from the data by 0.0172334 and the non-conditional mean estimated by the model is given by, µ(1-p)=0.0172850 where µ=a/(a+b)=4.571/(4.571+98.22)=0.04446887, again indicating the excellent fit of the model to the data.

With the presence of the covariates detachment distances, conductor age, conductor company time, marital status, amount of train rides, sum of work months and hours conducting trains, we assume the regression model defined by (5), (6), (8)) and (9), that is,

log i t μ i = log μ i / 1 - μ i = β 0 + β 1 d e t a c h m e n t . d i s tan c e s i + β 2 a g e i + β 3 j o b . t i m e i + β 4 m a r i t a l . s t a t u s i + β 5 t r i p . c o u n t i + β 6 w o r k . m o n t h s i + β 7 h o u r s . c o n d u c t i n g . t r i a n s i (10)

where i=1, ..., 348. For a Bayesian analysis, we assume normal independent prior distributions for the regression parameters, that is, β0~N(0, 1), βj ~N(0, 0.1), j=1, ..., 7 where N(0, 1) denotes a normal distribution with a mean equal to zero and variance equal to one and uniform prior distributions p~U(0, 1) and Φ~U(1, 10) for the parameters p and Φ. Also using the Openbugs software and the same previously used simulation scheme (11,000 as a burn-in sample and 1,000 additional samples chosen from a total of 50,000 samples chosen from 50 out of 50), we obtain the posterior summaries of interest. Table 3 shows the posterior summaries of interest.

Table 3
Posterior summaries (accident rates/months of work).

From the results of Table 3, we can conclude that the covariate amount of train rides (trip count) shows evidence that it is a significant covariate in the response accident rates/months of work because zero is not included in the 95% credibility interval for the regression parameter β5 (-0.6222; -0.0664). The β5 regression parameter estimator is negative (-0.2892). Thus, with the increase in train rides, there is a decrease in the accident rate per months worked. The covariates detachment distances and age also show some evidence of significant effects on the response on accident rates/work months as zero is almost not included in the 95% credibility intervals for the regression parameters β1 (-0.4384; 0.0981) and β2 (-0.0868; 0.6272). We observed a negative effect (β1 is estimated by a negative value) of the covariate detachment distances on the response accident rates/work months, that is, increasing the distances between stops decreases the accident/work month rate and a positive effect (β2 is estimated by a positive value) of the covariate age on the response accident rates/months of work, that is, increasing the age of the driver increases the rate of accidents/months worked. All other covariates show no significant effects on the response as the zero value is included and well centered in the corresponding 95% credibility intervals for each regression parameter.

4.2 Second response of interest: number of accidents per train conductor

Considering now the counts of accidents per train conductor, where the value zero is frequent (no accidents), we consider in the analysis of the count data the zero-inflated Poisson (ZIP) model introduced in Section 3.1.

Initially we assume the zero-inflated Poisson (ZIP) regression model defined in Section 3.1 without assuming the presence of covariates, defined by (2) and (3). For a Bayesian analysis, we assume uniform independent prior distributions, that is, µ~G (0.1, 0.1) and p~U (0, 1) where G(α, β) denotes a gamma distribution with mean α/β and variance α/β2. Thus, we are assuming non-informative prior distributions for the parameters µ and p. Using the Openbugs software, we initially generated 1,000 Gibbs samples, discarded to eliminate the effect of the initial values in the iterative procedure of simulating samples of the joint posterior distribution for µ and p. Next, we simulate another 10,000 samples by choosing each 10th generated sample, totaling 1,000 samples used to find the posterior summaries of interest. The convergence of the simulation algorithm was verified from graphs of the samples generated for each parameter. Table 4 presents the posterior summaries of interest.

Table 4
Posterior summaries (accident counts without covariates).

The conditional mean µ (only for workers where accidents are observed) of the Poisson distribution (2) is estimated to be 1.384. The conditional sample mean obtained from the data is given by, 187/135=1.38519. The proportion of zeros estimated by the model is given by 0.6115. As the sample proportion of zeros is given by 0.6121, we conclude that the model is well fitted by the data.

The non-conditional sample mean is given from the data by 0.5374 and the non-conditional mean estimated by the model is given by, E(Y)=µ(1-p)=1.384(1-0.6115)=0.537684 again indicating the excellent fit of the model to the data

With the presence of the covariates detachment distances, conductor age, conductor company time, marital status, amount of train rides, sum of work months and hours conducting trains, we assume the regression model defined by (2), (3) and µ=exp(x’β), where x’β=β01 x 12 x 2+...+βp x p , that is,

log μ i = β 0 + β 1 d e t a c h m e n t . d i s tan c e s i + β 2 a g e i + β 3 j o b . t i m e i + β 4 m a r i t a l . s t a t u s i + β 5 t r i p . c o u n t i + β 6 w o r k . m o n t h s i + β 7 h o u r s . c o n d u c t i n g . t r i a n s i (11)

where i=1, ..., 348. For a Bayesian analysis, we assume normal independent prior distributions for the regression parameters, that is, β0~N(0, 1), βj ~N(0, 0.1), j=1, ..., 7 and a uniform uniform prior distribution, p~U(0, 1) for the parameter p. Also using the Openbugs software and the same simulation scheme used earlier (311,000 as a burn-in sample and 1,000 additional samples chosen from a total of 400,000 samples chosen from 100 out of 100), we obtain the posterior summaries of interest. Table 5 shows the posterior summaries of interest.

Table 5
Posterior summaries (accident counts with covariates).

We can conclude that the covariate hours conducting trains shows some evidence that it is a significant covariate in the accident count response (zero is almost not included (upper limit -0.0006 close to zero) in the 80% credibility interval for the regression parameter β7. The Bayesian estimator of the regression parameter β7 is positive. Thus, with the increasing of hours conducting trains, there is a decrease in the accident count. This shows that with more experience of the conductor, there is a decrease in the number of accidents on the railway. All other covariates do not show significant effects on the response as the zero value is included and well centered in the corresponding 80% credibility intervals for each regression parameter.

5 CONCLUDING REMARKS

This study identified some personal and professional factors of train conductors of a railway logistics company with the occurrence of accidents in the period of years ranging from 2014 to 2016 where the responses of interest are given by the accident rates per month of work (number of accidents/months of work) and accident count. From a statistical analysis of the dataset using Beta and Poisson regression models in the presence of excess of zeros under a Bayesian approach using MCMC simulation methods, it was possible to identify some important results:

  • For the response accident rates/months of work the covariate amount of train rides (trip count) shows evidence that it is a significant covariate in the response. The covariates detachment distances and conductor age also show some evidence of significant effects on the response on accident rates/work months.

  • For the response accident count, the covariate hours conducting trains shows some evidence that it is a significant covariate in the accident count.

These results could be of great interest to the managers of the railway logistics company, to improve the railway safety.

The use of the proposed Beta and Poisson regression models also could be assumed for other situations, especially in industrial and transport accidents where there are many zero values (no occurrence of accidents) related to workers. It is important to point out, that the use of standard usual statistical models is not appropriate in the statistical analysis of this type of data.

Under a Bayesian approach using MCMC methods it is possible to get accurate inferences for the assumed mixture models. The use of the existing free software Openbug simplifies the simulation of samples of the joint posterior distribution, not requiring great computational knowledge. Other advantage of the Bayesian approach in applications: possibility to have informative prior distributions elicited from experts in the railway sector leading to more accurate results.

References

  • AHER SB & TIWARI DR. 2018. Railway disasters in India: causes, effects and management. Int J Rev Res Soc Sci, 6(2): 125-32.
  • ARAÚJO MCM & DA SILVA SOUSA MA. 2023. Ferrovias brasileiras: Histórico e processo de estagnação. Engenharia, Gestão e Inovação, 8: 42-51.
  • ARRUDA J, FRANCO L & VASCONCELOS L. 2022. Redução de custo com inspeção e manutenção em rodeiros de vagões ferroviários: Um estudo de casos nos rolamentos de rodeiros ferroviários. In: Gestão e Manutenção Industrial e Mineração, vol. 2. Poisson.
  • BALA M & BHASIN A. 2018. A Review on Analysis of Railway Traffic Accident with Data Mining Techniques. International Journal of Computer Sciences and Engineering, 6(6): 1251-1256.
  • BOX GE & TIAO GC. 1973. Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.
  • CEPEDA-CUERVO E, ACHCAR JA & LOPERA LG. 2014. Bivariate beta regression models: joint modeling of the mean, dispersion and association parameters. Journal of Applied Statistics, 41(3): 677-687.
  • CHIB S & GREENBERG E. 1995. Understanding the metropolis-hastings algorithm. The American Statistician, 49(4): 327-335.
  • DA SILVA LRL, ACHCAR JA & HERMOSILLA JLG. 2021. Longitudinal work ability index (wai): a case study with workers in an agricultural research company. Independent Journal of Management & Production, 12(2): 470-485.
  • DE ALMEIDA ELEUTÉRIO G & ROSA RA. 2023. Planejamento das rotas dos recursos ferroviários para realização da manutenção ferroviária considerando sincronismo, precedência e prioridade. Transportes, 31(2): e2644-e2644.
  • DHAYGUDE A, DEOKAR Y, BASTE N, DENGALE R & DARADE M. 2019a. Statistical Analysis of Railway Accidents. International Research Journal of Engineering and Technology (IRJET), 6: 4539-4544.
  • DHAYGUDE A, DEOKAR Y, BASTE N, DENGALE R & DARADE M. 2019b. Study of railway accidents. International Journal of Scientific & Engineering Research (IJSER), 10(4): 223-230.
  • ERDMAN D, JACKSON L, SINKO A et al. 2008. Zero-inflated Poisson and zero-inflated negative binomial models using the COUNTREG procedure. Sas Global Forum, 2008: 1-11.
  • EVANS AW. 2011. Fatal train accidents on Europe’s railways: 1980-2009. Accident Analysis & Prevention, 43(1): 391-401.
  • FERRARI S & CRIBARI-NETO F. 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7): 799-815.
  • GELFAND AE & SMITH AF. 1990. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410): 398-409.
  • GEORGIOU I. 2009. Mapping railway development prospects in Brazil. Transport Reviews, 29(6): 685-714.
  • HALL DB. 2000. Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics, 56(4): 1030-1039.
  • HONG WT, CLIFTON G & NELSON JD. 2023. Railway accident causation analysis: current approaches, challenges and potential solutions. Accident Analysis & Prevention, 186: 107049.
  • JØRGENSEN B. 1997. Proper dispersion models. Brazilian Journal of Probability and Statistics, 11(2): 89-128.
  • KERETCH EM & DE PAIVA CEL. 2016. Rail Accidents Caused by Failure on the Permanent Way. Journal of Traffic and Logistics Engineering, 4(1). Available at: https://api.semanticscholar.org/CorpusID:115020659
    » https://api.semanticscholar.org/CorpusID:115020659
  • KYRIAKIDIS M, PAK KT & MAJUMDAR A. 2015. Railway accidents caused by human error: historic analysis of UK railways, 1945 to 2012. Transportation Research Record, 2476(1): 126-136.
  • LIU Y, LI K & YAN D. 2024. Quantification analysis of potential risk in railway accidents: A new random walk based approach. Reliability Engineering & System Safety, 242: 109778.
  • LUNN DJ, THOMAS A, BEST N & SPIEGELHALTER D. 2000. WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing, 10: 325-337.
  • MONTGOMERY DC & RUNGER GC. 2010. Applied Statistics and Probability for Engineers. John Wiley & Sons.
  • MURTA ALS, MURTA MDPA, DE FREITAS MS & DAS NEVES FMDA. 2023. CO2 reduction in brazilian road and rail transport. Revista Valore, 8: 8015. Available at: https://api.semanticscholar.org/CorpusID:258751889
    » https://api.semanticscholar.org/CorpusID:258751889
  • RAD MA, LEFSRUD LM & HENDRY MT. 2023. Application of systems thinking accident analysis methods: A review for railways. Safety Science, 160: 106066.
  • SAN KIM D & YOON WC. 2013. An accident causation model for the railway industry: Application of the model to 80 rail accident investigation reports from the UK. Safety Science, 60: 57-68.
  • SCOTT LONG J. 1997. Regression Models for Categorical and Limited Dependent Variables. Advanced Quantitative Techniques in the Social Sciences, 7.
  • SHAPPELL SA & WIEGMANN DA. 2000. The human factors analysis and classification system - HFACS, US Department of Transportation, 1-15.
  • SHI J, WANG Y & ZHENG W. 2020. Correlation Analysis of Causes of Railway Accidents Based on Improved Apriori Algorithm. In: 2020 13th International Symposium on Computational Intelligence and Design (ISCID). pp. 274-277. IEEE.
  • SOUZA APD, TAVARES FM, BONIKOWSKI RTR & PEREIRA VHM. 2019. Reduction of number of railroad accidents with locomotive as the main cause. Final Project (Rail Transportation Management) - Deutsche Bahn; Instituto de Transporte e Logística, Brasília.
  • SOUZA E, BITTENCOURT T, AMES I, RIBEIRO D & CARVALHO H. 2023. Drive-by damage detection methodology for high-speed railway bridges applying Mel-frequency cepstral coefficients. In: Life-Cycle of Structures and Infrastructure Systems. pp. 252-259. CRC Press.
  • TAVARES FM, HERMOSILLA JLG, ACHCAR JA & DA SILVA E. 2021. O perfil do trabalhador e sua relação com a ocorrência de atos inseguros: o caso de maquinistas de trens de uma empresa de logística. Conjecturas, 21(3): 98-121.
  • WANG N, YANG X, CHEN J, WANG H & WU J. 2023. Hazards correlation analysis of railway accidents: A real-world case study based on the decade-long UK railway accident data. Safety Science, 166: 106238.
  • WANG Y, ZHENG W, DONG H & GAO P. 2020. Factors correlation mining on railway accidents using association rule learning algorithm. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). pp. 1-6. IEEE.
  • WASNIK R. 2010. Analysis of railway fatalities in central India. Governing Council 2010-2012, 32: 311.
  • YAN D, LI K, ZHU Q & LIU Y. 2023. A railway accident prevention method based on reinforcement learning-Active preventive strategy by multi-modal data. Reliability Engineering & System Safety, 234: 109136.

APPENDIX 1

Publication Dates

  • Publication in this collection
    02 Sept 2024
  • Date of issue
    2024

History

  • Received
    15 Jan 2024
  • Accepted
    21 May 2024
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br