Abstracts
The use of stochastic point processes to model the reliability of repairable systems has been a regular approach to establish survival measures in failure versus repair scenarios. However, the traditional processes do not consider the actual state in which an item returns to operational condition. The traditional renewal process considers an "as-good-as-new" philosophy, while a non-homogeneous Poisson process is based on the minimal repair concept. In this work, an approach based on the concept of Generalized Renewal Process (GRP) is presented, which is a generalization of the renewal process and the non-homogeneous Poisson process. A stochastic modeling is presented for systems availability analysis, including testing and/or preventive maintenances scheduling. To validate the proposed approach, it was performed a case study of a hypothetical auxiliary feed-water system of a nuclear power plant, using genetic algorithm as optimization tool.
availability; generalized renewal process; genetic algorithms
O uso de processos estocásticos pontuais para modelar a confiabilidade de sistemas reparáveis tem sido constante para estabelecer medidas de sobrevivência em cenários de falha versus reparo. Entretanto, os processos tradicionais não consideram o real estado no qual um item retorna à condição operacional. O processo de renovação tradicional considera uma filosofia de "tão-bom-quanto-novo", enquanto um processo não-homogêneo de Poisson é baseado no conceito de reparo mínimo. Neste trabalho, é apresentada uma abordagem baseada no conceito de Processo de Renovação Generalizado (PRG), que é uma generalização de processo de renovação e de processo não-homogêneo de Poisson. Uma modelagem estocástica será apresentada para análise de disponibilidade de sistemas, incluindo planejamento de testes e/ou manutenção preventiva. Para validar a abordagem proposta, um estudo de caso foi desenvolvido para um hipotético sistema auxiliar de água de alimentação de uma usina nuclear, usando algoritmo genético como ferramenta de otimização.
disponibilidade; processo de renovação generalizado; algoritmos genéticos
Testing and preventive maintenance scheduling optimization for aging systems modeled by generalized renewal process
Vinícius Correa DamasoI,*; Pauli Adriano de Almada GarciaII
ICentro Tecnológico do Exército (CTEx) Rio de Janeiro RJ. damaso@ctex.eb.br
IIUniversidade Federal Fluminense (UFF) Volta Redonda RJ. pauliadriano@gmail.com
ABSTRACT
The use of stochastic point processes to model the reliability of repairable systems has been a regular approach to establish survival measures in failure versus repair scenarios. However, the traditional processes do not consider the actual state in which an item returns to operational condition. The traditional renewal process considers an "as-good-as-new" philosophy, while a non-homogeneous Poisson process is based on the minimal repair concept. In this work, an approach based on the concept of Generalized Renewal Process (GRP) is presented, which is a generalization of the renewal process and the non-homogeneous Poisson process. A stochastic modeling is presented for systems availability analysis, including testing and/or preventive maintenances scheduling. To validate the proposed approach, it was performed a case study of a hypothetical auxiliary feed-water system of a nuclear power plant, using genetic algorithm as optimization tool.
Keywords: availability; generalized renewal process; genetic algorithms.
RESUMO
O uso de processos estocásticos pontuais para modelar a confiabilidade de sistemas reparáveis tem sido constante para estabelecer medidas de sobrevivência em cenários de falha versus reparo. Entretanto, os processos tradicionais não consideram o real estado no qual um item retorna à condição operacional. O processo de renovação tradicional considera uma filosofia de "tão-bom-quanto-novo", enquanto um processo não-homogêneo de Poisson é baseado no conceito de reparo mínimo. Neste trabalho, é apresentada uma abordagem baseada no conceito de Processo de Renovação Generalizado (PRG), que é uma generalização de processo de renovação e de processo não-homogêneo de Poisson. Uma modelagem estocástica será apresentada para análise de disponibilidade de sistemas, incluindo planejamento de testes e/ou manutenção preventiva. Para validar a abordagem proposta, um estudo de caso foi desenvolvido para um hipotético sistema auxiliar de água de alimentação de uma usina nuclear, usando algoritmo genético como ferramenta de otimização.
Palavras-chave: disponibilidade; processo de renovação generalizado; algoritmos genéticos.
1. Introduction
The use of evolutionary techniques to optimize the testing and/or preventive maintenance schedule of the components of systems is justified because commonly these kinds of problems have a combinatorial nature. This combinatorial aspect is due to the fact that the intention is to obtain simultaneously the best policy for all the system's components in order to maximize the availability of the system under analysis. One must consider the availability of the many components composing the whole system. Therefore, to deal with this kind of problem one must consider a superimposed stochastic process combining all stochastic processes concerning each component.
With the objective of maximizing the availability of safety systems of nuclear power plants, different approaches based on the use of genetic algorithms as solutions search tool have been adopted. The focus of these approaches is to optimize the periodic testing and preventive maintenance scheduling (Sanchez et al., 2009; Garcia et al., 2008; Rao et al., 2007; Martorell et al., 2007; Damaso et al., 2007; Martorell et al., 2006; Garcia et al., 2005; Martorell et al., 2000; Lapa et al., 2000). Some of these references deal with the problem through a multiobjective optimization approach. In the present work, it is not considered other criteria than availability, i.e., safety and cost are not considered as an objective to be optimized.
Therefore, in the present paper, the availability of each component is modeled individually by means of a generalized renewal process (Kijima & Sumita, 1986). The modeling of components and system availability, as well as the genetic modeling, is applied to a hypothetical nuclear power plant safety system, more precisely an auxiliary feed-water system (AFWS). The main objective is to maximize the mission (average) availability along the operation time of the system, through the establishment of a testing policy, using a genetic algorithm (GA) as an auxiliary optimization tool. It is considered that during the tests some minor preventive maintenance actions are performed, contributing to the renewal process.
2. Generalized Renewal Process
A system is considered reparable if, after its failure, it can be restored to operating condition by any procedure other than its complete replacement. The probabilistic models most commonly used to deal with repair actions are the renewal process (RP) and the non homogeneous Poisson process (NHPP). Both are stochastic point processes, that is, the repair times are negligible compared with the time that the system takes until failure, and are not considered in the reliability analysis.
The probabilistic model used in this work to approach the imperfect repair actions is the generalized renewal process (GRP). Nevertheless, it is necessary to define the concept of virtual age (Vn) to have a complete understanding of GRP.
The virtual age, Vn, corresponds to the estimated age of the equipment item after the n-th repair action. Kijima & Sumita (1986) proposed two types of virtual age models. The first, expressed by Equation 1, is commonly named the Type I Kijima model and consists essentially of the idea that the n-th action just acts to repair damage that occur in the exposure interval after the (n-1)-th repair action. Thus, the system's virtual age proportionally increases with time:
where Vi is the virtual age immediately after the i-th repair action and Yi is the time between the (i-1)-th and i-th failures.
The Type II Kijima model, expressed by Equation 2, assumes that the repair acts in order to recover the system from failure of all previous exposure intervals since the beginning of operation. In this model, the virtual age undergoes proportional increments throughout the accumulated exposure interval.
In both Type I and Type II Kijima models, the parameter q can be defined as a repair effectiveness factor, which corresponds to the quality of an intervention involving the component. It is possible to correlate the parameter q with the types of repairs as follows: q = 0 corresponds to a perfect repair, as the virtual age is always annulled after a repair action; q = 1 corresponds to a minimum repair and the virtual age is exactly equal to the actual age; and 0 < q < 1 corresponds to an imperfect repair and the virtual age is a fraction of the actual age.
Figure 1, adapted from Jacopino (2005), shows a schematic representation of the relationship between actual age and virtual age, where yi and xi correspond to the component's virtual age at the moment immediately before and after the i-th repair, respectively.
Other values for the parameter q are also possible, such as q < 0 and q > 1, which correspond to an intervention that causes the item to become "better-than-new" in the first case and "worse-than-old" in the second. However, in this work, it is considered only values in the interval [0,1].
Based on the virtual age concept, it is possible to evaluate the conditional probability of the component's failure by Equation 3:
where F(t) is the cumulative distribution function.
Assuming, without loss of generality, that the times between failures are modeled by a Weibull distribution, the GRP parameter estimation problem consists of estimating the scale (α) and shape (β) parameters of the Weibull distribution, and the parameter q, according to the model shown in Equation 4.
3. Availability Modeling
A preventive maintenance policy has an important role to achieve higher levels of availability in any industrial system. However, in many industrial plants there are safety systems whose components are in standby mode. In standby mode, there is no way to know a priori whether or not the system's components are in operational condition. Therefore, periodic tests should be scheduled so that hidden failures can be revealed and the proper maintenance procedures taken.
To model the availability of a system subject to a periodic testing policy (with minor preventive actions), the following considerations were made:
i. For each repairable component, a binary availability model is considered (available or unavailable);
ii. After a test, the component does not necessarily return to an as-good-as-new condition;
iii. There is a probability that a test will show the failure of a component and that it needs to be repaired (corrective maintenance), while the component remains unavailable not only during the testing interval, but also during the repair time;
iv. The intervals between tests are not necessarily constant.
Based on these assumptions, the model to calculate the availability of components as a function of a testing policy is presented (Damaso, 2006), considering that the system is on standby during all mission time Traditionally, when the downtime due to an intervention can be neglected in comparison with the mission time, one can consider that availability tends to be the same as reliability (Lewis, 1987). In the present case, this consideration cannot be made because the model considers downtimes that, a priori, are not negligible. This downtime is only concerned to the test and it must be noted that the component can be found in a failure condition or can fail during the test period. If one considers the mean time to repair (MTTR) associated with a revealed failure or a failure occurrence during a test period, the downtime will be even higher.
Assuming, without loss of generality, that the component failure process is given by a power law model, and that the times between failures are modeled by a Weibull distribution, then the availability until the first test, A0(t), is given by
where T'i< t < Ti is the i-th test interval.
The cumulative distribution function, F(t), is given by Equation 6 and the failure rate, λ (t), is given by Equation 7.
where α is the scale parameter and β is the shape parameter of a Weibull distribution.
Notice that one can consider any distribution function.
In the availability equation, one should consider the probability of a component being unavailable after the test, and consequently, of needing repair. This probability, PÃ , corresponds to the component's unavailability at the moment immediately preceding the beginning of the test, plus the probability of failure during the test interval, given that the component was available in the beginning. Identifying each PÃ with its corresponding cycle to the i-th test, one has
or
where the notation f(t → x-) means the limit of the function f(t) when t tends to x from the left.
The availability expression can be generalized to the i-th term Ai(t), as
where i = 1, 2,..., n; n is the number of tests; n is the repair rate; and F(t) follows Equation 3.
Thus, the mission availability of the component, A*, is given by
where Tf is the mission time.
4. Genetic Modeling
The genetic algorithm (GA) is used to find an optimized test schedule for each component, in order to optimize the whole system's availability over the mission time. The instants selected by GA for the tests to be performed on a certain component should follow a distribution pattern. For instance, it is reasonable to suppose that the interval between maintenances will become shorter as the system ages. This assumption induces the choice for a model that looks for solutions to periodic test planning with some ordination in the distribution (Damaso, 2006). The benefit of such approach is to limit the universe of solutions to be considered, eliminating those that do not make practical sense. Thus, the search process is more efficient and takes less computational time.
In this work, a proportional distribution is adopted, where the intervals between tests follow a geometric progression (GP). The first interval, ΔT1, starts at the beginning of operation (t = 0) and goes until the final instant of the first intervention, T1:
The subsequent intervals are given as
where i = 1, 2,..., n; ΔTi is the i-th time interval; and r is the proportionality factor (common ratio of the GP).
A unit value of r means that the test instants are distributed evenly, that is, the intervals between interventions are all the same.
The last interval, ΔTn+1 , is set between the final instant of the last intervention, Tn , and the operation end, Tf , and it is given as
Considering that n tests are foreseen during the component's operating time, the expression to calculate ΔT1, as a function of Tf, n and r, is given by
Thus, the other intervals can be evaluated starting from the known value of ΔT1, by Equation 13.
For each component, the following values are determined: number of tests, n; proportionality factor, r; and displacement of the test scheduling, d, the latter allowing advancement or delay of the scheduled tests to avoid undesirable coincidences between component unavailabilities. Such a set of parameters is metaphorically called a phenotype, and it represents a candidate to the solution. The candidate for the solution is codified in a structure denominated a genotype. One of the coding possibilities is to use a superstructure, in which 16 bits by parameter are reserved in the GA employed.
Figure 2 shows the schematic representation of this type of genotype/phenotype structure.
5. Case Study
The present case study entails optimization of the periodic testing policies for an auxiliary feed-water system of a typical nuclear power plant, more precisely of the two-loop pressurized water reactor (PWR) type.
The auxiliary feed-water system (AFWS) is composed of two subsystems: one of them has a turbo-driven pump (TDP) with capacity to feed the two steam generators (SGs), and the other subsystem is composed of two motor-driven pumps (MDPs), each one with capacity to feed one of the SGs. In normal alignment, the pumps use the water supplied from the auxiliary feed-water tank (AFWT). In a typical PWR type plant, the AFWS should carry out the following basic functions:
-
To feed the SGs in case of failure of the feed water system;
-
To maintain the water level in the SGs in order to remove the heat generated by the reactor, while the power level remains higher than 10% or while there is residual heat being generated.
To simplify the analysis, but still maintain the necessary information to validate the proposed methodology, the following considerations were made:
-
The feed water used by all the pumps comes only from the AFWT;
-
The valve sets are grouped in a single valve in each feed line of the generators;
-
The components of redundant groups possess the same structural and operational characteristics;
-
The system is considered to have failed if it is not possible to feed both generators, that is, if at some moment the lines that feed one of the generators (or both) are not available;
-
The SGs are not components of the AFWS.
The objective of the optimization process is to maximize the mission availability of the system, proposing test policies for the AFWT, the pumps and the valves. In this study, the mission time considered is 540 days and the test downtime, for all the components, is established as one day. Considering only the downtime associated with each test interval, for each component, one has approximately 0.18% of the considered mission time. This figure, in order to maintain the study accuracy, cannot be neglected. If one considers the MTTR associated with a hidden failure or a failure occurring during a test period, the downtime will be higher than that mentioned above. Note that these justifications are concerned to the present case study and imply considering availability rather than reliability.
Figure 3 shows, in a schematic form, the simplified structure of the AFWS used in this work.
The equation for the AFWS availability as a function of time is obtained starting from the availabilities of its components and is expressed by Equation 16.
Although it is not explicit in Equation 16, all of the component availabilities are also functions of time. The average value of this function during mission time is the target to be maximized by the GA.
Table 1 presents the parameters to calculate the availability of each component of the system, also considering initial virtual ages (V0). These values were estimated from data obtained from a typical nuclear power plant, using a maximum likelihood approach, as in Yañes, Jolgar & Modarres (2002). Other approaches to estimate the parameters can be found in Moura et al. (2008) and Jacopino (2005).
6. Results and Discussion
The optimization process was carried out through an exhaustive search using a simple genetic algorithms code. The fitness function (Fobj) was applied to minimize the average value of the system unavailability function. Fobj is given by
where ÃAFWS(t) is the system unavailability function, defined by ÃAFWS(t) = 1 AAFWS(t).
The best solution obtained from the testing policy optimization reduces the average system unavailability to a value of 4.1981× 10-4. The values of the average unavailabilities of the components and the system, related to this solution, are shown in Table 2. Notice that this value of the system's average unavailability concerns the optimized one, i.e., after an exhaustive search, which included several runs applying different genetic parameters, the genetic algorithm converged to the mentioned value. Specifically for the best solution, the genetic parameters were: 0.9 crossover probability, 0.01 mutation probability, population size of 100 and 75 generations without changes in the best value of the fitness function as stopping criterion. The crossover type was the PMX as proposed by Goldberg (1989).
Substituting each component's mean availability in Equation 16 produces an unavailability of the whole system of 4.1981× 10-4.
Table 3 displays the test scheduling of the AFWS components. This planning is related to the best solution obtained and indicates the number of days elapsed from the beginning of the mission until the test date (including this last one).
It should be noted that no test was scheduled for the AFWT during the system mission time. This fact is coherent with its constant failure rate, which has small value, indicating the high reliability of the component.
Since limitations regarding costs and maintenance teams were not considered, it is natural that although the valves have lower failure rates than those of the pumps, when the pumps are submitted to tests the corresponding valves are also. Another point to be highlighted is the coincidence with the scheduling for the motor-pump lines. This is compatible with the expected behavior when there is no restriction concerning the minimum point unavailability. On the other hand, the fact that the number of tests for the turbo-pump is the same as for the motor-pumps can be attributed to a possible synchronism effect that makes it advantageous to maintain the same renewal frequency for the redundant lines, since this tendency was observed in the optimization process.
Figures 4 to 11 show the availability versus time for the components, pertaining to the test planning constant of Table 3. Figure 12 depicts the availability versus time for the AFWS, highlighting the mission availability obtained as the best solution.
Figure 13 shows a comparative view of the curves of the AFWS unavailability with the test policy proposed and for a situation without tests.
In the curves of the component availabilities, the instants when the tests happen are evident, because the availability value suddenly falls to zero. In system availability curve (Figure 12), some points of abrupt fall are present, which correspond to the component tests.
However, at no time is the system's availability zero, which indicates that the testing policy always preserves at least an aligned feed line for each steam generator. Considering that this study deals with a safety system, this fact is considered extremely positive.
Figure 13 shows the benefit of the periodic testing policy. At the end of the mission time, the unavailability associated to the condition without testing would reach a value of 1.5456·10-2, while with the adoption of the proposed policy (including the small preventive actions associated with the tests) this value falls to 5.4066·10-4, a reduction of 96.5%.
At this point, it is interesting to mention that, although this has not been investigated here, there is a relationship among the parameters r (proportionality factor), β (Weibull shape parameter) and q (repair effectiveness factor). The closer to 1 the q parameter is, and the bigger the β parameter is, the greater will be the tendency to reduce the intervals between interventions over time (lower values of r). Further work should be carried out to study this relationship.
7. Conclusions
This work presented an approach to model availability of aging systems in standby mode, based on the generalized renewal process, to be used to optimize the scheduling of testing and preventive maintenance.
The genetic modeling employed included defining the number of tests and their distribution over the system's mission time to each of the components, considering that the intervals between tests would follow a geometric progression to compensate for aging.
The results are fully satisfactory and compatible with the system studied, demonstrating the effectiveness of the proposed approach for this kind of problem. Especially for safety systems, the subject of maximizing availability is a crucial factor.
As extension of this work, it is suggested to consider costs and limited number of maintenance teams, besides expanding the system studied to a larger number of components. Other optimization tools can be used in substitution of genetic algorithms.
Further work is underway to consider larger mission times, with the objective of verifying, in more detail the aging effects on system availability and the testing/maintenance policies proposed. Another theme of interest is to apply the approach to systems that have simultaneous standby and operational components.
Acknowledgements
We thank the Brazilian Army Technological Center (Centro Tecnológico do Exército CTEx) and Fluminense Federal University (Universidade Federal Fluminense UFF) for supporting this study. We also gratefully acknowledge the anonymous referees of Pesquisa Operacional for the important recommendations for improvement.
Recebido em 11/2008; aceito em 08/2009
Received November 2008; accepted August 2009
References
- (1) Damaso, V.C.; Pereira, C.M.N.A. & Frutuoso e Melo, P.F.F. (2007). Application of Integrated Optimization to the Control Room Air-Conditioning System of the Angra I Nuclear Power Plant. 2007 International Nuclear Atlantic Conference, CD-ROM, Santos, SP, Brazil.
- (2) Damaso, V.C. (2006). An integrated optimization modeling for safety system availability using genetic algorithms. D.Sc. Thesis, Federal University of Rio de Janeiro, Nuclear Engineering Program [in Portuguese]
- (3) Garcia, P.A.A.; Damaso, V.C.; Sant'Ana, M.C. & Frutuoso e Melo, P.F.F. (2008). Genetic algorithm optimization of preventive maintenance scheduling for repairable systems modeled by GRP. ESREL 2008, CD-ROM, Valencia, Spain.
- (4) Garcia, P.A.A.; Jacinto, C.M.C. & Droguett, E.A.L. (2005). A multiobjective genetic algorithm for blowout preventer test scheduling optimization. In: Proceeding of Applied Simulation and Modelling, Benalmádena, Spain.
- (5) Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning Addison-Wesley Professional, USA.
- (6) Jacopino, A.G. (2005). Generalization and bayesian solution of the general renewal process for modeling the reliability effect of imperfect inspection and maintenance based imprecise data. Ph.D. Thesis, Department of Mechanical Engineering, University of Maryland, Maryland, USA.
- (7) Kijima, M. & Sumita, N. (1986). A useful generalization of renewal theory: counting process governed by non-negative Markovian increments. Journal of Applied Probability, 23, 71-88.
- (8) Lapa, C.M.F.; Pereira, C.M.N.A. & Mol, A.C.A. (2000). Maximization of a nuclear system availability through maintenance scheduling optimization using a genetic algorithm. Nuclear Engineering and Design, 196, 219-231.
- (9) Lewis, E.E. (1987). Introduction to reliability engineering John Wiley & Sons.
- (10) Martorell, S.; Carlos, S.; Villanueva, J.F.; Sanchez, A.L.; Kushwaha, H.S.; Verma, A.K. & Srividya, A. (2007). Test interval optimization of safety systems of nuclear Power plant using fuzzy-genetic approach. Reliability Engineering and System Safety, 92, 895-901.
- (11) Martorell, S.; Carlos, S.; Villanueva, J.F.; Sanchez, A.L.; Galvan, B.; Salazar, D. & Cepin, M. (2006). Use of multiple-objective evolutionary algorithms in optimizing surveillance requirements. Reliability Engineering and System Safety, 91, 1027-1038.
- (12) Martorell, S.; Carlos, S.; Sánchez, A. & Serradell, V. (2000). Constrained optimization of test interval using a steady-state genetic algorithm. Reliability Engineering and System Safety, 67, 215-232.
- (13) Moura, M.C.; Rocha, S.P.V.; Droguett, E.A.L. & Jacinto, C.M.C. (2008). Avaliação bayesiana da eficácia da manutenção via processo de renovação generalizado. Pesquisa Operacional, 27, 569-589.
- (14) Rao, D.K.; Gopika, V.; Kushwaha, H.S.; Verma, A.K. & Srividya, A. (2007). Test interval optimization of safety system of nuclear power plant using fuzzy-genetic approach. Reliability Engineering and System Safety, 92, 895-901.
- (15) Sanchez, A.; Carlos, S.; Martorell, S. & Villanueva, J.F. (2009). Addressing imperfect maintenance modelling uncertainty in unavailability and cost based optimization. Reliability Engineering and System Safety, 94, 22-32.
- (16) Yañes, M.; Jolgar, F. & Modarres, M. (2002). Generalized renewal process for analysis of repairable systems with limited failure experience. Reliability Engineering and System Safety, 77, 167-180.
Publication Dates
-
Publication in this collection
03 Feb 2010 -
Date of issue
Dec 2009
History
-
Accepted
Aug 2009 -
Received
Nov 2008