operations & maintenance
Follow the jumbo jet2 July 2010
The US regulations that cover nuclear new-build no longer allow utilities to build plants first, and licence them later. But they still do not go far enough in specifying how those plants will be maintained for trouble-free operation. Nuclear power should look to the commercial airplane industry, where such systems have been working for more than 40 years. By J. K. August
The U.S. nuclear design framework spreads across many rules, guidelines, standards and policies, starting with Part 50 of Title 10 of the Code of Federal Regulations (10 CFR 50). Some of these overlap or conflict while others are ambiguous, complex or just unclear (see box, right). Within this framework, utilities built one-of-a-kind plants differently site-by-site, interpreting the same rules and requirements differently one at a time. Designers used different approaches and methods for operations and maintenance, even their documentation. As a result, even U.S. plants of similar design evolved highly customized methods at different sites.
Now U.S. nuclear operations depend uniquely on many different programmes. An effective design foundation, in contrast, would give plants clear, standard processes and common practices to operate within. That requires a simple, common framework. To construct an efficient, common framework requires a clear understanding of related safety information.
Reliability supports Probabilistic Risk Assessment (PRA) and operating goals. Reliability not only validates the license, but also ensures plant operations comply. In practice, reliable equipment doesn’t fail unexpectedly. However, to assure reliability requires effective tasks that support systems, structures and components (SSC) to provide overall plant reliability. These depend on the (1) critical SSC functions, which can vary based on application; (2) the SSC risks, which depend on inherent characteristics of the SSC and how it is designed, constructed and materials– “the design” of the SSC; and (3) the SSC context, which includes factors like how it is used (continuous duty or cycled), operating environment and physical installation stresses like abrasion wear or high-temperature baking. All of these factors require evaluating SSC for their installed application functions based on engineering analysis of likely failure mechanisms.
In the 1980s, accumulating delays in nuclear new-build construction prompted drafting of a new US regulation, Part 52, which changed the law to require licencing to occur before construction. Surprisingly, the only formal reliability assurance programme (RAP) requirement in Part 52 today is identifying in-scope SSC (Section 17 of the Standard Review Plan in NUREG-0800). Guidance is silent on how to find effective RAP tasks to perform. Now, a master equipment list (MEL) meets that regulatory requirement. To actually operate a plant reliably, that is incomplete. Current rules let Generation III/IV plant workers develop plans using the same disjointed reliability approaches used in existing Generations I and II. Then, incomplete monitoring and maintenance presented plants with substantial operational risks. One need only browse the?US Nuclear Regulatory Commission’s (NRC)’s key generic communications, NUREG-0737 or read the Kemeny report on the Three Mile Island accident. Thirty years of experience starting up and operating nuclear plants have shown that incomplete operating guidance poses grave risks.
For new nuclear construction, Part 52 could significantly improve safety performance by assuring certification applicants have effective plans at startup. It is obvious that lenders cannot risk billions on nuclear plants that can’t be licensed. What is less obvious is that banks will not lend billions to build plants that don’t run reliably. In the past, the public utilities commissions (PUCs) paid off historical nuclear costs (and allowed overruns) as ‘stranded assets,’ plants that couldn’t be competitive on their own. Generally speaking, the plant costs for these stranded assets were allowed to be recovered preferentially through electricity rates. Thus, Part 52 addresses a minimum hurdle; nuclear plants must also run better than projected. An effective RAP gives financial certainty. It provides the implementation guidance to meet Part 52’s laudable reliability goals.
Improving Part 52’s RAP would provide other benefits, too. By providing details, projects can better estimate costs for lenders to commit new plant funds – the last major barrier to U.S. nuclear construction. By fully disclosing plant requirements for the ITAACs, which are objective measures for compliance, costs for construction and operations become well known. These costs help lenders project realistic cash flows in competitive, non-utility finance environments following startup. RAP development will actually lower nuclear costs, based on experience of maintenance in the airplane manufacturing industry. Implementing similar design control programmes from 1965 through the present, airlines saw operating costs decline. These they attributed to better design and higher availability from designing scheduled maintenance plans over an airframe’s operating life cycle.
Air Transport Association of America standard MSG-3, “Operator/ Manufacturer Scheduled Maintenance Development” (2004) is effective to develop general maintenance programmes, yet the U.S. nuclear industry is unaware of it. (One similar standard in the nuclear industry is Part 50.49, Environmental Qualification of Class 1E Electrical Equipment Important to Safety, developed in the early 1980s. It requires equipment to be qualified for its environment and certified for lifetime, subject to replacement under rigorous controls. However, its scope is limited to Class 1E electrical equipment.) MSG-3 works so well, in fact, because it is an aviation/aerospace standard that was first developed 40 years ago (in 1968). It tries to answer a question that is fundamental to RAP: “What is the most effective way to develop a maintenance programme (and deliver it) that assures it covers everything, yet is cost-effective?”
The airplane standard
The intent of MSG-3 is to develop a maintenance programme for each type of aircraft before it enters into commercial airline service. MSG-3 uses a logic flowchart approach to maintenance development. To begin the process, users require a list of all the equipment in the plane – an equipment list. Suppliers of the equipment must provide not only the equipment design information, but also a breakdown of all the parts in each of the supplied components and their function. This enables a developer to begin a systematic review of the equipment, system by system. With the design information, engineers can study the contribution each item makes to the plane, and evaluate risk consequences to the plane’s mission on several levels. The highest consequences have a direct safety risk, while the lowest merely influence cost. A systematic review of component failure modes (based on existing knowledge, experience or comparison analysis for components that have never seen service) completes the process.
For new safety-related nuclear equipment, everyone understands the relevance of standards like ISO 9001, “Quality Management Systems,” or (U.S.) NRC 10 CFR 50 Appendix B (also ASME OM-1), “Quality Assurance Criteria for Nuclear Power Plants.” They give ample assurance that a part or component specified for safety service can capably perform its design function according to its specifications. Where all quality and reliability programmes struggle is to make the connection over time. Not only must parts and components meet requirements when new, they have to meet those same requirements over time. While a quality assurance programme (QAP) assures the first goal, the purpose of a reliability assurance programme (RAP) is to meet the second.
Industry interprets the RAP historically. In the past, vendors gave the licensees a master equipment list (MEL), and they determined how to maintain the equipment. Clearly, licensees had to follow their license and technical specifications, including surveillance test requirements. (Surveillance tests, an element of a scheduled maintenance plan, provide reasonable assurance of performance by setting appropriate test intervals for redundant or other systems with hidden failures whose chances of failure are unpredictable). However, beyond that stipulation, licensees could develop scheduled maintenance to perform (or not) without specific requirements, as they had before with their fossil plants.
But the standard of excellence called for by the Institute of Nuclear Power Operations, on the other hand, requires more than the minimum. RAP should provide actionable guidance for operators to make equipment reliable. An effective RAP should reasonably answer, “What activities make this SSC equipment reliable?” Identifying reliability tasks has proven difficult for nuclear industry experts, including the NRC and its independent safety oversight group, the Advisory Commission on Reactor Safeguards (ACRS).
For example, the Maintenance Rule (10 CFR 50.65), rolled out fully in 1997, does not develop specific guidance. The Maintenance Rule lets plants set their own ‘goals’ – success performance criteria. The Maintenance Rule standard goals only require that plants meet two goal metrics to pass the mark: number of maintenance-preventable function failures (MPFFs) (<2/quarter per system) and safety system unavailability (<2.5% per quarter). A plant not meeting the goals would seem to be violating its PRA, so action will have to be taken. But the NRC’s solution – to propose adjusting the metrics – would not include any information about how to do so. Similarly, the International Atomic Energy Agency’s approach, as published in IAEA-TECDOC-1264, “Reliability Assurance Program Guidebook for Advanced Light Water Reactors,” simply echoes the NRC’s position about the importance of performance monitoring. It requires classic computerized management maintenance system elements, yet fails to address how plants develop the tasks that must populate it.
Without different guidance, plant staff will perform as they have in the past. The same unstructured development iteration will take ten or 20 years to optimize, again. Because the U.S. nuclear industry believes that nuclear maintenance is too customized and complicated, it doesn’t consider the benefits of integrated maintenance processes. It is too independent and focused on the past to certify results for certified designs, which would be of great benefit. That is partly because these methods were not well-documented in the era in which Generation I/II plants were designed and built. It is also partly due to the closed culture of the nuclear industry, which (in the U.S. at least) is only recently migrating towards technologies like digital controls and rotary air compressors that have been used elsewhere since the 1980’s.
NRC specifies RAP requirements in two parts, design (D-RAP) and operations (O-RAP). These two parts reflect two sources of reliability; design (including construction) and operations. Combined operating license RAPs include site-specific requirements. Eventually, when construction is complete, a plant goes into operations. At that time, RAP translates into existing operating requirements, especially the Maintenance Rule. This transitions D-RAP into O-RAP.
O-RAP includes operating requirements like the Maintenance Rule and technical specifications reporting requirements, according to a modern industry-regulator consensus. D-RAP has two components, (1) the certified design and (2) the site-specific interface that “parks” a certified design on a physical plant site. The ways the rules are now, the owner is responsible for the final implementation of both.
The division between D-RAP and O-RAP splits the responsible organizations (designers and owners) and scopes (certified design and site specific COL). This continues historical RAP confusion. The additional work to develop guidance for reliability plans would be complex to integrate into an operating NPP. However, the responsibility for reliability plan development under standard nuclear design certification should rest with the equipment supplier. Thus, for certified plant designs, D-RAP development begins with certification, just like PRA. An architect-engineer, as the owner’s agent, must still contribute specific plans. Together, designers and operators (owners) must still translate their finished design’s D-RAP into the O-RAP management systems to implement the overall scheduled maintenance reliability plan.
Maintenance is important
Design reliability assurance comes from knowing what roles SSC play in failures, and developing plans that address them with proactive tasks. This improves the understanding of probable failures, their symptoms and causes, the nature of their occurrence and options for how to develop the tasks that manage them best. These are just the elements that develop a scheduled maintenance plan.
Traditionally, maintenance plans had PM (preventive maintenance) and CM (corrective maintenance) components. ‘PMs’ (PM work orders) were scheduled; ‘CMs’ (CM work orders) were initiated by people finding problems–usually operators. PMs should have been planned, but often were not. They should have had actionable guidance with acceptance criteria to guide the performer. Often, they did not. PMs ran the full gamut, but typically covered tasks such as oil changes, checking levels or filter replacements.
Predictive maintenance takes this a step further. Now engineers could use objective criteria that avoided tearing equipment apart, in many instances. Consider, for example, the task ‘check vibration level on air compressor (<3.0 mm)’. This type of request moves beyond a general concept of ‘doing things on time’ to condition assessment and condition-directed maintenance. The scheduled part is the condition assessment. The consequent action depends on the outcome. If the compressor’s vibration exceeds 3.0 mm, then it needs to be rebuilt. The action part of the maintenance is based on the condition, hence the name. Condition-based maintenance differs from corrective maintenance in that the initiating action tends to being highly structured. It originates from the highly-scripted actionable parts of a scheduled maintenance plan or operations monitoring, both of which have clear guidance and acceptance criteria. Corrective maintenance, by contrast, does not. It presumed operators would just know when something was wrong, and initiate a work order to fix it. In a modern programme, an equipment failure discovered during a PM is only important if the system it supports has a functional failure as a result. In modern programmes, those kinds of failures almost never do. Thus, I could have a very effective, scripted scheduled maintenance programme that did a high fraction of maintenance as condition-directed maintenance. In aerospace, this represents an ideal. As long as I have no functional failures (or that is, manage them well), my programme works effectively.
The discovery of a failure as part of this programme is actually highly planned, and shouldn’t be charged as a maintenance failure. But from a traditional view, such as from corrective maintenance, it would. Both the US NRC and Institute of Nuclear Power Operations (INPO) use a traditional maintenance model that is obsolete. To them, these failures require corrective maintenance, in fact, too much corrective maintenance. According to them, an ideal state is one where nothing ‘breaks,’ and so nothing needs fixing with corrective maintenance.
RAP: Essential elements
A good RAP is very simple, in principle. It is just a list of the equipment that supports reliability, ranked in the context of the nuclear plant installation, and the specific tasks (such as checks, tests, inspections, and simple rework/repair tasks), including implementation schedule, required to ensure reliability. To implement RAP, industry should certify licensed reactor designs that specify their scheduled maintenance, as airframe suppliers have done for 40 years.
To close the RAP gap requires developing and implementing a scheduled maintenance plan that addresses all critical function safety characteristics. To develop and implement a plan, reliability engineering developers need three tools:
1. A process that identifies critical equipment, its characteristics and causes of degradation
2. Processes that develop and integrate effective task controls, efficiently
3. Scheduling systems to implement and control the resulting plans
Critical equipment identification is an established part of nuclear safety design. Existing rules and standards (ANS 53-1, Safety Design of Modular Helium Reactors, for example) provide processes. MSG-3(2004) provides a method for identifying and integrating effective tasks into organized work and procedure content, and there are others. Scheduling software implements and tracks finished reliability assurance plans. Two are IBM (MRO) Maximo and Ventyx (formerly Indus) Passport. These are not endorsements, but intended to provide readers information and representative examples for review. All are available today, and similar products manage operating rounds. Software to develop and load the routines, tasks and finished work order content into these systems is also available. To integrate and automate all design basis requirements only requires the agreement of nuclear equipment suppliers, regulators, owners and their lenders to assure complete comprehensive reliability plans.
US legislation splits RAP into design (D-RAP) and operational (O-RAP) elements. D-RAP results must not only include a list of high-risk equipment, but also a plan to address risks. Actionable tasks that monitor and maintain the equipment will address D-RAP. That programme has three parts:
1. Operations monitoring rounds (shiftly, daily, weekly)
2. Scheduled maintenance tasks (organized as “work scopes” or task blocks for scheduling efficiency), loaded into a CMMS system scheduled to manage largely performed by maintenance and other technical support groups
3. Action plans for equipment identified as failing or failed in the steps above. That is, once a piece of equipment is in a failure state, how must the plant react? Technical specifications already address this at a high level. They do not provide detailed consequences of failure at the equipment level and how to react to it at the equipment level.
The most fundamental part of any operational RAP is a manufacturer-approved scheduled maintenance and monitoring plan. Certainly, owner-operators will need to revise and update that plan over the 60-year life of the plant as technologies and equipment change. And regulators will also need to monitor it – the essence of the Maintenance Rule.
To me, it makes no sense why ten plants of the same fundamental design would each have a separate scheduled maintenance programme. I feel so strongly about that this is a safety issue, I have offered to debate the U.S. NRC, and have presented it to the Advisory Committee on Reactor Safeguards (the U.S. NRC’s independent nuclear safety reviewer) as an unresolved safety issue. I am still waiting for their response.
JK August, professional engineer, VP of operations at nuclear reliability plan developer CORE, Inc. Email:firstname.lastname@example.org, tel +1 (303) 425-7408. August has written two books on RCM: Applied Reliability-Centered Maintenance (2000) and RCM Guidebook (2003). He is participating in ASME and ANS committees developing RAP standards. The author wishes to thank J.J. Hunter, SRO (ret) for his help reviewing this article.
ANS- Standards Board, Nuclear Facility Standards, Subcommittee 28, ANS53.1, Nuclear Safety Criteria for the Design of Modular Helium-Cooled Reactor Plants (one of two standards that are in draft)
ASME, ASME-RAM-2010, Reliability Program Design Standards (the second of two standards that are in draft.)
10 CFR 50, Domestic Licensing of Production and Utilization Facilities, www.tinyurl.com/y4maqgw
10 CFR 52, Licenses, Certifications and Approvals for Nuclear Power Plants, www.tinyurl.com/y3v9sgn
Hunter, J.J. SRO (ret). Project manager.
IAEA-TECDOC-1264, "Reliability Assurance Program Guidebook for Advanced Light Water Reactors", December 2001. www.tinyurl.com/y2kja8j
Kemeny, John. Three Mile Island: A Report To The Commissioners And To The Public, 1979. http://www.pddoc.com/tmi2/kemeny/.
Rogovin, Mitchell. Three Mile Island: A Report To The Commissioners And To The Public, 1980. (www.tinyurl.com/dlqgcm for part 1; parts 2, 3 and 4 are available by changing the last digit on the long real URL from 150 to 151, 152 and 153 respectively).
Moubray, John. Reliability-Centred Maintenance, Butterworth-Heinemann; 2nd edition (May 1999)
MSG-3 Operator/Manufacturer Scheduled Maintenance Development (2004), ATA. (www.tinyurl.com/y5edylo)
NUREG-0737, Clarification of TMI Action Plan Requirements (1980), www.tinyurl.com/y4ucr8t
NUREG-0800, Standard Review Plan for the Review of Safety Analysis Reports for Nuclear Power Plants: LWR Edition (formerly issued as NUREG-75/087). www.tinyurl.com/y4jasop SAE JA1011 "Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes referencing Message Specification"
Rogovin, Mitchell. Three Mile Island: A Report To The Commissioners And To The Public, 1980. (www.tinyurl.com/dlqgcm for part 1; parts 2, 3 and 4 are available by changing the last digit on the long real URL from 150 to 151, 152 and ¦153 respectively).
SAE JA1011 "Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes referencing Message Specification"
Smith, Anthony M. Reliability-Centered Maintenance, McGraw-Hill (1992).
|In the United States, Title 10 of the Code of Federal Regulations, Part 52, licenses new nuclear plants to be constructed and to operate by its so-called Combined Operating License (COL). Construction of a Part 52 certified design under a combined licence requires building a certified design on a licensed site. The owner must meet site-specific criteria beyond the scope of the certified design.In 1989, the RAP requirement was identified "a recognition that if a plant didn't meet its probabilistic risk assessment (PRA), the licence assumptions were invalid. Therefore, new plants needed some way to meet their PRA" and that was the RAP. A licencing document known as the Standard Review Plan (SRP) provides the US NRC approach to verify that a rule is followed in the field. It lays out the requirement for a RAP. Section 17 of the Design Control Document (DCD) addresses a certified design's Quality Assurance Program (QAP), which specifies a RAP in Section 17.4. A prospective licensee's ITAACs (Inspections, Tests Analyses and Acceptance Criteria: objective, measurable licensing criteria) verify final conformance to specifications that assure safe operations. Completing the ITAACs allows the finished certified licensed plant to operate. One ITAAC addresses a key part of design the RAP.Other key rules and their associated standards within the predecessor rule 10 CFR Part 50 are: 50.34, Contents of applications; technical information 50.36, Technical specifications 50.49, Environmental qualification of electric equipment important to safety for nuclear power plants 50.65, Requirements for monitoring the effectiveness of maintenance at nuclear power plants 50.55a, Codes and standards|