Putting people in the mix: part 221 July 2014
A finding of ‘human error’ is really just the tip of the iceberg in understanding why people operating in complex systems such as nuclear power plants did what they did. By Ken Ellis
In general, operation of a nuclear power plant is an exercise in risk- informed decision making -- or it should be. Risk is the summation of the individual inputs; it may be the product of improperly-aligned or poorly-integrated activities. Managers who have good risk awareness can see clusters of precursors building and stop work in time. (But to see them, they must be given enough time to get out into the field and enforce standards, rather than being stuck in meetings dealing with administrative issues).
Some accident precursors include:
- Leadership, or lack thereof
- Night-time operations: we are not a nocturnal species
- Time pressure (real or perceived)
- Unofficial messages from supervisors or corporate offices
- Acceptance of abnormalities
- Sense of invulnerability
- Lack of a proper identification of risk (probability times consequence)
- Insidious degradation of standards.
Complacency is a big problem in successful plants. Good performance and track records leads to confidence that mistakes won't happen -- a sense of invulnerability, which leads to standards slipping. When people say, "We have so many safety systems to protect the reactor that it can't break down," you know that standards are slipping, because those systems should not be relied on. Complacency and far too much focus on internal, day-to-day concerns can lead to reactive decision-making, and that could lead to a far more significant event.
Reactive decision-making is a risky business; you only know what is an acceptable risk once it has been encountered (see Figure). But when faced with unusual circumstances, and there is uncertainty about whether or not to proceed, someone has to make a decision. In the nuclear industry, these situations are often covered by infrequent test or performance evolution (ITPE) procedures and documents such as INPO SOER 91-01. These procedures demand a cross-functional team to meet and discuss the risks and consequences, create procedures if necessary, and develop mitigating responses should something go wrong.
In complex systems, safety margins can be compromised by many kinds of common human behaviours. Deviations from procedure can become normal; there can be a kind of insidious acceptance of slowly-degrading standards. The risk appetite of the organization is personified in the decisions of senior managers and supervisors; is the corporation aware of the safety implications of its work planning? Individual staff members' tolerance to and rationalization of risk, and their technical depth (or lack thereof) can also contribute to promoting unsafe work practices.
The actions of leaders set the tone for the company's workers; actions speak louder than words. As WANO's 'Principles for Excellence in Human Performance' says, "A leader's values and beliefs are readily recognized by simply observing his or her actions [in terms of] what is paid attention to, measured, or controlled, and reactions to incidents or crises..." [WANO-GL 2002-02].
The goal is to narrow the band of what constitutes acceptable risk. The work space within which the human actors can navigate freely is bounded by administrative rules, and functional and safety-related constraints. Rather than striving to control behaviour by fighting deviations from a particular pre-planned path, the focus should be on the control of behaviour by making the boundaries explicit and known, and giving workers opportunities to develop coping skills at boundaries.
Standard nuclear industry guidance such as the popular Canadian 'Event-free Tool' lists good practices for routine work that can help improve safety:
- Pre-job briefing
- Procedure use and adherence: what should be the worker's attitude be: are the procedures for information only? Should they be referred to periodically? Or should they be followed verbatim- and what about placekeeping during procedure execution?
- Three-way communication: instructions are confirmed at each stage to reduce the risk of misunderstanding. After having received a specific direction (one), the worker replies, restating his or her understanding of the command (two); then that understanding is
- confirmed (three).
- Conservative decision-making: any action that is taken, particularly
- in an unknown situation, is progressively safer.
- Questioning attitude: Never proceed in the face of uncertainty. Seek advice and get clarification.
I have enormous faith in the ability of people to learn and grow. I believe in people -- with our ability to assess and adapt to changing environments -- to help keep complex systems safe. We are the only ones who can find our way through the maze of pressures that exist in actual operating conditions, not the theoretical worlds that exist in our blueprints and computer models.
For more than 30 years, the question that has sat in my personal tool box, every bit as valuable as the hard hat on my head or the procedure in my hand, is "What's the worst that could happen?" It's a question that can never be fully answered, for there are no limits to our imaginations. But it gets people to understand, if the step that they are working on is critical. By asking what could go wrong, we try to use all of the operating experience at our disposal and the detail of the procedure and staff experience at hand to assemble a potential accident. Armed with that knowledge, we can prepare a mitigation strategy. The process is a bit like assembling a jigsaw puzzle without knowing the picture; the pieces are parts of processes or parts of systems, the picture is an accident, and the goal is to prevent it forming from the given pieces. Post-accident investigations work the other way around: they break down how the big picture was built up from process or system details. It is a sad truth that accident investigations often conclude that the accident was lying dormant within faulty organisational procedures, waiting for an opportunity to emerge.
Suggestions for managing human-performance risk
- Frequently talk about risk, about complexity, and about their relationships
- Carry out gap analysis between expectations and observation of field behaviour. There often is a variance between expectations and reality, as expressed by the epithet 'you get what you inspect, not what you expect'
- Actively solicit divergent opinion to avoid intentional blindness
- Debate what the acceptable boundaries should be
- Discuss antecedents for people's behaviour, including unofficial corporate messages
- Never allow doubt and uncertainty to go unchallenged
- Demand proof that a system is sufficiently safe to operate; not sufficiently unsafe to shut down
- Demand that operational decision-making (ODM) forums discuss the issues in this box when faced with an unusual operating challenge
- Beware that if a root cause investigation concludes with a finding of negligence, only part of the story has been revealed
- Expand the scope of defence-in-depth strategies to include the concept of complex systems and their inherent non-linearity
- Create a nuclear safety culture monitoring panel that meets quarterly.
It should discuss the issues mentioned in this list, and also actively attempt to head off accidents by considering how actions or systems might act as precursors.
About the author
Ken Ellis is managing director of the World Association of Nuclear Operators. Initially an aerospace engineer with the Canadian Air Force, he has since had 32 years of nuclear power industry experience in maintenance, operations, engineering and senior management.