High reliability organizations

From OSHWiki
Jump to: navigation, search



Raphaël Gallis, Gerard Zwetsloot, TNO, the Netherlands

Introduction

High Reliability Organizations (HRO’s) are organizations that constantly face serious and complex (safety) risks yet succeed in realising an excellent safety performance. In such situations acceptable levels of safety cannot be achieved by traditional safety management only. HRO’s manage safety risks in a specific way, supported by an associated organizational culture. A key concept hereby is resilience. The concept of HRO’s can be used to improve safety and reliability of other industries. There is a clear link with Resilience Engineering and the four abilities for resilient organizations (to anticipate, to monitor, to respond and to learn).

The origin of the HRO concept

The roots of the concept of High Reliability Organizations (HRO) stem from questions as to how dangerous environments such as aircraft carriers could have an excellent safety record. Researchers into this phenom-ena included Karl Weick amongst others [1]. Although they may seem diverse, HRO’s have a number of characteristics in common:

  • They operate in unforgiving social and political environments;
  • Their technologies are risk full and present the potential for error;
  • The scale of possible consequences from thing gone wrong precludes learning through experimentation;
  • To avoid failures these organizations use complex processes to manage complex technologies and complex work.

Roberts [2] initially proposed that HRO’s are a subset of hazardous organizations that have enjoyed a record of high safety over long periods. Definitions that are more recent have emphasized the dynamic nature of producing reliability. Focus nowadays is on thinking of HROs as reliability seeking rather than reliability achieving. Reliability-seeking organizations are not distinguished by their absolute low errors or accident rate, but rather by their “effective management of innately risky technologies through organizational control of both hazard and probability” (p 166-177) [2]. Consequently, the concept of ‘high reliability’ has come to mean that (1) high risk and high effectiveness can co-exist; (2) Some organizations must perform well under very challenging conditions; (3) It takes intensive and continuous effort to do so.

HROs are distinctive because of they try to organize themselves in such ways that the quality of attention across the organization is increased. This leads to enhancing people’s alertness and awareness so that they can detect en act on subtle variations that may indicate potential safety risks (i.e. collective mindfulness). Mindful organizing forms a basis to develop refine and update a shared understanding of the situation and the capabilities needed to act on. Mindful organizing requires that leaders and organizational members pay close attention to shaping the social and relational infrastructure of the organization, and establishing a set of interrelated organizing processes and practices, which jointly contribute to the system’s (e.g., team, unit, organization) overall culture of safety.

The limitations of traditional OSH Management in complex and hazardous industries

Snowden [3] proposed the CYNEFIN framework for risk management. This model distinguishes between four different decision contexts for risk management:*Simple (or understood as such): causes and consequences are known and can be anticipated; decision-making consists of identifying the risk, categorizing and applying known responses;*Complicated: causes and consequences can be determined, but it takes effort; insufficient data can lead to wrong choices in applying responses;*Complex: causes and consequences cannot be determined beforehand (nonlinear relationships). Decisions are made probing, exploration of alternatives, implementing flexible strategies;

  • Chaotic: causes and consequences cannot (per definition) be identified. Take actions and constantly observe the results until one can make sense of the situation.

Simple and complicated are linear systems. Complex and Chaotic are non-linear (e.g. nonlinear cause – effect relationships). Traditional OSH management functions very well in simple or complicated environ-ments, but not in complex or chaotic contexts. This implies that there is a need for new approaches to accident prevention in the changing world of work’ [4].

HRO’s, however, demonstrate that it is also possible to achieve excellent safety performances in complex and chaotic environments. This implies that companies that seriously strive for ‘zero accidents’. [5] and operate under complex or chaotic conditions, it is essential to practice the principles of HRO’s and Resilience Engineering.

Figure 1: The CYNEFIN framework
Source: [3]

The achievements of HRO’s are in contrast with Perrow’s so called Normal Accident Theory (NAT). Perrow [6] hypothesized that regardless of the effectiveness of management and operations, accidents in systems that are characterized by tight coupling and interactive complexity will be ‘normal’ or inevitable as they often cannot be foreseen or prevented. The HRO paradigm accepts that not all risk can be fully foreseen, but that ‘organisational mindfulness’ creates the ability to recognise them timely, and so to prevent accidents from happening Zero accident vision.


The characteristics of HRO’s

The concept of high reliability organisation (HRO) is still in development. Weick describes five characteristics [1] that have been identified as responsible for the "mindfulness" that keeps them working well when facing unexpected situations.

  • Preoccupation with failure: Although the organizations performance is high, there is a constant ‘unease’. A constant questioning of the status quo. Are things different? Did we miss something? Do we need to make adjustments? Not from not knowing, but avoiding complacency;
  • Reluctance to simplify interpretations: This refers to the drive to withstand the natural tendency to accept simple explanations and look no further etc. Even though it is known that this takes time and effort, it is regarded as needed;
  • Sensitivity to operations: The conviction that operational excellence is at the core of the organization;
  • Commitment to resilience: The notion that the organization needs to be flexible enough to anticipate and respond to (sudden) changes. Hence an amount of

redundancy is inevitable;

  • Deference to expertise: This refers to the ability to change the command structure from authority to expertise when the situation calls for this. It implies that professional and tacit knowledge at the shop floor is regarded as more relevant in safety critical situations than the expertise of experts who are not involved in the production process.

HRO’s also share four organizational characteristics that limit the frequency and impact of accidents or failures [2]:

  1. Prioritization of both safety and performance and shared goals across the organization;
  2. A “culture” of reliability (or, better, attitude toward reliability) that simultaneously decentralizes and central-izes operations allowing authority decisions to migrate toward lower ranking members;
  3. A learning organization that uses “trail-and-error” learning to change to the better following accidents, incidents and most important: near misses
  4. A strategy of redundancy beyond technology, but in behaviours such as one person stepping in when a task needs completion.

Roberts [2] emphasises the relevance of three organisational strategies: These are to:

  1. Aggressively seek to know what they don’t know;
  2. Design their reward and incentive systems to recognize costs of failures as well as benefits of reliability;
  3. Communicate consentingly the big picture of what the organization seeks to do, and try to get everyone to communicate with each other about how they fit in the big picture

Another way to describe the characteristics of HROs is [7]:

  1. Process auditing – The organization constantly evaluates itself for unexpected problems that can lead to safety faults much like continuous quality management systems that evaluate for threats to quality. The HRO system easily overlaps into quality as both safety faults and quality defects result from error, it is basically whether error affects a person, product, or process. Process auditing permits HROs to identify weaknesses in their systems.
  2. Vigilance for quality degradation – Performance can drift over time, particularly during a protracted period of successes, and a HRO’s quality may degrade or become inferior. In addition, growth in performance can become level if the organization uses only itself as the quality and safety referent. An HRO compares itself to a referent system, generally in the same field initially. With time and the development of expertise, a successful organization may compare itself to other, and quite different, HROs.
  3. Reward systems – This is the payoff that an individual or an organization gets for behaving one way or another. In the HRO we are concerned with risky behaviour but, while straightforward, reward systems can become nuanced and produce unexpected results. The reward system that exists inter-organizationally also influences the behaviour of organizations.
  4. Perception of risk – Risk must be recognized and perceived in order to be acted upon. HROs recognize that the hidden, latent, or missed risk may be most dangerous and that the most dangerous risks may be the most attractive. Risk, then, must not only be acknowledged but must be acted upon.
  5. Command and control. This refers to the ability of an organization to both lead and command staff and maintain control during a crisis or to begin the control of the environment during a crisis.


Broader application of high reliability in organizations

The HRO approach is not only relevant for high hazard industries in a demanding organisational environment, but can be useful for other organisations as well. Businesses that have to deal with complexity, but are not operating in a harsh unforgiving environment may seek high performance as well. In these organizations, failure, while not catastrophic, may occur associated with poor service rendered to the customer or public.

In this respect, Van Dalen et al. [8] distinguished four conditions for reliability seeking:

  1. An informed culture, where managers and employees communicate openly and share information;
  2. Common points of reference that allow people to negotiate shared values and concepts;
  3. Redundancy, where the organization won’t be immediately stopped when certain parts fail;
  4. Keeping a central focus on relationships, because collective attitude is not a self-evident feature in an organization but a pattern in which individuals carefully and cautiously adjust their actions one another and treat each other with respect.


Resilience engineering

High Reliability Organisations (HRO) and Resilience Engineering (RE) are closely related approaches providing a new vision on risk management, by addressing the capacities of organizations to face mostly unforeseen risky situations while maintaining their essential missions. While in the dominant safety theories ‘deviations from normal’ in the production process are regarded as dangerous, in the RE view this too rigid. Instead, variance should be accepted as a normal dynamic phenomenon, inherent to most production pro-cesses. Variance is not regarded as inherently dangerous, as any improvements also start with ‘deviations’ from what was hitherto regarded as the best situation.

The concept of resilience in itself is not new, but the use of the concept of resilience in safety and risk man-agement or in supply chain management is relatively new. Like HRO, the concept of resilience is most rele-vant in contexts that are complex or chaotic, i.e. in a changing world, where networks replace chains, complex interactions emerge, and traditional solutions fail to take safety to the next level.

Whereas conventional risk management approaches are based on learning from failures and mostly rely on lagging indicators. RE looks for ways to enhance the ability of organizations to create processes that are robust yet flexible. This requires them to monitor and revise risk models, to learn also from ‘positive’ variation, and to use resources proactively in the face of disruptions or on-going production and economic pressures. In Resilience Engineering, failures do not stand for a breakdown or malfunctioning of normal system functions, but rather represent the converse of the adaptations necessary to cope with real world complexity. Individuals and organizations must always adjust their performance to the current conditions. Because resources and time are finite; it is inevitable that such adjustments are approximate. Success has been ascribed to the ability of groups, individuals, and organizations to anticipate the changing shape of risk before damage occurs; failure is simply the temporary or permanent absence of that. Today’s definition of resilience is “the intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions” (p xxxvi) [9]. This definition emphasizes the ability to continue functioning, rather than simply to react and recover from disturbances and the ability to deal with diverse conditions of functioning, expected as well as unexpected.

Modern businesses need to adapt to diverse dynamics in their markets, supply chains and technologies. In addition, businesses need to manage variability within their organizations (for example team performance) continuously. While organizational systems are growing ever more complex and are being dependent of other organizations and systems, this growing complexity of networks and technology introduces new, com-plex risks. Current safety practices and insights are insufficient in coping with these issues. RE addresses these risks by building more adaptive organizations and processes. This not only helps to cope with new complex risks but may also strengthen business continuity by synthesis of safety management processes and the core business. The most important qualities of HRO and RE are that they help organisations to:

  • Deal safely with variety in business processes and with unexpected events;
  • Anticipate and respond to unforeseen risks and changing trends;
  • Adapt to a changing business environment or changes in production processes;
  • Develop lean, smart and flexible safety management;
  • Synthesize safety management and the core business.

To develop these capabilities, management needs to build resilience at all levels of the organisation (organi-sation, departments, teams, individual managers and employees) and also in the interaction with its business partners (e.g. in the supply chains) and networks.

The four abilities of resilient organizations

Hollnagel [10] identified four abilities that are regarded as indispensable for resilience:

  1. The ability to respond: This is an ability of socio-technical systems. It requires preparedness for responding to unforeseen situations, both in terms of risk reduction and prevention, and in terms of quick recovery for ‘just in case’.
  2. The ability to monitor: Variance is regarded as normal, and complexity makes that unpredictable variations may occur. Monitoring is essential for identifying variation and especially those variations that may imply a serious threat. Hereby it is important to us leading key performance indicators. It is important to continuously improve the ability to ‘recognize early warnings’ or ‘precursors’ of serious safety threats. This requires organizational mindfulness that is vital for recognizing the importance of such unforeseen and often ambiguous variations.
  3. The ability to anticipate: When unforeseen variance occurs in the production process, the first question that arises is, is that meaningful for safety? If the answer is yes, it is (probably) safety relevant; the next challenge is to take adequate action before it is too late. Usually there is not sufficient time to consult a higher manager or an off-line expert. This ability therefore requires that first-line operators (as individuals or as a team) are sufficiently competent and empowered to take action. This means they have to be able to influence the process in a way that return to normal variance is the most likely result. This often requires that technical means (tools, safeguards) and adequate real-time information on the state of the process is available.
  4. The ability to learn: When unforeseen situations occur, and are dealt with, this implies new experiences that could trigger a learning process. What was unforeseen yesterday, has happened today; perhaps it will occur again tomorrow or the day after tomorrow. Such scenarios can be learned from for the future. It is also important to learn from what went not fully adequate. In dealing rapidly with the complexities of hazardous processes it is unavoidable that (human) errors are made. These can be evaluated and serve as learning opportunities. It is also important to learn from misunderstandings in communication or from overconfidence in safety systems. Finally it is also important to learn from successes (not only from failure).

Resilience engineering (RE) and high reliability organisations (HRO) represent new and promising perspec-tives for dealing with safety under conditions of complexity and chaos. Serious improvements in safety have been achieved by HROs under such conditions. However, HRO and RE are rather young developments. There is still a range of challenges that are now subject to research and practical experimentation in industrial organisations. These challenges include:

  • How can organisations, teams and individuals be smarter in recognising weak signals?
  • How can organisations, teams and individuals detect a gradual decline of safety margins?
  • How to understand or model the complex system state during unexpected?
  • How to learn better from positive variations in production, and in how far do these help to avoid negative variations?

Conclusions

The two closely related concepts of High Reliability Organisations (HRO) and Resilience Engineering (RE) are innovative ways to achieve safety in conditions of dynamic complexity or even in chaotic contexts. The usual approaches to OSH Management are based on risk assessment and planned risk control. It is often not fully realised that this approach is excellent for simple and complicated (but still linear) conditions, but is not adequate under complex or chaotic conditions, wherein not all risks can be foreseen. Then, risk management cannot be based on planned activities only. In such conditions, risk management needs can only be partly planned in advance, and needs to be flexible in order to deal effectively with unplanned situations. For such conditions, both HRO and RE are promising approaches to realise excellent safety performance. In large organisations (e.g. a hospital) a lot can be achieved with traditional risk management approaches. However, in complex situations these may lead to many procedures, too much focus on the OSH paper work and limited effectiveness. The HRO en RE concepts then may have added value,to develop preparedness to deal with unforeseeable risks and to reduce the paper work. In this way the characteristics of HRO’s and the abilities for RE are relevant for all organisations that have to deal with complex or chaotic contexts or processes. However, HRO and RE are relatively new approaches, which are still in development; there is still a great need for practical approaches and tools based on these theories.


References

  1. 1.0 1.1 Weick, K.E., Sutcliffe, K.M., Managing the Unexpected, Jossey Bass, San Francisco, 2007.
  2. 2.0 2.1 2.2 2.3 Roberts, K.H., ‘Some Characteristics of High-Reliability Organizations’, Organization Science, 1990, pp., 160-177.
  3. 3.0 3.1 Snowden, D., ’Cynefin: a sense of time and space, the social ecology of knowledge management’, 2000, In: Snowden D.J., Boone M.E., A leader’s framework for decision making, Harvard Business review, November, 1-7, 2007
  4. EU-OSHA – European Agency for Safety and Health at work, ‘New trends in accident prevention due to the changing world of work’, 2002. Available at: [1]
  5. Zwetsloot, GIJM, Aaltonen M, Wybo, JL, Saari J, Kines P, Op De Beeck R, ‘The case for research into the zero accident vision’, Safety Science 2013, 58, pp 41-48.
  6. Perrow C., Normal accidents, living with high risk technologies, Basic Books, NY, 1984
  7. Libuser, C., Organization structure and risk mitigation, Los Angeles, University of California, 1994.
  8. Dalen van B., Slagmolen B., Taen R., Mindful organizing: How to manage unexpected events and unwanted processes, Nijmegen, Apollo 13 Consult, 2009.
  9. Hollnagel, E., Paries, J., Woods, D. D. & Wreathall, J. (Eds.), Resilience engineering in practice: A guidebook, Farnham, UK, Ashgate, 2011. (p xxxvi)
  10. Hollnagel, E., Woods, D.,Leveson, N., (Eds), Resilience Engineering, concepts and precepts, Ashgate, 2005.

Links for further reading

Resilence Innovation Lab (no date). Home page. Retrieved on 19 March 2013, from: [2]

Resilience Engineering Association (no date). Home page. Retrieved on 21 March 2013, from: [3]

HSE – Health and Safety Executive (no date)., RR899 – High reliability organisations – A review of the literature. Retrieved on 21 March 2013, from: [4]