According to PHMSA, in the 20-year span between 2000-2019, there were a total of 1,193 injuries resulting from pipeline incidents, with an average of 60 per year. In a startlingly high comparison, there were a total of 282 fatalities in the same time span, an average of 14 per year. Depending on the year and incidents in question, some twelve-month periods saw much lower numbers; however, on the flip side of the coin, some years saw much higher than average.
When the National Transportation Safety Board (NTSB) began studying operators’ procedures in light of related incidents, they found that 25% of surveyed companies did not prioritize their alarms. Likewise, 33% of surveyed companies did not review alarms regularly. In light of today’s regulations, the idea of operators neglecting to prioritize alarms for their controllers and not having procedures to review alarms for system issues is difficult to comprehend; however, the oil and gas industry pre-CRM Rule was one that focused on safety without the integration of human factors.
Human factors are those that result from human physiology and cognition and are to a large degree uncontrollable, though considered somewhat manageable. Examples of human factors are how we scan information, how we respond to certain situations when presented with information, and how we physically and mentally process fatigue. It’s easy to see that these issues can quickly affect a controller’s response at a console. As research developed, the NTSB and oil and gas organizations recognized the significance of regulating these areas of influence in pipeline safety.
When PHMSA came on board, it published what is now the Control Room Management (CRM) Rule, formally known as 49 CFR 195.446 and 49 CFR 192.631 for hazardous liquids and gas, respectively. However, before its historic release, two publications preceded it:
- ANSI/ISA 18.2 Management of Alarm Systems for the Process Industries
- API 1167 Recommended Practice for Pipeline Alarm Management
Working from ANSI/ISA 18.2’s publication in 2009 through PHMSA’s release of its CRM regulations, it is clear that ANSI/ISA and API both set the stage for the first federal regulations for CRM and continue to influence its interpretation today.
Gramercy, Louisiana Incident
Particular interest was taken early on in standard and practice development in alarm management due in part to an incident in Gramercy, Louisiana in 1996. After a pipeline ruptured, SCADA alarmed for high pressure and automatic shutdown of pumps for low suction, followed by a line balance alarm. Rather than fully investigating the situation, however, the controller acknowledged all of the alarms without reading the messages. Because similar alarms in that sequence had occurred with other operations that did not involve a leak, the controller assumed that there was no reason for concern. When a second line balance alarm occurred, the controller began an investigation and determined there was a leak, which at that time had already been spilling gasoline in a right-of-way and marsh for over an hour.
An official investigation found that a contractor had damaged the operator’s pipeline and not properly reported it, which caused the leak. However, the consequences of the leak were exacerbated by the controller’s delay in recognizing the rupture, diagnosing potential causes of the alarms beyond the anticipated scenario, and properly investigating alarms. When the investigation was complete, it was found that over 11,000 barrels of gasoline had been released, which caused the death of fish, wildlife, and vegetation. The total cost in damages was over $7 million, and the incident marked a prime example in the oil and gas industry of much needed change.
ANSI/ISA 18.2 Management of Alarm Systems for the Process Industries
The National Transportation Safety Board (NTSB) responded to alarm-related industry incidents with a set of recommendations, many of which addressed how operators should manage their alarms. Among the recommendations were prioritizing alarms to assist controllers, provide clear alarm descriptions for easier processing, implementing alarm suppression, and managing alarms to avoid controller distraction and confusion. Despite its authority within transportation, however, the NTSB did not have the power to enforce regulations, leading other organizations to step in.
The American National Standards Institute (ANSI) and Instrument Society of American (ISA) combined forces to draft a standard for alarm management, which they released in 2009, titled Management of Alarm Systems for the Process Industries. Relevant to the wide arena of process industries, it was not specific to oil and gas but did create the foundation for future standards by outlining key aspects of alarm management. Among its most influential contributions to alarm management, ANSI/ISA 18.2 introduced the concept of the alarm lifecycle, which guides operators in the development, management, maintenance, and assessment of general alarms systems.
The lifecycle not only provides the various stages of alarm management but showed how the process could vary depending on the alarm in question, providing opportunities for continuous assessment and maintenance along with the cyclical nature of reviewed alarms. By breaking up the lifecycle into stages, ANSI/ISA 18.2 creates a framework within which operators can manage alarms from philosophy to audit. All other alarm management aspects fall within this alarm management lifecycle, which aids operators in connecting the individual steps to a full picture of how to manage alarms in different phases of life. Whether it’s existing alarms that are becoming obsolete or new alarms being introduced into the alarm system, the process assures operators are being thorough in their alarm management.
ANSI/ISA 18.2 also provides performance metrics that give operators key performance indicators by which to evaluate their progress in alarm management. The metrics that outline acceptable levels of manageability are still considered objectives despite more than a decade since their publication in that most operators struggle to reach such goals by nature of their operations. However, perhaps most helpful are the maximum manageability metrics, which provide limits to how many alarms per day, per hour, and per 10-minute period a controller can manage. Such acceptability metrics set the standard for evaluating alarm management progress in terms of how much information a controller could process.
API 1167 Recommended Practice for Pipeline Alarm Management
With the foundation of alarm management already in motion for the process industries, the American Petroleum Institute took on the task of drafting a recommended practice for alarm management in the oil and gas industry. In 2010, just a year after ANSI/ISA released 18.2, API released its first edition of API 1167 Recommended Practice for Pipeline Alarm Management. Keeping control rooms in mind, API provided guidance on several aspects pertaining to alarm management, including alarm design, alarm rationalization, priority and distribution, nuisance alarm handling, and safety-related alarms, aiming to establish best practices for key areas of alarm management in control rooms.
Controllers must be able to understand the type of alarm they are receiving and what that indicates regarding operations and safety. By determining and applying consistent alarm design, operators are able to establish a set of alarm types that controllers can learn and decipher to help guide their reactions to alarms. Alarm design dictates the categorization of alarms and must be consistent throughout an alarm management system. Inconsistency within any aspect of an alarm system can cause confusion, but any inconsistency in alarm design will carry through the remainder of an alarm management lifecycle. To aid operators in employing reliable alarm design that provides the foundation for a more comprehensively consistent alarm system, API 1167 does not give a set list of alarm types or functions but provides a list of examples to guide operators in evaluating alarm categories and functions. Examples of alarm design categories are leak detection alarms, deviations alarms, SCADA status alarms, and emergency shut down (ESD) alarms. The categories that are implemented will be dependent on an operator’s assets and operations; however, as API 1167 asserts, whatever criteria is used to classify alarms in their design, they should be used consistently to maintain reliable alarm design throughout an alarm system.
Rationalizing alarms is a necessary step in the alarm lifecycle presented in ANSI/ISA 18.2 and requires operators to proposed new or alter existing alarms to be consistent with an alarm philosophy. However, rationalization is more than simply confirming that alarms exist for certain situations—rationalization has several steps and can detract from the effectiveness of an alarm system if performed improperly. By justifying the need for an alarm and providing the causes, corrective actions, and consequences of each alarm, operators are able to provide the necessary information for controllers to evaluate and respond to alarms. API 1167 outlines purposes for alarm rationalization, thereby establishing a framework within which operators can determine not only how but when to perform alarm rationalization. Among the reasons to perform alarm rationalization are configuring or reconfiguring alarms on new or existing SCADA systems; identifying and reducing duplicate alarms; ensuring proper and meaningful alarm set points and priorities; configuring alarms on points added or modified; providing detailing alarm information to use by controllers; and for the creation of a master alarm database.
While today’s industry expects alarm rationalization to be a standard practice for operators, previous decades saw alarms that were not always rationalized or even reviewed, which contributed to issues within alarm systems. Such scenarios not only have the potential to provide controllers with inaccurate information but create confusion that could lead to incidents that otherwise could be avoided. Alarm rationalization, however, addresses these issues by requiring operators to verify an alarm is not only necessary but clearly defined with information that communicates to controllers what is occurring, what could be the cause, and what is likely to happen if the alarm is ignored. Moreover, it establishes a foothold for effective assessment of an alarm system.
Alarm Priority & Distribution
As was seen in NTSB research, alarm priority was not employed by all operators, meaning that some controllers, when faced with a set of alarms coming in at the same time, would have a more difficult and timely task of determining which alarms would have more severe consequences and needed to be addressed first. This not only takes away valuable time in which abnormal or emergency operations could be addressed but adds to controllers’ cognitive load. API 1167 emphasizes the significance of alarm priority, specifically to indicate to controllers the severity of a situation as well as the time available to respond and the consequences of not responding appropriately. Now a standard requirement for all SCADA systems, alarm priorities are often classified in no more than five categories, ranging from critical to low, to avoid increasing cognitive load further by requiring controllers to remember more than five categories. Furthermore, as with alarm design, consistency is key in priorities to ensure that alarms can be understood by controllers in a way that does not negatively affect actions or procedures.
While how priorities are assigned is important, how each priority is distributed across all alarms is also a significant factor in proper alarm management. Even with adequate prioritization categories, a critical-heavy distribution can cause major issues for operations. If too many alarms are prioritized as critical, controllers can become overwhelmed as it becomes difficult to determine which alarm to address first. To avoid this issue, API 1167 recommends alarm systems use a distribution of less than 1% as critical alarms, around 5% as high alarms, around 15% as medium alarms, and 80% or more of low alarms.
This distribution should be achievable for all operations, regardless of size; however, that does not mean that all operations will fit into the distribution at all times. As operations are in flux, some distributions are likely to move to either side of each percentage, similar to how performance metrics are an objective that not always met but operate as a goal. As with the KPIs, as long as operations are working to reach the recommended distributions, operators are likely moving in the right direction of alarm management.
Nuisance Alarm Handling
Even the most experienced controller can become overwhelmed with nuisance alarms, especially amid abnormal operations when he is trying to determine the cause of an issue. Discussing primarily chattering, fleeting, and flood alarms, API 1167 explains that each creates a different scenario that can not only distract controllers from other operations but indicates issues or errors within a system or equipment. Chattering alarms have the potential to distract controllers as they repeatedly alarm, while fleeting alarms can distract controllers as they turn on and off again quickly without the controller responding to them. However, alarm floods are likely to do more than distract controllers, overwhelming them with multiple alarms within a ten-minute period.
As a result, API 1167 states that nuisance alarms must be monitored, addressed, and reduced to avoid controller distractions and overwhelm. While how an operator accomplishes these tasks is open-ended, the recognized methods include suppression, shelving, and taking alarms off scan. However, it is important for operators to understand that each of these methods has the potential to create its own issues if not handled appropriately, and, while they might reduce controller issues, if not implemented accurately with training, they can make matters worse. Ensuring the right method is implemented for the right reason is a key part of alarm management.
The significance of safety-related alarms cannot be understated, especially in alarm management where they not only need to be properly identified but monitored to identify and rectify issues as soon as possible. Which alarms are designated as safety-related alarms is up to the operator; however, API 1167 provides criteria for determining what qualify as safety-related alarms. The recommended practice establishes a safety-related alarm framework that includes protecting the public, property, and the environment; indicating failure of safety systems; indicating malfunction of equipment on safety-related preventative maintenance schedule; and indicating conditions threatening product containment of hazardous materials. While the criteria are specific enough to eliminate obviously non-safety-related alarms, they are flexible enough to allow operators to make their own determinations on what do and do not qualify as safety-related alarms within their operations.
The classification of alarms as being safety-related is a major step in alarm management because it brings about a new level of understanding to such alarms. All aspects of such an alarm’s management are automatically more escalated due to the severity of the consequence if they are not properly addressed. As such, the management of safety-related alarms is paramount to addressing safety in the pipeline operations.
PHMSA 49 CFR 1995.446 & 192.631 Regulations
When API 1167 was released, it marked the oil and gas industry’s first step toward establishing best practices for alarm management; however, as are true of standards and recommended practices, they are published after expert drafting and review but do not hold the weight of regulations. In terms of legal matters, beyond contracts between companies where they are used for agreed-upon metrics or procedures, they are not enforceable. Despite the efforts put into the publication and with the backing of API’s reputation, operators were not required to follows API 1167’s recommendations.
But it was only a matter of months before PHMSA released the first set of control room regulations, known as 49 CFR 195.446 and 192.631 for hazardous liquid and gas, respectively. This move ushered in legal requirements regarding CRM for pipeline operators. While the regulation encompassed a range of CRM topics, one section continues to be dedicated to alarm management, outlining the requirements of qualified operators in managing and maintaining their alarm systems. Though they preceded the PHMSA regulations, neither ANIS/ISA 18.2 nor API 1167 are specifically referenced in the regulations; however, their influence and relationship with the regulations are easy to identify.
Alarm Management Plan
PHMSA’s regulations require operators to have an alarm management plan, and while there are no specifics provided on what that must contain, time has shown that PHMSA prefers it to incorporate aspects of ANSI/ISA 18.2’s alarm lifecycle, such as an alarm philosophy. All other aspects of the alarm management requirements must be discussed in the plan as well. Additionally, PHMSA requires that the alarm management plan be reviewed at least once every calendar year to ensure its effectiveness within operations, which also fits into the alarm lifecycle phase of assessment.
Similar to API 1167’s discussion of safety-related alarms, PHMSA’s requirement addresses the review of safety-related alarm operations as they pertain to SCADA. The implication here is that the review process, which much like the requirements of an alarm management plan are not clearly defined but allow flexibility for operators, requires operators to assess all aspects of safety-related alarms, including categorization, descriptions, procedures, and so on. So, while the regulation has only a single item devoted to them, the impact of the regulation is wide on safety-related alarms in pipeline operations.
Monthly Review of Safety Points
Depending on the operator, some procedures allow for alarms to be taken off-scan or be inhibited, as is the case of nuisance alarms, which is discussed in API 1167. As the recommended practice implies, the practice can be dangerous if not well regulated within a control room, which is why the regulations require a monthly review of safety-focused points that have been taken off-scan and do not report into SCADA as well as those that have been inhibited and do not alarm to the controller in SCADA. PHMSA went on to add to this set by including safety points that generate false alarms or have had forced or manual values beyond expected activity time. This review criteria solidified the need to ensure safety-related points were at the foremost of operational considerations and established a framework within which to confirm the need for timely review in alarm management.
Verify Safety-Related Alarms Annually and with Changes
While establishing review of safety-related alarms and their points is a necessary step in proper alarm management, concern crops up when it comes to changes to SCADA as well as in the field. Communication is the key to safety, but there are times when it is not executed fully, leaving some operational groups in the dark. When it comes to changes to field equipment or SCADA, this can spell disaster if the control room is not included in discussions. PHMSA recognized this issue and mandated that safety-related alarm set point values and descriptions are verified when field equipment is changed or calibrated. A point-to-point, as the process is known, verifies that any changes to the equipment are reflected in the SCADA data, ensuring that controllers are seeing the correct information. Understanding that not all operators will require a point-to-point annually as the result of equipment changes, the regulation added that a point-to-point must be performed at least annually, which is covered if performed for replaced or recalibrated equipment. As with the monthly review of the alarm management plan, the point-to-point requirement fits into ANSI/ISA 18.2’s alarm lifecycle via the management of change phase, bringing together changes in the field with control room updates to ensure controllers are able to react appropriately.
A controller’s workload consists of multiple duties, including time spent interpreting and responding to alarms; however, because this time is not solely devoted to responding to alarms and given that alarm numbers can quickly become overwhelming if not properly managed, PHMSA included in the regulations a section addressing workload. Operators are required to monitor controller activity by looking at not only what controllers are responding to but how often alarms are sounding and what’s required of controllers to adequately process and address the information. The regulations’ objective is to assure that controllers have the necessary time to analyze alarms and react appropriately.
How an operator is to know what constitutes sufficient time, however, is tricky. Operations depend on the individual assets, and a controller’s workload depends on how often product is moved, how complex an asset is, and how many assets are on a given console. But ANSI/ISA 18.2 outlines the objectives for alarm workload in its performance metrics by providing what it calls maximum manageability. Anything beyond these metrics is considered unmanageable for controllers. If an operator is beyond these metrics, it is presumed that controllers are unable to adequately respond to all alarms; however, being below them without having actually reached the acceptable manageability metrics does not put an operator in violation. ANSI/ISA 18.2’s lower-end metrics are the objective but not necessarily the rule, and while PHMSA does not state that the standard should be used as the measuring stick, the industry has accepted it as such.
Alarm management standards, practices, and regulations have developed tremendously in the past decade, looking at not only how alarms are best managed amid oil and gas operations but the necessary steps for supporting controllers’ responsibilities and promoting pipeline safety. With the introduction of the alarm lifecycle concept via ANSI/ISA 18.2 along with performance metrics, the stage was set for regulations that helped establish the requirements within the standard’s framework. Likewise, with API 1167, references for why metrics should be met, how alarm rationalization would be achieved, and how often review processes should be completed were available. The standard and recommended practice made a solid foundation for the first in CRM regulations that not only addressed alarm management but considered aspects of human factors that had direct effects on safety, such as workload. As human factors continue to play a major role in the management of control room safety, available standards and recommended practices are reviewed and updated, with regulations often following suit.