[Note: this text version is only for web crawler.
Click HERE: PUBLICATIONS to access high quality PDF version ] 

 

 

Title: Testing Redundant and Backup Systems

Deck: Locate missed opportunities to validate redundant and backup control systems

= = = = =

Many of us are inclined to ignore the time honored adage of, "If it ain't broke, don't fix it." Sometimes our handyman instincts can't leave well enough alone. Yet that same intuition also encourages us to believe, "If it ain't broke, don't test it." This reluctance is especially evident when it comes to testing the redundant and backup features of our critical control systems. At best, failures resulting from inadequate testing will only cost you lost production. At worst, insufficient testing can cost you your job.

Get hip to R&B

Although the terms "redundant" and "backup" (R&B) are often interchanged, each represents a different aspect of reliability design. A redundant system uses multiple similar components in a configuration that permits simultaneous performance of the same (or similar) function. A redundancy failure causes no reduction of system operation or capability. Simple examples include parallel power supplies and series shutdown valves. A more sophisticated example is a redundant PLC system: a microchip fails, a warning light comes on, and production continues normally. A key aspect of redundant systems is that multiple components do the same job at the same time.

A backup system takes a different approach to reliability by providing an independent means of performing all or part of the overall control function, usually in a "primary" and "standby" configuration. Manual or automatic transfer mechanisms determine which component takes the lead. For increased reliability, backup systems can use alternate configurations and technologies to improve resistance to single point and common mode failures. For example, a simple local controller that can operate without assistance from a plant-wide control system is a common instance of backup technology. The local system may lack bells and whistles, but at least it can maintain safe production should the primary system go offline.

Backups can also be found within the control system's support structure. A frequent example is an uninterruptible power supply (UPS) that delivers reliable energy to many electronic control systems. If the primary power goes away, the UPS instantly takes over to maintain essential control functions--at least as long as the batteries hold out.

Note that redundancy and backup are not mutually exclusive. Many control systems contain separate elements of both, and some even combine them into "redundant backup" systems with high levels of fault tolerance. Such systems include two or more similar control entities, each having full capability, but based on different technologies. Having two independent and diverse control systems is often considered the best protection against unanticipated failures.

In addition to improving reliability, R&B controls can simplify routine maintenance of an operating facility. R&B concepts allow portions of the control system to be repaired offline while the controlled process remains in service. Special operating modes such as manual supervision may be required, but the ability to perform online testing of items such as relief valves and meter runs is a valuable benefit of high-reliability systems.

Why test R&B?

Some industries, such as aerospace and nuclear, routinely test their redundant and backup system because reliable technology is essential to their high-risk business. But less-risky industrial users don't always adopt a "mission critical" approach to testing R&B performance. Everyone in industry has heard war stories of redundant and backup systems that failed to do their job. "The UPS should have kept us going" or "the redundant processor had an outdated program." The subsequent diagnosis is often preformed through a rear-view mirror, with perhaps some adjustment to future maintenance procedures. But in reality, proper testing of R&B systems remains on the back burner of many maintenance programs.

Of course, the less exotic elements of R&B systems, such as inputs and outputs, are often tested during routine maintenance of the control system. However, such checks are often limited to calibration and physical care. Such maintenance may test the heart of an R&B system, but not its soul. A true test requires simulation of the special transient conditions for which the R&B systems are designed. Proper R&B testing requires more than simply faking a process fault to verify that the system performs its normal role. R&B testing should also include challenging their unique "non stop" features to verify reliable performance even while partially disabled.

Further, the requirement to routinely verify R&B operation is becoming increasingly important because of safety-related standards such as IEC 61511 and ANSI/ISA S84-2004. These internationally accepted guidelines define Safety Integrity Levels (SIL) and Safety Instrumented Systems (SIS) that generally rely on redundant and/or backup systems. Merely designing controls to meet those standards isn't sufficient to satisfy existing and pending regulations. Proper testing and verification of specific redundant and backup features is essential in meeting both the spirit and letter of those standards.

How to test?

A proper test of redundancy and backup requires creating operating conditions that mimic failures of the control system itself, and also of its various support systems. Such tests must go beyond the manual or automatic diagnostics built into many R&B systems (i.e. the UPS "test" button). Those diagnostics are generally local to the device and may not adequately test responses to external problems. So although built-in tests help verify operation of an R&B component, they cannot verify reliable system operation for situations that involve interconnected units.

So how can the R&B functions themselves best be tested? There's no easy answer here--every redundant and backup system has its own special requirements. But a common theme is to simulate fault conditions that are unrelated to the controlled machine or process. A significant goal is to test the redundant or backup system's ability to maintain operation during and after a transient condition that interrupts normal conditions, including loss of the primary control system. Therefore, testing one part of an R&B system usually requires disabling other parts under conditions that simulate real-world failures.

Another key testing goal is to validate the R&B system's ability to alert operators to a partial failure. In addition to seamlessly maintaining operations, the R&B system must accurately indicate that it or its partner is impaired. Without such notification, corrective action may be delayed or overlooked until after the remaining portions fail.

Fortunately, functional testing of R&B systems is usually more "fun" than routine maintenance work. Rather than calibrating transmitters or greasing actuators, we get to kill half of a redundant system and suffer nothing beyond a warning light. Or we can disable a remote speed control and watch the lowly backup governor maintain operation. And then there's everyone's favorite---pulling the plug on a UPS and grinning when nothing bad happens. Can testing really be that simple? Maybe not.

The UPS example just cited may seem like a good idea, but many UPS manufacturers will disagree. An often-overlooked effect of "pulling the plug" is disconnection of important ground and neutral references that help the UPS monitor primary power. A better UPS test procedure is to remove power at the circuit breaker or other convenient point to introduce transients similar to a power outage. Only then can a true test of the UPS's ability to sense, switch, and supply be performed in the field.

Likewise, simulations that merely "pull the plug" on an input, communications link, or processor may not represent realistic R&B failure modes. Input signals don't usually go away, but they do drift out of specification. Similarly, communications links don't always go quiet--in fact, they're more likely to get noisy when failed. And processors are rarely known to leap from their happy home in the electronics rack. A more realistic procedure will mess with the power or communications going into a processor, or to an output coming from the processor, to determine if its R&B control partner can carry on.

Establishing adequate test procedures therefore requires careful consideration and planning. The tests can't merely be convenient or arbitrary--they need to be realistic. And they need to be part of the facility's regular maintenance plan.

When to test?

In theory, we should be able to test redundant and backup systems anytime we want--if they work properly, there's nothing to fear. But in reality, R&B testing for a "non-shutdown" rarely occurs until after the system fails to perform. Perhaps the lapse is due to fear that the R&B system won't work--no one wants an unexpected shutdown noted in their permanent file. The logical solution is to combine R&B testing with other maintenance procedures in which an unexpected shutdown can be tolerated.

For example, many offline maintenance activities begin with a functional test of the emergency shutdown (ESD) system. Few maintenance tasks are more satisfying than watching an automatic control system stop a complex machine or process in a safe, organized sequence. We expect nothing less when we push the big red button, yet it's still a kick to watch the dominoes tumble toward a happy ending. Similarly, a planned shutin is an ideal time to test the failure modes of redundant and backup systems to verify that they don't shutdown a process. Therefore, functional R&B tests are usually best accomplished just prior to performing the scheduled ESD.

What's next?

If you suspect shortcomings in your R&B maintenance, consider building a multi-disciplined team to raise awareness and evaluate your needs. Proper testing will likely require input from many sources. Be sure to include the usual suspects such as Plant Utilities, Communications, Engineering, and Operations. But also include lesser players such as Safety, Training, and Administration, all of whom share your interest in seeing redundancy and backup systems perform as planned. There's little doubt that attainable goals can be set. But chances are, the path to those goals begins with you.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Bio information for Arthur Zatarain, PE

Arthur Zatarain, PE, consults in technology and intellectual property through Artzat Consulting, LLC. He is also Vice President of TEST Automation & Controls, a provider of industrial systems worldwide. He can be reached through www.artzat.com.

 

 

Best Viewed in Mozilla Firefox

Artzat Consulting is owned by Arthur Zatarain, PE in Metairie Louisiana, a suburb of New Orleans Artzat provides consulting and expert witness services to attorneys, insurers, and end users. Typical projects relate to equipment, automation, instrumentation, and control systems. Service is available nationwide with engineering licenses held in Louisiana, Alabama, California, and Alaska.

Forensic Engineer

A forensic engineer performs analysis and reporting on technhical matters that are typically being pricessed through some form of legal matter. However, a legal environment isn't required for a forensic examination. The analysis may be performed merely to determine the cause of a specific event or condition. For example, a forensic examination may be made on a control system to determine why an accident occured, or why a system did not perform as expected. The forensic analysis may be of software code such as ladder lofic in a PLC, or it may involve hard wired relay logic, electrical controls, power distribution, or instrumentation. Forensic engineering is therefore useful in a variety of situations regardless of the legal entanglement.

Industrial Equipment

Typical equipment includes programmable logic controller PLC, distrubited control system DCS, and electric relay logic. PLC systems use ladder logic for most operations, while a DCS will often use function block programming. The concepts of PLC and DCS have merged into a unified control platform based on open architecture interfaces. The use if ladder logic is widespread due to its earlier application to relay logic circuits.

An expert witness is used to investigate and evaluate the technical and commercial aspects of accidents, intellectual property, and commercial matters. Artzat consulting can assist clients in all these areas, with experience with steam boilers, paper mill, steel mill, burner management, and telemetry scada. Other areas include medical devices, flow measurement, meters, power distribution, and refridgeration.

Expert Witness Services

Expert witness can be provided in any state, with experience in Louisiana, California, Alabama, and Alaska. Other states include North Carolina, Olkahoma, Illionis, and Indiana and Texas. Michigan has also been served, with the states of Washington, Colorado, Oregon, and District of Columbia DC. Any state such as New York or New Jersey can also be served by expert witness service. Professional credentials are important, such as licensed engineer or registered engineer. Also importnat is a masters degree in engineering or similar field. A phd is not a necessity for an expert witness because career experience and expert witness experience is more useful to the client than a phd with no relevant experience.

product Liability

A forensic engineer is useful for matters of product liability and product defects. Artzat Consulting has experience with product liability for industrial and commercial equipment. Product liability has also been analyzed for control systems, programmable controllers, ladder logic, and engineering design. Product liability can result from an original product manufacturer oem, or from a systems integrator who combines components into a complete system.

Forensic Engineering Locations

Service in Louisiana, Mississippi, Texas, and Alabama is efficient due to the proximity of Metairie to those areas. However, an airplane will take Artzat anywhere within the USA in a matter of hours. Travel to Alabama areas such as Birmingham or Montgomery or Mobile is easy, with Huntsville also accessible by car. Visits to Houston, Dallas, San Antonio, and Austin are also less than one day away by car. A phd is not unusual for an expert witness, but is not really important when compared to real life experience with equipment, controls and automation with PLC and DCS control system equipment.

Service in California includes Los Angeles, San Francisco, and San Diego as well as outlying Bakersfield and Antioch. Seattle is a bit far, but the airline does most of the heavy lifting. Travel to New York NYC occurs easily on JetBlue and Delta. Once in NYC the entire tri-state area is easily accessibls, as is upstate new york.

Service to New England is welcomed, so please inquire with your technical requirements for an expert witness. Travel to new England such as Boston is by JetBlue, or other carriers, which can then lead to other New England cities.

Engineer for Machine Accident

An engineer ma be required to serve as an expert witness or forensic for a machine accident such as with a conveyor, power press, steel mill, or extraction machine. The instance could be an equipment accident, or it could be a process accident. A typical example is an expert engineer for a manufacturing accident. This could be an expert engineer or forensic engineer in an assembly plant, or an expert engineer in a production line or on a vehicle assembly line.

Oilfield accident

An expert engineer can be useful to evaluate an oilfield or oil and gas accident. Those events may include oil and gas or the related products such as water, co2, h2s, and sulfates. The accidents occur on oil wells, gas wells, pipelines, storage tanks, and production vessels such as separators, treaters, waste heat recovery units, and water treating facilities. Such events can be generally divided into an oil and gas drilling accident or an oil and gas production accident. An oilfield accident requiring an expert engineer can occur onshore of offshore. The expert engineer can be for control system, production system, safety system or automation system, or instrumentation. The system can be electrical, electric, electronic, hydraulic, and pneumatic. A computer control system can also require an expert engineer. An industiral engineer can also be used if the matter involves safety and production systems.

Automatic control

An expert engineer may be required for an accident involving automatic control. That expert could be for electrical engineer, control system engineer, or automation engineer. A mechanical engineer or someone with experience with mechanical engineering can also be useful for an automatic control accident. A certified systems integrator is someone who can be an expert engineer for automatic control. The systems integration involves combining multiple equipment and techology into a single control system. This involves design, programming, fabrication, testing installation, and maintenance.

industrial accident

An industrial accident may require an expert engineer or forensic engineer to analyze and evaluate the control system connected with the event. The accident may have nothing to do with the control system. Still, a forensic engineer may be required to analyze the system to determine that the control system was not af fault.

Equipment accident

An equipment accident can require an expert engineer or expert witness to help evaluate the circumstances and situation including the mechanical and electrical components of the equipment. This can be industrial equipment, process equipment, manufacturing system, commercial equipment such as heater or dryer, or pump and compresssor. Industrial equipment is also a flow meter, electrical switchgear, control switch, button, and instrumentation. End devices are pressure, temperature, level, and other physical measurement. Many equipment is used for food production, packaging, transportation, storage, and conveyor. Metal processing such as steel mill, paper mill, refinery, petrochemical, and tank farm. Vehicle can also be equipment itself, or it can contain devices related to an equipment accident.