Today's manufacturing plants are a mosaic of many moving parts. In some cases, the same part is used to do different things. In one application, failure of the part may only be a small consequence, while in a different application, the failure may cause a major catastrophe, resulting in considerable loss of time and product.
As each plant is different, with its own unique set of potential problems and mitigating tasks, the identification and mitigation of high-consequence failures requires conscious effort, experience and knowledge. A strong focus on reliability is vital to maintaining first-pass, first-quality production.
It is important to remember that reliability is not a characteristic but rather a continuous journey. One of the most important objectives of this journey is to identify and eliminate defects that can compromise systems and equipment.
Where reliability is concerned, a defect is anything that is not perfect, whether that condition is related to a single piece of equipment or an entire operation. A defect can be as simple as a broken gear tooth or as complex as an entire production line that cannot handle the demands being placed upon it. What is most important is to track the defects back to their sources and address the issues to prevent the defect's return.
Defects are caused by a variety of sources, including normal degradation of items in their environment, design deficiencies, improper manufacture or use of the item, exposure to contaminants, improper repairs, collateral damage from other failures or inadequate preventive maintenance. A defect condition may lie dormant for years, only to be revealed when the defective item is stressed in an abnormal way.
While the principles of defects and their sources can be described broadly, specific failures are defined in terms of their causes and their consequences. A pump impeller failure will have different causes from a motor bearing failure. Depending on the location of a part within a system, a failure could cause anything from a reduction in product quality to a complete line shutdown. Fundamental to understanding how to improve reliability of a manufacturing facility is the understanding of how the items can fail and, once they fail, understanding the effect of those failures. Once you understand the effect of the items, you can then design an appropriate mitigating task.
Mitigating tasks are all around us, and they protect us in many different ways. Some common mitigating tasks include applying sunscreen to prevent sunburn, using a speedometer to avoid getting a speeding ticket or washing your hands to avoid getting sick. In other words, mitigating tasks are the things you do to prevent or minimize harmful consequences. Mitigating tasks do not happen automatically, nor are they free. They are part of a reasoned analysis done by an expert. The expert makes a recommendation for mitigating tasks based on a cost-benefit analysis of the consequences.
Furthermore, once these high-consequence items are identified, systems must be designed to reduce the consequence. Once the systems are in place, they must stay active to protect against future failure events. The effort needs to be ongoing so that as things change, the analysis is repeated to ensure the systems are still adequate.
The old saying goes, "The journey of a thousand miles begins with a single step." This also is true of the journey to reliability. But what is the first step?
A popular approach to reliability improvement is to identify those items that seem subject to chronic failure and tag them as the "worst actor." Among the metrics to look at in determining bad actors are mean time between failures (MTBF), maintenance costs, emergency work or average maintenance costs over a period of time for equipment or groups of equipment in the same area.
In studying bad actors, the following questions can help uncover the root cause of the chronic failure: "When did the problem begin?" or "Was it always like this?" or "Was there some initiating event that caused the level of reliability to change?"
A popular practice is known as "predictive maintenance," though this is often a misnomer. In general, this refers to a given number of diagnostic tests routinely conducted on plant equipment with the goal of providing information on incipient failure of the machine under test. However, in practice, predictive maintenance does not provide a prediction, but rather it flags equipment showing signs of trouble. The routine diagnostic tests are cost-effective, and the information they produce is highly valuable if used to make timely decisions about the equipment's maintenance requirements. A more accurate name, and one that inspires proper use of the tools, is "condition monitoring," which can be a valuable failure mitigation task. While it does not prevent the failure, it often gives clues as to the cause and future prevention of the failure.
Reliability engineers can evaluate the failure history of equipment items to find out if an item is performing as expected or if it is subject to quality issues from poor quality parts or poor assembly techniques. This process is called distribution analysis, and it is a powerful failure analysis technique that, if applied correctly, can identify and ultimately eliminate chronic failures.
Maintenance and reliability are the convergence of multiple effective resources correlated in time. A great deal of skill and attention to detail is required to organize resources so the right people with the right tools use the right procedures.
Experts in root cause failure analysis categorize the data they collect about a failure into five categories, called the five "Ps":
Reliability engineers look at the five "Ps" whenever they are investigating the reasons for poor reliability. Whatever defects are causing low reliability can be found within these five "Ps."
Once the appropriate systems and organizational structure to support those systems are in place, the design of the equipment must be considered. Ideally, this would be done before the production line is built, but companies often inherit production lines that were designed years ago. Does this mean they are stuck with poorly designed equipment and systems? Not if there is an opportunity to improve the design. The question is not "can we afford to do it?" but rather "can we afford not to do it?"
One of the tools that can help uncover weaknesses in the design of a system is reliability, availability and maintainability (RAM) modeling. Modern RAM models can show the value of design improvements along with structured preventive and predictive tasks that are part of an overall reliability improvement strategy. Models also give indication of the benefits of proposed improvements before the improvement is made. This can contribute to a significant reduction of risk.
Where the analysis indicates that a weakness exists, a focused improvement can help. A focused improvement effort can help to substantiate the idea that "we can do this." Once effectively addressed, bad actors (usually one at a time) cease to cause problems. This gives the organization a chance to embrace requisite changes, implementing them slowly over time.
As the organization begins to understand that the changes are positive, resistance drops and the pace of change can accelerate.
Keep in mind that once these short-term goals in reliability have been achieved, the journey is not over. Reliability depends on gaining ground against defects and then setting new goals in a process of continuous improvement. Once accomplished, the steps described above are merely the groundwork for achieving improved and consistent product quality, uptime that meets or exceeds demand, and safety records that may at one time have been considered out of reach.