How to Implement Reliability-Centered Maintenance

Jonathan Trout, Noria Corporation

Below we'll discuss how reliability-centered maintenance works, how you should implement it and see how reliability-centered maintenance can benefit you.

What Is Reliability-Centered Maintenance?

First established in the aviation industry, reliability-centered maintenance (RCM) is the process of identifying potential problems with your assets and determining what you should do to make sure those assets continue to produce at maximum capacity. Another way to look at RCM is it's a way to analyze breakdowns to determine maintenance methods and unique maintenance schedules for your individual assets.

Reliability-centered maintenance is often mistaken for preventive maintenance; however, there is one key difference: preventive maintenance isn't selective like RCM, making it less efficient. When performed correctly, reliability-centered maintenance reduces inefficiency by looking at each individual asset carefully before assigning maintenance tasks.

Reliability-centered maintenance uses a general workflow involving four steps:

Choose the asset you want to evaluate.
Evaluate the chosen asset based on the Society of Automotive Engineers (SAE) JA1011 standard.
Choose the type of maintenance to perform (preventive, proactive or condition-based).
Repeat the RCM process for other equipment or critical assets.

Reliability-centered maintenance takes a bit more time to implement initially, but it helps your plant operate effectively in terms of production availability, how many spare parts you need in stock and other factors directly relating to the overall cost.

Assessment Criteria for Reliability-Centered Maintenance

When beginning the reliability-centered maintenance process, it's a good idea to start with your most critical asset or the one that will cause the biggest headache if it breaks down.

So, you've chosen an asset you want to evaluate using RCM. It's time to assess it based on seven standard questions set forth by the SAE, which is the regulatory organization on reliability-centered maintenance, among other engineering standards. Let's take a look at each question laid out in SAE's JA1011 standard.

How well should this piece of equipment perform? Once you've determined the piece of equipment you want to analyze, you need to look at the primary function in terms of how the machine meets manufacturing goals (or customer needs in some cases). In other words, what does the machine do and what do you want it to do? You can determine this by looking at and comparing the machine's past performance and maintenance data.
For example, say your bottle-capping machine can cap 1,000 bottles an hour at optimum capacity. You see from past maintenance data that your bottling machine has needed to be repaired every 800 hours on average (sometimes referred to as mean time between failure or MTBF). Each time it needs to be worked on, it's shut down for three hours. If you run the bottling machine 20 hours a week, every 40 weeks (800/20) you'll experience downtime and lose about 3,000 capped and finished bottles (1,000 bottles x 3 hours). Based on the bottling machine's data, it should be able to go approximately 1,200 hours before needing repairs. If you can increase the bottle-capping machine's average time between repairs by 50%, you should see nearly 1,000 more finished bottles every 40 weeks.
In what ways can this piece of equipment fail? The second question deals with the "what ifs." SAE standards refer to these as "failure modes." A machine running 24/7 might experience the failure mode of fatigue as it nears the end of its life. Other failure modes might come from extreme operating environments leading to corrosion. These are two of the most common failure modes, but it's important to consider others like human error, organizational strategy and design or manufacturing flaws.
What causes each failure? Once you figure out possible failure modes, you need to determine the root cause of the failure or potential failure. In what ways could the bottle-capping machine fail? Human error, a broken belt and gearbox or motor breakdowns are a few failure modes you might see, but don't stop there. Dive into each cause a bit more. Why did human error occur? It could be caused by poor training. Why did the gearbox breakdown? It wasn't maintained properly (poor lubrication, lack of oil changes, etc.).
What happens when a failure occurs? In other words, you need to identify the effects of a potential failure of this machine. How will it affect the end product and overall operating cost? Putting a strong reliability-centered maintenance plan together helps eliminate things like production loss, high-cost repairs and unplanned downtime.
Why does each failure matter? Simply put, what are the consequences of each failure? Strive to answer how a failure would affect employee safety, environmental safety, production processes and the physical condition of the asset. This is similar to question four, but here you'll want to break down negative effects and quantify each one. For example, how much will the increased labor and repair costs be due to this failure? The capping machine going down for three hours would cost nearly $22 per hour, with repairs costing approximately $400 for parts. What about the decrease in productivity? Downtime could set you back $400 per hour.
If the bottle capper breaks down causing that three-hour downtime, you might be looking at a total bill of $1,666, with $66 for labor ($22 per hour x 3 hours), $400 for parts and $1,200 for a decrease in productivity ($400 per hour x 3 hours). Quantifying these costs will help you forecast expenses brought on by failures.
What tasks (proactive) should be done to prevent these failures from happening? This question brings home the point of reliability-centered maintenance. What can be done to prevent the $1,666 breakdown in the previous example? After the failure is fixed, you'll know the cause, allowing you to plan for future occurrences by scheduling appropriate maintenance to prevent a breakdown in the future. If the gearbox on the bottle capper broke down due to dirty oil and low oil levels causing wear over time, you will now know to monitor oil levels and the oil condition regularly and perform oil filtering on a pre-planned schedule. Below are examples of preventive maintenance tasks that RCM can help you develop.
What should be done if a suitable preventive task can't be found? In other words, if you can't use a predictive maintenance plan to solve the issue at hand, is there anything you can do? If your bottle-capping machine is nearing the end of its life, the smartest choice is probably letting it run until it dies. Knowing this, you can order a new machine, have it built up and waiting for when the time comes. During the 20 hours a week the capping machine doesn't run, you can replace the old machine with the new one. This eliminates downtime that would otherwise occur waiting on the new machine to be shipped to your facility.

3 Basic Principles of Implementing Reliability-Centered Maintenance

When it comes to implementing reliability-centered maintenance, the seven questions discussed above can be broken down into three phases: decision, analysis and act.

Phase 1 – Decision

The first three questions combine to make up the decision phase. To avoid wasting time, justify and plan for implementing an RCM plan. Discuss readiness, needs and desired outcomes with your maintenance staff, project leaders, subject-matter experts and executives. The decision-making phase should be reserved for outlining goals in line with the budget, timeline and management concerns.

When it comes to choosing the equipment for RCM analysis, think about which pieces are most critical to operations as well as the repair vs. replacement cost, and then look at past data to get a snapshot of how much you've spent on previous maintenance. Include the following questions in your decision-making phase:

Would a failure on this machine be difficult to detect during normal maintenance or operation?
Would a failure of this machine affect safety?
How would a failure on this machine impact operations?
How would a failure on this machine impact spending?

Defining a data-driven list of the machine's functionality helps your team choose the capacity at which it wants the machine to run as opposed to its actual performance.

Phase 2 – Analysis

Questions four through six help you analyze or actually conduct the RCM study. First, your team should identify functional failures. These can include poor performance, performing unnecessary functions or complete failure. For example, a total functional failure would be the belt on the bottle capper breaking, causing the machine to stop completely.

The next step in the analysis phase is identifying and evaluating the effects of the failure(s). Your team should document what can be observed or what actually happens during a failure. How does it affect overall production? How does it affect safety?

The last step in the analysis phase is identifying failure modes or what causes each failure? A popular technique to uncover these causes is using failure mode and effect analysis (FMEA). This analysis technique breaks down all possible failures that could occur in the design, manufacturing or assembly process, as well as a product or service. Ask questions such as:

How does this failure affect safety?
How does this failure impact operation and overall production?
Does this failure cause full or partial outages?

FMEA looks at failure modes, root causes, failure indicators, failure criticalities, failure probabilities and effects by considering asset history and team/employee experiences. Most FMEA analysis programs use the information gathered to drive the planning of mitigation tasks to help detect failures early or prevent them altogether.

Many companies automate the analysis process using a computerized maintenance management system (CMMS). A CMMS tool helps with planning and minimizing the chances your team will miss scheduled work and equipment failures by generating tasks and scheduling inspections.

Phase 3 – Act

Phase 3 incorporates the seventh question (select maintenance tasks). After planning, making decisions and analyzing, it's time to act on the information you've analyzed to update your maintenance tasks and system procedures and improve asset design. Think about grouping your failure management techniques into two groups: proactive tasks and default actions.

Proactive tasks: These include predictive and preventive maintenance techniques to prevent failures proactively. Proactive maintenance tasks are scheduled in advance, helping mitigate the risk of failure, while predictive maintenance tasks or condition monitoring helps detect failures before they begin.
Default actions: These refer to reactive maintenance or letting a machine run until it fails and then fixing the issue. You may have heard this referred to as "run to failure" maintenance.

Deciding which technique is best for your situation depends on your RCM analysis and understanding how your failure modes affect your assets and impact your overall production.

Benefits of Implementing Reliability-Centered Maintenance

Some of the biggest benefits you'll see when implementing a reliability-centered maintenance plan include minimizing the frequency of overhauls, reducing equipment failures, refocusing maintenance tasks on critical assets, increasing component reliability and more. What do all of these benefits have in common? They all affect your bottom line. Let's take a look at a couple of real-world examples.

NASA's Marshall Flight Center

NASA's Marshall Flight Center brought in a contractor to design and implement a permanent reliability-centered maintenance plan for its facilities and collateral equipment, most notably the pressurized systems. The RCM plan decreased the flight center's maintenance costs, extended the life of aging equipment, made work conditions safer by managing risk, decreased energy consumption and reduced the environmental impact, all resulting in a taxpayer savings of more than $300,000.

National Ignition Facility (NIF)

The NIF is a section of the government operating out of the Lawrence Livermore National Laboratory (LLNL). According to the National Ignition Facility & Photon Science website, it is funded by the U.S. Department of Energy's National Nuclear Security Administration (NNSA) and uses 192 laser beams to "routinely create temperatures and pressures similar to those that exist only in the cores of stars and giant planets and inside nuclear weapons." By doing so, it helps the NNSA maintain the reliability and safety of the U.S. nuclear deterrent without full-scale testing.

The NIF lasers essentially make up one giant laser the size of three football fields, so you can imagine how expensive and dangerous a breakdown might be. Using reliability-centered maintenance saved the NIF nearly $80,000 in one isolated instance alone, according to the NIF's former facilities and maintenance manager, Nick Jize. Through an RCM program, it was determined a motor in the laser-amplifying cooling system was put on a watch list and scheduled for weekly vibration analysis. Through vibration analysis, it was revealed that the bearings were deteriorating and loosening, allowing the NIF to replace the motor before it failed. Proactively replacing this motor prevented almost eight hours of "shot delays" for a one-time savings of $80,000, according to Jize. The NIF continuously updates its RCM procedures.

Bottom Line

Reliability-centered maintenance is designed to be performed continuously as opposed to a one-time analysis. It is a valuable tool that enables you to extend the life of your assets, maintain their integrity, minimize or eliminate unplanned downtime and reduce maintenance costs. Reliability-centered maintenance can help you:

Align maintenance tasks with business goals and objectives;
Achieve regulatory compliance, safety and environmental responsibility goals;
Define your plant's real performance objectives, including your equipment's true performance capabilities;
Identify potential risks and hazards that come with hitting your performance objectives;
Determine the most efficient and effective methods of mitigating risks; and
Document the whole process for continuous performance assessment and future RCM improvements.

About the Author

Jonathan Trout

Determine Your Optimum State

Lubrication-Enabled Reliability: Enhancing Maintenance and Machinery Performance

Enhancing Reliability with Service Partnerships

Proper Maintenance Structure | Gear Talk: Episode 17

Featured Whitepapers

Control What You Can Control: Unlocking Resilience in Asset Management

How Oil Cleanliness Extends Industrial Equipment Life

Buyer's Guide

Lubricants

Oil Filtration

Lubricant Storage and Handling