Root Cause Problem Elimination

Christer Idhammar, IDCON INC

In good maintenance organizations, 20 to 30 % of all maintenance hours are used on Root Cause Problem Elimination (RCPE). This includes the time to find the root cause of problems and to eliminate them so they are not repeated. Using it here in the context of maintenance management, results can include everything from training employees to improve work processes, equipment design or how to operate equipment.

People sometimes wonder what the difference is, Root Cause Failure Analyses (RCFA) as opposed to Root Cause Problem Elimination (RCPE). It’s simple. RCPE takes us all the way to eliminating the problem, while an analysis only highlights the root cause of a problem. If you only do the analyses and find the root of the problem but you do not take the next step to eliminate the problem you just had an interesting exercise but no results.

Following IDCON’s beliefs we also like to use the term “problem” instead of “failure,” as the term failure draws attention to equipment and maintenance. “Problem” is more inclusive, as it also pertains to issues for operations, not only maintenance.


To give you an idea of what can be accomplished with RCPE, let’s have a look at Sven Wingquist, the Swedish industrial magnate who founded both SKF, and co-founded Saab AB.

At the turn of the twentieth century, Wingquist was a young engineer working at a textile mill in Gothenburg. His job was to ensure that all machines always ran as they should. Today his title would likely be reliability engineer.

They had problems at the textile mill, the bearings for the main shafts broke down frequently. Wingquist did what any good engineer would, he looked for the root of the problem. He discovered that the mill stood on loose clay-based soil, causing the whole structure to shift and move, as much as up to a few millimeters, which in turn skewed the drive shafts and misaligned them because of the constant friction. Not only did the fire risk increase even more (a common problem back in those days, especially in a textile mill), but the constant down-times of the mill due to the overheated bearings, took a toll on throughput.

Wingquist again did what good engineers do, and began looking for a solution. He soon learned that the problem could be solved with spherical roller bearings that could handle the misalignment of the drive shafts.. The dilemma was that back then, these bearings were only manufactured in Germany, and deliveries often took several months, but what choice did they have? When the bearings finally arrived, they were often of poor quality and did not work as advertised. Wingquist was now forced to find another solution, and he did in 1906. It took him a few models, but his second patent would prove to become wildly successful. His design was simple—two rings and double rings of balls. The inside of the outer ring had a spherical shape, which allowed the ring to adjust to misalignments , and the load was evenly distributed between the balls. The new design of the bearings worked and eliminated the frequent shut downs of the mill. 

In the beginning, technical experts of the era did not believe in the invention and the leading German producers of simple bearings dismissed Sven Wingquists ideas, so he started his own business. Svenska Kullagerfabriken, or more commonly known as SKF, was founded in 1907. More than a hundred years later SKF reported an annual revenue of more than eight billion dollars. So, root cause problem elimination, problem solving and a few inventions, can take us all very far.

I don’t know exactly all the steps Wingquist took in his assessment, or who else was working with him, but judging by his mega success, in addition to performing a successful RCPE, eliminating the problem, he must also have been able to set up a great partnership within the industries he founded and ran.

The RCPE process is also an important component in building a solid partnership and work system between operations and maintenance.

I believe that most problems can be solved by the operations and maintenance frontline organization. Electricians, operators or mechanics are often the first people to notice and visit the site of a problem. It could be a tripped out electric motor or a sudden change in the mix of chemical components in a product. You can look at it as a crime scene investigation because these people are often the first to arrive at the scene so they should be taught how to collect possible evidence on determining what might have happened. And as with crime, time is of the essence. Collecting evidence before it’s been contaminated or the scene has changed is crucial.

You need to have employees trained in the RCPE process to facilitate the RCPE events. This is often a major part of a reliability engineer’s job. He or she puts together a mixed team of employees with some exposure to the problem as well as some people with applicable experience.  


Critical thinking. Everyone won’t be great at RCPE. We all have some tendency to jump into conclusions and sometimes too fast. When participating in a RCPE, your mind must be like a parachute:  “it has to be open to function” to follow a logical process—without falling victim to assumption.

It is human nature, even with little information and facts, to form an opinion, become locked in on it, and find arguments to defend our reasoning. Of course, sometimes you can be right, but it can often lead you down the wrong path and the root cause of the problem will remain. As a general rule you should investigate, and document, when andwhere the problem occurred, determine what happened and then find out why it happened. 

            Troubleshooting and root cause are often mixed up. Troubleshooting results in finding a problem and fixing it, while RCPE goes deeper and identifies the root cause—why a component broke down and then eliminate the risk of it happening again.

There are many different tools to use for RCPE. The Fault Tree, the Cause & Effect Diagram, and the 5-Whys all are good if you use them right. Whatever tool you choose, it should have a cause-and-effect relationship to yield a complete RCPE. Unfortunately, most software used for problem solving is restrictive. It slows the process down and hampers creative- and critical thinking. The Fishbone Diagram, for example, does not aid in root cause problem solving; it is more of a brainstorming tool. Personally I think the Fishbone Diagram become overwhelming with details and does not create a clear path to a solution. We have found that something as simple as using sticky notes on a board is the best way to do RCPE. It is visible, engages participants, and is easy to change as the problem solving session is in progress. 

            In our quest to find solutions to problems, it’s important to have a good RCPE work process. You cannot solve all problems so you need to be selective. Identifying triggers that will select problems that should be solved within 72 hours, for example. Triggers typically include safety incidents, environmental damage, or a spiked cost for lost production and/or maintenance.

As a general rule, do not set the triggers too low when you start this process. You risk not having the time to solve all the problems triggered if triggers are too low. It’s better to set high triggers and as you get fewer problems to solve, you can lower these triggers.

In a reactive maintenance organization there will not be more than very sporadic time available to do RCPE. Unless it’s a dire problem, RCPE often gets low priority. The front line organization will not be much involved on a regular basis because they are too busy reacting to urgent maintenance work. So you have to free up time for RCPE.

The very basics of reliability and maintenance management must be instituted and performing well before you free up time to do successful RCPEs. If you start RCPE with the intention of making it a big part of your reliability engineer’s job with the involvement of the frontline organization, and only 50 – 60 percent of work is planned and scheduled, you will not find much time to do good RCPE.

As much as 90 percent of all corrective maintenance work should be executed as planned and scheduled work. If your organization is more reactive than that it will be difficult to find time to do the RCPE work well.

Your craftspeople should be very involved because they are often the first on the “crime scene” and they have the hands-on experience, and having them participate in RCPE often opens them up to reveal knowledge and ideas they may not be aware they had. It is also motivating for craftspeople to see improvements they were part of become implemented.

When your basics are done with excellence and you gradually increase the number of good quality RCPE events, your organization will evolve into a learning and continuously improving, long-term thinking- and problem solving organization.

Communication is crucial in getting to the bottom of any problem. “Keep it simple” is key here. Try to use one object and one problem and make sure the problem statement is a fact. The problem statement is the first logical answer to the question “How can this problem occur?


  • “The motor is hot.” Is a good statement, motor is the object and the problem is that it is hot.     
  • “The motor is hot and is noisy.” This is less clear because it is one object and two problems.
  • “The motors on the assembly line two are hot.” This can lead to confusion, as there are several objects with the same problem.


So what if a pump is cavitating, it is hot, and it is noisy? You want that information, but not in the problem statement. It needs to be listed in the evidence section. If you put it all in the problem statement, you assume the three symptoms are related to the same problem.  It is likely they are, but never assume. The same goes for having several objects in the statement, but if you do, be very careful as to not assume the symptom of one object is the same as another.

            Just like preparing a case for trial, we must make sure that our data includes only facts and that our evidence has been properly collected.

            Draw a timeline with your evidence and facts and be sure that there is always an answer to the question “How can?” with each statement. It helps to catch time-related gaps in our logic. Be careful to not include symptoms that occurred after the incident.Broken parts, product samples, lubricants, any tools used, interview logs, and production data, is important evidence and should be bagged and tagged, and stored in a designated place until an RCPE is completed. Just like in murder cases, where the cops say after 48 hours, the chances of solving it goes down drastically. And just like a detective, make sure to note names of people that were present during the incident (witnesses) so they can participate in the RCPE event.  Write down or voice memo any interviews and include these questions:Whatexactly did you observe?Whereis the issue, and exactly where on the object is the problem? When was it first noticed, and is there a pattern? Were there changes in time? Are there other similar objects or issues that can be used in comparison and studied? Were there changes in say, operating speed, pressure, or chemistry?

            In sum, develop and select possible causes and base them on evidence, evaluate them based on this evidence. When the detective work is done, evidence is gathered, interviews conducted, timelines recorded and we have identified the Root Cause Problem—it’s time to eliminate it.

There are three levels of root causes and we identify them as technical, human, and work systems. You can always dig deeper, ask several more levels of “How-could that happen?”   

Some general guidelines for RCPE:

If solutions aren’t implemented, the RCPE analyses only add to the company waste. If RCPE generates an improvement project – use project management guidelines. RCPE events should be done using work orders, which in turn should be prioritized in the backlog together with all other work in the backlog.

When it comes to selecting solutions, surprisingly, this step is often seen as obvious and we typically solve things the way we’ve always done things. Don’t fall for it.  Practice your creative and critical thinking and discuss all possible solutions.

In reporting our solutions it’s important to present how we arrived at the root causes. We may know we are right, but we also have to prove that to others. Carefully present your RCPE with easy-to-follow tools (How-can diagram), possible and recommended solutions, project plans and desired timeline, and perhaps do a case study if the problem is big. Be clear who is responsible and accountable for each item.

            Typically, the hardest piece of an RCPE investigation is to manage the behavior and thinking of the RCPE group. Remember to be objective, use critical and creative thinking, evaluate by using evidence, look for time changes, compare similar objects, divide the problem down to its different parts, and most of all—don’t forget that us humans tend to have preconceived ideas about everything, so encourage thinking outside the box.

Thinking outside the box has been one of IDCON’s and my own strongest assets, starting with examples like Volvo contract back in 1973. We offered a different way, got to the root of things, proved our case, won the bid and eliminated enough problems to get the production rate up to 98 percent. RCPE is a bit like humor. It is often the unexpected twist that makes a joke work.

            And don’t forget that sometimes things aren’t all that complicated and a little listening can go a long way. The solution to a problem may be right in front of us, and an RCPE may not even be necessary. In one biotechnical plant in the US, they were plagued with electric power outages and they set out to solve the problem. At the very beginning of the RCPE event an operator said: “I know what’s causing the power outages,” but no one listened. “It’s the cows. I have told you, the cows are rubbing themselves against one of the loose power poles. This is causing the power lines, especially in hot weather, to swing and come in contact with other cables. That’s what’s causing the outages. We can of course try to train the cows to stop rubbing themselves against this particular power pole, but the best elimination of this problem is to stabilize the pole and protect the pole so the cows cannot rub themselves against it anymore.” He was right and it was an easy fix.


Christer Idhammar is the founder of IDCON, Inc., a management consulting firm (idcon.com).  This article was excerpted from a recent book authored by Mr. Idhammar entitled Knocking Bolts.  More information can be found on this book at https://www.idcon.com/reliability-and-maintenance-books/


Subscribe to Machinery Lubrication

About the Author

Christer Idhammar is president of IDCONInc., a Raleigh, N.C.-based reliability and maintenance management consulting firm which specializes in education, tra...