MTTR Explained

Jonathan Trout, Noria Corporation
Tags: maintenance and reliability

MTTR is a metric used by maintenance departments to measure the average time needed to determine the cause of and fix failed equipment.

What Is MTTR?

Anytime you see the phrase "mean time to," it means you're looking at the average time between two events. Mean time to repair (MTTR) is a metric used by maintenance departments to measure the average time needed to determine the cause of and fix failed equipment. It gives a snapshot of how quickly the maintenance team can respond to and repair unplanned breakdowns. It's important to remember the MTTR calculation considers the period of time between the beginning of the incident to the time the equipment or system returns to production. This includes:

Notifying maintenance technicians
Diagnosing the issue
Fixing the issue
Reassembling, aligning and validating equipment
Resetting, testing and starting up the equipment or system for production

The MTTR formula does not take into account lead time for spare parts and is not meant to be used for planned maintenance tasks or shutdowns.

MTTR, as it pertains to maintenance, is a good baseline for figuring out how to increase efficiency and limit unplanned downtime, therefore saving money on the bottom line. It also highlights why repairs might be taking longer than normal, which, when addressed, can get critical equipment up and running fast, minimizing missed orders and increasing customer service. In the interest of efficiency, MTTR analysis provides insight into how your team purchases equipment, schedules maintenance and handles maintenance tasks.

Even though MTTR is considered reactive maintenance, tracking MTTR gives you a look into how effective and efficient your preventive maintenance program and tasks are. For example, equipment with a lengthy repair time might have underlying root causes that contribute to the failure. MTTR can help you start investigating the root cause of failures and get you on your way to a solution. For example, if you notice MTTR increasing in a particular asset, it may be due to the fact that preventive maintenance tasks aren't standardized. A technician might get a work order telling him to lubricate a certain part, but it may not lay out which lubricant to use or how much, leading to further equipment failures.

MTTR analysis is also helpful when it comes to making decisions on whether to repair or replace an asset. If a piece of equipment takes longer to repair as it gets older, it might be more economical to replace it. MTTR history can also be used to help predict lifecycle costs of new equipment or systems.

Mean Time to Repair vs. Mean Time to Recovery

You'll often hear the "R" in MTTR used interchangeably with "repair" and "recovery." The difference between the two terms is that when talking about mean time to recovery, you're including not only the repair time but what we've mentioned above – repair time plus the testing period and the time it takes to return to normal operation. Many people define MTTR by lumping the two together, as we did above. The only time you'll need to distinguish between the two is in the context of maintenance contracts or service level agreements (SLAs). This way, people know exactly what they need to be measuring.

MTTR Calculation

As we touched on earlier, the MTTR formula is the total unplanned maintenance time divided by the total number of repairs (failures). MTTR is most commonly represented in hours. Keep in mind, MTTR assumes tasks are performed sequentially and by trained maintenance personnel.

Total unplanned maintenance time / Total number of repairs = MTTR

A simple example of MTTR might look like this: if you have a pump that fails four times in one workday and you spend an hour repairing each of those instances of failure, your MTTR would be 15 minutes (60 minutes / 4 = 15 minutes).

Another example could involve an asset that experiences 10 outages in a 90-day period. The outage times (time of detection to time the asset is back to production) are 24, 51, 79, 56 and 12 minutes. The MTTR for this 90-day period is 44 minutes. That is the average time between the detection of the issue to the recovery of the asset.

There are two assumptions to keep in mind when calculating MTTR:

Usually, every instance of failure varies in severity, so while some breakdowns require days to repair, others might only take minutes. Therefore, MTTR gives you an average of what to expect.
It's important that every instance of failure is attended to by competent and properly trained maintenance personnel who follow standardized procedures. This ensures reliable results.

It's been said that some of the best maintenance teams in the world have an MTTR of less than five hours, but it's almost impossible to benchmark your facility's MTTR with another's metrics due to the number of variables. MTTR depends on multiple factors like the type of asset you're analyzing, its age, criticality, maintenance team training, etc.

MTTR vs. MTBF: What's the Difference?

When dealing with systems or equipment that can be repaired, MTTR and MTBF are two metrics often analyzed and compared when looking into failures that can result in costly downtime. So, what's the difference between the two? Mean time between failure (MTBF) is a prediction of the time between the innate failures of a piece of machinery during normal operating hours or how long a piece of equipment operates without interruption. It's calculated by taking the total time an asset is running (uptime) and dividing it by the number of breakdowns that happened over that same period of time.

MTBF = Total uptime / # of Breakdowns

MTBF analysis helps maintenance departments strategize on how to reduce the time between failures. Together, MTBF and MTTR determine uptime. To calculate a system's uptime with these two metrics, use the following formula:

Uptime = MTBF / (MTBF + MTTR)

Consider the following scenario: Your system is supposed to be up and running 40 hours, but it wasn't working for 28 of those hours. It's only been available for 14 hours, and a total of five failures occurred. Using our uptime formula, we'll first calculate MTBF by taking 40-28 / 5=34.4. Next, we'll calculate MTTR by taking 28 / 5 = 5.6. So, to calculate uptime, our formula would look like this:

34.4 / (34.4 + 5.6) = 0.86 (86%)

How to Improve MTTR

MTTR is seen as a key performance indicator (KPI). Therefore, maintenance teams should always strive to improve it. The benefits of reducing MTTR are fairly obvious – less downtime means stable production, happy customers and reduced maintenance costs. So, what are some steps you can take to help improve your organization's MTTR? The best place to start is understanding the four stages of MTTR and taking steps to reduce each of them.

Identification - the period of time from when the failure occurs to when a technician becomes aware of the issue. Things like wireless sensors and alert systems are great ways to shorten the identification time period of MTTR.
Knowledge - the period of time after the failure has been identified but before repairs have started. Figuring out or diagnosing the problem is generally the most time-consuming part of MTTR.
Fix - the period of time it takes to actually fix the issue at hand. Reducing the time it takes to fix an issue can be accomplished by standardizing procedures to guide well-trained technicians who are tasked with solving the problem.
Verify - the period of time it takes to ensure the applied fix is actually working. A real-time monitoring system is a helpful tool to quickly gather data and reports to show the fix is working.

Diagnosing the cause of the failure is the most time-consuming aspect of MTTR. In fact, 80 percent of MTTR is spent figuring out what caused the asset or system to fail. Documenting, managing and having a machine ledger on hand with things like maintenance schedules, repaired/replaced components and a history from equipment monitoring systems will be vital to being able to quickly narrow down possible causes of failure. In a failure scenario, critical time is lost when phone calls are being made, meetings are being called and incorrect diagnoses are happening, leading to fixes that fail.

In the same failure scenario, having proper documentation and an asset history lets you quickly examine all causal factors that may have contributed to the failure. Management can look at the maintenance calendar to see if the machine has been consistently maintained, see when the machine last had a component repaired or replaced, and check to see where that particular machine has had problems in the past.

Training and Procedures

Detailed written procedures should be made available to all maintenance personnel and followed precisely to mitigate the risk of trial and error when it comes to making repairs. Procedures provide technicians with a structured sequence of actions that help minimize the time it takes to fix an issue.

All the documenting and preplanning in the world will not help reduce your MTTR if your technicians aren't properly trained with the right skill set needed to repair your equipment. Implementing continuous training exercises and sharing them with the team is vital. Discussing recurrence matrices and introducing one-point lessons are great ways to do this.

Single-point lessons are short, visual lessons on a single point. They're intended to improve job-specific knowledge and skills by showing specific problems and how to fix them. One-point lessons can deal with safety, basic knowledge, improvement or trouble areas. To help reduce MTTR, one-point lessons can be used to work through actual breakdown scenarios, either as they're happening (most effective) or in a mock trial.
Recurrence matrices track weekly breakdowns, when a breakdown analysis is complete and when countermeasures are applied. Making sure your team understands the breakdown trends shown in a recurrence matrix helps them learn how to determine whether a breakdown was forced or from natural deterioration.

Spare Parts

Even though the MTTR formula generally doesn't consider lead time for spare parts, it's important to acknowledge how the availability of spare parts affects MTTR. In his dissertation, A structured approach for the reduction of mean time to repair of blast furnace D, ArcelorMittal, South Africa, Vanderbijlpark, Alex Thulani Madonsela discusses human factors contributing to MTTR; one of them being spare parts. "Timely availability of spare parts affects the duration of maintenance tasks," he explains. "Without proper support of equipment when required, executing maintenance becomes difficult for maintenance personnel. The lack of spare parts and knowledge of where to find them negatively affects the MTTR when maintenance has to be performed." Madonsela goes on to detail an approach to help minimize MTTR by having an organized inventory of spare parts.

Compile a functional location structure: This step involves compiling a list of plant equipment based on their location or where a maintenance task would be performed according to hierarchy.
Compile equipment inventory: Based on the plant's design, put together an inventory of the equipment.
Develop a naming and coding standard: This is important for maintenance technicians to be able to locate and maintain stock. This ensures the correct spare parts are ordered and stored correctly every time. It also ensures efficiency, since maintenance technicians will know the exact location of spare parts.
Perform criticality evaluations of spare parts: Each spare part on hand should be evaluated based on its criticality in supporting the maintenance strategy for each piece of equipment.
Finalize inventory: Once the previous steps have been completed, a finalized inventory list should be made available and be easily accessible by everyone.
Develop a storage standard: Implement the original equipment manufacturer (OEM) recommendations for each spare part to ensure the quality of the parts does not deteriorate.
Quality assurance: Make sure items stored as "readily available" meet the correct standards. For those that haven't been checked, store them in a separate storage area. Any spare parts returned to the storage area should be checked for quality.
Audits: Auditing ensures your system is working properly and is adding value. The audit team might consist of the storage/warehouse manager, maintenance supervisor and planners.

Technology

Perhaps the best chance for an organization to reduce its MTTR is by implementing modern monitoring technologies. Onsite or remote monitoring done via a smartphone or tablet gives you a 24/7 look into how your system is performing. This real-time data can be used to track metrics like MTTR and let plant engineers design preventive maintenance plans and plan for failures in advance.

Modern computerized maintenance management systems (CMMS) help you easily track data like labor hours spent on maintenance, number of breakdowns and operational time, which is used to monitor high-level failure statistics. CMMS can even calculate MTTR and MTBF automatically for you. You may have heard of the internet of things (IoT) – the interconnection of everyday devices to the internet. It's already taking over the world of consumerism in the form of smart homes, as you now can control your heating and air conditioning units, lights, and locks all from your smartphone. But this is also creeping into the industrial world.

The industrial internet of things (IIoT) introduces automation, real-time data analysis and smart decision-making into the world of manufacturing. Machine-to-machine technology is combined with the IIoT to offer real-time data analysis. This allows for things like tracking failure data in real-time when equipment breaks down and automatically gathering, aggregating and analyzing data before sending a recommended action to technicians. Failure data, like the asset's operating condition before the failure occurred, and historical repair data from your CMMS can be used to direct repairs. In other words, the IIoT can greatly reduce the diagnosing phase discussed earlier, the part of MTTR that takes the longest.