Near-zero downtime: Overview and trends

Hai Qiu, the University of Cincinnati; Dr. Jay Lee, the University of Cincinnati

1. Maintenance Technologies Overview:

Many manufacturing companies are pushing their production equipment for every ounce of capacity while, at the same time, trying to cut their overhead costs. This has put a strong emphasis on the importance of quality maintenance services used to care for their systems. Service and maintenance are becoming essential for companies to sustain their manufacturing productivity and customer satisfaction at the highest possible level. Aftermarket support of products is increasingly becoming the key factor in determining the profitability and dependability of a company. The importance of maintenance functions, and therefore of maintenance management, has grown tremendously.

Maintenance technologies aim to

  • Increase the device reliability and reduce production downtime
  • Increase the throughput
  • Increase life expectancy of assets
  • Improve safety and quality conditions

Looking back on the development history and forecasting the development tendency of maintenance technologies, the road map to excellence in maintenance can be illustrated as in Figure 1.

Figure 1. The development of maintenance technologies.

1.1 No Maintenance

There are two kinds of situations in which no maintenance will occur.

  • No way to fix it: The maintenance technique is not available for some special application, or the maintenance technique is not well-developed at the early stage.
  • Isn’t worth it to fix it: Some machines were designed to be used only once. Comparing with maintenance cost, it might be more cost-effective just to discard it.

None of the scenarios above are within the scope of the discussion here.

1.2 Reactive Maintenance

In plain English, the aim of reactive maintenance is just to “fix it after it’s broken,” since most of the time a machine breaks down without warning and it is urgent for the maintenance crew to put it back to work. This is also referred to as “firefighting”.

The reason that reactive maintenance happens is because some operations have developed through the years with very little attention given to the proper care of the machinery involved. Essentially, little to no maintenance is conducted, and the machinery operates until a failure occurs. At this time, appropriate personnel are contacted to assess the situation and make the repairs as expeditiously as possible. Hence, you get the expression "putting out the fires" or "firefighting."

In a situation where the damage to equipment is not a critical factor, as plenty of downtime is available, and the values of the assets are not a concern, the firefighting mode may prove to be an acceptable option. Of course, one must consider the additional cost of making repairs on an emergency basis since soliciting bids to obtain reasonable costs may not be applicable in these situations. Due to market competition and environmental/safety issues, the trend is toward appropriating an organized and efficient maintenance program as opposed to firefighting.

1.3 Preventive Maintenance

Preventive maintenance is an equipment maintenance strategy based on replacing, overhauling or remanufacturing an item at a fixed interval, regardless of its condition at the time. Scheduled Restoration tasks and Scheduled Discard tasks are both examples of preventive maintenance tasks.

Preventive maintenance (PM) can be divided into two categories:

Minor PM is basic maintenance, which is simply the act of performing the most fundamental equipment service (lubrication, cleaning, routine adjustments, etc.), that is essential to assuring the continued operation of the equipment. This activity is quite simple with just a few machines, adequate downtime and sufficient funds. A problem begins to occur when there are a lot of machines and no organized program to schedule and control the work tasks. The solution is to implement a minor preventive maintenance program to be certain that the machinery’s basic needs are addressed in a timely and efficient manner. Such a program fulfills the minimum requirement for continued operation, but does nothing to anticipate potential future failures.

Major PM not only includes Minor PM but also begins to address potential failures. With this option, machinery is scheduled to be out of service so that more involved tasks can be performed. Based on run hours or some equivalent time factor, components such as bearings, shafts, sensors, gears, piping, etc., are replaced in anticipation of potential failure in the near future. The time factor is usually determined through experience and is statistical in nature. With this practice, though, it is possible to replace components that are still in good condition as well as risking the introduction of a problem through improper maintenance. As a result, cost can sometimes increase without benefit. However, both Minor and Major PM are critical to assuring equipment reliability and so a combination of the two is frequently practiced.

1.4 Predictive Maintenance

Predictive maintenance (PdM) is a right-on-time maintenance strategy. Predictive maintenance may be best described as a process which requires technologies and people skills, while combining and using all available diagnostic and performance data, maintenance histories, operator logs and design data to make timely decisions about maintenance requirements of major/critical equipment. It is the integration of various data, information and processes that leads to the success of a PdM program. It analyzes the trend of measured physical parameters against known engineering limits for the purpose of detecting, analyzing and correcting a problem before a failure occurs. A maintenance plan is made based on the prediction results derived from condition based monitoring. This can cost more up front than PM because of the additional monitoring hardware and software investing, manning, tooling, and education required to establish a predictive maintenance program. However, it offers increased equipment reliability and a sufficient advance in information to improve planning, thereby reducing unexpected downtime and operating costs.

Figure 2 shows the different elements of the PdM program that are integrated to assist in maintenance decisions.

*Source: Augustine DiGiovanni, Vice-President CSI Services, Maintenance Optimization by Integrating Technologies and Process Change

Figure 2: Elements of a PdM program.

The key concepts of PdM are:

  • Combine all information
  • Analyze information for equipment degradation
  • Determine corrective action
  • Prediction algorithms
  • Determine when to take corrective action
  • Feedback action taken for maintenance history and/or root cause failure analysis
  • Be proactive.

1.5 Proactive Maintenance

Proactive maintenance, in general terms, encompasses any tasks used to predict or prevent equipment failure. To be more specific, there are two working directions.

  • Change the failure reactive to failure proactive by avoiding the underlying conditions that lead to machine faults and degradation. Proactive maintenance focuses on analyzing the root cause, and not just the symptoms. It seeks to prevent or to fix the failure from the source after it identifies the root cause. One of the most popular examples of proactive maintenance concerns heart disease in the human body. For reactive maintenance, the response will only be taken after the patient was sent to hospital emergency room. For preventive maintenance, the patient might have a bypass or transplant surgery followed by continued checkups. For predictive maintenance, heart disease can be detected using EKG or ultrasonic technology and maybe the installation of a device for continuous monitoring. For proactive maintenance, the disease control would involve cholesterol and blood pressure monitoring along with diet control.
  • Feed the maintenance information back to the design and operation department. Failure prevention should also be conducted in the design and operation department. The maintenance crew’s job is not only to fix a machine or change parts, but they should also help by suggesting how to improve a machine’s design and operation so that the failures are prevented proactively.

There is still some debate about the efficiency and failure response speed of proactive maintenance, but there is no doubt that there has been a lack of communication between maintenance and design

1.6 Self-maintenance

Self-maintenance is a new design and system methodology. A self-maintaining machine can monitor and diagnose itself, and if any kind of failure or degradation happen, it can still maintain its functions for a while. A self-maintaining machine doesn’t belong in the conventional physical maintenance concept, but in the functional maintenance concept instead. Functional maintenance aims to recover the required function of a depredating machine by trading off functions, whereas traditional repair (physical maintenance) aims to recover the initial physical state by replacing faulty components, cleaning, etc. The way to fulfill the self-maintenance function is by adding intelligence to the machine, making it clever enough for functional maintenance. In other words, self-maintainability would be appended to an existing machine as an additional embedded reasoning system.

Another system approach to creating the self-maintaining ability is to add the self-service trigger function to a machine. The machine will then self-monitor, self-diagnose and self-trigger the service request with detailed and clear maintenance requirements. The maintenance task is still conducted by a maintenance crew, but the no gap integration of machine, maintenance schedule, dispatch system and inventory management system will minimize maintenance costs to the greatest extent and raise customer satisfaction to the highest level.

2. Where Are We Now?

Most of the traditional manufacturing industries are still struggling to reduce the firefighting nature of their maintenance tasks. One major U.S. automotive manufacturer has a maintenance staff of between 15,000 and 18,000, in all their plants combined. According to them “85 percent to 90 percent [of their maintenance work] is crisis work” (breakdowns). Some other companies have already successfully adopted the preventive maintenance program in their factories. One automotive part supply company said that nearly 80 percent of their maintenance tasks are scheduled maintenance and only 20 percent are firefighting. For most of the manufacturing industries, the ideal ratio of planned to unplanned work is 19:1, which is considered to be “world class” by many of them. So, if a company already reaches a 90 percent or higher level for scheduled maintenance, from the point of view of cost saving and productivity improving, is that good enough? Actually, the key point here is whether 90 percent of the scheduled maintenance is necessary, which leads to our main discussion topic: moving from preventive maintenance to predictive maintenance.

2.1 Shift From Reactive and Preventive Maintenance to Predictive Maintenance

Reactive maintenance, performed only when equipment fails, results in both high production costs and significant service downtime caused by equipment and process breakdowns. Preventive maintenance is intended to eliminate machine or process breakdowns and reduce downtimes by scheduling maintenance operations regardless of the actual state of a machine or process. Preventive maintenance intervals are determined using reliability theory and information about the machine or process life cycle.

This practice often results in an unnecessary loss of productivity either because maintenance is performed when the process or machine is still functioning at an acceptable level, or because unpredicted breakdowns occur before scheduled maintenance operations are performed. According to a Forbes Magazine study, one out of every three dollars spent on preventive maintenance is wasted. A major overhaul facility reports that “60 percent of hydraulic pumps sent in for rebuild had nothing wrong with them.” These inefficiencies are the result of maintenance performed in accordance with a schedule (fixed and guess work) as opposed to the machine’s true condition and need (flexible and dynamic). So, even if we have already achieved a nearly perfect preventive maintenance level, its cost still represents a sizeable portion of the total operating expenses, and leaves a lot of room for improvement and cost savings. Therefore, in contemporary markets, it becomes increasingly important to predict and prevent failures based on current and past behavior of a piece of equipment, thus ensuring its maintenance only when needed and exactly when needed.

Preventive maintenance has always been compared to the service schedule for an automobile. If you change the oil in your car every 3,000 miles whether it needs it or not, you are following a preventive maintenance policy. The predictive maintenance is when you sample the oil from time to time and check for any changes in its characteristics and make a prediction for when your vehicle should go for service. You may find out you need to change the oil more often, or you can keep driving for another thousand miles without changing it. By using this more accurate maintenance technique, not only will you be taking better care of your automobile but you will also reduce costs by avoiding unnecessary service.

For these reasons, we propose a paradigm shift from the traditional approaches of detecting and quantifying failure toward an approach centered around detecting, quantifying and predicting the performance degradation of a process, machine or service. Performance degradation is a harbinger of system failure, so it can predict unacceptable system performance (in a process, machine or service) before it occurs. The traditional fail-and-fix practice can thus be replaced by the new predict-and-prevent process.

2.2 The Benefits of Predictive Maintenance

The benefits of predictive maintenance can be categorized as following:

1. Improve productivity

  • Minimizes or eliminates costly downtime and increases profitable uptime.
  • Reduces unscheduled maintenance – repairs can be made at times that least affect production.
  • Optimizes machinery performance – machinery always operates within specifications.
  • Reduces the time required to make machinery repairs – advance notice of machinery condition permits more efficient organization of the repair process.
  • Reduces overtime required to make up for lost production due to broken down or poorly performing machinery.
  • Increases the speed that machinery can be operated, if desirable.
  • Increases the ease of operation of machinery.

2. Reduce the overall costs

  • Reduces unnecessary machinery repairs – machines are repaired only when their performance is less than optimal.
  • Reduces spare parts inventories – many parts can be purchased just in time for repairs to be made during scheduled machinery shutdowns.
  • Reduces depreciation of capital investment caused by poor machinery maintenance – well maintained machinery lasts longer and performs better.
  • Reduces excessive electric power consumption caused by inefficient machinery performance – saves money on energy requirements.
  • Reduces need for standby equipment or additional floor space to cover excessive downtime – less capital investment required for equipment or plant.

3. Better customer relationship and satisfactory level

  • Reduces the number of dissatisfied customers or lost customers due to poor quality – with less than optimal machine performance, quality always suffers.
  • Just on time service reduces the customers’ waiting time and downtime.
  • Possibility of identifying the service demand before the customers notice the problem.
  • Reduces penalties that result from late deliveries caused by broken down or poorly performing machinery.
  • Reduces warranty claims due to poor product quality caused by poorly performing machinery.

4. Increases machinery safety

  • Reduces the injuries caused by poorly performing machinery.
  • Reduces safety penalties levied against a company for unsafe machinery.
  • Reduces insurance rates because well-maintained machinery increases safety.

2.3 Requirements for Predictive Maintenance

In order to implement predictive maintenance technology two investments must be considered by the management group:

  • Investment in condition-based monitoring and diagnostic equipment.
  • Investment in training of staff

3. Predictive Maintenance Methodologies

3.1 Condition-Based Monitoring and Performance Assessment

The basis of predictive maintenance is condition-based monitoring. Without constantly checking a machine’s operating status and tracking its tendency for degradation, it is impossible to make a precise predictive maintenance plan.

There are dozens of predictive maintenance technologies constructed on the basis of the condition-based monitoring or constant test mechanism, and some have become standards in many industries. Those standard and widely used technologies include vibration analysis, oil analysis, wear-particle analysis, ultrasound, thermography and acoustic emission analysis. The following table shows the ways maintenance professionals have traditionally used these predictive technologies for different applications.

Detection Method

Failure Mode


Vibration Analysis

Out of Balance


Bearing Defect

Gear Defect


Rotating Machinery

Oil and Wear Particle Analysis

Lubrication Failure

Abnormal Wear

Mechanical Component


Leak Detection


Loose Connection

Corona Discharge


Bearing Defect

Hydraulic Pump

Air/Steam/Vacuum System

Power Distribution

Electrical Switchgear and Overhead Transmission



Abnormal Hot Component


Electrical Component

Mechanical Component

Structural Component

Acoustic Emission


Stress Crack


and Transfer Equipment

Vibration analysis is used primarily with rotating machinery to find problems such as bearing defects, out-of-balance conditions and misalignment. Prior to the use of vibration analysis, maintenance technicians had to wait until a bearing failed to realize that there was a problem. By using vibration analysis, however, periodic readings can be taken and recorded. Maintenance personnel can then compare these readings to baseline readings. When wear reaches a certain level, the bearing is scheduled for replacement before it fails. This reduces the amount of reactive maintenance and ensures the replacement occurs with minimum impact on the production or facility schedule. In large rotating machinery, online condition monitoring systems have been widely adopted. The vibration information from each bearing section is collected and the current machine performance is evaluated based on that. Furthermore, future maintenance is scheduled according to that evaluation and its prediction of machine performance. That way, the machine would only be opened when it is really necessary.

Vibration analysis is also used to diagnose some non-mechanical problems in fluid power systems and surge or fluid excitation faults in large centrifugal compressors. For example, restrictions or disturbances in a fluid handling system create turbulence and unique vibration signatures that can help identify a problem.

Ultrasound is used primarily for leak detection, particularly for steam and air leaks. These leaks can be expensive and yet many companies allow them go unnoticed.

Common applications for ultrasound include leak detection for pneumatic and other gas systems, vacuum systems, gaskets and seals, and steam traps. Ultrasound also detects valve blow-throughs and is also the most common way to detect cavitation problems in hydraulic pumps.

Ultrasound is also used for inspections of electrical switchgear and overhead transmission lines, where routine inspection is time consuming and hazardous. These areas are monitored for corona discharge, and when the instruments "hear" the discharge, technicians can quickly find the problem with little time wasted. Thus, technicians are able to find small problems before they become critical and cause equipment failure.

Oil and Wear-Particle Analysis are two different technologies which are widely used to detect lubrication-related faults. Oil analysis determines the condition of a lubricant. Wear-particle analysis determines the condition of equipment based on the concentration of wear particles in the lubricant.

For example, consider a gear case that is showing signs of abnormal wear (e.g., noise or overheating). An oil sample could be checked for wear particles. Considering the types and condition of particles found, it is possible to isolate a number of possible problems and their causes (e.g., operating the equipment beyond design speed or capacity or filter failure). Once the problem has been identified, the appropriate maintenance action can be scheduled, again with minimum impact on operations or the facility.

Some unique applications will involve the analysis of a lubricant itself or the wear particles in the lubricant. For example, wear particles can show when there is insufficient lubrication. "Insufficient lubrication" does not necessarily mean the absence of a lubricant in a system. The lubrication system on an enclosed drive, for example, could have a clogged spray nozzle, preventing proper lubrication from reaching a hard-to-inspect area. While the visible part of the drive may be getting proper lubrication, the other area that is lacking lubrication would produce wear particles that indicate that condition. The samples can also indicate conditions such as additive failure, lubricant contamination or excessive loading that exceeds the rating of the lubricant.

Thermography is used primarily to locate electrical components that are hotter than normal. Such a condition usually indicates wear or looseness. Thus, thermography allows technicians to perform maintenance on only the electrical components that need attention without requiring that all components get the same level of attention.

In utilities, for example, the correct torque is essential on electrical components to ensure that no heat is generated from a loose connection. Before thermography, it was necessary for each connection in a control panel to be checked manually for correct torque. By using thermography, only the connections that are hot receive attention. This reduces the staff necessary to perform preventive maintenance on the connections.

Other applications include the monitoring of outdoor wiring, such as overhead transmission lines, which wear due to environmental conditions. Thermography also serves to measure transformer temperatures to find problems indicated when certain areas are hotter than others. In addition, it supports maintenance in industries that have high-temperature processes. The technology helps pinpoint areas where refractory material is wearing and allows repairs prior to catastrophic failures.

Another less-used application for thermography is checking coupling alignment without major shutdowns of the equipment. As a misaligned coupling rotates, it generates heat. The greater the temperature difference, the greater the misalignment. By using thermography, maintenance personnel can observe the temperature rise across a coupling. Some companies have used this technique long enough to develop profiles on the temperature rise for each type of coupling. Using this profile, they can determine the amount of misalignment (not what plane it is in). Then, the technicians can proactively schedule the coupling for realignment.

Acoustic emission (AE)analysis is the class of phenomena whereby an elastic wave, in the ultrasonic range usually between 20 kilohertz and 1 megahertz, is generated by the rapid release of energy from the source within a material. The elastic wave propagates through the solid to the surface, where it can be recorded by one or more sensors. The sensor is a transducer that converts the mechanical wave into an electrical signal. In this way, information about the existence and location of possible sound sources is obtained. The basis for quantitative methods is a localization technique to extract the source coordinates of the AE events as accurately as possible.

AE analysis differs from ultrasonic testing, which actively probes the structure. AE analysis listens for emissions from active defects and is very sensitive to defect activity when a structure is loaded beyond its service load in a proof test.

AE analysis is a useful method for the investigation of local damage in materials. One of the advantages it has over other NDE techniques is the potential it has to be able to observe damaged processes during the entire load history without any disturbance to the specimen.

AE analysis is used successfully in a wide range of applications including: detecting and locating faults in pressure vessels or leakage in storage tanks or pipe systems, monitoring welding applications, corrosion processes, partial discharges from components subjected to high voltage and the removal of protective coatings. Areas where research and development of AE applications is currently being pursued, among others, are process monitoring and global or local long-term monitoring of civil-engineering structures (e.g., bridges, pipelines, off-shore platforms, etc.). Another area where numerous AE applications have been published is fiber-reinforced polymer-matrix composites, in particular glass-fiber-reinforced parts or structures (e.g., fan blades). AE systems also have the capability of detecting acoustic signals created by leaks.

The disadvantage of AE analysis is that commercial AE systems can only estimate qualitatively how much damage there is to the material and approximately how long the components will last. Therefore, other NDE methods are still needed to do more thorough examinations and provide quantitative results. Moreover, service environments are generally very noisy, and the AE signals are usually very weak. Thus, signal discrimination and noise reduction are very difficult, yet extremely important for successful AE applications.

3.2 Watchdog Agent

Currently, the prevalent condition-based maintenance (CBM) approach involves estimating a machine's current condition based upon the recognition of indications of failure. Recently, several predictive CBM techniques within this failure-centered paradigm have been proposed. These approaches notwithstanding, to implement the aforementioned predictive CBM techniques require expertise and a prior knowledge about the assessed machine or process because the corresponding failure modes must be known in order to assess the current machine’s or process’ performance. For this reason, the aforementioned CBM methods are application specific and non-robust.

The Center for Intelligent Maintenance Systems proposed a new CBM paradigm for performance assessment and prediction based on Watchdog Agent. This new approach is based on utilizing the performance-related information obtained from the signatures extracted from multiple sensor inputs through generic signal processing, feature extraction and sensor fusion techniques. Performance assessment in this case is made based on matching the signatures representing the most recent performance with those observed during the normal system behavior. A close match between these signatures would indicate good performance, while a greater disparity between them would indicate performance degradation and the need for maintenance.

Since no failure data is needed for this CBM technique to be operational, and since the nature of the employed methods is generic, the need for expert knowledge is greatly reduced. However, if failure data describing some failure mode is available, the most recent process signatures can also be matched against those failure-related signatures with the resulting match bearing significant diagnostic information.

Figure 3 illustrates this CBM technique centered on describing and quantifying the process degradation instead of process failure. Finally, historical behavior of process signatures can be utilized to predict their behavior and thus forecast the process performance. Based on the forecasted performance, proactive maintenance is possible through the prediction of process degradation and prevention of potential failure before it occurs. Thus, the Watchdog Agent is enabled to yield the information about when unacceptable system performance will occur, why the performance degradation occurred and what component in the system needs to be maintained. This information will ultimately lead to optimal maintenance policies and actions that will proactively prevent downtime.

This entire infrastructure of multi-sensor performance assessment and prediction could be even further enhanced if Watchdog Agents mounted on identical products operating under similar conditions could exchange information and thus assist each other in building a world model. Furthermore, this communication can be used to benchmark the performance of “brother-products” and thus rapidly and efficiently identify underperforming units before they cause any serious damage and losses. This paradigm of communication and benchmarking between identical products operating in similar conditions is referred to as the “peer-to-peer” (P2P) paradigm. Figure 8 illustrates the aforementioned Watchdog Agent functionalities supported by the P2P communication and benchmarking paradigm.

Figure 3: Performance assessment based on the overlap between signatures.

According to the standard for Open System Architecture for Condition-Based Maintenance (OSA-CBM), a typical CBM system consists of the following seven layers:

• Sensor module

• Signal processing

• Condition monitoring

• Health assessment

• Prognostics

• Decision-making support

• Presentation

The Watchdog functionality expands this standard topology to a multi-sensor level and realizes sensory processing, condition monitoring, health assessment and prognostics layers of the CBM scheme. The sensors and decision making layers within an Intelligent Maintenance System are realized outside the Watchdog Agent.


In today’s competitive market, production costs, lead time and optimal machine utilization are crucial issues for companies. Near-zero-downtime is the goal for a maintenance crew to maintain a company’s throughput and high productivity. Reactive maintenance, performed only when equipment fails, results in both high production costs and significant service downtime caused by equipment and process breakdowns. Preventive maintenance is intended to eliminate machine or process breakdowns and downtimes through maintenance operations scheduled regardless of the actual state of the machine or process. Therefore, in contemporary markets, it becomes increasingly important to predict and prevent failures based on the current and past behavior of the equipment, thus ensuring its maintenance only when needed and exactly when needed.

For these reasons, the shift from the traditional reactive maintenance and preventive maintenance to predictive maintenance should be the development direction of maintenance technology. Based on the condition-based monitoring technology, the traditional fail-and-fix practice can and eventually must be replaced by the new predict-and-prevent paradigm.

About the authors:

Hai Qiu and Jay Lee help direct the NSF Industry/University Cooperative Research Center on

Intelligent Maintenance Systems (IMS) at the University of Cincinnati. To learn more, visit www.imscenter.net.


Subscribe to Machinery Lubrication

About the Author
About the Author