Creating a maintenance plan is generally not difficult to do. But creating a comprehensive maintenance program that is effective poses some interesting challenges. It would be difficult to appreciate the subtleties of what makes a maintenance plan effective without understanding how the plan forms part of the total maintenance environment.
This article explains what makes the difference between an ordinary maintenance plan and a good, effective maintenance program.
Maintenance practitioners across industry use many maintenance terms to mean different things. So to level the playing field, it is necessary to explain the way in which a few of these terms have been utilized throughout this document to ensure common understanding by all who read it. It must be emphasized, however, that this is the author’s preferred interpretation of these terms, and should not necessarily be taken as gospel truth.
In sporting parlance, the maintenance policy defines the “rules of the game”, whereas the maintenance strategy defines the “game plan” for that game or season.
Maintenance policy – Highest-level document, typically applies to the entire site.
Maintenance strategy – Next level down, typically reviewed and updated every 1 to 2 years.
Maintenance program – Applies to an equipment system or work center, describes the total package of all maintenance requirements to care for that system.
Maintenance checklist – List of maintenance tasks (preventive or predictive) typically derived through some form of analysis, generated automatically as work orders at a predetermined frequency.
Short-term maintenance plan (sometimes called a “schedule of work”) – Selection of checklists and other ad-hoc work orders grouped together to be issued to a workshop team for completion during a defined maintenance period, typically spanning one week or one shift.
Figure 1 below describes the flow of maintenance information and how the various aspects fit together.
Figure 1 – Maintenance Information Loop
The large square block indicates the steps that take place within the computerized maintenance management system, or CMMS.
It is good practice to conduct some form of analysis to identify the appropriate maintenance tasks to care for your equipment. RCM2 is probably the most celebrated methodology, but there are many variations.
The analysis will result in a list of tasks that need to be sorted and grouped into sensible chunks, which each form the content of a checklist. Sometimes it may be necessary to do some smoothing and streamlining of these groups of tasks in an iterative manner.
The most obvious next step is to schedule the work orders generated by the system into a plan of work for the workshop teams.
Less common, however, is to use this checklist data to create a long-range plan of forecasted maintenance work. This maintenance plan serves two purposes:
The results can be used to determine future labour requirements, and
They feed into the production plan.
The schedule of planned jobs is issued to the workshop and the work is completed. Feedback from these work orders, together with details of any equipment failures, is captured in the CMMS for historical reporting purposes.
A logical response to this shop floor feedback is that the content of the checklists should be refined to improve the quality of the preventive maintenance, especially to prevent the recurrence of failures.
A common mistake however, is to jump straight from the work order feedback and immediately change the words on the checklists. When this happens, the integrity of the preventive maintenance programme is immediately compromised because the revised words on the checklist have no defendable scientific basis. This should be avoided wherever possible.
The far better approach to avoid this guessing game is to route all the checklist amendments through the same analysis as was used originally to create the initial checklists. This means that the integrity of the maintenance program is sustained over the long term. Implicit in this approach, however, is the need to have a robust system in which the content of the analysis can be captured and updated easily.
Finally, all the information that gets captured into the CMMS must be put to good use otherwise it is a waste of time. This is the value of management reports that can be created from maintenance information.
Without describing the complete RCM analytical process, it is instructive at this stage to point out a few details that are important to the content of such an analysis because of the way they can impact the overall maintenance plan.
Table 1 – Information captured in the RCM-style analysis
Equipment hierarchy down to component level
Root cause of failure
Analytical tool to select:
Failure effect category
Preventive/ corrective maintenance tasks (as appropriate)
The center column is what will be found in any typical RCM-style analysis.
In addition to that, there is value in constructing a hierarchy of the equipment system showing assemblies, subassemblies and individual components. This helps to keep track of which section of the system is being considered at any time, and the list of components also helps to identify the spare parts requirements for the system.
Of vital importance is the clear identification of the root cause of each failure, as this will affect the selection of a suitable maintenance task. To illustrate this point, consider for example, a seized gearbox. “Seized” is an effect. There could be several root causes of this failure mode that can be addressed in different ways through the maintenance plan. There is usually no value in aiming maintenance at the effect of a failure.
Also important from a planning perspective is to identify the time it will take to carry out each task independently. The sum total of these task times gives a good indication of how long the total work order will take.
All of the above depends on the production process and the site’s operating context, so these comments should be taken simply as a guideline.
The following are a few points to consider when constructing a preventive maintenance program:
Preventive maintenance tasks must:
Wherever possible, aim for predictive rather than preventive tasks
“Check and replace, if necessary” tasks destroy planned times
Frequencies and estimated times for each task must be accurate and meaningful
Try wherever possible to only plan shutdown time for “non-running” tasks. Keep “running” tasks to be done during periods of normal production. Structure the maintenance program to allow for this.
After analysing all the maintenance requirements for the equipment system, these individual tasks would be grouped together to create the checklists, based on common criteria for:
In order to smooth the PM workload, a robust approach is to base the spread of PM activities on the checklists arising from the RCM-style analysis. This assumes that the analysis has been conducted thoroughly and that it is in a format that can be amended easily.
The graph in Figure 2 below illustrates how it is possible to arrange the occurrence of the PM work orders in such a way to create the smoothest possible flow of regular preventive maintenance work, while still leaving enough time to carry out those “follow-on” corrective maintenance tasks that were identified from conducting the preventive/predictive checks during the last maintenance stop.
It is important to notice that just because two checklists may have the same frequency, it is not necessary to schedule them to be done at the same time. Sometimes, of course, it does make practical sense to schedule PMs for the same day, but don’t assume that this is always true. As a general rule, in an automated or continuous process production environment, the total amount of work on one checklist or work planned for one maintenance period should not exceed 80 percent of the total time available.
Figure 2 – Smoothing the PM workload
In order to achieve this smoothed workload pattern, it may be necessary to return to the timings, frequencies, groupings, start dates, etc., that were specified in the original analysis and rework some of the data. This is the iterative approach that was mentioned earlier in the description of Figure 1.
It is well-recognized in modern maintenance circles that there is great value in planning the maintenance workload at a macro level over a long-term horizon as well as at a detailed level over a short horizon. These two activities serve significantly different purposes.
Regular work orders are created automatically in Maximo every night from the work order templates in the PM Master table. These fresh work orders are generated typically 30 days prior to the Target Start date specified on the PM. Other work orders are also created manually by the system users, such as craftsmen and engineers.
All these work orders need to be prioritized according to the importance and urgency of the tasks, and they need to be planned into the weekly workload of the maintenance crews to ensure that a well-balanced selection of work is assigned to each crew without them becoming overloaded.
An example layout of the weekly maintenance work schedule is shown in Figure 3 below.
Figure 3 – Example of weekly maintenance work schedule
Most often, a CMMS will only produce report data in text or numerical format. Because engineers like to see things in a graphical or pictorial representation, however, it may be necessary to combine the use of the CMMS with another package that has graphics capability, such as a spreadsheet. The following descriptions rely on the ability of the CMMS to produce a “flat file” from a report, which can then be imported into a spreadsheet and manipulated further.
If possible, it would be preferable to retain all the raw data within the CMMS and simply produce all the graphs and reports from that environment. There are two obstacles to this approach, however:
Very few CMMS packages have graphical capability;
Very few CMMS packages will capture or provide the full spectrum of data that may be required to construct the desired selection of graphs.
The alternative solution, therefore, is to copy the required selection of data from the CMMS to the spreadsheet environment where it can be manipulated further.
Some sites enjoy the luxury of having regular, fixed maintenance windows built into the production plans. For example, it could be agreed that every Tuesday morning Production Unit 1 will stop production and the equipment will be made available to the maintenance crew for six hours. During this six-hour window, the maintenance crew has the opportunity to assign as many people as required to complete all the planned maintenance activities in that work center. Thereafter, the system is handed back to the production team until the next week.
In many cases however, there is no such regular routine in place. Opportunities for the maintenance teams to conduct planned maintenance need to be negotiated and agreed with the production teams on an “as-needed” basis. Unfortunately, this is very often reduced to the maintenance department begging for access to the equipment. Furthermore, this plea is often met with the unsympathetic response from the production teams that they have to run the equipment in order to meet their targets and they therefore cannot afford to release it for maintenance. This is a very short-sighted view in my opinion.
The generation of a long-range maintenance plan that shows the number of hours of preventive maintenance work to be done in each work center over an 18- to 24-month horizon is a valuable tool. It gives the production schedulers visibility of the amount of time that is required for this preventive maintenance so that they can proactively plan to release the equipment for those periods. This makes the job of planning the maintenance activities so much simpler.
The nature of the production environment at the author’s site makes it difficult to implement a regular, fixed pattern of maintenance windows as described above. For this reason a long-range maintenance plan is produced to give the production teams as much advance warning as possible of the anticipated maintenance requirements. This plan shows the forecasted maintenance hours for each operating unit, by craft type, in weekly chunks over a 24-month horizon.
Table 2 below illustrates what the structure of a long-range maintenance plan might look like. A flat file is created from the master data table in Maximo which contains details of all the maintenance tasks and checklists with their corresponding equipment details, duration, frequencies, crafts, next due dates, etc. This information is imported into a spreadsheet, which uses a series of filters and formulae to produce the long-range plan.
Table 2 – Example layout of long-range maintenance plan
Based on this report, the production planners make the necessary allowances in the production calendars so that the equipment will be made available for maintenance. This allowance is initially made at a macro level. The exact dates and times for maintenance will be agreed in the week or two before it is due.
This arrangement of the numbers can also be used to help smooth the workload across the weeks by adjusting the due dates of the maintenance tasks in the CMMS as described earlier.
The above explanations describe how to identify the anticipated number of maintenance hours in a production area. This next section covers the approach to verifying that there is sufficient manpower available to carry out all the work.
In order to ensure that each team on site has adequate craftsman resources available to cover all the work that will arise in their areas, a long-range workload vs. manpower forecast can be produced. This amounts to a graph that compares the hours of work to be done each month with the corresponding man-hours of labor available. A graph is constructed for each craft group within each workshop team, spanning the next 18- to 24-month horizon.
If the long-term prediction shows that the level of maintenance activity is about to increase beyond the level that can be accomplished with the existing resources, this advance warning will ensure that there will be sufficient time to recruit and train additional resources before the situation goes out of control. Similarly, a decrease in the predicted level of maintenance activity will give sufficient advance visibility of the opportunity to reassign craftsman resources to other teams or activities. This proactive approach will lead to improved manpower utilization and less panic.
Listed below are some of the categories of data that are used to construct the graphs:
Workload (i.e. everything that will occupy the craftspeople’s time)
Manpower (i.e. net man-hours available)
The sum of the workload hours for each month draws the workload line. The sum of the manpower hours draws the labor capacity line. Where the workload exceeds the labor capacity, the load must be smoothed, or additional resources may be required.
The preventive maintenance hours from the CMMS are obtained from the totals from the long-range maintenance plan described in the previous section. The allowances for breakdowns, corrective work, etc., are calculated as a rolling 12-month average of the demonstrated actual data from the CMMS. Data for other allowances may be sourced from elsewhere if not contained in the CMMS.
Manpower is basically the effective number of man-hours available for each craft in the crew.
Some example graphs are shown in Figure 4 below.
Figure 4 – Example workload vs. labor-capacity graphs
Where the manpower exceeds the workload, everything is in control. Where the workload exceeds the manpower, it will be necessary to reduce some of the non-essential activities at that time, or increase the people availability.
Feedback information returning from the shop floor, either by way of the planned work order responses, or from equipment failures will be captured in the CMMS. This information can be summarized on a report such as shown in Figure 5 below. The key recipients of these reports are the reliability engineers who look after each equipment system.
Ideally, the engineer should look at every work order that was raised in his area, but this is not always feasible, so a summary report such as this is useful. The reliability engineer must then decide on the appropriate course of action in response to each failure or observation.
Figure 5 – Example weekly failures report
The algorithm shown in Figure 6 below describes the thought process that should be going through the minds of the reliability professionals every time they review the failure work orders as shown on the summary report in Figure 5 above.
It must be remembered, however, that every time the “Amend Checklists” option is selected, this amendment should be routed through the original RCM analysis to ensure the integrity of the maintenance program is not violated. Amending the checklists without running through the method and structure of the original analysis is a mistake. Regardless of the approach that has been used to record the original analysis, it is worth it in the long run to force the reliability engineers to route every amendment through the analysis and record the results for future reference.
If a spreadsheet has been identified as the most appropriate option, then it should be structured in a robust and user-friendly fashion. If it is clumsy to update, it will fall into disrepair, and the integrity of the program is lost. A database system is a far better option for this purpose, if a suitable one is available.
Figure 6 – “What broke” algorithm
The purpose of maintenance measures should be to monitor the health of the maintenance organisation. Where everything is in control, the metrics will reflect the success that has been achieved. Conversely, they should also be used to highlight problem areas and irregularities in order to drive the desired behaviours or areas for improvement.
The graphs in Figure 7 below illustrate some of the benefits that have been realized on the author’s site as a result of having a well-functioning maintenance organisation. These graphs form just part of the regular reporting metrics by which the maintenance activities are managed.
The first graph shows the conformance to the weekly planned maintenance schedule. The target is set at 95 percent and is consistently being exceeding across all of the engineering teams.
Graphs 2 and 3 show how the number of failures has been decreasing month-on-month in one particular work center over the past 12 months, and correspondingly, the mean time between failures has been increasing over the same period.
The last 2 graphs show machine availability in two of the key work centers where a full re-analysis of all the maintenance requirements was recently conducted using an adapted RCM2 approach. It is clear to see how, in both cases, the equipment availability was far out of control and from the time the improvement activity was started, the availability stabilized and is now still tracking consistently above 90 percent. This has been the result of a few things:
one is improving the quality of the preventive maintenance routines, and
another is good maintenance planning
Figure 7 – Sample graphs showing the benefits of an effective maintenance program