Why Data Is the Foundation for Reliability

Khalid A. Al-Jabr Qadeer Ahmed Dahham Al-Anazi, Saudi Aramco
Tags: maintenance and reliability, IIoT

In today’s technological era, data is key to decision-making. This field of specialty is known as “data science.” Companies can take advantage of technology by collecting, analyzing and utilizing data to make informed decisions. One research group predicts that with the current rate of growth in data, by 2025 the size of data will be 163 zettabytes. To better comprehend this number, consider that one zettabyte equals one trillion gigabytes. This raises questions about data storage, quality and management.

This article will discuss the importance of data and its usage in performing meaningful reliability studies. The common definition of reliability is the probability that a piece of equipment, system or facility will operate without failure for a given period of time under specific operating conditions. Therefore, accurate, historical failure data and its proper analysis is critical for any reliability analysis. Data analytics provides an opportunity to examine huge amounts of data and extract useful information that can then support better decision-making. This is only possible if one has reasonable confidence in the data, because poor data can lead to poor decisions.

Benefits of Data Analysis

Reliability analysis is an effective way to help management and engineers make technical and financial decisions. Among other things, data analysis assists in optimizing project designs, lowering costs, predicting component lifespans, investigating failures, evaluating warranty intervals, implementing effective inspection periods and determining key performance indicators (KPIs). Accurate data is vital to performing a comprehensive reliability study.

Data filtration and collection are important responsibilities of any reliability engineer. Data collection is the method of gathering and evaluating information on variables of interest to establish a systematic model to answer specified research questions, evaluate hypotheses, and estimate and support outcomes. Thus, data collection is the common phase for all research. Ensuring accurate and honest data collection is the common factor and the same objective for these studies.

Many tools and techniques are available to process data in ways that make it more accurate and reliable, such as to eliminate outliers that can skew the overall results of reliability analysis.

Establishing Robust Data

In any operating facility, accurate and reliable data, which includes asset maintenance and failure records, operating windows, etc., can provide the foundation for reliability engineering studies. Unfortunately, not all companies have the systems, processes and culture required for data collection and management.

One requirement for establishing a robust database is to ensure that all meaningful data points are collected and stored. A database that collects only some important data may offer an incomplete and perhaps even misleading picture of current operations and asset conditions. Utilizing validated tools, which are methods for collecting the assessed and reliable data, can be a useful practice. For example, a large company in Finland reported that approximately one of every six closed maintenance reports (17.2 percent) didn’t include a failure mode. Also, none of the closed maintenance reports recorded the number and type of spare parts. These observations suggest that this particular company has a limited database that would offer only a narrow perspective on equipment failure and maintenance history, with critical information missing, such as the location of failures and their impact.

An additional requirement for effective data analytics is the timely reporting of data. Maintenance departments that report their findings weekly or even monthly are more likely to lose critical data and activities than those organizations that deploy a dynamic system which consolidates data continuously.

Another best practice is to ensure that the data collection and storage system defines what is considered high-quality data instances and values, with as much automation as possible, to promote consistency in reporting and database search-ability. A maintenance reporting system that is dependent on open text fields essentially converts data analytics into a manual process. While open text fields have a place in any well-designed database, they should be used to provide more detail and clarification.

Instead, the data collection and storage system should have separate cells for each meaningful data point, utilizing as many drop-down menus as possible to ensure consistency of description and reporting. Reliability engineers will only be able to perform extensive reliability studies when the data is searchable and consistently described throughout the system.

Defining the types of reports and analysis required from a database will determine the data fields to include. Thus, the first step in getting high-quality data is defining the question to be answered and ensuring that the data collected is appropriate for that purpose. For reliability studies, database system fields should collect maintenance information on spare parts, failure modes, man-hours, major inspection findings, damaged components and routine activities. Moreover, controlling the consistency of the reporting in these fields through comprehensive drop-down menus will enable software applications to perform key functions, such as calculating mean time between failures (MTBF), availability and other reliability KPIs.

Data Quality Factors  

Tools and Technology

A myriad of tools is available to meet data quality objectives, including tools for reducing duplication, integrating and migrating data across and between platforms, and performing data analytics. Data analysis tools enable the user to extract meaning from data, such as combining and categorizing data to reveal trends and patterns. Many technologies are now mobile enabled. These technologies can minimize human and system errors in data collection. The adoption of these new tools and technologies can help improve data quality.

People and Processes

Each employee at every level of company operations, from the maintenance crew to the engineer and management, must share a common understanding of the role of data in the company. This includes what data will be collected, how often and for what purposes the data will be used. Along with training, clear processes must be established to ensure reliable and consistent data collection and storage.

Organizational Culture

Management support and company culture play a vital role in data quality. KPIs reported to management should monitor data quality. If an organization would like to launch a new project or initiative for improving performance, increasing the number of opportunities or addressing significant issues, it often must make changes, including changes to processes, job roles, organizational structures and types, and the use of technology. Procedures and work processes must be updated and aligned with best practices. Continuous improvement will be the driver for success. The quality and quantity of data will be key for this driver. Through continuous training, the importance of the data can be developed among personnel, which will help to enhance the organizational culture.

Data’s Impact on Reliability

For an illustration of the importance of data quality, consider the following case study. A facility launched a project to increase oil production by installing a new gas-oil separation package (GOSP) with crude stabilization units. The GOSPs would be composed of separation traps, wet-crude handling facilities, a water oil separator, gas compression facilities, a flare system, transfer/shipping pumps and stabilization facilities.

A reliability, availability and maintainability (RAM) study was conducted to predict the production availability of the facilities and compare it against the target availability. The study would also be used to identify areas that limit production throughput, recommend measures to attain the availability needed to deliver the production business targets, confirm the operating and maintenance philosophies adopted to meet the total system availability, and define remedial actions or potential design changes.

The raw maintenance data is summarized in Table 1. It is based on interviews with maintenance crews from existing operating facilities. The data collected for the study had problems in a number of areas, beginning with the false comparison of maintenance data from old assets to determine operating envelopes for the new facility.

For example, Table 1 suggests that every 10 months a compressor will be out of service for 30 days because of mechanical seal issues. This estimate assumes that a compressor will spend 10 percent of its lifetime undergoing maintenance due to mechanical seal issues. This assumption is incorrect since the facility will adopt new technologies. In addition, many lessons learned from older facilities will be reflected in the new design.  

Another incorrect assumption extracted from the data is the impact of corrosion. The raw data appears to suggest that the compressor is kept under maintenance for 30 days every four years (48 months) as a result of shaft pitting. The use of upgraded material in the compressor shaft will eliminate these types of problems.

Table 1 further indicates that the mean time to repair (MTTR) due to vibration is 60 days. Compare this assumption with the average expected MTTR for new compressors of only four days due to improved spare part management.

As this example illustrates, the assumptions extracted from data that may be accurate for aging facilities with old equipment are not accurate when applied to new facilities designed with upgraded materials and more efficient technologies.

Table 1. Raw field data collected for a reliability, availability and maintainability study

Table 2 summarizes the same data set corrected by reliability engineers. Accessing the same data provided to the third-party vendor, the engineers filtered the raw data to eliminate all the maintenance issues that could be automatically corrected by process instrumentation. The data was then categorized by maintenance and operational management strategies to identify issues related to design flaws, such as bottlenecks, limited capacity and availability.

The corrected data can be applied to the new facility and be used to make decisions for design optimization. For example, the failure modes for gas compressors now show an MTBF of eight years due to dry seals and an MTTR of three days. Also, the corrosion assumptions for shaft compressors were eliminated by upgraded materials in the new facility design.

Table 2. Filtered data collected for the reliability, availability and maintainability study

The availability and capacity for both designs are represented in Figure 1. This illustrates the difference in outcomes between the two models based on the data sets provided as well as the difference in the availability and capacity results. The original data put the availability of the new facility at 77.34 percent due to a long MTTR and short MTBF, while the corrected data set calculated the overall availability at 99 percent, which represents the actual situation.

On the same project, a similar practice was done for the other equipment. The project management team (PMT) was told to eliminate spare equipment due to high availability. The results were used to optimize the design configuration for full system utilization. As this case study illustrates, employing corrected data can have a huge impact on the capital and construction costs of new projects by eliminating unnecessary equipment and expediting the completion time for the project and cost avoidance.

Figure 1. Results of the reliability, availability and maintainability (RAM) study                       

Figure 2. The relationship between input data, design and simulator results

Meaningful results for any reliability software or simulator depend on the quality of the input data and design. As the saying goes, “garbage in, garbage out.” Figure 2 shows the relationship between design and input data with the RAM simulation outcomes. Once the RAM model is built based on input data, potential optimization can be introduced. The data is the key element in the model and other reliability performance measurements.

The same is true for focused reliability studies. Reliability engineers spend much of their time analyzing data in operations. For example, engineers may conduct a reliability study on specific bad-actor items, which are defined as a component, equipment or system with high maintenance costs and high failure rates. The results of this assessment are used to focus limited resources on high-impact items with the greatest benefit to field operations in terms of maintenance costs and availability. If engineers have unrepresentative data or not enough data, all the results and recommendations will not address the real problems. This represents a lost opportunity to add value to maintenance planning, spare part management, maintenance budgeting and technical challenges. Thus, quality data requires efficient data collection systems that clearly identify the types and quantity of data needed to support the decisions the organization must make.

3 Key Steps to Improve Data Quality

1. Deploy the Right Database Platform

The solution selected for the organization shouldn’t close any maintenance notifications or work orders until all required fields are completed. In other words, the selected platform should disable shortcuts to ensure consistency of the data collected.

2. Integrate Existing Functions in a Comprehensive Solution

The platform should incorporate all reliability functions into one solution to better integrate data and reduce the number of systems deployed in an organization. For example, if any spare parts were taken from the storehouse, they should be charged against a specific notification. This would require a platform that assimilates spare part management with maintenance activities.  

3. Implement a Data Quality Assurance Program

Quality assurance activities for the deployed solution should include a periodic audit of data quality across the organization. For example, the quality assurance team could randomly audit 5 percent of the maintenance notifications and work orders for each operating facility to evaluate the quality of the collected data. The outcomes of this assessment could then be used to further improve the utilization of the solution and ensure an effective database.

Data Is the Cornerstone

Complete asset maintenance and repair-history data must be collected, stored and analyzed correctly. Front-line employees, including maintenance crews and operations personnel involved in data collection, must also understand the importance of their role in data quality. Remember, data is the cornerstone for decision-making in any company, and data quality is at the heart of all reliability studies. If you have high-quality data, you can utilize it confidently for effective advocacy, meaningful research, strategic planning and management delivery.  

About the Authors

Khalid A. Al-Jabr is a reliability engineering specialist for Saudi Aramco who has more than 18 years of industrial experience with a focus on equipment reliability and challenges. He holds a Ph.D., is a chartered engineer and is certified as a professional in engineering management and data analyzing.

Qadeer Ahmed works as a consulting reliability engineer for Saudi Aramco and has 18 years of experience in reliability engineering. A chartered engineer, he holds a Ph.D. and is a Certified Maintenance & Reliability Professional (CMRP) and a Six Sigma Black Belt.

Dahham Al-Anazi is a reliability engineering leader for Saudi Aramco’s consulting services department. He has more than 25 years of technical experience and holds a doctorate in mechanical engineering.