The Strategy First: Technology in Service of Reliability

Gerardo Trujillo

Alerts Header

In recent years, I’ve noticed a maintenance gap that has been getting progressively wider. This gap is not just technological. It’s strategic.

Many teams are impressed by the number of tools available today that claim to be able to improve their reliability. Sensors, connectivity, dashboards, analytics, algorithms, systems that “recommend” actions and other advanced tools are becoming increasingly common. All of these should add value. But without a framework to guide their deployment and use, those investments often deliver the opposite of what they promise: they distract, create a false sense of security, fragment decisions, undermine experience and know-how, and many times increase costs or produce a negative ROI.

Let’s clarify where these tools belong in a reliability program. Reliability is the result of a well-designed maintenance strategy and the disciplined execution of that strategy to sustain performance over time. It is also not static; it changes as organizational objectives shift and the operating context changes with them.

Criticality must also be carefully considered. Many teams understand the concept, but few use it in a practical way to define which assets truly must not fail. As a result, they fail to design a strategy that achieves the required level of reliability. To succeed, the goal must be explicit: which assets must not fail, and why?

Critical assets are not always the assets people label as “important” or the most expensive ones. Truly critical assets are those whose failure creates unacceptable consequences: safety incidents, environmental impact, loss of operational continuity, reduced quality, higher energy use, compliance failures, increased total cost, reputational damage, regulatory exposure, and more.

That is the focus that should govern the organization’s strategy. Everything else—sensors, communications, software, analytics, and when appropriate, machine learning and AI—are simply tools that support that objective and help maximize ROI.

 

Turning the Criticality Matrix into Objectives and Strategy

A criticality matrix only works when it stops being a theoretical ranking exercise and becomes a practical set of operational decisions. The real deliverable is not an abstract “AAA/AA/A” grade, but something far more concrete: a defined reliability objective for each critical asset or system, based on the consequence of its failure. That objective must then be translated into a maintenance strategy that eliminates or controls the dominant failure modes. If the criticality matrix does not change priorities, tasks, frequencies, alarm criteria, and response rules, then it doesn’t define a strategy; it only documents an opinion.

“These assets or systems must not fail because of these consequences. Therefore, this is their reliability objective, and this is the strategy to achieve it.”

 

Designing the Strategy to Achieve the Target Reliability

An effective strategy is not simply a list of tasks. It is a set of activities designed to control or eliminate failure causes in critical assets.

To succeed, the strategy must be built around relevant failures and must identify their causes, symptoms, and consequences. Condition monitoring standards should be followed, and the selected methods should be those best suited to prevent the onset or progression of those failure modes and to deliver repeatable results. In practical terms, the strategy must answer these questions:

  1. What function must the asset perform, and at what point does loss of that function unacceptable because of its consequences?
  2. Which failure modes contribute most to the asset or system’s risk?
  3. What symptoms are present, and what is the actual P–F window under the asset’s typical operating conditions?
  4. Which task eliminates, controls, or detects that failure mode in time, and is that task technically and economically viable?
  5. In which cases is run-to-failure, and what is the justification in terms of consequence and cost?

This is not excessive standardization, nor is it theory. It is the minimum required for a maintenance strategy to achieve its reliability objective and deliver a return. ROI is maximized by optimizing everything else: doing what is necessary, with the right technology, at the right frequency, using the right judgment, and acting in time.

 

Technology Is a Lever, not a Substitute

Once the strategy is defined, then it is time to talk about which technologies should be used to implement it. And it is worth naming it precisely rather than grouping everything under the broad label of “AI.” The tools used to support the maintenance strategy vary based on the situation, and each one should serve a specific function within the decision and execution system.

Online sensors and devices capture signals from the asset and its environment, turning condition, operation, and process data into tangible evidence. Their true value lies not in simply “measuring,” but in anticipating the onset or progression of relevant failure modes and confirming when asset condition is changing.

Communications and OT–IT integration come next. They form the bridge between the shop floor and the systems where decisions are made and work is executed. That bridge ensures data arrives complete and on time, and it is where data quality is defined in practice: link reliability, acceptable latency, signal integrity and synchronization.

Data and work management platforms are where information is turned into action. Data historian systems preserve asset behavior over time so teams can analyze trends, deviations, and events. CMMS/EAM systems translate that information into work: plans, work orders, resources and costs, all of them traceable. Analytics and visualization then connect condition, process, and operation so the right information reaches the right person at the right time.

Analytical models are the rules that transform data into repeatable decision criteria. They may be simple or sophisticated, but their purpose is the same: define thresholds, detect abnormal trends, correlate variables, and generate alerts or priorities using logic aligned with the failure mode and its consequence.

Machine learning and AI should only be used when they add real value beyond rules and conventional statistics. For example, these tools are often effective at recognizing subtle patterns. That can make them useful for classifying behaviors, prioritizing events, reducing false positives, and supporting recommendations. But they must operate within a defined strategy. Used outside of that strategy, they only accelerate directionless decisions.

Technology provides three clear benefits when a strategy exists.

Visibility, because it replaces assumptions with evidence. It does more than detect change; it helps explain the context of that change: load, process, environment, operating regime. That visibility matters when it is tied to a specific failure mode and a real consequence.

Speed, because it reduces the time between a change in condition and the resulting decision. It helps detect problems earlier, correlates variables that used to be analyzed separately, and helps prioritize what requires attention. Speed is not about reacting faster out of anxiety; it is about acting decisively within the true P–F window.

Discipline, because it standardizes criteria by making them a consistent part of the process. It defines rules, creates traceability, forces documentation, and strengthens execution. That is how repeatable results are achieved.

Without strategy, technology delivers something far less useful: noise. It generates more signals, more alerts, more activity and the same level of risk. When that noise isn’t followed by clear objectives, failure-mode identification, and the right response, it simply accumulates. And accumulated noise without action eventually has consequences.

 

Governance is the Difference that Matters Most

Across different plants, I repeatedly see three distinct approaches to technology integration, along with a fourth that is less common today but has the potential to become a dangerous future scenario:

A) Technology without governance

Technology is implemented before reliability objectives are defined by system. Generic “catalog” configurations are applied, rules are copied, and execution proceeds because the organization wants to “be digital” or reduce dependence on people. There is activity and reporting, but not necessarily results.

B) Dependence on the expert

The strategy holds together because experts interpret context and correct the course. This can work well, but it is neither optimal nor sustainable, because results depend on key individuals rather than on a system that makes performance repeatable.

C) Intelligent integration (the desired state)

First, reliability objectives are defined by asset or system, and the dominant failure modes are identified. Then the strategy is designed, including tasks, frequencies, criteria, and response rules. After that, technology is used to accelerate and sustain execution. Here the algorithm supports; it does not command.

D) The system as authority (a dangerous future scenario)

No one can offer a data-supported reason for what is being done; the only answer is “the model recommended it.” At that point, governance is lost. And when operating conditions change, the system keeps running but stops protecting the real objective.

 

When Technology Aligns with Strategy

In a well-designed reliability plan, the objective must lead. Strategy then translates that objective into action, knowledge validates the action, technology enables efficient execution, and governance sustains the objective over time. Reliability is not something that can simply be purchased. It must be deliberately designed and consistently executed. Technology—sensors, communications, software, analytics, and when appropriate, AI—is the lever that accelerates and sustains that result. If the program cannot clearly answer what must not fail, what the reliability objective is, and what strategy protects it, then there is no strategy. There is only unfocused activity.