What is Risk-based Maintenance?

Posted: 10/12/2021

An industrial boiler in a building interior, managed using risk-based maintenance

The overarching goal of maintenance management is to minimize unexpected equipment failures in a cost-effective manner. Risk-based maintenance (RbM) is an approach that uses an asset’s risk of failure to allocate maintenance resources. This article provides an overview of risk-based maintenance.

Risk-based maintenance (RbM) is a maintenance methodology that uses risk assessment principles to optimize maintenance tasks and the allocation of resources. It involves systematically identifying an asset’s criticality, failure modes, and risk of failure to create a maintenance plan that minimizes the risk of failure.

Using a risk-based maintenance approach, maintenance efforts are redirected from assets with the lowest risk of failure to those assets with the greatest failure risk. High-risk assets vary by industry and organization. Examples include:

Major systems (e.g., electrical, plumbing, HVAC, etc.) in buildings and facilities
Vital production lines in manufacturing plants
Heavy equipment used in infrastructure construction and maintenance
Fleet vehicles used to transport goods

Why Risk-based Maintenance is Important

Unplanned downtime is a major cost for businesses. In fact, studies have found that unplanned downtime events cost as much as $250,000 per hour and reduce productivity by as much as 20%!

To minimize such losses, maintenance teams employ a variety of maintenance strategies and techniques, but are pressured to do so at a low cost. This prompts organizations to evolve their maintenance strategies beyond traditional corrective maintenance (CM), time-based maintenance (TbM), and preventive maintenance (PM).

Risk-based maintenance provides organizations with a systematic way to determine the type and frequency of maintenance each asset receives. Instead of wasting time and energy maintaining equipment that doesn’t need it (which can actually do more harm than good), organizations can allocate sufficient maintenance resources to assets whose failures have the most impact on the organization.

Other Ways to Improve Reliability

There are many ways that organizations can improve reliability while also lowering maintenance costs. Risk-based maintenance is just one such approach. Others include:

Is Risk-based Maintenance Right for Me?

A risk-based maintenance approach might be a fit for your organization if it:

Relies on highly expensive equipment that is difficult to replace
Manages a tight maintenance budget or limited maintenance resources
Owns remote assets that make regular maintenance difficult due to travel requirements
Runs mission critical equipment where there are no viable alternatives or substitutes
Wants to improve return on investment (ROI) by optimizing its current maintenance plan

How to Implement Risk-Based Maintenance

Conducting a risk-based maintenance assessment is a systematic process, meaning that there is a generally accepted sequence of steps to follow. The two main parts of this process are: 1) performing a criticality analysis, and 2) performing a risk assessment.

Before we start, it is important to note that implementing risk-based maintenance is a technical process that involves getting input from a cross-functional team including operations, maintenance, engineering, safety, and others.

The following steps outline a simplified version of a risk-based maintenance methodology. Readers seeking a more robust, thorough explanation should refer to the ISO 31000 standard on risk management or the United States Department of Defense standard MIL-STD-1629A.

Step 1: Gather Maintenance Data

In order to implement risk-based maintenance, you need to gather asset data such as:

Identity (i.e., asset name and number)
Acquisition cost
Age
Mean Time Between Failure (MTBF) history
Mean Time to Repair(MTTR) history
Cost of unplanned downtime
Frequency of maintenance

This information is readily available in most computerized maintenance management system (CMMS) software. If you are not familiar with what a CMMS is, read our What is a CMMS? article or view our handy infographic.

Step 2: Determine Asset Criticality

Industrial boiler in a facility determined to be critical equipment under risk-based maintenance

Risk-based maintenance prioritizes maintenance work to critical assets. Criticality is a measure of an asset’s importance to the organization. Critical assets generally affect an organization broadly or represent a single point of critical failure. For example, a boiler is critical to the operations of a facility.

Organizations use a criticality analysis to evaluate the severity asset failure has on the organization. A common tool for performing a criticality analysis is a criticality matrix, like the one shown below. Failure events are ranked on the matrix by severity in multiple categories such as safety, production, and cost.

Since there are multiple ways an asset can fail, each carrying a different amount of risk, you need to set a baseline. Choose one failure event that you deem plausible to occur and that has the most severe consequences. Use the matrix to rate the failure on each category.

Matrix ranking severity of failure for 5 different categories on a 1 – 5 scale for risk-based maintenance

(Click to enlarge)

These ratings are used to generate an asset criticality rating (ACR). The ACR can be calculated by multiplying the ranking in each category together, adding the scores together, or simply taking the highest score in any category. For example, suppose the severity of a failure is rated as follows:

Safety = 2
Environmental = 1
Production = 3
Equipment = 1
Cost = 1

Therefore, the ACR would be:

6, if multiplying (2 x 1 x 3 x 1 x 1 = 6)
8, if adding (2 + 1 + 3 + 1 + 1 = 8)
3, if taking the highest categorical rating (Production, in this example)

Whichever way you choose, a higher score means the asset is more critical, relative to the others you’ve analyzed. Record this score, as it will be used in a later step.

Step 3: Determine the Likelihood of Failure

Once criticality is known, you determine the probability of failure. As with criticality, rate the likelihood of failure on a scale of 1 to 5 (or use a larger scale if you prefer). The example below uses a 5-point scale, where:

1 = Very unlikely to fail (Expected to fail less than once every 2 years, on average)
2 = Unlikely to fail (Expected to fail less than once per year, on average)
3 = Occasional failure (Expected to fail 1 – 2 times per year, on average)
4 = Likely to fail (Expected to fail more than twice per year, on average)
5 = Fails frequently (Expected to fail frequently)

Record this score.

Step 4: Calculate the Risk Priority Number

A risk priority number (RPN) is a numerical value that quantifies an asset’s risk of failure. It is calculated by multiplying the asset criticality rating by the probability of failure rating. More advanced calculations also factor in a detection rating, which quantifies the likelihood of detecting an imminent failure before it occurs. For our purposes, we will ignore detection.

The table below shows the calculated risk priority number for 3 different assets.

Chart showing the calculated risk priority number for multiple asset failures.

In this example, the severity of failure was determined by using the highest severity rating score in any given category from the criticality matrix shown earlier. The probability of failure follows the 5-point scale from the previous step.

Step 5: Analyze Your Findings

Based on the RPN calculations above, we can draw the following conclusions:

Asset 1 is the most likely to fail, but the consequences are relatively minor. It is possible that reliability issues are due to aging equipment or inadequate preventive maintenance, but further investigation is needed.
Based on its RPN, Asset 2 carries the least amount of risk. However, while the probability is low, the severity is high. In this case, the failure is worth preventing.
Asset 3 carries the most risk, according to its RPN. Failure happens on a regular basis and leads to relatively severe consequences. You should prioritize this asset.

Step 6: Prioritize Asset Failures

Risk priority numbers make it easy to compare the risks that failures pose, relative to one another – but which failures require action? One tool that can be used is a risk matrix, like the one shown below.

Risk matrix comparing a failure’s severity and probability.

This grid shows all possible risk priority number scores using a 5-point scale, color-coded by risk level. Find where your severity and probability ratings intersect to determine the RPN and observe the color code.

In this grid, scores in green represent assets with the lowest amount of risk and of low priority. Yellow and orange scores represent assets that are low to medium risk and medium to high risk, respectively. A score in red indicates that asset failure carries high risk and should be a priority for the maintenance team to address.

While the matrix is a useful decision-making tool, it should not replace other evaluation. Recall the RPN for Asset 2 from earlier. Asset 2 has an RPN rating of 5 which, according to the matrix, makes it low priority. However, its severity is rated as a 5. Even if you do not expect this failure to occur, it is still worth trying to prevent, especially if it may cause fatal injuries, destroy the equipment, or create an environmental crisis.

Step 7: Create a Risk Mitigation Plan

Now that you’ve identified which asset failures pose the biggest threat to the organization, it’s time to create a maintenance plan to prevent future failures. The most common maintenance techniques are:

When deciding which technique to use, consider the following questions:

What maintenance resources do I currently have?
What are the manufacturer’s maintenance recommendations?
How old is the asset and what is its life expectancy?
What is the cost to replace the asset?
Is it cost-effective to prevent the failure?
What is the risk of not preventing this failure?
What other changes do I need to make to support this strategy?

Step 8: Continuously Improve

Optimizing your maintenance program following a risk-based approach is not a one-time event. You should update criticality and risk ratings as your key asset management metrics improve. After you address assets with the highest risk, you’ll be able to turn your focus to new assets and repeat the process over and over again. This process can also be used to prioritize specific failure events within the same asset group.

Risk-based Maintenance and CMMS

CMMS software enables you to easily collect, store, and track maintenance data required to perform a risk-based maintenance assessment.

In terms of criticality, a CMMS provides access to asset service history, work orders, and historical performance data which help you select assets for analysis. Downtime tracking and maintenance reports provide further insight into the effects of failure. CMMS software also provides useful asset data including:

The number of failures an asset has experienced
The amount of unplanned downtime attributed to an asset
Key performance indicators such as Mean Time to Repair (MTTR), preventive maintenance effectiveness, and work order performance
Asset maintenance costs

In terms of risk, or the probability of failure, a CMMS provides useful information including historical asset data and measures of reliability such as Mean Time Between Failure (MTBF) calculations.

Lower the Impact of Asset Failure with FTMaintenance Select

Risk-based maintenance provides a structured process for allocating maintenance resources to the failures that most threaten your organization. FTMaintenance Select provides a powerful platform for planning, managing, and tracking your maintenance program, enabling you to make data-driven decisions that lower the risk and reduce the consequences of asset failure. Request a demo today to learn more about how to get started with FTMaintenance Select.

What is Risk-based Maintenance?