Reliability centered maintenance (RCM) is the ongoing, systematic process of matching critical systems with the most cost-effective maintenance strategy to maximize overall reliability.
There’s no such thing as a one-size-fits-all solution in maintenance management, and unfortunately, it’s because there are so many different ways to fail. In fact, each asset can have its special ways of failing, reasons behind those failures, consequences to the failures, and strategies for predicting and avoiding them.
RCM is all about finding what works and then getting it working for you.
Let’s start with some basic definitions.
What is reliability centered maintenance (RCM)?
Reliability centered maintenance is the process of finding the best possible maintenance strategy for every asset in your organization. The guiding principle is that different assets require different styles of maintenance management. Some demand continuous high-tech monitoring, while others are best left to the run-to-failure model. For a lot of your assets, your best bet is preventive maintenance.
Quick side note: run-to-failure often has a bad name in maintenance, but there are times when it’s your best choice. The classic example is light bulbs, which almost always have the lowest level of criticality. They are cheap to buy and carry in inventory. When they fail, there’s little to no safety risk and you’re not running the risk of lowered productivity. And even the most inexperienced tech can replace them.
The process of finding the best strategy begins with looking at your history of breakdowns and the steps you’ve been taking to maintain and repair your assets. From there, you choose the best maintenance strategy. The end goal of reliability centered maintenance is achieving consistently high levels of reliability at the lowest possible costs. The non-technical expression is “getting the most bang for your buck.”
What are the reliability centered maintenance principles?
Reliability centered maintenance started in the aviation industry, which is unsurprising given the numerous parts and components that comprise aviation equipment, their heavy use, and the risks and potentially catastrophic consequences of aviation equipment failures. Over time, organizations across industries have implemented the minimum criteria set out for RCM methods in technical standard SAE JA1011 — Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes.
What are the reliability centred maintenance principles? They’re actually a set of seven questions.
What is the asset or equipment suppose to do, and what are the associated performance standards?
Here, you’re trying to identify the system or equipment maintenance functions. In other words, you need to know how the equipment performs and its ability to meet company needs within the parameters of environmental safety and government standards. You can find this information in manufacturer documentation. You want to know the scope of the functions as well as their limitations and methods of use relating to safety and environmental measures.
For example, an industrial scale may have a weight limit. As soon as you exceed it, that scale starts becoming inaccurate or stops functioning. The documentation also explains how to use the asset to ensure both safety and accuracy. There could be instructions on how to place or handle the items you want to weigh and where to keep the scale.
What does this asset do, how much of that is it doing currently, and how much of that would I like it to be doing?
For example, you have a conveyor belt that moves boxes. Currently, it’s moving 5000 boxes between breakdowns, and each of those breakdowns lasts about three hours. Based on a combination of what the belt’s manufacturer says, what your maintenance team says, and data in your CMMS software, you think you can get that number up to 7000 boxes between breakdowns. You can also reduce each breakdown from three to two hours.
In what ways can equipment fail to provide the required functions?
Simply put, this means being able to identify failure modes in a piece of equipment. In other words, it involves determining the nature of the equipment failure. For example, does the failure relate to one part or is it a systemic failure? The key is to identify exactly how a piece of equipment has failed, how often, and if it involves the same equipment part. In companies with several pieces of the same types of equipment, it is important to determine if a particular failure is occurring systematically on all pieces or if the failure is limited to only one piece.
What are the events that cause each failure?
Closely related to finding equipment failure modes, you also need to identify the causes of the failures. It’s important to determine why, when, and how equipment failures most typically happen. This is particularly true of heavy-use equipment, which could suffer from operating fatigue. Also, you need to know when equipment is most likely to fail and the nature of the failure.
For example, you might run a water pump continuously, and at some point, the equipment starts to fatigue from the constant use. Another common type of equipment stress leading to failure is exposure to harsh environmental conditions such as heat, cold, or moisture. There is also human error as well as inherent design or manufacturing flaws that cause equipment failure. Finding out the cause of the failure is important to understanding how to prevent or minimize it.
What happens when each failure occurs?
To improve your operations, you need to do more than just identify equipment failures. You also need to know their effects, which can range from nearly undetectable to complete losses of function. For example, a failing piece of equipment might lead to a decrease in output speed or quality. Or, it might smoke, stutter, and seize. In the end, all forms of equipment failure impact productivity, operations, and capital costs. They also lead to unplanned disruptions in production and expensive repairs you wish you had avoided.
In what way does each failure matter?
Here you’re looking for failure fallout. Apart from the financial and logistic consequences of equipment failure, you need to think about safety risks for operators as well as possible environmental impacts.
You also need to consider how a failure effects the integrity and condition of an asset overall.
What systematic task can I do proactively to prevent or diminish the consequence of the failure?
The answer to this question is hiding inside the asset’s maintenance and repair history. By looking at who did what and when they did it, you can start to see breakdown patterns. Once you have the pattern, you can start to slot in proactive preventive measures between breakdowns. For example, the conveyor belt generally runs fine for about 5000 boxes before requiring some sort of repairs. If you add visual inspections after every 4500 boxes, you have a good chance of stretching out your uptime.
But be careful; the wording of this question can be a bit misleading. It’s about what you can do, but you also have to consider what you should do. There are situations when you should take steps to avoid breakdowns. But there are also situations where it’s going to be better to simply continue to use the run-to-failure maintenance strategy. When the cost and trouble of avoiding breakdown are more than the value of the increased uptime, it makes more sense just to let things run until they fail. Back to the classic example, think light bulbs.
What should I do if I can’t find a suitable preventive task?
Here we’re dealing with a very specific situation: the best maintenance strategy is not run-to-failure, but at the same time we can’t find a good proactive preventive maintenance plan to apply. Imagine you have an old A/C unit in your machine shop. In fact, it’s so old that you can’t source parts for it anymore. And it runs on a coolant that used to be common but is now in the process of being phased out through legislation. You can’t maintain it by refilling the coolant and you can’t repair it by switching in new parts.
Because you can’t set up a maintenance strategy, all you can do is have a plan in place for when the A/C inevitably dies. That might mean having money already set aside in the budget to buy a replacement. It might mean borrowing a unit from another department’s inventory. There’s no perfect answer, but you want a solution that can be implemented quickly, with the least amount of disruption.
As you work through the seven questions, you find the best possible maintenance strategy for each asset. It’s important to remember that answers can change over time. Any given asset can shift in criticality, and the costs associated with different maintenance strategies can increase or drop due to many factors, both internal and external.
What are the differences between risk based maintenance and reliability centered maintenance?
Now that we’ve firmly established a definition of reliability centered maintenance, we can quickly clear up any confusion between it and risk based maintenance management.
With RCM, we’re choosing the best maintenance strategy for each asset. So, with light bulbs, it’s run-to-failure, but for a forklift, it’s more likely preventive maintenance.
With risk based maintenance, we start with some unavoidable truths about maintenance:
- There are always more things to do than time to do them.
- We have more work than workers.
- No matter how generous, the maintenance budget has limits.
And that means we have to prioritize which assets get our time and attention. We can’t do it all. Risk based maintenance is a process of deciding how we use our limited resources by:
- Establishing criticality for each asset
- Developing a risk-based maintenance program
- Planning maintenance based on risk reduction
- Allocating parts and repairs based on risk
Assets that carry a higher consequence of failure (CoF) get more attention. Assets with lower criticality get less. There’s a spectrum between the highest and lowest, and each has a corresponding level of maintenance.
Reliability centered maintenance implementation
Organizations need to begin by looking at their assets in terms of criticality. Basically, you should ask, “How bad is it if this asset fails?” Then start to look at other factors, such as costs for maintenance and labor, risk of injury, environmental damage, lost productivity, and compliance-related fines. Once you’ve determined criticality, rank your assets from most to least critical.
Then, starting from the top of your list, use the seven RCM questions on each asset. Based on the answers, you can determine the best maintenance strategy for each asset.
Crucially, RCM is an on-going process. Organizations need to periodically revisit earlier decisions, ensuring that their maintenance strategies change as business goals, asset criticality, and failure histories evolve. For example, the best maintenance strategy for an asset early in its useful life is different from the one that’s the best fit 15 years later. And even though predictive maintenance did not make economic sense for an asset five years ago, it might be the best choice after the price of sensors has dropped.
Risk-free RCM implementation
But it’s not just parts and processes. Successful implementation also depends on people, all the way from your maintenance teams to stakeholders across departments, up and down the corporate ladder.
In a recent Hippo webinar, Rolling the Dice with Reliability, Michelle Ledet Henley of The Manufacturing Game explains that often times people are the real challenge, and that it’s always easier to set up the technical systems than it is to convince people to use them.
Part of the reason is that every new change brings its own set of risks, and people are generally risk averse. The standard isn’t “If it ain’t broke, don’t fix it.” Instead, many default to “Even if it is broke, let’s not do anything that could make it worse.”
So, one way to ease people into new programs is to remove the risk altogether. Henley’s Manufacturing Game, a series of seminars with various exercises and roleplays, gives everyone a chance to test drive the new processes risk free. Once the stakes are lowered, it’s much easier for everyone to jump on board.
Gamifying these early steps also helps people remember them. Unlike a lecture-style lesson, with people passively listening, and most likely mostly forgetting, the information, Henley says it’s better to facilitate discussions and experiences. The more hands-on, the better. Which makes a lot of sense, especially for your maintenance team, who learned their trades by doing, not watching.
The more you know about an asset’s maintenance and repairs history, the easier it is to use RCM to determine the best maintenance strategy.
Modern CMMS software makes capturing, safeguarding, and leveraging asset data easier, helping you ensure you have answers you can trust. With paper and spreadsheets, where mistakes can quickly creep in, you can never really trust your data.
If you’re ready to make the jump to modern maintenance management, it’s time to reach out to CMMS providers and get the conversation started.
Executive summary on reliability centered maintenance
Reliability centered maintenance is a process for finding the best maintenance strategy for each of your assets. For example, it makes more sense to use run-to-failure for light bulbs. But for a forklift, you would want to use preventive maintenance. RCM is different than risk based maintenance management, which helps you allocate resources based on criticality. For reliability centered maintenance, you also need to look closely at the types of failures, ways to prevent them, and the asset’s maintenance and repair histories.
RCM is an ongoing process, and you should periodically re-evaluate your strategy for each asset.