Predictive Maintenance in the Data Center

Shutdowns and maintenance outages are toxic to the efficiency of data centers. Operators are now using predictive maintenance to optimize their facilities.

When looking to the future, we have long since swapped the mystic’s crystal ball for the IT-based disciplines of analytics and business intelligence. These modern methodologies are typically used to mine historical data, draw lessons from the past, and thus understand the present. Today, they are being succeeded by data science, which aims to offer insights into the future that would challenge the fabled Oracle of Delphi. But while the Oracle only spoke on the seventh day of each month (outside of the annual winter shutdown), this modern mathematical and analytical functionality is available 24/7.

The simplest form of analytics is pure statistics, i.e., the collection and visualization of data for a chosen range of indicators. The information value can be significantly enhanced by adding more dimensions. This is a classic data warehouse-based technology known as online analytical processing (OLAP). For greater insight and relevance, we can use predictive analytics, where future system behavior is calculated using models derived from historical data and logical relationships. The accuracy of a prediction is, of course, entirely dependent on the quality of the data and model used. The three-day weather forecast, for instance, is relatively accurate because the model is sufficiently sophisticated and reliable. Longer-term forecasts, on the other hand, involve too many factors that are still considered incalculable.

The data center is another place where predictive analytics is increasingly important. The main applications in this case are predictive maintenance models that operators use to eliminate fault-related downtime and to optimize service intervals based on specific local requirements. The rate at which an air-conditioning filter becomes blocked with particles, for example, is dependent on the concentration of dust in the air passing through it. Data centers that use free-air cooling and are located in urban areas (or areas with high volcanic activity) have a clear disadvantage here compared with traditional sealed facilities. Rather than creating maintenance agreements with the same fixed intervals for the replacement or cleaning of filters regardless of operating environment, it is therefore more efficient to calculate specific service dates based on actual local conditions. As with filters, it is possible to automatically adjust the maintenance intervals for batteries, UPSs, pumps, generators, and many other data center components.

Outages by numbers
Outages by numbers

Since it is obviously unwise to wait until a filter is completely blocked and the device shuts down, operators should calculate the optimum service date in advance, taking account of the lead times required for planning and implementing the work. In addition to general indicators, such as running time (number of hours in operation), there are more sophisticated and reliable methods for calculating the condition of a component, which are based on measured data and comparisons. A blocked filter, for example, reduces the flow of air, which can easily be monitored using a customized algorithm that compares the rotational speed of the fans with the resulting air-flow rate. Past changes, meanwhile, provide the basis for future predictions as some systems are equipped with machine learning functionality and an ever-increasing number of influencing factors are incorporated into models.

“Past changes provide the basis for future predictions as some systems are equipped with machine learning functionality and an ever-increasing number of influencing factors are incorporated into models – and it is already available today.”
Oliver Lindner, Head of Business Line DCIM

With the processing power available today, most of these calculations are performed in real time. The major benefit of predictive maintenance is the ability to detect emerging technical issues before they cause a system to shut down. Compared with fixed-interval servicing, predictive maintenance offers a reduction in enforced downtime as well as shorter outage periods while system components are being serviced. It also reduces the cost of complying with agreed service levels, since operators only have to perform tasks that are actually required and need fewer replacement parts to do so.

In today’s data centers, predictive maintenance is used primarily in conjunction with the new generation of cooling systems. These systems have already proven their reliability and worth, so it is reasonable to assume that in the not too distant future predictive maintenance will be widely deployed as an off-the-shelf technology. And that is good news for the industry, because when data centers are maintained with this level of efficiency they can be used more effectively for really important tasks. Such as 14-day weather forecasts.