The Story of the Data Center with the 1.2 Million Dollar Screwdriver

Shouldn’t data center technicians be our heroes? While they are responsible for maintaining the technical environment, they also improve uptime and ensure availability – and in fact make IT services for every important business process possible. But just as every human does from time to time, they make small mistakes which can have a serious impact.

This is exactly what happened in Zurich…

It was a typical day for one of two data centers responsible for IT services coordinating the whole railway service of Zurich’s main station.  That was, until a network technician walked through the aisles to fulfill his next work order – and accidently pulled out a network cable with his screwdriver. All regional trains stopped immediately and the systems in the coordination center crashed down. Unfortunately the second redundant data center – which would have been used in this blackout scenario – was unavailable due to maintenance.

It was perfect chaos.

This outage could have happened to anyone. No matter how good your recovery strategies are, the underlying cause is that many companies don’t have a proper move-add-change process in place. Consequently, technicians have to operate with a high failure rate and insufficient work order management. Outdated information on the physical network increases the number of network problems. According to Gartner, more than half of such network problems are caused by physical layer issues due to leak of documentation and patching mistakes. Other sources claim that up to 40% of switch ports* are stranded resources due to insufficient documentation. Put simply, data center operators don’t always have a complete understanding of their physical infrastructure – but they could if they had the right tools in place.

Real life problems and how they can easily be solved

How was the chaos in Zurich corrected? The technician tried to fix the problem immediately by finding the right position of the cable in the current documentation. Unfortunately, it didn’t work, so he called the service desk for help. Mobilizing all their forces, it took them 2 hours to fix the problem and restart the systems. Not one single train traveled in or out of the main railway station during this time. According to the New Journal of Zurich, Zurich’s main station is served by more than 2,900 trains daily and served about 414,000 passengers per day in 2014. With an average ticket costing about $37.00, this roughly equals a loss of $1,276,500 for 2 hours. Needless to say, the service interruption caused a huge loss of capital and reputation.

This mishap goes to show why data centers are a core component in the value chain for any digitized business – and the demands continue to increase rapidly. Data center management systems are essential for provisioning and managing a large number of assets and connections. All activities and changes must be logged for later reference and auditing without wasting time on documentation activities. Unauthorized changes should automatically generate alarms, thus enabling rapid fault location and correction. To optimize data center infrastructure operation and precise capacity planning, the system should provide further tailored reports and analysis.

What is AIM and how does it work?
Comprehensive Automated Infrastructure Management (AIM) solutions combine software and hardware monitoring assistance. From a software perspective, the benefits include an entire physical infrastructure inventory, the capability to control and plan changes based on proper work order management processes, graphical illustrations of Cis and network components, signal tracing capabilities, and better use of installed capacity due to meaningful analytics. Hardware benefits range from real-time physical connectivity monitoring, automatic updates of databases to ensure 100% accuracy of documentation, automatic tracking of all changes, and alerts on any unsolicited changes for greater security.

That’s where R&M and FNT Software come in. The two internationally active companies developed a standard interface between R&MinteliPhy and FNT Command to deliver unprecedented transparency and control of IT assets and network resources.

“We see capacity management in the data center becoming an increasingly important topic for infrastructure and operations managers,”explains said Dr. Thomas Wellinger, Market Manager Data Center at R&M. “The impact poor connectivity visibility can have on data availability or uptime becomes the topic of automation of connectivity documentation a priority – whether from an availability, operations, risk or compliance perspective as it supports critical capacity and compliance decisions and increases data center efficiency.”

Listen to our Brighttalk Webcast here:

How AIM Could Work in Your Environment

First of all, implementing professional AIM systems isn’t as difficult as you might think. Network managers can manage their physical infrastructure to be fully automatic by retrofit with just a small number of components, e.g. patch cord connectors, sensorbars for patch panels and analyzers for the network cabinets. The RFID-based cable tracking solution, R&MinteliPhy, monitors and sends data automatically to FNT Command as the central cable management solution and thereby provides real-time physical connectivity, port capacity, documentation, and security alerts.

The standard interface works in both directions as all changes and work orders can be planned in the central cable management system to a specific time and date. Then, the task can be send to sensorbars and patch panels for visual guidance, task tracking, and permission configuration.

Overall, this combined solution empowers network managers to have complete control of their physical network and perform work orders faster, more efficiently and with nearly zero fault rate. Thanks to the central documentation, infrastructure and operational decision makers have all the information they need to make better informed critical capacity and compliance decisions. This, therefore, increases data center efficiency and ensures networks are ready for any and all upcoming challenges.