The Strategic Role of DCiM in Banking Data Center Security

On October 14th, there was a significant disruption that affected the banking services of DBS and Citibank in Singapore.

This disruption has shed more light on the technical challenges that can arise in critical infrastructures, especially in data centers. The interruption, caused by a technical issue with the chilled water system during a planned upgrade at a data center, led to the disconnection of essential services when temperatures rose in some rooms, affecting operations and consequently, numerous clients.

Problem analysis.

It seems that the interruption originated during a routine update in the cooling system, specifically with the chilled water system. Everything indicates that an incorrect adjustment was made, causing overheating in several DataCenter rooms. This incident already demonstrates the fragility of the infrastructure, highlighting the need to increase more robust preventive measures.

Impact on customers.

The Monetary Authority of Singapore (MAS) ordered an investigation into the incident as it resulted in the disconnection of many banking services from both entities, including mobile banking, ATMs, and online transactions, from 3:00 PM on Saturday, October 14th, until Sunday morning

The lack of access resulted in over 810,000 failed attempts to access banking platforms and more than 2.5 million incomplete transactions, causing significant frustration and concern among their customers.

Cause of the incident.

Everything indicates that the root cause of the incident was caused by a human error during the routine update. One of the operators sent an incorrect parameter during the update; specifically, they mistakenly closed valves belonging to the water tanks, leading to an unplanned increase in temperature in the data center. This resulted in the automatic disconnection of servers to prevent damager.

At first, it may seem that the error did not cause any damage to the critical infrastructure. However, the direct impacts on customers, from the interruption of essential services to the inability to carry out transactions, underscore the economic and brand importance that this incident has had on the entities in terms of reputation.

All this data shows us the importance of robust preventive measures 

Improvements in Update Management:

 Implement more rigorous update processes, including thorough testing and validation of changes, to avoid incorrect adjustments that may impact the stability of the system.

Reinforcement of Cooling Infrastructure

Improve cooling systems in data centers, considering advanced technologies and redundancies to ensure operational stability even during critical updates.

Greater Transparency and Communication:

Establish transparent communication protocols with customers during interruptions, providing timely and clear updates on the situation and corrective measures.

Review of Disaster Recovery Plans:

 Evaluate and improve disaster recovery plans to ensure a faster and more effective response in emergency situations, including the efficient implementation of backup data centers

Regulatory Compliance and Rigorous Oversight:

Strengthen compliance with regulations established by authorities, such as the Monetary Authority of Singapore (MAS), and undergo more rigorous oversight to ensure conformity with security standards and maximum allowable downtime​ 

With the increasing technology every day, more thorough digital management becomes essential. 


At Bjumper we present some proposals that significantly improve the management of critical IT infrastructure, making them more resilient and offering technology to our clients that minimizes the potential inconveniences that interruptions may generate in all of them:  

1º Implementation of Data Center Infrastructure Management (DCIM) Systems:

  • The adoption of DCIM enables proactive monitoring and management of the data center infrastructure. It provides real-time visibility into critical components, including cooling systems, power, and server performance.

2º Continuous Monitoring of Environmental Conditions:

  • An efficient DCIM constantly monitors environmental conditions, including temperature in server rooms. In the case we mentioned, early detection of anomalies would have allowed for a quick response before high temperatures affected operations.

3º Advanced Update and Change Management:

  • DCIM systems facilitate centralized management of updates and changes in the infrastructure. They provide tools to plan and execute updates in a more controlled manner, reducing the risk of incorrect adjustments.

4º Enhanced Capacity Planning.

  • The predictive capacity offered by a DCIM allows for more accurate planning, avoiding overload during updates, and maintaining optimal performance under all conditions..

5º Integration with Disaster Response Systems:

  • A well-integrated DCIM collaborates with disaster response systems, ensuring a smooth transition to backup data centers in emergency situations and minimizing downtime..

6º Enhanced Regulatory Compliance:

  • The detailed reports generated by a DCIM facilitate regulatory compliance, providing documented evidence of the infrastructure's status and actions taken to address any issues.

 The inclusion of DCIM in the critical infrastructure management strategy emerges as a key solution to mitigate risks and strengthen resilience

Beyond immediate corrective measures, the implementation of advanced management technologies like DCIM emerges as an essential investment to ensure operational continuity, security, and reliability in critical environments such as banking data centers

In an increasingly digital world, the adoption of innovative solutions becomes imperative to prevent future interruptions and safeguard customer trust and the integrity of the financial system.

Nowadays, having a DCIM is essential, but even more vital is having tools that can interpret in real-time everything that DCIMs offer. Do you know any of these tools? If you are not familiar with any yet, contact us, and we will show you how you can interpret all your DCIM in real-time and implement preventive actions that not only improve the sustainability of the infrastructure but also maximize its benefits.


                                                                    

                          Let It Work for you!


Share post LinkedIn