Management through DCIM...

 DCIM and Uptime Operational Sustainability Certification

I'm going to provide a brief introduction to share my perspective on what data center operations entail, although most are familiar with them.

Data center operations are built on continuity, extensively covered in all industry regulations and articles. However, continuity is typically focused on the electromechanical infrastructure.

Clearly, the continuity of power and cooling enables the operating conditions for the remaining infrastructure, in this case, the IT equipment that consumes power and generates heat, which is subsequently alleviated or evacuated by the data center's cooling systems.

I'd like to add an additional factor: the continuity of the services hosted on the IT infrastructure.

In summary, for Bjumper, data center continuity means the ability to have power and cooling available to allow the IT equipment to operate and, therefore, enable the services hosted on them, which are typically what generate revenue for businesses.

Colocation companies, in turn, earn revenue by providing the infrastructure and electromechanical continuity for their customers to deliver applications and services to end-users.

In essence, data center industry revenue comes from services and applications accessible to business customers or end-users. Some examples could range from online game subscriptions to cable TV channels or applications like online banking and cloud-based management systems.

 With that said, it seems that the goal of data center operations should not overlook the management of IT infrastructure and, of course, its proper functioning.

Moving on to the subject of this article, we will analyze the Uptime Operational Sustainability Certification from an application and effectiveness standpoint.

 Starting with a simple analysis of a data center, we can summarize day-to-day tasks based on roles, starting from leadership levels down to field operations.

 CxOs

  1. Establish the annual budget

  2. Cost control

  3. Growth of results

  4. Define the investment plan

  5. Results analysis for presentation to the board of directors

 Electromechanical Infrastructure or Maintenance:

  1.    . Control of values in line with appropriate equipment operating thresholds

  2.      Provide power to new racks

  3.      Preventive and predictive maintenance

  4.      Corrective maintenance

  5.      Management with manufacturers and other suppliers to scale support with a specified SLA (Service Level Agreement)

  6.      Renewal of components or related management. Control of spare parts inventory.

  7.      Submit resource requests to CxOs

IT Staff:

  1. Change management in servers, electronics, and network connections with a specified SLA .

  2. Labeling of racks, equipment, and networking cables

  3. Submit resource requests to CxOs

  4. Remote Hands Services

       If we analyze operations by each role, we lose the holistic view of operations.

As can be seen from these activities, they are all interconnected, so we need to analyze the whole rather than the parts.

The Uptime Operational Sustainability Certification could be similar to those applied in other sectors, and therefore, at BJ, we believe that some aspects need to be defined with more precision, as is done in other sectors.

The definition of processes and their documentation is the basis of any operational framework. However, process automation is crucial, and ensuring their application using technology is the best result of a good operational framework.

 Next, I would like to present some situations that I believe may be familiar to many of you in order to convey the message more clearly.

  •     Server name labels do not match the Excel or Word documentation.

  • Network cable labeling varies among different technicians. ​

  • Server locations do not match the Excel or Word documentation.

  • Areas in the Data Center have excess heat, with some racks appearing quite empty while others are colder.

  • IT equipment is relocated due to a lack of network ports or power.

  • Physical port names of servers, blade chassis, and especially switches differ from the names indicated in work orders.

  • Monthly manual measurement of power consumption per rack.

  • Occasional thermography scans.

  • Reprocessing of equipment additions and removals due to incomplete or incorrect information.

  • Rework in the management of equipment additions and removals, and cabling due to incomplete or incorrect information.

  • Difficulty in configuring growth forecasts or requirements.

  • Swift knowledge about the maintenance status.

Certification aims to help identify the processes that a Data Center should have and the documentation required for its execution. However, it often does not detail how to carry it out, leaving room for compliance with outdated and inefficient methodologies due to the amount of information to be handled and the need for immediate issue resolution.

For instance, managing the status of maintenance can be done in an Excel sheet to meet certification requirements. However, when you need to know the status, it may be managed solely by individuals in the electromechanical area and accessible to the team, requiring validation that the file is indeed up-to-date. Data reliability comes into play.

If we have automated and centralized control in software, the information will be accessible to all stakeholders, knowing that the information is up-to-date, as there is a defined and automated process.

That's why Bjumper advocates the use of technology to automate the processes defined in the operational framework, using centralized data in a DCIM platform that consolidates the data needed by all Data Center roles. 

DCIM can receive information from the electromechanical infrastructure in various protocols, provide complete asset management for IT equipment, as well as power and network cabling. It offers the possibility to include workflow management in both IT change management and equipment maintenance operations for power and air conditioning.

The concentration of information in DCIM provides the technological foundation to tackle the challenge of automating Data Center operations with guarantees to provide the information necessary for decision-making by different roles in the Data Center.

Our role is not to certify; in fact, we are not an accredited company for certification. We are a company committed to making certifications useful in day-to-day operations and assisting in the definition, analysis, and continuous improvement of these processes.

It's simple: good process definition, facilitated and automated by technology, along with ensuring people's knowledge of these processes and how to use systems, will guarantee proper management of your infrastructure.


                                                                      

                                                                       Let it work for you!

 

Share post LinkedIn