Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Survival guide for a data center power shutdown final
1. Survival guide for a data center power shutdown
Imagine you are running a data center (DC) housing critical applications and solutions. Now imagine that
you are told that this DC has to sustain an extended power outage. The big question for all involved is,
"How much restful sleep can I expect until the DC is back online?"
What is the impact of an improperly executed DC shutdown?
Failure to shut down a DC gracefully can lead to severe impact resulting in downtime and a loss of
revenue as well as create significant customer dissatisfaction. It is imperative to avoid DC outages. But if
you find yourself in that situation, following these tips might save you a lot of headaches.
Planning: Strong medication to preserve your sleep cycle
• Visibility
• Collaboration
• Coordination
• Contingency and business continuity
2. For the first dose, you absolutely must have complete understanding of the DC environment and
dependencies. It's tremendously beneficial to have visibility into your physical, virtual, network and
storage infrastructure. Ideally, you are using IBM Cloud and Smarter Infrastructure (CSI) monitoring
products such as IBM Tivoli Monitoring (ITM), IBM Tivoli Network Manager IP Edition (ITNM-ip) and IBM
Tivoli Storage Productivity Center (TPC) to manage your DC.
Such an arduous and complex task cannot be accomplished by single individual or team; it requires a
major collaborative effort and cross-team coordination to be successful. Knowing the roles and
responsibilities of every DC service, application and asset owner will ensure that the data center
shutdown and restore goes as smoothly as possible.
With even the best laid plans, some systems may not come back up. Having a strong business continuity
plan and backups for critical services will go a long way to ensure restful sleep in the nights after the
shutdown!
Execution: Shutting down with grace and style
If the team has planned and coordinated well, this step should flow easily. Knowing the sequence for
shutting down services, servers, storage and network devices is crucial. For example, shutting down
domain name server DNS can break your monitoring database connectivity before you're ready to
gracefully shut it down!
The following is the general sequence that the CSI-IT team follows whenever we go through a planned
data center outage:
1. Make sure you have a successful backup of all physical and virtual systems.
2. For physical systems:
a. Shut down the application or services gracefully.
b. Shut down databases and other middleware.
c. Shut down the physical systems.
3. For virtual systems :
a. Shut down your applications or services.
b. Shut down databases and other middleware.
c. Shut down the virtual instances – virtual machines (VMs) or logical partitions (LPARs).
d. Shut down your virtual input/outputs (VIOs), vCenter or kernel-based virtual machine
(KVM) host.
e. Shut down Hardware Management Consoles (HMCs) if in use.
f. Power down cloud infrastructure racks.
4. Shut down storage infrastructure.
3. 5. Shut down network services infrastructure.
Power's back up; is the data center?
Just like the shutdown of the DC, powering it back up must be systematic and, essentially, the reverse of
the shutdown sequence. This is where using the value of CSI monitoring products shines through by
providing insight into service, server, appliance and device state by facilitating expedited restoration of
services.
A key caveat when powering racks back online is to pace oneself so as to not trip a power breaker and
risk an unexpected outage for the DC. It's never fun recovering from the chaos resulting from a lack of
patience!
Conclusion
Given the magnitude and complexity of the CSI data center environments, power outage events require
close coordination, communication and flawless execution to be successful.
In our own experience with a recent power shutdown, CSI-IT followed these rules of thumb and
managed to minimize impact to the DC. We shut down over 4,000 physical and virtual assets
and experienced a less than 0.001 percent failure rate!
How will you avert sleepless nights when dealing with your own data center shutdowns? This blog post
was a collaborative effort by the CSI-IT Lab and Network Services Organization. Please leave a comment
or contact us on Twitter (@ShakeMan_A or @GigaManz) or by email (salmuabe@us.ibm.com,
hestrada@us.ibm.com or mfarid@us.ibm.com).