Data center disaster recovery.ppt

Data Center Evolution
1960 1980 2000 2010
Mainframes
Terminal
Client/
Server
Compute
Evolution
Network
Evolution
OptimizationInternet
Computing
Networked Data Center
Phase
Data Center Continuous
Network
Content
Networking
Thin
Client:
HTTP
AvailabilityData Center
Data Center
Virtualization
Consolidation
TCP/IP
Data Center
Networking
1.Consolidation
2.Integration
3.Virtualization
4.High Availability
3

4
N-Tier
Applications
DB Servers
App Servers
Web Servers
Mainframe OperationsIP Comm.
Front End
Network
Application/Server
Optimization
Content
Switch
Cache
Today’s Data Center
Integration of Many Systems and Services
Tape
FC
SAN
FC
Switch
RAID
Storage
Network
NAS
FC
Switch
VSANs
Scalable Infrastructure Application
and Server Optimization Data Center
Security
DC Storage Networks
Distributed Data Centers
Security
Firewall
IDS
Resilient
IP
Metro Network
DWDM/SONET/Ethernet
Secondary Data Center
MAN/
Internet
DR Data Center
FC
Switch
WAN/
Internet

Primary Data
Center
Secondary
Data Center
App A App B App A App C
FC FC
What Is Distributed Data
Center?
Data Replication

Distributed Data Centers
5
• Required by disaster recovery and business
Continuance
• Avoid single, concentrated data depositary
• High availability of applications and data
access
• Load balancing together with performance
scalability
• Better response and optimal content routing:
proximity to clients

Primary Data
Center
Secondary
Data Center
FC FC
Front-----End IP Access
Layer
“Content Routing”
Site Selection
9

Primary Data
Center
Secondary
Data Center
FC FC
Application and Database
Layer
“Content Switching”
Load Balancing
“Server Clustering”
High Availability
:

Primary Data
Center
Secondary
Data Center
FC FC
Backend SAN Extension
“Storage” and “Optical”
Data Replication
and Transporting

Disaster Recovery
2
•  Recovery of data and resumption of service
—Ensuring
• business can recover and continue after
failure or disaster
• Ability of a business to adapt, change and
continue when confronted with various
outside impacts Mitigating the impact of a
disaster

Disaster Recovery
What It Means for Business
3
-

Disaster Recovery Planning
4
•  Business Impact Analysis (BIA)
Determines the impacts of various disasters to specific
business functions and company assets
• Risk analysis
Identifies important functions and assets that are critical
to company’s operations
• Disaster Recovery Plan (DRP)
Restores operability of the target systems, applications,
or computing facility at the secondary data center after
the disaster

Disaster Recovery Objectives
•  Recovery Point Objective (RPO)
The point in time (prior to the outage) in which system
and data must be restored to Tolerable lost of data in
event of disaster or failure.
The impact of data loss and the cost associated with
the loss
• Recovery Time Objective (RTO)
The period of time after an outage in which the
systems and data must be restored to the
predetermined RPO The maximum tolerable outage
time

Recovery Point/Time vs. Cost
Smaller RPO/RTO
Higher $$$,
replication,
hot standby
Larger RPO/RTO
Lower $$$, tape
backup/restore,
cold standby
Time
Disaster
Strikes
Time t1 Time t2
Systems Recovered
and Operational
Recovery Time
Extended
Cluster
Manual
Migration
Tape
Restore
Secs Mins Hours Days Weeks
$$$ Increasing Cost
Recovery Point
Synchronous
Replication
SecsMinsHoursDays
Asynchronous
Replication
Periodic
Replication
Tape
backup
time t0
$$$ Increasing Cost
5
Critical Data Is
Recovered

Failure Scenarios
:
Disaster Could Mean Many Types of Failure
Network failure
Device failure
Storage failure
Site failure

InternetService
Provider A
Service
Provider B
Network Failures

ISP failure
Dual ISP connections
Multiple ISP
Connection failure within
the network
EtherChannel®
Multiple route paths

InternetService
Provider A
Service
Provider B
Device Failures
 Routers, switches,
FWs
HSRP
VRRP
Hosts
HA cluster
LB server farm
NIC teaming

InternetService
Provider A
Service
Provider B
Storage Failures
Disk arrays
RAID
Disk controllers
Storage Replication
Site to Site Mirroring Optimization

InternetService
Provider A
Service
Provider B
Site Failures
• Partial site failure
-Application maintenance
-Application migration
-Application scheduled
-DR exercise
• Complete site failure
-Disaster
2

Warm Standby
4
• A data center that is equipped with
hardware and communications
interfaces capable of providing
backup operating support
• Latest backups from the production data
center must be delivered
• Network access needs to be activated
• Application needs to be manually started

Disaster Recovery—
Active/Standby
Primary Data
Center
Secondary
Data Center
(Warm Standby)
FC FC
IP/Optical Network

Hot Standby
5
A data center that is environmentally ready and
has sufficient hardware, software to provide data
processing service with little down time
• Hot backup offers disaster recovery, with little or
no human intervention
• Application data is replicated from the primary
site
• A hot backup site provides better RTO/RPO than
warm standby but cost more to implement
• Business continuance

Disaster Recovery—
Active/Standby
Primary Data
Center
Secondary
Data Center
FC FC
IP/Optical Network
9

Active/Active DR Design Multiple
Tiers of Application
Presentation Tier
Application Tier
Storage Tier
InternetService
Provider A
Service
Provider B
:

Internal
Network
Active/Active Web
Hosting
Active/Active
Application
Processing
Active/Standby
Database Processing
or
Active/Active
for Different
Application
Internal
Network
Active/Active Data Centers
Internet
2
Service
Provider A
Service
Provider B

Site Selection Mechanisms
• Site selection mechanisms depend on the
technology or mix of technologies adopted
for request routing:
1. HTTP redirect
2. DNS-based
3. L3 Routing with Route Health Injection (RHI)
• Health of servers and/or applications needs
to be taken into account
• Optionally, other metrics (like load) can be
measured and utilized for a better selection

HTTP Redirection—Traffic Flow
http://www2.cisco.com/
http://www.cisco.com/
http://www1.cisco.com/
1. GET/HTTP/1.1
Host: www.cisco.com
2. HTTP/1.1 302 Moved
Location: www2.cisco.com
3. GET/HTTP/1.1Host: www2.cisco.com
HTTP/1.1 200 OK
Keepalives

DNS----Based Site Selection
—Traffic Flow
Client
http://www.cisco.com/
DNS Proxy
Data Center 1
Root Name Server for/ Authoritative Name
Server for .com
Authoritative
Name Server
cisco.com
Authoritative
Name Server
www.cisco.com
Keepalives
1
2
3 4
5
6
7
8
9
10
Keepalives
Data Center 2
UDP:53
TCP:80

Route Health Injection
Implementation
Client B
2
Client A
Router 13
Router 11
Router 12
Router 10
Location B
Preferred Location for
VIP x.y.w.z
Location A
Backup Location for
VIP x.y.w.z
Very High Cost
Low Cost

Cluster Overview

Application Servers
Web Servers
Database Servers
2:
Load Balancing Cluster : multiple
copies of the same application
against the same data set,
usually read only
High Availability Cluster :
multiple copies of application
that requires access to a
common data depository, usually
read and write
Clustering provides benefits for
availability, reliability, scalability,
and manageability

High Availability Cluster Design
APP
Cluster
Software
Cluster
Enabler
OS
3
Public Network : Client
/Application requests
Private Network :
Interconnection between
nodes
Storage Disk : Shared
storage array, NAS or
SAN

HA Cluster Application View
Node1 Node2
3
Active/standby
Standby takes over when active fails
Two-node or multi-node
Active/active
Database requests load balanced all nodes
Lock mechanism ensures data integrity
Shared everything
Each node mounts all storage resources
Provides a single layout reference system for
all nodes
Shared nothing
Each node mounts only its “semi-private”
storage
Data stored on the peer system’s storage is
accessed via the peer-peer communication

WAN
Geo-Cluster: Cluster That Span Multiple
Data Centers
Geo--Clusters
Considerations
Node1 Node2
Local
Datacenter
Remote
Datacenter
Disk Replication
Synchronous or Asynchronous
2 x RTT

Challenges:
Split brain
L2 heart-
beats
Storage 3

HA Cluster Challenges :
Split-Brain
Node1 Node2
Data Corruption
3
2
Split-brain : Active nodes
concurrently accessing the
same disk, leads to data
corruption
Resolution : Use a Quorum, a
tie breaker for gaining access
to the disk

Layer 2 Heartbeats
Node1 Node2
Local
Datacenter
Remote
Datacenter
WAN
Disk Replication
Synchronous or Asynchronous
Public Layer 2 Network
3
3
Private Layer 2 Network
Extended L2 Network :
L2 adjacency required
for node’s heartbeat.
Extending VLAN across
site is hazardous
Resolution : L3
Capability for Cluster
Heartbeat. EoMPLS to
carry L2 hearbits
across DR sites.

Storage Disk Zoning
Node1 Node2
Extended SAN
sym1320 sym1291
StandbyActive
WD
WD
3
4
RW
RW
Storage Zoning : Taking over of
storage disk array when active
node fails.
Resolution : Cluster software to
communicate with the Cluster
Enabler.
Instructs the Disk Array to perform
an failover when failure is detected.

Storage for Applications
35
• Presentation tier
Unrelated small data files commonly stored on
internal disks
Manual distribution
• Application processing tier
Transitional, unrelated data
Small files residing on file systems
May use RAID to spread data over multiple disks
• Storage tier
Large, permanent data files or raw data
Large batch updates, most likely real time
Log and data on separate volumes

Replication: Modes of
Operation
39
Synchronous
All data written to local and remote arrays before I/O
is complete and acknowledged to host
Speed of Light = 3 x 108m/s (Vacuum) ≈ 3.3Hs/km
Speed through Fiber ≈ ⅔ c ≈ 5Hs/km
2 RTT per write I/O = 20Hs/km
Asynchronous
Write acknowledged and I/O is complete after write to
local array; changes (writes) are replicated to remote
array asynchronously

Synchronous
Impact to Application
Performance
Distance Limited (Are
Both Sites Within the
Same Threat Radius)
No Data Loss
Asynchronous
No Application Performance
Impact
Unlimited Distance (Second
Site Outside Threat Radius)
Exposure to Possible Data
Loss
Synchronous vs.
Asynchronous Trade-- Off
3:
Enterprises Must Evaluate the Trade-Offs
• Maximum tolerable distance ascertained by
assessing each application
• Cost of data loss

Data Replication with DB
Example
Control
Files
Datafiles Redo Log
Files
Identify
4
Record
Changes To
• DB name
• Creation date
• Backup
performed
• Redo log time
period
• Datafile state
• Table spaces
• indexes
• Data dictionary
M Database changes
• Control files identify other files
making up the database and
records content and state of the
db
• Datafile is only updated
periodically
• Redo logs record db changes
resulting from transactions
Used to play back changes that
may not have been written to
datafile when failure occurred
Typically archived as they fill to
local and DR site destinations

Example (Cont.)
Hot Backup of
Datafiles and
Control Files
Taken at Time t0
t0
Time
t1
Failure or Disaster Occurs at
Time t1
•Media failure (e.g., disk)
• Human error (datafile deletion)
• Database corruption
Online Redo
Logs
. . . . . . . . .
Archived Redo Logs
4
Database restored to state at time of failure
(time t1) by:
1. Restoring control files and datafiles from last
hot backup (time t0)
2. Sequentially replaying changes from subsequent
redo logs (archived and online)—changes made
between
time t0 and t1

Example (Cont.)
Redo Logs (Cyclic)Redo Logs (Cyclic)
Copy of Every Committed
Transaction
Archive Logs
Synchronously
Replicated
for Zero Loss
SAN
Extension
Transport
Replicated/Copied
Primary Site Secondary Site
Replicated/Copied
Point in
Time
Copy
Taken
When DB
Quiescent
Database
Database
Copy at
Time t0
Database
Copy at
Time t0
Earlier DB
Backups
Archive Logs
4
Mixture of Sync and Async Replication Technologies
Commonly Used
• Usually only redo logs sync replicated to remote site
• Archive logs created from redo log and copied when redo log switches
• Point in Time (PiT) copies of datafiles and control files copied periodically
(e.g.,
nightly)

High
Density
Multilayer
SAN
Director
Internet
High
Density
Multilayer
LAN
Switch
Data Center Interconnection
Options
Back2End
Application
Servers
Enterprise2Class
Storage Arrays
Stateful
Firewalls
Content
Caching
Server
Load
Balancing
Intrusion
Detection
Front2End
Application
Servers
SONET/SDH
DWDM/
CWDM
IP/Metro E
High
Density
Multilayer
SAN
Director
High
Density
Multilayer
LAN
Switch
Load
Balancing
Intrusion
Detection
Stateful
Firewalls
Content
Caching
Server
Back2End
Application
Servers
Enterprise2Class
Storage Arrays
Front2End
Application
Servers
Internet
4
2

Dark Fiber
CWDM
DWDM
SONET/SDH
Data
Center National
Increasing Distance
Campus Metro Regional
Limited by Optics (Power Budget)
Data Center Transport Options
Async
Async (1Gbps+)MDS9000 FCIP
Limited by Optics (Power Budget)
4
3
Limited by BB_Credits
Sync
Sync (2Gbps)
Sync (2Gbps Lambda)
Sync (1Gbps+ Subrate)
Sync (Metro Eth)

Cisco Data Center Vision
CONSOLIDATION
Centralization and
standardization to lower costs,
improve efficiency and uptime
VIRTUALIZATION
Management of resources
independent of underlying
physical infrastructure to
increase utilization, efficiency
and flexibility
LAN
WAN
MAN
SAN
Data Storage
Network Network
AUTOMATION
Dynamic provisioning and
autonomic Information Lifecyle
Management (ILM) to enable
business agility
Compute Network Storage
Enterprise
Applications
Storage
Network
Compute
Business Policies
On2Demand
Service Oriented
Intelligent
Information
Network
Server
Fabric
Network
HPC
Cluster
GRID
4

4:
Today’s Data Centers
Require an Architectural Approach to…
Protect with Business Resilience
•Tighten security
•Improve business continuance
Optimize with Consolidation
•Improve operational efficiency
•and resource utilization
•Lower complexity and cost of ownership
Grow towards Services-oriented
Infrastructure
•Align virtualized resources with business
demands
•Automate infrastructure to respond
dynamically

The Big Picture—The Cisco Data Center
The Emerging
Data Center
Architecture
Multiprotocol
Gateway Services
ENTERPRISE
TAPE STORAGE
ENTERPRISE
DISK STORAGE
MAINFRAME
CONNECTIVITY
TOPSPIN
FAMILY
Catalyst 6500
Family
SERVER
FABRIC
SWITCHING
SSL Termination
VPN Termination
Firewall Services
Intrusion Detection
Server Balancing
MDS 9000
Family
ENTERPRISE SAN
SWITCHING
Embedded Intelligent
Network Services
Virtualization Services
Server Virtualization
Virtual I/O
V
Low Latency RDMA
Services
Clustering
Fabric Routing Svcs
Data Replication Svcs
Storage Virtualization
Virtual Fabrics (VSANs)
Storage Services
ENTERPRISE
GRID
Grid/Utility Computing
NAS UNIXWIN
Fabric #2 Fabric #3Fabric #1
Enterprise UNIX/Windows Blade Virtual Private Virtual Private Virtual Private
NAS Storage Servers Servers Server Server Blade Server

Data center disaster recovery.ppt

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Data center disaster recovery.ppt

Similaire à Data center disaster recovery.ppt (20)

Dernier

Dernier (20)

Data center disaster recovery.ppt