2. Data Center Evolution
1960 1980 2000 2010
Mainframes
Terminal
Client/
Server
Compute
Evolution
Network
Evolution
OptimizationInternet
Computing
Networked Data Center
Phase
Data Center Continuous
Network
Content
Networking
Thin
Client:
HTTP
AvailabilityData Center
Data Center
Virtualization
Consolidation
TCP/IP
Data Center
Networking
1.Consolidation
2.Integration
3.Virtualization
4.High Availability
3
3. 4
N-Tier
Applications
DB Servers
App Servers
Web Servers
Mainframe OperationsIP Comm.
Front End
Network
Application/Server
Optimization
Content
Switch
Cache
Today’s Data Center
Integration of Many Systems and Services
Tape
FC
SAN
FC
Switch
RAID
Storage
Network
NAS
FC
Switch
VSANs
Scalable Infrastructure Application
and Server Optimization Data Center
Security
DC Storage Networks
Distributed Data Centers
Security
Firewall
IDS
Resilient
IP
Metro Network
DWDM/SONET/Ethernet
Secondary Data Center
MAN/
Internet
DR Data Center
FC
Switch
WAN/
Internet
5. Distributed Data Centers
5
• Required by disaster recovery and business
Continuance
• Avoid single, concentrated data depositary
• High availability of applications and data
access
• Load balancing together with performance
scalability
• Better response and optimal content routing:
proximity to clients
9. Disaster Recovery
2
• Recovery of data and resumption of service
—Ensuring
• business can recover and continue after
failure or disaster
• Ability of a business to adapt, change and
continue when confronted with various
outside impacts Mitigating the impact of a
disaster
11. Disaster Recovery Planning
4
• Business Impact Analysis (BIA)
Determines the impacts of various disasters to specific
business functions and company assets
• Risk analysis
Identifies important functions and assets that are critical
to company’s operations
• Disaster Recovery Plan (DRP)
Restores operability of the target systems, applications,
or computing facility at the secondary data center after
the disaster
12. Disaster Recovery Objectives
• Recovery Point Objective (RPO)
The point in time (prior to the outage) in which system
and data must be restored to Tolerable lost of data in
event of disaster or failure.
The impact of data loss and the cost associated with
the loss
• Recovery Time Objective (RTO)
The period of time after an outage in which the
systems and data must be restored to the
predetermined RPO The maximum tolerable outage
time
13. Recovery Point/Time vs. Cost
Smaller RPO/RTO
Higher $$$,
replication,
hot standby
Larger RPO/RTO
Lower $$$, tape
backup/restore,
cold standby
Time
Disaster
Strikes
Time t1 Time t2
Systems Recovered
and Operational
Recovery Time
Extended
Cluster
Manual
Migration
Tape
Restore
Secs Mins Hours Days Weeks
$$$ Increasing Cost
Recovery Point
Synchronous
Replication
SecsMinsHoursDays
Asynchronous
Replication
Periodic
Replication
Tape
backup
time t0
$$$ Increasing Cost
5
Critical Data Is
Recovered
18. InternetService
Provider A
Service
Provider B
Site Failures
• Partial site failure
-Application maintenance
-Application migration
-Application scheduled
-DR exercise
• Complete site failure
-Disaster
2
19. Warm Standby
4
• A data center that is equipped with
hardware and communications
interfaces capable of providing
backup operating support
• Latest backups from the production data
center must be delivered
• Network access needs to be activated
• Application needs to be manually started
21. Hot Standby
5
A data center that is environmentally ready and
has sufficient hardware, software to provide data
processing service with little down time
• Hot backup offers disaster recovery, with little or
no human intervention
• Application data is replicated from the primary
site
• A hot backup site provides better RTO/RPO than
warm standby but cost more to implement
• Business continuance
23. Active/Active DR Design Multiple
Tiers of Application
Presentation Tier
Application Tier
Storage Tier
InternetService
Provider A
Service
Provider B
:
25. Site Selection Mechanisms
• Site selection mechanisms depend on the
technology or mix of technologies adopted
for request routing:
1. HTTP redirect
2. DNS-based
3. L3 Routing with Route Health Injection (RHI)
• Health of servers and/or applications needs
to be taken into account
• Optionally, other metrics (like load) can be
measured and utilized for a better selection
27. DNS----Based Site Selection
—Traffic Flow
Client
http://www.cisco.com/
DNS Proxy
Data Center 1
Root Name Server for/ Authoritative Name
Server for .com
Authoritative
Name Server
cisco.com
Authoritative
Name Server
www.cisco.com
Keepalives
1
2
3 4
5
6
7
8
9
10
Keepalives
Data Center 2
UDP:53
TCP:80
28. Route Health Injection
Implementation
Client B
2
Client A
Router 13
Router 11
Router 12
Router 10
Location B
Preferred Location for
VIP x.y.w.z
Location A
Backup Location for
VIP x.y.w.z
Very High Cost
Low Cost
30. Cluster Overview
Application Servers
Web Servers
Database Servers
2:
Load Balancing Cluster : multiple
copies of the same application
against the same data set,
usually read only
High Availability Cluster :
multiple copies of application
that requires access to a
common data depository, usually
read and write
Clustering provides benefits for
availability, reliability, scalability,
and manageability
31. High Availability Cluster Design
APP
Cluster
Software
Cluster
Enabler
OS
3
Public Network : Client
/Application requests
Private Network :
Interconnection between
nodes
Storage Disk : Shared
storage array, NAS or
SAN
32. HA Cluster Application View
Node1 Node2
3
Active/standby
Standby takes over when active fails
Two-node or multi-node
Active/active
Database requests load balanced all nodes
Lock mechanism ensures data integrity
Shared everything
Each node mounts all storage resources
Provides a single layout reference system for
all nodes
Shared nothing
Each node mounts only its “semi-private”
storage
Data stored on the peer system’s storage is
accessed via the peer-peer communication
33. WAN
Geo-Cluster: Cluster That Span Multiple
Data Centers
Geo--Clusters
Considerations
Node1 Node2
Local
Datacenter
Remote
Datacenter
Disk Replication
Synchronous or Asynchronous
2 x RTT
Challenges:
Split brain
L2 heart-
beats
Storage 3
34. HA Cluster Challenges :
Split-Brain
Node1 Node2
Data Corruption
3
2
Split-brain : Active nodes
concurrently accessing the
same disk, leads to data
corruption
Resolution : Use a Quorum, a
tie breaker for gaining access
to the disk
35. Layer 2 Heartbeats
Node1 Node2
Local
Datacenter
Remote
Datacenter
WAN
Disk Replication
Synchronous or Asynchronous
Public Layer 2 Network
3
3
Private Layer 2 Network
Extended L2 Network :
L2 adjacency required
for node’s heartbeat.
Extending VLAN across
site is hazardous
Resolution : L3
Capability for Cluster
Heartbeat. EoMPLS to
carry L2 hearbits
across DR sites.
36. Storage Disk Zoning
Node1 Node2
Extended SAN
sym1320 sym1291
StandbyActive
WD
WD
3
4
RW
RW
Storage Zoning : Taking over of
storage disk array when active
node fails.
Resolution : Cluster software to
communicate with the Cluster
Enabler.
Instructs the Disk Array to perform
an failover when failure is detected.
37. Storage for Applications
35
• Presentation tier
Unrelated small data files commonly stored on
internal disks
Manual distribution
• Application processing tier
Transitional, unrelated data
Small files residing on file systems
May use RAID to spread data over multiple disks
• Storage tier
Large, permanent data files or raw data
Large batch updates, most likely real time
Log and data on separate volumes
38. Replication: Modes of
Operation
39
Synchronous
All data written to local and remote arrays before I/O
is complete and acknowledged to host
Speed of Light = 3 x 108m/s (Vacuum) ≈ 3.3Hs/km
Speed through Fiber ≈ ⅔ c ≈ 5Hs/km
2 RTT per write I/O = 20Hs/km
Asynchronous
Write acknowledged and I/O is complete after write to
local array; changes (writes) are replicated to remote
array asynchronously
39. Synchronous
Impact to Application
Performance
Distance Limited (Are
Both Sites Within the
Same Threat Radius)
No Data Loss
Asynchronous
No Application Performance
Impact
Unlimited Distance (Second
Site Outside Threat Radius)
Exposure to Possible Data
Loss
Synchronous vs.
Asynchronous Trade-- Off
3:
Enterprises Must Evaluate the Trade-Offs
• Maximum tolerable distance ascertained by
assessing each application
• Cost of data loss
40. Data Replication with DB
Example
Control
Files
Datafiles Redo Log
Files
Identify
4
Record
Changes To
• DB name
• Creation date
• Backup
performed
• Redo log time
period
• Datafile state
• Table spaces
• indexes
• Data dictionary
M Database changes
• Control files identify other files
making up the database and
records content and state of the
db
• Datafile is only updated
periodically
• Redo logs record db changes
resulting from transactions
Used to play back changes that
may not have been written to
datafile when failure occurred
Typically archived as they fill to
local and DR site destinations
41. Data Replication with DB
Example (Cont.)
Hot Backup of
Datafiles and
Control Files
Taken at Time t0
t0
Time
t1
Failure or Disaster Occurs at
Time t1
•Media failure (e.g., disk)
• Human error (datafile deletion)
• Database corruption
Online Redo
Logs
. . . . . . . . .
Archived Redo Logs
4
Database restored to state at time of failure
(time t1) by:
1. Restoring control files and datafiles from last
hot backup (time t0)
2. Sequentially replaying changes from subsequent
redo logs (archived and online)—changes made
between
time t0 and t1
42. Data Replication with DB
Example (Cont.)
Redo Logs (Cyclic)Redo Logs (Cyclic)
Copy of Every Committed
Transaction
Archive Logs
Synchronously
Replicated
for Zero Loss
SAN
Extension
Transport
Replicated/Copied
Primary Site Secondary Site
Replicated/Copied
Point in
Time
Copy
Taken
When DB
Quiescent
Database
Database
Copy at
Time t0
Database
Copy at
Time t0
Earlier DB
Backups
Archive Logs
4
Mixture of Sync and Async Replication Technologies
Commonly Used
• Usually only redo logs sync replicated to remote site
• Archive logs created from redo log and copied when redo log switches
• Point in Time (PiT) copies of datafiles and control files copied periodically
(e.g.,
nightly)
44. Dark Fiber
CWDM
DWDM
SONET/SDH
Data
Center National
Increasing Distance
Campus Metro Regional
Limited by Optics (Power Budget)
Data Center Transport Options
Async
Async (1Gbps+)MDS9000 FCIP
Limited by Optics (Power Budget)
4
3
Limited by BB_Credits
Sync
Sync (2Gbps)
Sync (2Gbps Lambda)
Sync (1Gbps+ Subrate)
Sync (Metro Eth)
45. Cisco Data Center Vision
CONSOLIDATION
Centralization and
standardization to lower costs,
improve efficiency and uptime
VIRTUALIZATION
Management of resources
independent of underlying
physical infrastructure to
increase utilization, efficiency
and flexibility
LAN
WAN
MAN
SAN
Data Storage
Network Network
AUTOMATION
Dynamic provisioning and
autonomic Information Lifecyle
Management (ILM) to enable
business agility
Compute Network Storage
Enterprise
Applications
Storage
Network
Compute
Business Policies
On2Demand
Service Oriented
Intelligent
Information
Network
Server
Fabric
Network
HPC
Cluster
GRID
4
46. 4:
Today’s Data Centers
Require an Architectural Approach to…
Protect with Business Resilience
•Tighten security
•Improve business continuance
Optimize with Consolidation
•Improve operational efficiency
•and resource utilization
•Lower complexity and cost of ownership
Grow towards Services-oriented
Infrastructure
•Align virtualized resources with business
demands
•Automate infrastructure to respond
dynamically
47. The Big Picture—The Cisco Data Center
The Emerging
Data Center
Architecture
Multiprotocol
Gateway Services
ENTERPRISE
TAPE STORAGE
ENTERPRISE
DISK STORAGE
MAINFRAME
CONNECTIVITY
TOPSPIN
FAMILY
Catalyst 6500
Family
SERVER
FABRIC
SWITCHING
SSL Termination
VPN Termination
Firewall Services
Intrusion Detection
Server Balancing
MDS 9000
Family
ENTERPRISE SAN
SWITCHING
Embedded Intelligent
Network Services
Embedded Intelligent
Virtualization Services
Server Virtualization
Virtual I/O
V
Low Latency RDMA
Services
Clustering
Fabric Routing Svcs
Data Replication Svcs
Storage Virtualization
Virtual Fabrics (VSANs)
Embedded Intelligent
Storage Services
ENTERPRISE
GRID
Grid/Utility Computing
NAS UNIXWIN
Fabric #2 Fabric #3Fabric #1
Enterprise UNIX/Windows Blade Virtual Private Virtual Private Virtual Private
NAS Storage Servers Servers Server Server Blade Server