Contenu connexe Similaire à 3. ami big data hadoop on ucs seminar may 2013 (20) Plus de Taldor Group (12) 3. ami big data hadoop on ucs seminar may 20131. Cisco Confidential NDA Required. 1© 2013 Cisco and/or its affiliates. All rights reserved.
Designing Hadoop
Infrastructure with Cisco
Data Center Solutions,
Blueprint for Success.
AmiBen-Amram,amib@cisco.com
DataCenterArchitectureLeader,
Cisco
2. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 2
Massively Parallel
Processing; RDBMS
for EDW
Unstructured
Key-Value Store
Database
Document Database
Apache Opens Source Project
Manage and Process Massive Amounts of Data
No SQL MPP Databases
Hadoop
Cisco has partnered with leading software providers to offer a comprehensive
infrastructure and management solution to Big Data..
3. 3© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required.
Database
NoSQL Database
Tested and Validated Reference Architectures, Joint engineering Lab
Solution Bundles
Technical Collaterals
Apache-Hadoop reengineered
UCS is the exclusive hardware
reference
Several joint engagements
MPP Column store
UCS is exclusive hardware reference
UCS is the only partner platform
Commercial, distributed key-value
database.
MPP row store
Apache-Hadoop software and services
Few 100 node production cluster (UCSM)
Commercial
Document-oriented database
4. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 4
5. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 5
Small Flows/Messaging
(Heart-beats, Keep-alive, delay sensitive
application messaging)
Small – Medium Incast
(Hadoop Shuffle, Scatter-Gather, Distributed Storage)
Large Flows
(HDFS Insert, File Copy)
Large Incast
(Hadoop Replication, Distributed Storage)
7. Analyze
Simulated with
Shakespeare
Wordcount
[ 10s-20s Mgbps]
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort
[ Larger than 1 Gbps]
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort with output
replication
[ 2 – 4 Gbps]
Job Patterns have varying impact on network utilization
Job Pattern - network graph of data coming into one node.
8. 8
Map 1 Map 2 Map NMap 3
Reducer
1
Reducer
2
Reducer
3
Reducer
N
HDFS
Shuffle
Output
Replication
Region
Server
Region
Server
Client Client
Major
Compaction
Read
Read
Read
Update
Update
Read
Major
Compaction
9. 9
Hbase During Major Compaction.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Latency(us)
Time
UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us)
Read/Update
Latency
Comparison of
Non-QoS vs. QoS
Policy
~45% for Read
Improvement
Switch Buffer
Usage
With Network
QoS Policy to
prioritize Hbase
Update/Read
Operations
every 24 hours HBase wakes up and has this stampede of elephants that does this
massive push into HDFS.
10. 10© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required.
Validated 96 Node Hadoop Cluster
• Network
Three Racks each with 32 nodes
Distribution Layer – Nexus 7000 or
Nexus 5000
ToR – FEX or Nexus 3000
2 FEX per Rack
Each Rack with either 32 single or
dual attached host
• Hadoop Framework
Apache 0.20.2
Linux 6.2
Slots – 10 Maps & 2 Reducers per node
• Compute – UCS C200 M2
Cores: 12
Processor: 2 x Intel(R) Xeon(R)
CPU X5670 @ 2.93GHz
Disk: 4 x 2TB (7.2K RPM)
Network: 1G: LOM, 10G: Cisco UCS P81E
Name Node
Cisco UCS C200
Single NIC
2248TP-E
Nexus 5548 Nexus 5548
Data Nodes 1 – 48
Cisco UCS C 200 Single NIC
…
Data Nodes 49- 96
Cisco UCS 200 Single NIC
…
Traditional DC Design Nexus 55xx/2248
2248TP-E
Name Node
Cisco UCS C 200
Single NIC
Nexus 7000 Nexus 7000
Data Nodes 1 – 48
Cisco UCS C 200 Single NIC
…
Data Nodes 49 - 96
Cisco UCS C 200 Single NIC
…
Nexus 3000
Nexus 3000
Nexus 7K-N3K based Topology
11. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 11
12. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 12
• companies are often challenged by the complexities of traditional
server solutions.
• Big data solutions must enable high performance and scale as the
business demands.
• To meet these requirements Cisco designed a comprehensive
solution: Cisco® Common Platform Architecture (CPA) for Big
Data.
• Cisco CPA for Big Data includes compute, storage, connectivity,
and unified management features that enable rapid deployment,
predictable performance, and reduced total cost of ownership
(TCO).
• In addition to these benefits, Cisco CPA for Big Data offers unique
data and management integration with enterprise applications
hosted on the Cisco Unified Computing System™ (Cisco UCS®)
13. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 1313
TECHNICAL LEADERSHIP MARKET MOMENTUM
• Unified Infrastructure
• Management Automation
• Design Flexibility
• Optimize for virtualization.
• Best Cloud Infrastructure
• 61 industry benchmark world
records
• $2 billion revenue run rate
• 20,000 customers: almost
50% of Fortune 500
• #2 US blade server market
share by revenue
• #3 WW blade server market
share by revenue
• More than 200 customers in
Israel.
14. © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
UCS 6200 Series
Fabric Internments:
High speed connectivity and
management, integration
with enterprise application
on blades
Nexus 2232 Fabric Extenders:
Scalability at lower cost
UCS Manager
UCS 240 M3 Servers:
Compute, storage
LAN, SAN, Management
Building Blocks
Cisco Big Data Common Platform (CPA) is a highly scalable architecture
designed to meet variety of scale-out application demands
UCS Central
15. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 15
Big Data
High Performance
Rack
UCS-EZ-BD-HP
(2) UCS 96-Port 6296
Fabric Interconnect
(2) Nexus 2232 PP
(16) UCS C240 M3 Servers
w/ dual Intel Xeon E5-2665
2.4 GHz Processors, 256GB
of Memory, 1 x Mega RAID
9266-CV-8i Card, 24 x 1TB
7.2K SATA HDDs
MPP
High Performance
Half-Rack
UCS-EZ-BD-MPP
(2) UCS 96-Port 6248 Fabric
Interconnect
(2) Nexus 2232 PP
(8) UCS C240 M3 Servers w/
dual Intel Xeon E5-2690 2.9 GHz
Processors, 256GB of Memory, 1
x Mega RAID 9266-CV-8i Card,
24 x 600GB 10K SAS HDDs
Solution Bundles
Big Data
High Capacity
Rack
UCS-EZ-BD-HC
(2) UCS 96-Port 6296
Fabric Interconnect
(2) Nexus 2232 PP
(16) UCS C240 M3
Servers w/ dual Intel Xeon
E5-2640 2.5 GHz
Processors, 128GB of
Memory, 1 x Mega RAID
9266-CV-8i Card, 12 x
3TB 7.2K SAS HDDs
Storage Density Optimized;
Low $/TB (under $500/TB)
Balanced Compute and IO Bandwidth;
Price-Performance Optimized
High Performance Compute and IO
Bandwidth and IOPS (under $10K/GBPS)
Optimized for Cost, Tested and Validated for Performance and Rapid Deployments
Additional Racks
2 x N2K-UCS2232PF
16 x UCS-EZ-C240-2665
Additional Racks
2 x N2K-UCS2232PF
16 x UCS-EZ-C240-2640
Additional Servers
UCS-EZ-C240-2690
16. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 16
Performance
Optimized (SAS)
Capacity
Optimized (SAS)
Price-Performance
Optimized (SATA)
C240 M3 (SFF) C240 M3 (SFF) C240 M3 (LFF) C220 M3 (SFF)
RU 2 2 2 2
CPU E5-2690 E5-2665 E5-2640 E5-2680
Cores 16 16 12 16
Memory 256GB 256GB 128GB 256GB
Disk Drives
24 x (300 GB 15K,
600GB 10K,
900GB 15K)
24 x 1TB 7.2K 12 x 3TB 7.2K External
Compute Units
NOSH
Compute
17. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 17
Differentiation 0:
Big Data Benefits
• Unified Management - UCS Manager
• Unified Fabric - “Single Wire Management”
• Seamless management integration and data integration
• Direct SAN access
6200
Fabric A
6200
Fabric B
B200 B200
CNA
F
E
X
A
CNA CNA
F
E
X
B
F
E
X
A
F
E
X
B
SAN A SAN BETH 1 ETH 2
MG
MT
MG
MT
Chassis 1 Chassis 2
Fabric Switch
Fabric Extenders
Uplink
Ports
Compute Blades
Half/Full Width
OOB
Mgmt
Server Ports
Virtualized Adapters
6200
Fabric A
6200
Fabric B
B200 Blade
CN
A
F
E
X
B
F
E
X
A
SAN A SAN BETH 1 ETH 2
M
G
M
T
M
G
M
T
Chassis 1
Fabric
Switch
Fabric Extenders
Uplink
Ports
Compute Blades
Half/Full Width
OOB
Mgmt
Server Ports
Virtualized Adapters
C240 Rack
Mount
CNA
FEX A FEX B
18. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 18
Big Data
• Dozens to 100s of severs are typical
• 20–50% annual growth
UCSM Enables
• Global view of the cluster
• Proactive monitoring of health
• 1 Click software bios and firmware upgrades
• 1 Click bios setting
• 1 Click tunables like jumbo frames
UCS Central Enables
• Scaling to large cluster
• Application isolation
Unified Management
A Single Unified System
For Blade and Rack Servers
C-Series Rack
Optimized Servers
Differentiation 1:
19. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 19
Big Data Benefits
• Optimized service profile template for CPA enable quick and consistent
deployments
• One click power shell script to configure CPA.
LAN
SAN
•RAID settings
•Disk scrub actions
•Number of vHBAs
•HBA WWN assignments
•FC Boot Parameters
•HBA firmware
•FC Fabric assignments for
HBAs
•QoS settings
•Border port assignment per
vNIC
•NIC Transmit/Receive Rate
Limiting
•VLAN assignments for NICs
•VLAN tagging config for NICs
•Number of vNICs
•PXE settings
•NIC firmware
•Advanced feature settings
•Remote KVM IP settings
•Call Home behavior
•Remote KVM firmware
•Server UUID
•Serial over LAN settings
•Boot order
•IPMI settings
•BIOS scrub actions
•BIOS firmware
•BIOS Settings
LAN
SAN
Traditional UCS Service Profile
Differentiation 2:
20. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 20
Big Data Benefits
• “Single Wire Management”
• Fully redundant active-active fabric cluster interconnect
• Can be configured for direct SAN access
Traditional Unified Fabric
10 GE Ethernet
Cisco VIC Technology
66% Less Switch Ports and Cables
Differentiation 3:
21. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 21
Data Center Applications Big Data Applications
Unified Fabric
Unified Management
Integrated
Data
Management
Data Integration Using Connectors
Data Feeds
Cisco Big Data Common Platform
Architecture
Using C-Series Rack-Mount Servers
Cisco UCS B-Series
Blade Servers
SAN
Array
Cisco UCS Big Data Common Platform Architecture:
Extending Enterprise Application Ecosystem to Big Data
Hadoop
NoSQL
MPP Database
RN
Differentiation 4:
22. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 22
No Additional Switching for up to 10 Racks (160 Servers)
10,000 using UCS Central
Example Configuration:
Servers Per Domain
(Pair of Fabric Interconnects)
North-Bound Bandwidth
(GBits/sec)
Any Node to Any Node Bandwidth
(GBits/sec)
160 320 10
144 480 10
128 640 10
Differentiation 5:
23. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 23
24. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 24
• Tested and Validated Reference Architectures
• Joint engineering Lab
• Solution Bundles
• Technical Collaterals
NoSQL Database
No SQL MPP Databases
Hadoop
25. workload automation facilitates the flow of data
costs
Twitter
Feeds
Map Reduce
Hive
BI Analytics
SQL
Sqoop
Map Reduce
Map Reduce
Call logs
Web Clicks
Gather Data Data IntegrationLoad Data Data Analysis
Report Generation
and Distribution
Web Services
SSH
DB/JDBC
ERP/CRM
Data Mover
Sqoop
MapReduce
Informatica
Hive
Sqoop
Informatica
Business Objects
Cognos
Web Services
26. © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
Reports
Dashboards
Analytics
OLAP
Alerts
ERP Applications
DB
CRM Applications
DB
DW
DW
ERP/CRM Apps
& Databases
Data Exchange
System(s)
ETL/DW/Big Data/BI
Systems & Applications
Manages Enterprise Workloads
DW
DataIntegration
Business Intelligence
Application(s)
File Drop Box
FTP/SFTP/FTPS
Saas, AWS
FTP Server
DB
API Feeds
(Twitter, FB, LI etc)
Big
Data
27. © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
Data
Acquisition
Data load Analysis of
Sales Data
Export to
Enterprise
Generate
Report
1
1
2
2
3
3 4
4
28. © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
Integrated
Cisco UCS
Server
Management
Integrated
Network
Management w/
Fabric
Interconnect
and Nexus
Switches
Integrated
Data
Management
Cisco UCS
B-series
Cisco UCS
C-series
w/ Direct Attach
Storage
Data Center Applications Big Data Applications
Cisco Workload Automation Delivers Automated
Business Processing Abstraction Layer
Data Feeds
Big Data
Jobs
Data Center Applications
Automated Backup and
Storage
In/out of Big Data Grids
Rapid error free deployment – service profile
Maintenance activities like BIOS, FW upgrade across the cluster
Monitoring the health, power
Seamless data movement
30. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 30
• Hadoop has many building blocks…At the core it is an architecture to
store and process unstructured and semi-structured data…
Hadoop Distributed File System
(HDFS)
At the base is a
Self-healing
clustered storage
system.
Map-Reduce
Distributed Data
Processing
PIG Hive Sqoop
Top level
abstractions
Top level
Interfaces ETL Tools
BI
Reporting
RDBMS
HBASE
Database with
Real-time
access
31. © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31
Extreme Performance
Optimized for fast query
execution and unmatched
data loading
Elastic Scalability
Expand capacity and
performance
Highly Available
Fully redundant and
reliable configuration
Unified Networking
Converged data and
management plane
networking
Rapidly Deployable
Pre-validated configuration,
rapid deployment via
service profiles
Unified Management
Power of UCS Manager
to manage the compute,
networking, I/O
Industry Leading
Partnerships
Joint solutions with major
software players
Enterprise Application
Integration
Seamless integration
with enterprise
applications on blades