Big data talk barcelona - jsr - jc

Architectural Considerations for Big
Data Workloads on OpenStack
OpenStack Summit, Barcelona
October 27, 2016
Jonathan Chiang - Cloud Architect, Comcast
James Saint-Rossy - Principal Engineer, Comcast

Introductions
James Saint-Rossy, Principal Engineer
2
Jonathan Chiang, Cloud Architect
Chris Power, MIA

Agenda
•What we do
•Comcast’s journey with OpenStack
•Big data use cases at Comcast
•Our application profiles
•Key Objectives of Modern workloads
•Disaggregated vs Hyper-Converged
•Recommended Approaches for the different use cases
•HDFS and S3 working together

4
A Fortune 50 Company Uniquely Positioned at
the Intersection of Media and Technology
TV, Internet, Voice and Home
Cable Networks
Film
Broadcast Television
Theme Parks

Stretching the Comcast Elastic Cloud | Our
Journey with OpenStack
•Petabyte of Memory and One Million vCPU Cores in 2016
•Multi-Petabyte Ceph Block and Object Storage
•Multi-Terabyte SSD Block Storage
•Deployed across 34 Regions
• National and Regional Data Centers
•Icehouse Release Today, Moving Directly to Mitaka

Community Contributions
•Lines of code: 95,000
•Commits: 1200
•Core Developers and Reviewers on Multiple Projects
•Since Vancouver Summit (Kilo), Comcast has doubled its
upstream contributions

Big Data Use Cases at Comcast
7
Real-time
Telemetry Data
Streaming
Image Recognition
Statistical Data Analysis
Machine Learning
NoSQL Databases
Pulsar

Application Profile
• Designed to be 100% sequential writes, with reads
served from OS page cache
• Writes relatively low IOPS, high throughput, large block
size, and sequential
• Reads from disk can be intermittent depending on the
existence of latent consumers, and when reads occur
they are typically random small block high IOPS reads
• Kafka is somewhat latency sensitive but more tolerant
than a NoSQL database for example

Application Profile
• Internal cloud NoSQL database
• Medium/high IOPS, small block sizes, random reads
and writes
• Designed to support low latency read and write use
cases, therefore latency sensitive
• Mixture of reads and writes and block size is use case
dependent, typical observed distribution in standard
key-value cluster is 70r/30w
Pulsar

Application Profile
• HDFS Data Node – low IOPS, very large blocks,
sequential reads and writes, not extremely latency
sensitive
• YARN NodeManager Temp Space – medium IOPS,
higher throughput, tends towards more random write
patterns, slightly more sequential read patterns
• High Performance Admin Nodes (name nodes,
journal/zookeeper nodes) high IOPS, small block size,
random reads and writes, these nodes typically
perform better and improve overall cluster performance
with high performance storage

Key Objectives for Modern Workloads
• Performance
• Availability, Reliability, Resiliency
• Manageability, APIs, Integrations
• Workload Isolation
• Data Intensive Applications

12
Disaggregated vs Hyper-Converged ….
FIGHT!! "graffiti, Leake Street" by
duncan c
CC BY-NC 2.0

Recommended Approach for Kafka
Divide and Conquer
• Use HDDs for Collectors
• Use SSDs for
Aggregates

Recommended Approach for Pulsar
Disaggregated if:
• Can Handle high number of IOPS
• Meet the capacity
• Network latency issues can be mitigated
Hyper-Converged if:
• Compute has local SSDs/NVMEs
• Enough capacity
Pulsar

HDFS Advantages (Hyper-Converged
Storage)
• Native to Hadoop
• Fast
• Data Locality
• Less Network Traffic
• Compatibility
• Large Files
16

S3 Advantages (Disaggregated)
• Scalability
• Durability
• Persistence
• Price
• Flexibility
17
Swift
RGW

HDFS and S3 Together
• S3
• Data Ingest Storage
• Results Storage
• HDFS
• Transient Storage
• Alternative storage formats Parquet/ORC
18

HDFS
and
S3
Working
Together
Stuff...
19Credit: NASA

Testing and Validation
Approach
The test plans for each application platform are designed to represent typical use cases for those
applications and test their performance, latency, and storage capacity.
Hadoop Big Data Platform
• Benchmark Tools
• Application Testing
Kafka Stream Data Platform
• Use internally developed automation to deploy and test Kafka clusters.
• Test Configuration and Scenarios
• ZooKeepers

Operational Considerations
22
"Space Shuttle
Endeavour's Control
Panels" by Steve
Jurvetson
CC BY 2.0

Operations and Support at Scale
23
• Noisy Neighbor - Which one?
• Where is the handoff between Ops and Engineering?
• Do you have Devops?
• When things start to break
• Synthetic workloads

Recap
• Application Profiles
• Our solutions
• Storage Recommendations
• Operational Considerations

HDFS Implementation
• 3 replicas
• Ephemeral storage on compute node
• Nothing Fancy
26

"A wall of hard drives!" by Scott Schiller
CC BY 2.0

Network Considerations
• Does Hadoop know about your network
• S3 implementation as close as possible
• Where is your data coming from
29

Multiple Approaches to Infrastructure
Hyper-Converged Disaggregated

S3 Implementation
• Ceph??
• Strong Consistency
• Might already be there
• Uses Proxy between S3 and librados
• Swift??
• Native performance
• Under Openstacks big tent
• Focused on object storage
• AWS
• No infrastructure setup
• Reliability
• Easy scaling and Capacity Planning
31

Big data talk barcelona - jsr - jc

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Big data talk barcelona - jsr - jc

Similaire à Big data talk barcelona - jsr - jc (20)

Dernier

Dernier (20)

Big data talk barcelona - jsr - jc