Containerized Hadoop beyond Kubernetes

Partha Seetala
CTO, Robin Systems
Containerized Hadoop
beyond Kubernetes

Who am I?
SAMPLE CUSTOMER DEPLOYMENTS
11 billion security events ingested and analyzed a day
(Elasticsearch, Logstash, Kibana, Kafka)
6 Petabytes under active management in a single Robin cluster
(Cloudera, Impala, Kafka, Druid)
400 Oracle RAC databases managed by a single Robin cluster
(Oracle, Oracle RAC)
CTO of Robin Systems, before that Distinguished Engineer at Veritas/Symantec
We have solved some fundamental problems to enable containers and Kubernetes for running
complex Big Data, NoSQL, Database and AI/ML workloads
Robin is The Kubernetes platform for big data, databases and AI/ML

Why containerize Big Data, NoSQL, and Databases?

Why containerize: developers perspective?
1. Dislike opening an IT ticket and wait weeks for their apps to be ready for use
They want their apps to be available now
2. Want to experiment with tools, but dislike the complexity of setting them up
For example, which of RDBMS, NoSQL, DocumentDB, or GraphDB is better for the app?
3. Want to run apps where it makes the most sense
Their laptop, on prem datacenter, public cloud, etc
Containers offer deployment agility and infrastructure independence

Why containerize: infrastructure perspective?
40%
Resource utilization is pretty low on Big Data clusters

Why containerize – infrastructure perspective?
$34 K
Utilization worsens with every
hardware refresh
$141 K$25 K
4 years ago Today
CPU 20 Cores 40 Cores
Memory 128 GiB 512 GiB
Storage 48 TiB 144 TiB
Network 2x10 Gbe 2x40 Gbe
CPU 24 Cores
Memory 256 GiB
Storage 540 TiB
Network 4x40 Gbe
CPU 36 Cores
Memory 768 GiB
Storage 122 TiB
Network 2x100 Gbe
GPU 8x NVIDIA V100
Modern hardware offers a lot more resources per
rack unit which must be kept busy to realize RIO
Containers allow you to maximize infrastructure utilization

Why Containers, not Virtualization?
› Get the benefits of virtualization without any of its overhead
› Containers run applications directly on baremetal without virtualizing hardware
› A resource given to a hypervisor is a resource that is taken away from your Big Data application
› Applications are being packaged and shipped as container images, not VMs
› You must leverage and adapt to this shift in application packaging
› Containers avoid the need for specialized storage stacks for deduplicating VM images

What are the challenges with containerizing
Big Data, NoSQL and Databases?

Challenges with containers
Incomplete cgroups virtualization causes many Big Data and Databases to misbehave
CPU
› Contiguous core IDs, CPU ID mapping (Kudu), accurate threads:cores mapping (DB)
› NUMA aware assignment (HANA)
Memory:
› JVM sees entire host memory even if you cap the memory for container (Any JVM app)
› Memory allocation inconsistencies (hugepages, shared page cache) (Oracle)
Storage
› Apps that need raw block devices need correct WWNs management (e.g., Oracle, MapR)
› blkio cgroups setting is useless to avoid noisy neighbor problems (All apps)
Confidential – Restricted Distribution

Challenges with container orchestration platforms
Very opiniated and architected with a microservices-oriented philosophy
› Expects that apps can be brought up trivially within milliseconds during crash recovery
› Scale by adding more containers and registering with a load balancer to spread load around
› Recommend modeling your app as a collection of stateless containers, each serving a single service
But you are dealing with applications that have decades of built in assumptions
› Big Data and databases are not written as a micro-services applications
› You can’t stop and restart them rapidly
› You have to worry about both storage and network state for ensuring high availability
› There is significant investment in custom scripting that assume SSH access to hosts running apps
Confidential – Restricted Distribution

Storage and Networking challenges
2018 CNCF survey says Storage and Networking are the biggest challenges in Kubernetes
https://www.cncf.io/blog/2017/06/28/survey-shows-kubernetes-leading-orchestration-platform
48%
44%

Storage and Networking challenges
› Latest 2018 CNCF: 48% say Storage is a big challenge, 44% say Networking is a challenge in Kubernetes
› There are 27 Storage vendors and 21 Network vendors providing Storage & Networking solutions for
containers and Kubernetes1
1 https://github.com/cncf/landscape
Despite so many vendor solutions, why is it still a challenge for so many people?
Storage vendors Network vendors

Operational challenges to overcome
Storage
› Performance un-predictability when consolidating Big Data, and Database apps
› Data locality requirements (both performance and datacenter network bandwidth constraints)
› Anti/affinity and isolation constraints
Networking
› Services running inside K8S are often times consumed by applications running in different L3 subnets
› Putting a load-balancer in between apps and services is unnecessary and less performant for most
Big Data, NoSQL, and Database applications
› 90% of the apps being used in real-life require IP address to be preserved during restarts
Spending time setting up Storage and Networking is a drag on user productivity

Don’t miss the forest for the trees
Users
Applications
Infrastructure
Most vendors are looking at the
problem in this direction
Whereas we should be
looking at it in this direction
Focus on User-
App Interaction
Let apps drive
infrastructure to meet
user requirements
Focus is on Infra
components
StatefulSets, Deployments,
Persistent Volume Claims,
Services, CSI, CNI, HPA

Can Kubernetes alone get us to the promised land?
MANAGEMENT
(kubectl, helm)
SERVICES
(Ingress, Proxy, LB)
STORAGE
(CSI)
NETWORKING
(CNI, Overlay)
MONITORING
(Heapster, HPA)
CONFIGURATION
(ConfigMap, Secrets)
UI
SERVER INFRA
(Baremetal, On-prem VM,
AWS, Azure, GCP)
DATA
(PV, PVC)
TROUBLESHOOTING
(Logging, Events)
CONTAINERS
(docker, LXC)

Time to reframe our thinking
Let applications drive infrastructure to meet user requirements
(in this model application workflows configure Kubernetes, Networking and Storage)

Integrated
App-aware Storage
Docker, LxC,
Kubernetes
Integrated
Networking
Application-aware
Workflow Manager
+ + +
Application workflows configure Kubernetes, Networking and Storage

When you elevate your thinking to Applications
You do less of this
› Deployments, ReplicaSets, and StatefulSets
› Persistent Volumes and Claims
› Service endpoints, proxy
› Ingress and Egress routes
› Secrets and Configmaps
› Heapster, CSI, CNI, and CRI
And do more of this
› Time-travel application states
› Clone entire Applications with their data
› Backup and restore entire apps, any app
› Upgrade applications in a failsafe manner
› Control QoS of apps to meet performance SLAs
› Make applications and data mobile across clouds

Give your users a managed service experience
SPECIFY
DATA-LOCALITY,
ANTI/AFFINITY
CONSTRAINTS AND
PLACEMENT HINTS
ENABLE
SERVICE
COMPONENTS
SPECIFY COMPUTE
SPECIFY STORAGE
SPECIFY SCALE
Just minutes from click to use
64 node Hadoop Cluster with
1408 CPU Cores, 4.5 TB of Memory,
1.5 PB of Storage  takes just 23
mins
Services enabled: Atlas, Spark, Hive,
Kerberos, Sentry, HDFS, namenode
HA
K8S components auto created
(StatefulSets, PVC, Services, …)
Data-locality, anti/affinity policies
enforced
Any Big Data, NoSQL,
Database, AI/ML app

Adjust resources to meet changing priorities
› Application priorities change with time
› Faster ingest during daytime
› Faster querying for end-of-quarter reporting
› Trade resources between adjacent applications
dynamically
› Adjust CPU, Memory, GPU, Network and IOPs
dynamically
› Scaling resources vs scaling entire service
› K8S’ Horizontal Pod Autoscaler (HPA) is not
suitable for data applications:
› Works by adding more Pods to scale horizontally
› Great for stateless apps
› Not so good for Big Data, NoSQL and Databases
› Results in data rebalancing which is a costly and
permanent. Scaling down is very hard.
Kafka
Hadoop1
Hadoop2
Druid
Assign each app its
own resource quota
(CPU, Mem, IOPS)
Shift resources
from Hadoop2
to Hadoop1
with 1-Click
Shift resources
from Hadoop1
to Hadoop2
with 1-Click
8 AM 3 PM12 AM 11 PM

Application-centric resource management and QoS
› We enhanced K8S’ cgroups management
capabilities
› More comprehensive procfs, and sysfs
virtualization
› Virtualize sysinfo(2) system call
› We implemented an application-topology
aware MIN and MAX storage QoS
› Predictable performance for mission-critical
workloads
› Eliminate noisy-neighbor challenges when
consolidating workloads
Robin Application-aware Storage
Hadoop
Mongo
DB
Kafka MySQL Postgres
Mongo
DB
IO IO IO
Postgres

Operational challenges for Big Data, NoSQL, Databases
extend beyond just provisioning and scaling

Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud

cloud
1-click Application-consistent Snapshots
Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4 Current

cloud
Snapshot 1
4 months ago
Snapshot 2
2 weeks ago
Snapshot 3
3 days ago
Snapshot 4
yesterday
Current
now
1-click Ready-to-use Clones
RoW based Cloning (Ultra fast)
Clone gets different network identify

See demos at booth
G3
www.RobinSystems.com
1-click Provision
1-click Scale
1-click QoS Control
1-click Snapshots
1-click Clones
1-click Backup
1-click Upgrade
1-click Migrate

Containerized Hadoop beyond Kubernetes

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Containerized Hadoop beyond Kubernetes

Similaire à Containerized Hadoop beyond Kubernetes (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

Containerized Hadoop beyond Kubernetes

Notes de l'éditeur