Partha Seetala is the CTO of Robin Systems, which provides a Kubernetes platform for running big data, NoSQL, database, and AI/ML workloads. Robin addresses challenges with containerizing these applications, such as resource management and storage and networking issues. Robin's solution allows applications to drive infrastructure configuration for improved user experience with capabilities like one-click provisioning, scaling, cloning, backup, and migration of applications across clouds.
2. Who am I?
SAMPLE CUSTOMER DEPLOYMENTS
11 billion security events ingested and analyzed a day
(Elasticsearch, Logstash, Kibana, Kafka)
6 Petabytes under active management in a single Robin cluster
(Cloudera, Impala, Kafka, Druid)
400 Oracle RAC databases managed by a single Robin cluster
(Oracle, Oracle RAC)
CTO of Robin Systems, before that Distinguished Engineer at Veritas/Symantec
We have solved some fundamental problems to enable containers and Kubernetes for running
complex Big Data, NoSQL, Database and AI/ML workloads
Robin is The Kubernetes platform for big data, databases and AI/ML
4. Why containerize: developers perspective?
1. Dislike opening an IT ticket and wait weeks for their apps to be ready for use
They want their apps to be available now
2. Want to experiment with tools, but dislike the complexity of setting them up
For example, which of RDBMS, NoSQL, DocumentDB, or GraphDB is better for the app?
3. Want to run apps where it makes the most sense
Their laptop, on prem datacenter, public cloud, etc
Containers offer deployment agility and infrastructure independence
6. Why containerize – infrastructure perspective?
$34 K
Utilization worsens with every
hardware refresh
$141 K$25 K
4 years ago Today
CPU 20 Cores 40 Cores
Memory 128 GiB 512 GiB
Storage 48 TiB 144 TiB
Network 2x10 Gbe 2x40 Gbe
CPU 24 Cores
Memory 256 GiB
Storage 540 TiB
Network 4x40 Gbe
CPU 36 Cores
Memory 768 GiB
Storage 122 TiB
Network 2x100 Gbe
GPU 8x NVIDIA V100
Modern hardware offers a lot more resources per
rack unit which must be kept busy to realize RIO
Containers allow you to maximize infrastructure utilization
7. Why Containers, not Virtualization?
› Get the benefits of virtualization without any of its overhead
› Containers run applications directly on baremetal without virtualizing hardware
› A resource given to a hypervisor is a resource that is taken away from your Big Data application
› Applications are being packaged and shipped as container images, not VMs
› You must leverage and adapt to this shift in application packaging
› Containers avoid the need for specialized storage stacks for deduplicating VM images
8. What are the challenges with containerizing
Big Data, NoSQL and Databases?
9. Challenges with containers
Incomplete cgroups virtualization causes many Big Data and Databases to misbehave
CPU
› Contiguous core IDs, CPU ID mapping (Kudu), accurate threads:cores mapping (DB)
› NUMA aware assignment (HANA)
Memory:
› JVM sees entire host memory even if you cap the memory for container (Any JVM app)
› Memory allocation inconsistencies (hugepages, shared page cache) (Oracle)
Storage
› Apps that need raw block devices need correct WWNs management (e.g., Oracle, MapR)
› blkio cgroups setting is useless to avoid noisy neighbor problems (All apps)
Confidential – Restricted Distribution
10. Challenges with container orchestration platforms
Very opiniated and architected with a microservices-oriented philosophy
› Expects that apps can be brought up trivially within milliseconds during crash recovery
› Scale by adding more containers and registering with a load balancer to spread load around
› Recommend modeling your app as a collection of stateless containers, each serving a single service
But you are dealing with applications that have decades of built in assumptions
› Big Data and databases are not written as a micro-services applications
› You can’t stop and restart them rapidly
› You have to worry about both storage and network state for ensuring high availability
› There is significant investment in custom scripting that assume SSH access to hosts running apps
Confidential – Restricted Distribution
11. Storage and Networking challenges
2018 CNCF survey says Storage and Networking are the biggest challenges in Kubernetes
https://www.cncf.io/blog/2017/06/28/survey-shows-kubernetes-leading-orchestration-platform
48%
44%
12. Storage and Networking challenges
› Latest 2018 CNCF: 48% say Storage is a big challenge, 44% say Networking is a challenge in Kubernetes
› There are 27 Storage vendors and 21 Network vendors providing Storage & Networking solutions for
containers and Kubernetes1
1 https://github.com/cncf/landscape
Despite so many vendor solutions, why is it still a challenge for so many people?
Storage vendors Network vendors
13. Operational challenges to overcome
Storage
› Performance un-predictability when consolidating Big Data, and Database apps
› Data locality requirements (both performance and datacenter network bandwidth constraints)
› Anti/affinity and isolation constraints
Networking
› Services running inside K8S are often times consumed by applications running in different L3 subnets
› Putting a load-balancer in between apps and services is unnecessary and less performant for most
Big Data, NoSQL, and Database applications
› 90% of the apps being used in real-life require IP address to be preserved during restarts
Spending time setting up Storage and Networking is a drag on user productivity
14. Don’t miss the forest for the trees
Users
Applications
Infrastructure
Most vendors are looking at the
problem in this direction
Whereas we should be
looking at it in this direction
Focus on User-
App Interaction
Let apps drive
infrastructure to meet
user requirements
Focus is on Infra
components
StatefulSets, Deployments,
Persistent Volume Claims,
Services, CSI, CNI, HPA
15. Can Kubernetes alone get us to the promised land?
MANAGEMENT
(kubectl, helm)
SERVICES
(Ingress, Proxy, LB)
STORAGE
(CSI)
NETWORKING
(CNI, Overlay)
MONITORING
(Heapster, HPA)
CONFIGURATION
(ConfigMap, Secrets)
UI
SERVER INFRA
(Baremetal, On-prem VM,
AWS, Azure, GCP)
DATA
(PV, PVC)
TROUBLESHOOTING
(Logging, Events)
CONTAINERS
(docker, LXC)
16. Time to reframe our thinking
Let applications drive infrastructure to meet user requirements
(in this model application workflows configure Kubernetes, Networking and Storage)
17. Robin is The Kubernetes platform for big data, databases and AI/ML
Integrated
App-aware Storage
Docker, LxC,
Kubernetes
Integrated
Networking
Application-aware
Workflow Manager
+ + +
Application workflows configure Kubernetes, Networking and Storage
18. When you elevate your thinking to Applications
You do less of this
› Deployments, ReplicaSets, and StatefulSets
› Persistent Volumes and Claims
› Service endpoints, proxy
› Ingress and Egress routes
› Secrets and Configmaps
› Heapster, CSI, CNI, and CRI
And do more of this
› Time-travel application states
› Clone entire Applications with their data
› Backup and restore entire apps, any app
› Upgrade applications in a failsafe manner
› Control QoS of apps to meet performance SLAs
› Make applications and data mobile across clouds
19. Give your users a managed service experience
SPECIFY
DATA-LOCALITY,
ANTI/AFFINITY
CONSTRAINTS AND
PLACEMENT HINTS
ENABLE
SERVICE
COMPONENTS
SPECIFY COMPUTE
SPECIFY STORAGE
SPECIFY SCALE
Just minutes from click to use
64 node Hadoop Cluster with
1408 CPU Cores, 4.5 TB of Memory,
1.5 PB of Storage takes just 23
mins
Services enabled: Atlas, Spark, Hive,
Kerberos, Sentry, HDFS, namenode
HA
K8S components auto created
(StatefulSets, PVC, Services, …)
Data-locality, anti/affinity policies
enforced
Any Big Data, NoSQL,
Database, AI/ML app
20. Adjust resources to meet changing priorities
› Application priorities change with time
› Faster ingest during daytime
› Faster querying for end-of-quarter reporting
› Trade resources between adjacent applications
dynamically
› Adjust CPU, Memory, GPU, Network and IOPs
dynamically
› Scaling resources vs scaling entire service
› K8S’ Horizontal Pod Autoscaler (HPA) is not
suitable for data applications:
› Works by adding more Pods to scale horizontally
› Great for stateless apps
› Not so good for Big Data, NoSQL and Databases
› Results in data rebalancing which is a costly and
permanent. Scaling down is very hard.
Kafka
Hadoop1
Hadoop2
Druid
Assign each app its
own resource quota
(CPU, Mem, IOPS)
Shift resources
from Hadoop2
to Hadoop1
with 1-Click
Shift resources
from Hadoop1
to Hadoop2
with 1-Click
8 AM 3 PM12 AM 11 PM
21. Application-centric resource management and QoS
› We enhanced K8S’ cgroups management
capabilities
› More comprehensive procfs, and sysfs
virtualization
› Virtualize sysinfo(2) system call
› We implemented an application-topology
aware MIN and MAX storage QoS
› Predictable performance for mission-critical
workloads
› Eliminate noisy-neighbor challenges when
consolidating workloads
Robin Application-aware Storage
Hadoop
Mongo
DB
Kafka MySQL Postgres
Mongo
DB
IO IO IO
Postgres
23. Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
24. Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
1-click Application-consistent Snapshots
Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4 Current
25. Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
Snapshot 1
4 months ago
Snapshot 2
2 weeks ago
Snapshot 3
3 days ago
Snapshot 4
yesterday
Current
now
1-click Ready-to-use Clones
RoW based Cloning (Ultra fast)
Clone gets different network identify
26. Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
27. Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
28. See demos at booth
G3
Robin is The Kubernetes platform for big data, databases and AI/ML
www.RobinSystems.com
1-click Provision
1-click Scale
1-click QoS Control
1-click Snapshots
1-click Clones
1-click Backup
1-click Upgrade
1-click Migrate
Notes de l'éditeur
Containers are taking over the world by storm. Everyone seems to be doing them. They are the next big thing since virtualization. Most software vendors are now releasing their software as a docker image. Heck even Microsoft has released SQLServer for Linux as a docker image. It seems that the industry has accepted that going forward software will be shipped and run inside containers.
So it is only natural to ask – how about running Hadoop inside containers.
Containers are taking over the world by storm. Everyone seems to be doing them. They are the next big thing since virtualization. Most software vendors are now releasing their software as a docker image. Heck even Microsoft has released SQLServer for Linux as a docker image. It seems that the industry has accepted that going forward software will be shipped and run inside containers.
So it is only natural to ask – how about running Hadoop inside containers.