SlideShare une entreprise Scribd logo
1  sur  39
Running secured Spark job in
Kubernetes compute cluster and
integrating with Kerberized HDFS
Joy Chakraborty
June 21, 2018
Email: joychak1@[yahoo/gmail].com]]
• To run Spark job on Elastic
Compute platform (Kubernetes)
accessing data stored in HDFS
Essentially what we will be doing -
3
Who am I ???
I learn and apply ….
- Design and write software for living
- for last 18 years …
4
Disclaimer
We will be just scratching the Surface !!!
Agenda
5
Kubernetes as Elastic Compute1
HDFS as secured distributed storage2
Configuring Spark to run in Kubernetes & accessing HDFS3
Demo - Build & setup the Kubernetes/Spark/HDFS environment4
6
Why Kubernetes?
7
Compute Requirements
8
Compute Requirements
• Support Elasticity
• Flexibility and variability in work load
• Process massive amount of data in parallel
• Support Reliability & Multitenancy
• Ease in Accessibility without compromising
Security
9
Distributed
Containerized
10
• Container Management
• Scheduling
• Resource Management
Distributed Containerized System
11
• Container Management
• Scheduling
• Resource Management
Kubernetes
Distributed Containerized System
12
what?
13
Kubernetes
An open-source system for automating deployment, scaling,
and management of containerized applications.
• Portable
• Extensible: modular, plug-able, compos-able
• Self-healing: auto-placement, auto-restart, auto-
replication, auto-scaling
• Latest Release: 1.8.13
• Community driven completely
• Written in GoLang
14
Kubernetes High Level Architecture
15
Kubernetes High Level Architecture
• The Kubernetes Master is a collection
of three processes that run on a single
node in your cluster, which is
designated as the master node.
• kube-apiserver
• kube-controller-manager
• kube-scheduler
• Each individual non-master node in
your cluster runs two processes –
• kubelet, which communicates with
the Kubernetes Master
• kube-proxy, a network proxy which
reflects Kubernetes networking
services on each node.
Kube-apiserver
Kube-Controller-manager
16
Kubernetes - basic
components
17
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
18
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
19
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
• RBAC
20
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
• RBAC
*** Custom Resource
21
Kubernetes Pod
• Pod is the basic building block of Kubernetes–
• the smallest and simplest unit in the Kubernetes object model
that you create or deploy
• represents a running process on your cluster.
• encapsulates an application container (or, in some cases,
multiple containers), storage resources, a unique network IP,
and options that govern how the container(s) should run
• Docker is the most common container runtime used in a Pod,
but Pods support other container runtimes as well
22
How to Interact with Kubernetes
23
How to Interact with Kubernetes
• REST API
• Curl or browser
• Command Line Interface
• Kubectl
• Kube Config
• Created during cluster
creation
• Programmatic
• GoLang, Python, Scala
24
How Secured
(kerberized) HDFS
works?
25
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Secured/Kerberized HDFS
26
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
2. Client-app requests Ticket
3. KDC sends TGT
1. Service
Principles/Keys
6. Sends Service Ticket and requests for Authentication
5. User Authenticated
using Service Principle/key
Retrieves
User roles/permissions
Secured/Kerberized (keytab) HDFS
27
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
2. Client-app requests Ticket
3. KDC sends TGT
1. Service
Principles/Keys
6. Sends Service Ticket and requests for Delegation Token
7. User Authenticated
using Service Principle/key
Retrieves
User roles/permissions
Secured/Kerberized (delegation token) HDFS
8. Name node sends delegation token
28
Spark + Kubernetes +
HDFS
29
Spark Running in Kubernetes
Kubernetes ClusterKube Master
30
Spark Running in Kubernetes
Kubernetes Cluster
Spark-Submit
(--master k8s://…)
Kube Master
31
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Spark-Submit
Spark
Master
Kube Master
32
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
33
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
34
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Kube Secret
Keytab
35
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Kube Secret
Keytab
36
Lets build the
Kubernetes/Spark/HDFS
environment !!!
37
What to use
• To install Kubernetes
• vagrant-kubeadm (https://github.com/c9s/vagrant-kubeadm)
• Run local Docker Hub (https://docs.docker.com/registry/deploying )
• Run Docker registry container and map localhost to
some ip (10.0.2.2)
• Build Spark Docker image from Spark download Dockerfile
under Kubernetes directory (spark-2.3.0-bin-hadoop2.7/ kubernetes)
• Publish the image to Docker hub
• Run Spark-submit
38
Demo !!!
Email: joychak1@[yahoo/gmail].com]
Thank You
39

Contenu connexe

Tendances

Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)DataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerDataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?DataWorks Summit
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon GershinskyDatabricks
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreDataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceDinesh Chitlangia
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsDataWorks Summit
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at UberDataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Spark Summit
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 

Tendances (20)

Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 

Similaire à Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS

01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMwareVMUG IT
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningChris McEniry
 
Container orchestration k8s azure kubernetes services
Container orchestration  k8s azure kubernetes servicesContainer orchestration  k8s azure kubernetes services
Container orchestration k8s azure kubernetes servicesRajesh Kolla
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBitnami
 
Nodeless and serverless kubernetes
Nodeless and serverless kubernetesNodeless and serverless kubernetes
Nodeless and serverless kubernetesNills Franssens
 
Kubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-HassanKubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-HassanSyed Murtaza Hassan
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmugVMUG IT
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices worldKarol Chrapek
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfLibbySchulze
 
Pro2516 10 things about oracle and k8s.pptx-final
Pro2516   10 things about oracle and k8s.pptx-finalPro2516   10 things about oracle and k8s.pptx-final
Pro2516 10 things about oracle and k8s.pptx-finalMichel Schildmeijer
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJEDB
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms Redis Labs
 
Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.Jooho Lee
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Krishna-Kumar
 

Similaire à Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS (20)

01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground Running
 
Container orchestration k8s azure kubernetes services
Container orchestration  k8s azure kubernetes servicesContainer orchestration  k8s azure kubernetes services
Container orchestration k8s azure kubernetes services
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
 
Nodeless and serverless kubernetes
Nodeless and serverless kubernetesNodeless and serverless kubernetes
Nodeless and serverless kubernetes
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev
 
Kubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-HassanKubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-Hassan
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmug
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Kubermatic.pdf
Kubermatic.pdfKubermatic.pdf
Kubermatic.pdf
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdf
 
Pro2516 10 things about oracle and k8s.pptx-final
Pro2516   10 things about oracle and k8s.pptx-finalPro2516   10 things about oracle and k8s.pptx-final
Pro2516 10 things about oracle and k8s.pptx-final
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms
 
Introduction to Virtual Kubelet
Introduction to Virtual KubeletIntroduction to Virtual Kubelet
Introduction to Virtual Kubelet
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
 
Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS

  • 1. Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS Joy Chakraborty June 21, 2018 Email: joychak1@[yahoo/gmail].com]]
  • 2. • To run Spark job on Elastic Compute platform (Kubernetes) accessing data stored in HDFS Essentially what we will be doing -
  • 3. 3 Who am I ??? I learn and apply …. - Design and write software for living - for last 18 years …
  • 4. 4 Disclaimer We will be just scratching the Surface !!!
  • 5. Agenda 5 Kubernetes as Elastic Compute1 HDFS as secured distributed storage2 Configuring Spark to run in Kubernetes & accessing HDFS3 Demo - Build & setup the Kubernetes/Spark/HDFS environment4
  • 8. 8 Compute Requirements • Support Elasticity • Flexibility and variability in work load • Process massive amount of data in parallel • Support Reliability & Multitenancy • Ease in Accessibility without compromising Security
  • 10. 10 • Container Management • Scheduling • Resource Management Distributed Containerized System
  • 11. 11 • Container Management • Scheduling • Resource Management Kubernetes Distributed Containerized System
  • 13. 13 Kubernetes An open-source system for automating deployment, scaling, and management of containerized applications. • Portable • Extensible: modular, plug-able, compos-able • Self-healing: auto-placement, auto-restart, auto- replication, auto-scaling • Latest Release: 1.8.13 • Community driven completely • Written in GoLang
  • 14. 14 Kubernetes High Level Architecture
  • 15. 15 Kubernetes High Level Architecture • The Kubernetes Master is a collection of three processes that run on a single node in your cluster, which is designated as the master node. • kube-apiserver • kube-controller-manager • kube-scheduler • Each individual non-master node in your cluster runs two processes – • kubelet, which communicates with the Kubernetes Master • kube-proxy, a network proxy which reflects Kubernetes networking services on each node. Kube-apiserver Kube-Controller-manager
  • 17. 17 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume
  • 18. 18 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job
  • 19. 19 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job • RBAC
  • 20. 20 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job • RBAC *** Custom Resource
  • 21. 21 Kubernetes Pod • Pod is the basic building block of Kubernetes– • the smallest and simplest unit in the Kubernetes object model that you create or deploy • represents a running process on your cluster. • encapsulates an application container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run • Docker is the most common container runtime used in a Pod, but Pods support other container runtimes as well
  • 22. 22 How to Interact with Kubernetes
  • 23. 23 How to Interact with Kubernetes • REST API • Curl or browser • Command Line Interface • Kubectl • Kube Config • Created during cluster creation • Programmatic • GoLang, Python, Scala
  • 25. 25 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node Secured/Kerberized HDFS
  • 26. 26 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node 2. Client-app requests Ticket 3. KDC sends TGT 1. Service Principles/Keys 6. Sends Service Ticket and requests for Authentication 5. User Authenticated using Service Principle/key Retrieves User roles/permissions Secured/Kerberized (keytab) HDFS
  • 27. 27 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node 2. Client-app requests Ticket 3. KDC sends TGT 1. Service Principles/Keys 6. Sends Service Ticket and requests for Delegation Token 7. User Authenticated using Service Principle/key Retrieves User roles/permissions Secured/Kerberized (delegation token) HDFS 8. Name node sends delegation token
  • 29. 29 Spark Running in Kubernetes Kubernetes ClusterKube Master
  • 30. 30 Spark Running in Kubernetes Kubernetes Cluster Spark-Submit (--master k8s://…) Kube Master
  • 31. 31 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Spark-Submit Spark Master Kube Master
  • 32. 32 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master
  • 33. 33 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node
  • 34. 34 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node Kube Secret Keytab
  • 35. 35 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node Kube Secret Keytab
  • 37. 37 What to use • To install Kubernetes • vagrant-kubeadm (https://github.com/c9s/vagrant-kubeadm) • Run local Docker Hub (https://docs.docker.com/registry/deploying ) • Run Docker registry container and map localhost to some ip (10.0.2.2) • Build Spark Docker image from Spark download Dockerfile under Kubernetes directory (spark-2.3.0-bin-hadoop2.7/ kubernetes) • Publish the image to Docker hub • Run Spark-submit

Notes de l'éditeur

  1. Artificial intelligence is intelligence exhibited by machines. In computer science, the field of AI defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal