SlideShare une entreprise Scribd logo
1  sur  28
Running Zeppelin in Production
Vinay Shukla
Product Management, Director
Twitter: @neomythos
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Whoami?
 Product Management
 Spark for 3 + years, Hadoop for 4 years, Zeppelin for 2 years
 Blog at www.vinayshukla.com
 Twitter: @neomythos
 Addicted to Yoga, Hiking, & Coffee
 Smallest contributor to Apache Zeppelin
Programmer > Product Management > Programmer > Product Management
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Apache Zeppelin?
 Browser based access to Big Data
 Make Spark accessible to more users
 Abstract users from dealing with Kerberos
 Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters
 Beautiful Visualization built in, easy to extend
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Zeppelin work?
Notebook
Author
Collaborators/R
eport viewer
Zeppelin
Cluster
Spark | Hive | HBase
Any of 30+ back
ends/Interpreters
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Architecture
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Interpreter Modes
 Basic unit of work is Note
– Note has paragraphs
 3 Modes
– Shared (All notes use same Interpreter process & Interpreter group)
– Scoped (Notes still shared the process, but separate interpreter group, possible to share objects)
– Isolated (Each note runs its own interpeter process & group)
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploying Zeppelin
 Master Node
 Worker Node
 Management Node
 Client/Gateway Node ✔
Node Choices
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Deployment
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
Hadoop Cluster
Hive
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with Spark
Ex
Spark on YARN
Zeppelin
Spark-
Shell
Ex
Spark
Thrift
Server
Livy
REST
Server
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
Spark Driver
Livy
REST
Server
D
r
i
v
e
r
With Livy
Interpreter
Built In Spark
Interpreter
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Security
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authentication + SSL
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security in Apache Zeppelin?
Zeppelin leverages Apache Shiro for
authentication/authorization
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example Shiro.ini
# =======================
# Shiro INI configuration
# =======================
[main]
## LDAP/AD configuration
[users]
# The 'users' section is for simple deployments
# when you only need a small number of statically-defined
# set of User accounts.
[urls]
# The 'urls' section is used for url-based security
#
Edit with Ambari or your
favorite text editor
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LDAP Authentication in Zeppelin
 LDAP Bind
– uid=jsmith,ou=users,dc=mycompany,dc=com
– uid={0},ou=users,dc=mycompany,dc=com
– ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com
 LDAP Search
– ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net
– ldapRealm.contextFactory.systemPassword=SomePassw0rd
– ldapRealm.contextFactory.authenticationMechanism=simple
– ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userObjectClass=person
– ldapRealm.groupObjectClass=group
– ldapRealm.userSearchAttributeName = sAMAccountName
– # Set search scopes for user and group. Values: subtree (default), onelevel, object
– ldapRealm.userSearchScope = subtree
– ldapRealm.groupSearchScope = subtree
– ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0})
– ldapRealm.memberAttribute=member
http://bit.ly/2rMTgLw
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Want to connect to LDAP over SSL?
 Change protocol to ldaps in shiro.ini
ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636
 If LDAP is using self signed certificate, import the certificate into truststore of JVM running
Zeppelin
echo -n | openssl s_client –connect ldap.example.com:389 | 
sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' >
/tmp/examplecert.crt
keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts 
-storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid LDAP password in clear in shiro.ini
 Create an entry for AD credential
–Zeppelin leverages Hadoop Credential API
–hadoop credential create
ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks
Enter password:
Enter password again:
ldapRealm.contextFactory.systemPassword has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
 Make credentials.jceks only Zeppelin user readable
 chmod 400 with only Zeppelin process r/w access, no other user allowed access to
credentials
 Edit shiro.in
 ldapRealm.contextFactory.systemPassword -provider
jceks://etc/zeppelin/conf/credentials.jceks
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid JDBC password in shiro.ini
 Create a credential for JDBC password in Hadoop Credential store
hadoop credential create jdbc.password -provider
jceks://file/user/zeppelin/conf/zeppelin.jceks
 Use the credential in shiro.in
default.jceks.credentialKey jdbc.password
default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks
 Details at JIRA ZEPPELIN-1935
JDBC password only needed
for non-hive ID, Hive leverage
ID propagation
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation in Zeppelin
 Interpreter Dependent
– Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation with Livy
Zeppelin
Spark
Yarn
Livy
Ispark Group
Interpreter
SPNego: Kerberos Kerberos/RPC
Livy APIs
LDAP
John Doe
Job runs as John Doe
LDAP/LDAPS
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization in Zeppelin
 Control access to Note
 Grant Permissions (Owner, Reader, Writer)
to users/groups on Notes
 LDAP Group integration
 Control access to Zeppelin UI
 Allow only admins to configure interpreter
 Configured in shiro.ini
 For Spark with Zeppelin > Livy > Spark
 Identity Propagation Jobs run as End-User
 For Hive with Zeppelin > JDBC interpreter
 Leverage Ranger based Row/Column
Security for Hive SparkSQL
 Shell Interpreter
 Runs as end-user
Authorization in Zeppelin Authorization at Data Level
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Control Who can modify Interpreter Settings
[urls]
/api/interpreter/** = authc, roles[admin_role]
/api/configurations/** = authc, roles[admin_role]
/api/credential/** = authc, roles[admin_role]
 Step 1
– Define Protected URL pattern in Shiro.ini
– Assign URL patterns to a role
 Step 2
– Map role to LDAP group
ldapRealm.rolesByGroup = "hadoop-admins":admin_role
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scalability & HA
 Memory/Core for Zeppelin Server
 Consider (20-30 GB)
 Memory/Core for Zeppelin Interpreter
 (4-8 GB)
 Memory/Core for Livy
 (4-8 GB)
 Memory/Core for Spark
 Depends on Spark Jobs (See Spark
Performance Tuning)
 https://spark.apache.org/docs/latest/tuning
.html
 Horizontal Scaling
 Spin up multiple Zeppelin instance
 Need external load balancer
 Sticky sessions
Scalability HA
Shared Storage
Shared Configuration
Communication between Z & Interpreters
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Using R & Python with Zeppelin
 Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter
 Deploy R/Python binaries on all worker node
 Leverage Livy Interpreter for SparkR & PySpark
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy2 OOB Job as admin user fails
 Scenario: Simple HDP 2.6 Install with default config
 Failure : Livy 2 Interpreter job fails as admin user
 Reason: HDFS dir does not exist /user/admin
 Work Around: Manually create /user/admin with admin:hdfs dir ownership
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy 500 Error with PySpark
 Scenario: Cut & Paste code into Zeppelin
 Failure : Livy interpreter reports 500
 Work Around: Manually type code into Zeppelin Livy interpreter
 Fixed with HDP 2.6.1
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other Zeppelin Livy Interpreter Issues
 matplotlib doesn’t work in Livy pyspark interpreter
 Job progress is not shown in frontend.
 ZeppelinContext is not available in Livy interpreter
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Future Plans
 More Visualization
 More Stability
 More Security
– SSO Integration with Knox
– Zeppelin > Livy over SSL
– Ranger Integration
– Atlas Integration
 Integration with Data Science Experience
 HA & More Collaboration
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
&
Questions
Vinay Shukla
@neomythos

Contenu connexe

Tendances

Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
DataWorks Summit
 

Tendances (20)

Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
 
Introduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And Storm
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Apache Deep Learning 201 - Barcelona DWS March 2019
Apache Deep Learning 201 - Barcelona DWS March 2019Apache Deep Learning 201 - Barcelona DWS March 2019
Apache Deep Learning 201 - Barcelona DWS March 2019
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 
Flink sql for continuous sql etl apps & Apache NiFi devops
Flink sql for continuous sql etl apps & Apache NiFi devopsFlink sql for continuous sql etl apps & Apache NiFi devops
Flink sql for continuous sql etl apps & Apache NiFi devops
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
 
Incrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagicallyIncrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagically
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
 
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
 

Similaire à Running Apache Zeppelin production

Similaire à Running Apache Zeppelin production (20)

Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL Developer
 
Building microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipelineBuilding microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipeline
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Api design and prototype
Api design and prototypeApi design and prototype
Api design and prototype
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Running Apache Zeppelin production

  • 1. Running Zeppelin in Production Vinay Shukla Product Management, Director Twitter: @neomythos
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Whoami?  Product Management  Spark for 3 + years, Hadoop for 4 years, Zeppelin for 2 years  Blog at www.vinayshukla.com  Twitter: @neomythos  Addicted to Yoga, Hiking, & Coffee  Smallest contributor to Apache Zeppelin Programmer > Product Management > Programmer > Product Management
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache Zeppelin?  Browser based access to Big Data  Make Spark accessible to more users  Abstract users from dealing with Kerberos  Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters  Beautiful Visualization built in, easy to extend
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Zeppelin work? Notebook Author Collaborators/R eport viewer Zeppelin Cluster Spark | Hive | HBase Any of 30+ back ends/Interpreters
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Architecture
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Interpreter Modes  Basic unit of work is Note – Note has paragraphs  3 Modes – Shared (All notes use same Interpreter process & Interpreter group) – Scoped (Notes still shared the process, but separate interpreter group, possible to share objects) – Isolated (Each note runs its own interpeter process & group)
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deploying Zeppelin  Master Node  Worker Node  Management Node  Client/Gateway Node ✔ Node Choices
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Deployment Spark on YARN Ex Ex LDAP John Doe 1 2 3 SSL Firewall Hadoop Cluster Hive
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with Spark Ex Spark on YARN Zeppelin Spark- Shell Ex Spark Thrift Server Livy REST Server D r i v e r D r i v e r D r i v e r D r i v e r D r i v e r Spark Driver Livy REST Server D r i v e r With Livy Interpreter Built In Spark Interpreter
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Security
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin: Authentication + SSL Spark on YARN Ex Ex LDAP John Doe 1 2 3 SSL Firewall
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security in Apache Zeppelin? Zeppelin leverages Apache Shiro for authentication/authorization
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example Shiro.ini # ======================= # Shiro INI configuration # ======================= [main] ## LDAP/AD configuration [users] # The 'users' section is for simple deployments # when you only need a small number of statically-defined # set of User accounts. [urls] # The 'urls' section is used for url-based security # Edit with Ambari or your favorite text editor
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP Authentication in Zeppelin  LDAP Bind – uid=jsmith,ou=users,dc=mycompany,dc=com – uid={0},ou=users,dc=mycompany,dc=com – ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com  LDAP Search – ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net – ldapRealm.contextFactory.systemPassword=SomePassw0rd – ldapRealm.contextFactory.authenticationMechanism=simple – ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userObjectClass=person – ldapRealm.groupObjectClass=group – ldapRealm.userSearchAttributeName = sAMAccountName – # Set search scopes for user and group. Values: subtree (default), onelevel, object – ldapRealm.userSearchScope = subtree – ldapRealm.groupSearchScope = subtree – ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0}) – ldapRealm.memberAttribute=member http://bit.ly/2rMTgLw
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Want to connect to LDAP over SSL?  Change protocol to ldaps in shiro.ini ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636  If LDAP is using self signed certificate, import the certificate into truststore of JVM running Zeppelin echo -n | openssl s_client –connect ldap.example.com:389 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/examplecert.crt keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid LDAP password in clear in shiro.ini  Create an entry for AD credential –Zeppelin leverages Hadoop Credential API –hadoop credential create ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks Enter password: Enter password again: ldapRealm.contextFactory.systemPassword has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.  Make credentials.jceks only Zeppelin user readable  chmod 400 with only Zeppelin process r/w access, no other user allowed access to credentials  Edit shiro.in  ldapRealm.contextFactory.systemPassword -provider jceks://etc/zeppelin/conf/credentials.jceks
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid JDBC password in shiro.ini  Create a credential for JDBC password in Hadoop Credential store hadoop credential create jdbc.password -provider jceks://file/user/zeppelin/conf/zeppelin.jceks  Use the credential in shiro.in default.jceks.credentialKey jdbc.password default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks  Details at JIRA ZEPPELIN-1935 JDBC password only needed for non-hive ID, Hive leverage ID propagation
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation in Zeppelin  Interpreter Dependent – Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation with Livy Zeppelin Spark Yarn Livy Ispark Group Interpreter SPNego: Kerberos Kerberos/RPC Livy APIs LDAP John Doe Job runs as John Doe LDAP/LDAPS
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization in Zeppelin  Control access to Note  Grant Permissions (Owner, Reader, Writer) to users/groups on Notes  LDAP Group integration  Control access to Zeppelin UI  Allow only admins to configure interpreter  Configured in shiro.ini  For Spark with Zeppelin > Livy > Spark  Identity Propagation Jobs run as End-User  For Hive with Zeppelin > JDBC interpreter  Leverage Ranger based Row/Column Security for Hive SparkSQL  Shell Interpreter  Runs as end-user Authorization in Zeppelin Authorization at Data Level
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Control Who can modify Interpreter Settings [urls] /api/interpreter/** = authc, roles[admin_role] /api/configurations/** = authc, roles[admin_role] /api/credential/** = authc, roles[admin_role]  Step 1 – Define Protected URL pattern in Shiro.ini – Assign URL patterns to a role  Step 2 – Map role to LDAP group ldapRealm.rolesByGroup = "hadoop-admins":admin_role
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scalability & HA  Memory/Core for Zeppelin Server  Consider (20-30 GB)  Memory/Core for Zeppelin Interpreter  (4-8 GB)  Memory/Core for Livy  (4-8 GB)  Memory/Core for Spark  Depends on Spark Jobs (See Spark Performance Tuning)  https://spark.apache.org/docs/latest/tuning .html  Horizontal Scaling  Spin up multiple Zeppelin instance  Need external load balancer  Sticky sessions Scalability HA Shared Storage Shared Configuration Communication between Z & Interpreters
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Using R & Python with Zeppelin  Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter  Deploy R/Python binaries on all worker node  Leverage Livy Interpreter for SparkR & PySpark
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy2 OOB Job as admin user fails  Scenario: Simple HDP 2.6 Install with default config  Failure : Livy 2 Interpreter job fails as admin user  Reason: HDFS dir does not exist /user/admin  Work Around: Manually create /user/admin with admin:hdfs dir ownership
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy 500 Error with PySpark  Scenario: Cut & Paste code into Zeppelin  Failure : Livy interpreter reports 500  Work Around: Manually type code into Zeppelin Livy interpreter  Fixed with HDP 2.6.1
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other Zeppelin Livy Interpreter Issues  matplotlib doesn’t work in Livy pyspark interpreter  Job progress is not shown in frontend.  ZeppelinContext is not available in Livy interpreter
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Future Plans  More Visualization  More Stability  More Security – SSO Integration with Knox – Zeppelin > Livy over SSL – Ranger Integration – Atlas Integration  Integration with Data Science Experience  HA & More Collaboration
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You & Questions Vinay Shukla @neomythos

Notes de l'éditeur

  1. Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
  2. Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
  3. All Images from Flicker Commons