SlideShare une entreprise Scribd logo
1  sur  28
Running Zeppelin in Production
Vinay Shukla
Product Management, Director
Twitter: @neomythos
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Whoami?
 Product Management
 Spark for 3 + years, Hadoop for 4 years, Zeppelin for 2 years
 Blog at www.vinayshukla.com
 Twitter: @neomythos
 Addicted to Yoga, Hiking, & Coffee
 Smallest contributor to Apache Zeppelin
Programmer > Product Management > Programmer > Product Management
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Apache Zeppelin?
 Browser based access to Big Data
 Make Spark accessible to more users
 Abstract users from dealing with Kerberos
 Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters
 Beautiful Visualization built in, easy to extend
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Zeppelin work?
Notebook
Author
Collaborators/R
eport viewer
Zeppelin
Cluster
Spark | Hive | HBase
Any of 30+ back
ends/Interpreters
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Architecture
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Interpreter Modes
 Basic unit of work is Note
– Note has paragraphs
 3 Modes
– Shared (All notes use same Interpreter process & Interpreter group)
– Scoped (Notes still shared the process, but separate interpreter group, possible to share objects)
– Isolated (Each note runs its own interpeter process & group)
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploying Zeppelin
 Master Node
 Worker Node
 Management Node
 Client/Gateway Node ✔
Node Choices
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Deployment
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
Hadoop Cluster
Hive
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with Spark
Ex
Spark on YARN
Zeppelin
Spark-
Shell
Ex
Spark
Thrift
Server
Livy
REST
Server
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
Spark Driver
Livy
REST
Server
D
r
i
v
e
r
With Livy
Interpreter
Built In Spark
Interpreter
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Security
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authentication + SSL
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security in Apache Zeppelin?
Zeppelin leverages Apache Shiro for
authentication/authorization
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example Shiro.ini
# =======================
# Shiro INI configuration
# =======================
[main]
## LDAP/AD configuration
[users]
# The 'users' section is for simple deployments
# when you only need a small number of statically-defined
# set of User accounts.
[urls]
# The 'urls' section is used for url-based security
#
Edit with Ambari or your
favorite text editor
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LDAP Authentication in Zeppelin
 LDAP Bind
– uid=jsmith,ou=users,dc=mycompany,dc=com
– uid={0},ou=users,dc=mycompany,dc=com
– ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com
 LDAP Search
– ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net
– ldapRealm.contextFactory.systemPassword=SomePassw0rd
– ldapRealm.contextFactory.authenticationMechanism=simple
– ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userObjectClass=person
– ldapRealm.groupObjectClass=group
– ldapRealm.userSearchAttributeName = sAMAccountName
– # Set search scopes for user and group. Values: subtree (default), onelevel, object
– ldapRealm.userSearchScope = subtree
– ldapRealm.groupSearchScope = subtree
– ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0})
– ldapRealm.memberAttribute=member
http://bit.ly/2rMTgLw
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Want to connect to LDAP over SSL?
 Change protocol to ldaps in shiro.ini
ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636
 If LDAP is using self signed certificate, import the certificate into truststore of JVM running
Zeppelin
echo -n | openssl s_client –connect ldap.example.com:389 | 
sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' >
/tmp/examplecert.crt
keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts 
-storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid LDAP password in clear in shiro.ini
 Create an entry for AD credential
–Zeppelin leverages Hadoop Credential API
–hadoop credential create
ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks
Enter password:
Enter password again:
ldapRealm.contextFactory.systemPassword has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
 Make credentials.jceks only Zeppelin user readable
 chmod 400 with only Zeppelin process r/w access, no other user allowed access to
credentials
 Edit shiro.in
 ldapRealm.contextFactory.systemPassword -provider
jceks://etc/zeppelin/conf/credentials.jceks
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid JDBC password in shiro.ini
 Create a credential for JDBC password in Hadoop Credential store
hadoop credential create jdbc.password -provider
jceks://file/user/zeppelin/conf/zeppelin.jceks
 Use the credential in shiro.in
default.jceks.credentialKey jdbc.password
default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks
 Details at JIRA ZEPPELIN-1935
JDBC password only needed
for non-hive ID, Hive leverage
ID propagation
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation in Zeppelin
 Interpreter Dependent
– Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation with Livy
Zeppelin
Spark
Yarn
Livy
Ispark Group
Interpreter
SPNego: Kerberos Kerberos/RPC
Livy APIs
LDAP
John Doe
Job runs as John Doe
LDAP/LDAPS
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization in Zeppelin
 Control access to Note
 Grant Permissions (Owner, Reader, Writer)
to users/groups on Notes
 LDAP Group integration
 Control access to Zeppelin UI
 Allow only admins to configure interpreter
 Configured in shiro.ini
 For Spark with Zeppelin > Livy > Spark
 Identity Propagation Jobs run as End-User
 For Hive with Zeppelin > JDBC interpreter
 Leverage Ranger based Row/Column
Security for Hive SparkSQL
 Shell Interpreter
 Runs as end-user
Authorization in Zeppelin Authorization at Data Level
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Control Who can modify Interpreter Settings
[urls]
/api/interpreter/** = authc, roles[admin_role]
/api/configurations/** = authc, roles[admin_role]
/api/credential/** = authc, roles[admin_role]
 Step 1
– Define Protected URL pattern in Shiro.ini
– Assign URL patterns to a role
 Step 2
– Map role to LDAP group
ldapRealm.rolesByGroup = "hadoop-admins":admin_role
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scalability & HA
 Memory/Core for Zeppelin Server
 Consider (20-30 GB)
 Memory/Core for Zeppelin Interpreter
 (4-8 GB)
 Memory/Core for Livy
 (4-8 GB)
 Memory/Core for Spark
 Depends on Spark Jobs (See Spark
Performance Tuning)
 https://spark.apache.org/docs/latest/tuning
.html
 Horizontal Scaling
 Spin up multiple Zeppelin instance
 Need external load balancer
 Sticky sessions
Scalability HA
Shared Storage
Shared Configuration
Communication between Z & Interpreters
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Using R & Python with Zeppelin
 Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter
 Deploy R/Python binaries on all worker node
 Leverage Livy Interpreter for SparkR & PySpark
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy2 OOB Job as admin user fails
 Scenario: Simple HDP 2.6 Install with default config
 Failure : Livy 2 Interpreter job fails as admin user
 Reason: HDFS dir does not exist /user/admin
 Work Around: Manually create /user/admin with admin:hdfs dir ownership
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy 500 Error with PySpark
 Scenario: Cut & Paste code into Zeppelin
 Failure : Livy interpreter reports 500
 Work Around: Manually type code into Zeppelin Livy interpreter
 Fixed with HDP 2.6.1
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other Zeppelin Livy Interpreter Issues
 matplotlib doesn’t work in Livy pyspark interpreter
 Job progress is not shown in frontend.
 ZeppelinContext is not available in Livy interpreter
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Future Plans
 More Visualization
 More Stability
 More Security
– SSO Integration with Knox
– Zeppelin > Livy over SSL
– Ranger Integration
– Atlas Integration
 Integration with Data Science Experience
 HA & More Collaboration
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
&
Questions
Vinay Shukla
@neomythos

Contenu connexe

Tendances

What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentDataWorks Summit/Hadoop Summit
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...DataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & FutureDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...DataWorks Summit
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 

Tendances (20)

What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache Falcon
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 

Similaire à Running Zeppelin in Enterprise

Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtechYuta Imai
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisDataWorks Summit/Hadoop Summit
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDataWorks Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Kellyn Pot'Vin-Gorman
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisDataWorks Summit/Hadoop Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperJeff Smith
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperJeff Smith
 
Building microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipelineBuilding microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipelineDonghuKIM2
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerJosh Elser
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...DataWorks Summit
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016Adam Gibson
 
Api design and prototype
Api design and prototypeApi design and prototype
Api design and prototypeDonghuKIM2
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 

Similaire à Running Zeppelin in Enterprise (20)

Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL Developer
 
Building microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipelineBuilding microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipeline
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Api design and prototype
Api design and prototypeApi design and prototype
Api design and prototype
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Dernier (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Running Zeppelin in Enterprise

  • 1. Running Zeppelin in Production Vinay Shukla Product Management, Director Twitter: @neomythos
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Whoami?  Product Management  Spark for 3 + years, Hadoop for 4 years, Zeppelin for 2 years  Blog at www.vinayshukla.com  Twitter: @neomythos  Addicted to Yoga, Hiking, & Coffee  Smallest contributor to Apache Zeppelin Programmer > Product Management > Programmer > Product Management
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache Zeppelin?  Browser based access to Big Data  Make Spark accessible to more users  Abstract users from dealing with Kerberos  Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters  Beautiful Visualization built in, easy to extend
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Zeppelin work? Notebook Author Collaborators/R eport viewer Zeppelin Cluster Spark | Hive | HBase Any of 30+ back ends/Interpreters
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Architecture
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Interpreter Modes  Basic unit of work is Note – Note has paragraphs  3 Modes – Shared (All notes use same Interpreter process & Interpreter group) – Scoped (Notes still shared the process, but separate interpreter group, possible to share objects) – Isolated (Each note runs its own interpeter process & group)
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deploying Zeppelin  Master Node  Worker Node  Management Node  Client/Gateway Node ✔ Node Choices
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Deployment Spark on YARN Ex Ex LDAP John Doe 1 2 3 SSL Firewall Hadoop Cluster Hive
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with Spark Ex Spark on YARN Zeppelin Spark- Shell Ex Spark Thrift Server Livy REST Server D r i v e r D r i v e r D r i v e r D r i v e r D r i v e r Spark Driver Livy REST Server D r i v e r With Livy Interpreter Built In Spark Interpreter
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Security
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin: Authentication + SSL Spark on YARN Ex Ex LDAP John Doe 1 2 3 SSL Firewall
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security in Apache Zeppelin? Zeppelin leverages Apache Shiro for authentication/authorization
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example Shiro.ini # ======================= # Shiro INI configuration # ======================= [main] ## LDAP/AD configuration [users] # The 'users' section is for simple deployments # when you only need a small number of statically-defined # set of User accounts. [urls] # The 'urls' section is used for url-based security # Edit with Ambari or your favorite text editor
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP Authentication in Zeppelin  LDAP Bind – uid=jsmith,ou=users,dc=mycompany,dc=com – uid={0},ou=users,dc=mycompany,dc=com – ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com  LDAP Search – ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net – ldapRealm.contextFactory.systemPassword=SomePassw0rd – ldapRealm.contextFactory.authenticationMechanism=simple – ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userObjectClass=person – ldapRealm.groupObjectClass=group – ldapRealm.userSearchAttributeName = sAMAccountName – # Set search scopes for user and group. Values: subtree (default), onelevel, object – ldapRealm.userSearchScope = subtree – ldapRealm.groupSearchScope = subtree – ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0}) – ldapRealm.memberAttribute=member http://bit.ly/2rMTgLw
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Want to connect to LDAP over SSL?  Change protocol to ldaps in shiro.ini ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636  If LDAP is using self signed certificate, import the certificate into truststore of JVM running Zeppelin echo -n | openssl s_client –connect ldap.example.com:389 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/examplecert.crt keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid LDAP password in clear in shiro.ini  Create an entry for AD credential –Zeppelin leverages Hadoop Credential API –hadoop credential create ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks Enter password: Enter password again: ldapRealm.contextFactory.systemPassword has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.  Make credentials.jceks only Zeppelin user readable  chmod 400 with only Zeppelin process r/w access, no other user allowed access to credentials  Edit shiro.in  ldapRealm.contextFactory.systemPassword -provider jceks://etc/zeppelin/conf/credentials.jceks
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid JDBC password in shiro.ini  Create a credential for JDBC password in Hadoop Credential store hadoop credential create jdbc.password -provider jceks://file/user/zeppelin/conf/zeppelin.jceks  Use the credential in shiro.in default.jceks.credentialKey jdbc.password default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks  Details at JIRA ZEPPELIN-1935 JDBC password only needed for non-hive ID, Hive leverage ID propagation
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation in Zeppelin  Interpreter Dependent – Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation with Livy Zeppelin Spark Yarn Livy Ispark Group Interpreter SPNego: Kerberos Kerberos/RPC Livy APIs LDAP John Doe Job runs as John Doe LDAP/LDAPS
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization in Zeppelin  Control access to Note  Grant Permissions (Owner, Reader, Writer) to users/groups on Notes  LDAP Group integration  Control access to Zeppelin UI  Allow only admins to configure interpreter  Configured in shiro.ini  For Spark with Zeppelin > Livy > Spark  Identity Propagation Jobs run as End-User  For Hive with Zeppelin > JDBC interpreter  Leverage Ranger based Row/Column Security for Hive SparkSQL  Shell Interpreter  Runs as end-user Authorization in Zeppelin Authorization at Data Level
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Control Who can modify Interpreter Settings [urls] /api/interpreter/** = authc, roles[admin_role] /api/configurations/** = authc, roles[admin_role] /api/credential/** = authc, roles[admin_role]  Step 1 – Define Protected URL pattern in Shiro.ini – Assign URL patterns to a role  Step 2 – Map role to LDAP group ldapRealm.rolesByGroup = "hadoop-admins":admin_role
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scalability & HA  Memory/Core for Zeppelin Server  Consider (20-30 GB)  Memory/Core for Zeppelin Interpreter  (4-8 GB)  Memory/Core for Livy  (4-8 GB)  Memory/Core for Spark  Depends on Spark Jobs (See Spark Performance Tuning)  https://spark.apache.org/docs/latest/tuning .html  Horizontal Scaling  Spin up multiple Zeppelin instance  Need external load balancer  Sticky sessions Scalability HA Shared Storage Shared Configuration Communication between Z & Interpreters
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Using R & Python with Zeppelin  Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter  Deploy R/Python binaries on all worker node  Leverage Livy Interpreter for SparkR & PySpark
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy2 OOB Job as admin user fails  Scenario: Simple HDP 2.6 Install with default config  Failure : Livy 2 Interpreter job fails as admin user  Reason: HDFS dir does not exist /user/admin  Work Around: Manually create /user/admin with admin:hdfs dir ownership
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy 500 Error with PySpark  Scenario: Cut & Paste code into Zeppelin  Failure : Livy interpreter reports 500  Work Around: Manually type code into Zeppelin Livy interpreter  Fixed with HDP 2.6.1
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other Zeppelin Livy Interpreter Issues  matplotlib doesn’t work in Livy pyspark interpreter  Job progress is not shown in frontend.  ZeppelinContext is not available in Livy interpreter
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Future Plans  More Visualization  More Stability  More Security – SSO Integration with Knox – Zeppelin > Livy over SSL – Ranger Integration – Atlas Integration  Integration with Data Science Experience  HA & More Collaboration
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You & Questions Vinay Shukla @neomythos

Notes de l'éditeur

  1. Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
  2. Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
  3. All Images from Flicker Commons