Contenu connexe
Similaire à Running Apache Zeppelin production (20)
Running Apache Zeppelin production
- 1. Running Zeppelin in Production
Vinay Shukla
Product Management, Director
Twitter: @neomythos
- 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Whoami?
Product Management
Spark for 3 + years, Hadoop for 4 years, Zeppelin for 2 years
Blog at www.vinayshukla.com
Twitter: @neomythos
Addicted to Yoga, Hiking, & Coffee
Smallest contributor to Apache Zeppelin
Programmer > Product Management > Programmer > Product Management
- 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Apache Zeppelin?
Browser based access to Big Data
Make Spark accessible to more users
Abstract users from dealing with Kerberos
Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters
Beautiful Visualization built in, easy to extend
- 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Zeppelin work?
Notebook
Author
Collaborators/R
eport viewer
Zeppelin
Cluster
Spark | Hive | HBase
Any of 30+ back
ends/Interpreters
- 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Architecture
- 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Interpreter Modes
Basic unit of work is Note
– Note has paragraphs
3 Modes
– Shared (All notes use same Interpreter process & Interpreter group)
– Scoped (Notes still shared the process, but separate interpreter group, possible to share objects)
– Isolated (Each note runs its own interpeter process & group)
- 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploying Zeppelin
Master Node
Worker Node
Management Node
Client/Gateway Node ✔
Node Choices
- 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Deployment
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
Hadoop Cluster
Hive
- 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with Spark
Ex
Spark on YARN
Zeppelin
Spark-
Shell
Ex
Spark
Thrift
Server
Livy
REST
Server
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
Spark Driver
Livy
REST
Server
D
r
i
v
e
r
With Livy
Interpreter
Built In Spark
Interpreter
- 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Security
- 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authentication + SSL
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
- 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security in Apache Zeppelin?
Zeppelin leverages Apache Shiro for
authentication/authorization
- 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example Shiro.ini
# =======================
# Shiro INI configuration
# =======================
[main]
## LDAP/AD configuration
[users]
# The 'users' section is for simple deployments
# when you only need a small number of statically-defined
# set of User accounts.
[urls]
# The 'urls' section is used for url-based security
#
Edit with Ambari or your
favorite text editor
- 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LDAP Authentication in Zeppelin
LDAP Bind
– uid=jsmith,ou=users,dc=mycompany,dc=com
– uid={0},ou=users,dc=mycompany,dc=com
– ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com
LDAP Search
– ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net
– ldapRealm.contextFactory.systemPassword=SomePassw0rd
– ldapRealm.contextFactory.authenticationMechanism=simple
– ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userObjectClass=person
– ldapRealm.groupObjectClass=group
– ldapRealm.userSearchAttributeName = sAMAccountName
– # Set search scopes for user and group. Values: subtree (default), onelevel, object
– ldapRealm.userSearchScope = subtree
– ldapRealm.groupSearchScope = subtree
– ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0})
– ldapRealm.memberAttribute=member
http://bit.ly/2rMTgLw
- 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Want to connect to LDAP over SSL?
Change protocol to ldaps in shiro.ini
ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636
If LDAP is using self signed certificate, import the certificate into truststore of JVM running
Zeppelin
echo -n | openssl s_client –connect ldap.example.com:389 |
sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' >
/tmp/examplecert.crt
keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts
-storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
- 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid LDAP password in clear in shiro.ini
Create an entry for AD credential
–Zeppelin leverages Hadoop Credential API
–hadoop credential create
ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks
Enter password:
Enter password again:
ldapRealm.contextFactory.systemPassword has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
Make credentials.jceks only Zeppelin user readable
chmod 400 with only Zeppelin process r/w access, no other user allowed access to
credentials
Edit shiro.in
ldapRealm.contextFactory.systemPassword -provider
jceks://etc/zeppelin/conf/credentials.jceks
- 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid JDBC password in shiro.ini
Create a credential for JDBC password in Hadoop Credential store
hadoop credential create jdbc.password -provider
jceks://file/user/zeppelin/conf/zeppelin.jceks
Use the credential in shiro.in
default.jceks.credentialKey jdbc.password
default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks
Details at JIRA ZEPPELIN-1935
JDBC password only needed
for non-hive ID, Hive leverage
ID propagation
- 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation in Zeppelin
Interpreter Dependent
– Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
- 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation with Livy
Zeppelin
Spark
Yarn
Livy
Ispark Group
Interpreter
SPNego: Kerberos Kerberos/RPC
Livy APIs
LDAP
John Doe
Job runs as John Doe
LDAP/LDAPS
- 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization in Zeppelin
Control access to Note
Grant Permissions (Owner, Reader, Writer)
to users/groups on Notes
LDAP Group integration
Control access to Zeppelin UI
Allow only admins to configure interpreter
Configured in shiro.ini
For Spark with Zeppelin > Livy > Spark
Identity Propagation Jobs run as End-User
For Hive with Zeppelin > JDBC interpreter
Leverage Ranger based Row/Column
Security for Hive SparkSQL
Shell Interpreter
Runs as end-user
Authorization in Zeppelin Authorization at Data Level
- 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Control Who can modify Interpreter Settings
[urls]
/api/interpreter/** = authc, roles[admin_role]
/api/configurations/** = authc, roles[admin_role]
/api/credential/** = authc, roles[admin_role]
Step 1
– Define Protected URL pattern in Shiro.ini
– Assign URL patterns to a role
Step 2
– Map role to LDAP group
ldapRealm.rolesByGroup = "hadoop-admins":admin_role
- 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scalability & HA
Memory/Core for Zeppelin Server
Consider (20-30 GB)
Memory/Core for Zeppelin Interpreter
(4-8 GB)
Memory/Core for Livy
(4-8 GB)
Memory/Core for Spark
Depends on Spark Jobs (See Spark
Performance Tuning)
https://spark.apache.org/docs/latest/tuning
.html
Horizontal Scaling
Spin up multiple Zeppelin instance
Need external load balancer
Sticky sessions
Scalability HA
Shared Storage
Shared Configuration
Communication between Z & Interpreters
- 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Using R & Python with Zeppelin
Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter
Deploy R/Python binaries on all worker node
Leverage Livy Interpreter for SparkR & PySpark
- 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy2 OOB Job as admin user fails
Scenario: Simple HDP 2.6 Install with default config
Failure : Livy 2 Interpreter job fails as admin user
Reason: HDFS dir does not exist /user/admin
Work Around: Manually create /user/admin with admin:hdfs dir ownership
- 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy 500 Error with PySpark
Scenario: Cut & Paste code into Zeppelin
Failure : Livy interpreter reports 500
Work Around: Manually type code into Zeppelin Livy interpreter
Fixed with HDP 2.6.1
- 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other Zeppelin Livy Interpreter Issues
matplotlib doesn’t work in Livy pyspark interpreter
Job progress is not shown in frontend.
ZeppelinContext is not available in Livy interpreter
- 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Future Plans
More Visualization
More Stability
More Security
– SSO Integration with Knox
– Zeppelin > Livy over SSL
– Ranger Integration
– Atlas Integration
Integration with Data Science Experience
HA & More Collaboration
- 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
&
Questions
Vinay Shukla
@neomythos
Notes de l'éditeur
- Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
- Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
- All Images from Flicker Commons