Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row/Column-level
Security in SQL
for Apache Spark
Dongjoon Hyun – So...
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who am I
 Software Engineer @ Hortonworks
 Apache REEF PMC member ...
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Security Issues
Goals
Components
How it works
Demo
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security
 One of fundamental features for enterprise adoption
– Mul...
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Issue 1
 Spark reads all or nothing
– Directory/File-based permissi...
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Issue 2
 Spark apps should be rewritten
– Special data source table...
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goal 1: Spark SQL Apps
Support row/column-level security with the ba...
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goal 2: Spark shells (1/2)
Support row/column-level security in all ...
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goal 2: Spark shells (2/2)
Support row/column-level security in all...
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goal 3: Spark Thrift Server
Support row/column-level security with ...
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Components
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What are required?
 Kerberos
 Apache Hadoop (HDFS/YARN)
 Apache ...
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
Provide a standard authorization method across many H...
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hive
 Hive Ranger Plugin & Policies
– Support row/column-le...
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark-LLAP for Spark 1.6
• User should use LlapContext
• Support Sc...
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark-LLAP GitHub (Apache License)
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How it works
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How it works – Overview
Case: spark-submit with YARN cluster mode
S...
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How it works – Overview
Spark
Hive
(HiveServer2)
Ranger
LLAP
User
A...
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive
Enable LLAP
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Admin – Manage
Hive Database: db_common
Table: *
Hive Column: *
Sel...
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Admin – Audit
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User
 spark-submit
--jars spark-llap.jar
--conf spark.sql.hive.lla...
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark
 HDFS Delegation Token
– HDFSCredentialProvider gets it from...
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark
LlapMetastoreCatalog: Replaces MetastoreRelation into LlapRel...
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark
LlapMetastoreCatalog: Replaces MetastoreRelation into LlapRel...
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark
LlapRelation supports predicate pushdown during optimization
...
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark
LlapRelation supports predicate pushdown during optimization
...
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark
Read filtered and masked data from LLAP
jobConf.set("hive.lla...
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo (Video)
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Some related SPARK Issues
 SPARK-14743 Add a configurable credenti...
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary
 Support row/column-level security with
– Spark apps
– Spa...
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Acknowledgement
 Apache Hive / Apache Spark / Apache Ranger
 Bika...
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you
Prochain SlideShare
Chargement dans…5
×

Row/Column- Level Security in SQL for Apache Spark

1 995 vues

Publié le

Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.

Publié dans : Technologie
  • Soyez le premier à commenter

Row/Column- Level Security in SQL for Apache Spark

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Row/Column-level Security in SQL for Apache Spark Dongjoon Hyun – Software Engineer Bikas Saha – Software Engineer April 2017
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who am I  Software Engineer @ Hortonworks  Apache REEF PMC member and committer  Apache Spark project contributor  https://github.com/dongjoon-hyun Dongjoon Hyun
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Security Issues Goals Components How it works Demo
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security  One of fundamental features for enterprise adoption – Multi-tenancy: Billing team / Data science team / Marketing teams  Row and column-level access control for SQL users – Row filtering – Column masking  Must enforce shared policies to various SQL engines simultaneously – E.g. Apache Spark 2.1/1.6 and Apache Hive 2.1
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Issue 1  Spark reads all or nothing – Directory/File-based permissions are insufficient  Permission 777 on warehouse? Security starts from storage
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Issue 2  Spark apps should be rewritten – Special data source tables  Duplicated data – Filtered rows – Removed or masked columns  SQL Views – Maintained by manually Overhead during starting and maintaining security policies
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goals
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goal 1: Spark SQL Apps Support row/column-level security with the batch apps from pyspark.sql import SparkSession spark = SparkSession .builder .enableHiveSupport() .getOrCreate() spark.sql("select * from db_common.t_customer").show() db_common t_customer …
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goal 2: Spark shells (1/2) Support row/column-level security in all shells spark-shell pyspark
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goal 2: Spark shells (2/2) Support row/column-level security in all shells sparkR spark-sql
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goal 3: Spark Thrift Server Support row/column-level security with Spark Thrift Server Login as `hive` Login as `spark`
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Components
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What are required?  Kerberos  Apache Hadoop (HDFS/YARN)  Apache Ranger  Apache Hive (LLAP)  Spark-LLAP: A library and patches to integrate the above Focus here
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger Provide a standard authorization method across many Hadoop components https://hortonworks.com/apache/ranger/#section_2
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hive  Hive Ranger Plugin & Policies – Support row/column-level security  LLAP Daemon (GA in HDP 2.6) – Persistent query servers with intelligent in-memory caching – Provide a secure relational datanode view of the data Trusted Service
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark-LLAP for Spark 1.6 • User should use LlapContext • Support Scala/Java and spark-shell HDP 2.5 var lc = new LlapContext(sc) lc.sql("select * from t").show Spark-LLAP (Technical Preview) Milestone Spark-LLAP for Spark 2.1 • No need to rewrite SQL related code • Support all languages and shells HDP 2.6 Next Spark-LLAP for Spark 2.1 • Support YARN cluster mode
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark-LLAP GitHub (Apache License)
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How it works
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How it works – Overview Case: spark-submit with YARN cluster mode Spark Hive (HiveServer2) Ranger LLAP User Admin 2. Launch 3. Get delegation token 1. Manage policies 7. Monitor Audits 6. Read filtered/masked data Authorize 5. Get data locations 4. Get metadata
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How it works – Overview Spark Hive (HiveServer2) Ranger LLAP User Admin 2. Launch 3. Get delegation token 1. Manage policies 7. Monitor Audits 6. Read filtered/masked data Authorize 5. Get data locations 4. Get metadata Existing InfraNew for Spark New for Hive (GA)
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive Enable LLAP
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Admin – Manage Hive Database: db_common Table: * Hive Column: * Select User: spark Permissions: SELECT
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Admin – Audit
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User  spark-submit --jars spark-llap.jar --conf spark.sql.hive.llap=true --conf spark.yarn.security.credentials.hiveserver2.enabled=true --master yarn --deploy-mode cluster sql.py Launch Spark jobs Note: There exists more static configurations related LLAP `--package` option is supported, too Easy to turn on/off Only used for YARN cluster mode
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark  HDFS Delegation Token – HDFSCredentialProvider gets it from namenode  Hive Metastore Delegation Token – HiveCredentialProvider gets it from Hive Metastore  HiveServer2 Delegation Token – HiveServer2CredentialProvider gets it from HiveServer2 Get delegation tokens Spark-LLAP Existing Note: Spark manages token renewal
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark LlapMetastoreCatalog: Replaces MetastoreRelation into LlapRelation SELECT gender, count(*) FROM db_common.t_customer WHERE name LIKE '%Obama’ GROUP BY gender LlapRelation SubqueryAlias Analyzed Logical Plan Filter: name like %Obama Aggregate: gender UnresolvedRelation Filter: name like %Obama Parsed Logical Plan Aggregate: gender
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark LlapMetastoreCatalog: Replaces MetastoreRelation into LlapRelation Without Spark-LLAP With Spark-LLAP
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark LlapRelation supports predicate pushdown during optimization LlapRelation SubqueryAlias Analyzed Logical Plan Filter: name like %Obama Aggregate: gender LlapRelation Filter: EndsWith(name,Obama) Optimized Logical Plan Project: gender Aggregate: gender
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark LlapRelation supports predicate pushdown during optimization LlapRelation SubqueryAlias Analyzed Logical Plan Filter: name like %Obama Aggregate: gender LlapRelation Filter: EndsWith(name,Obama) Optimized Logical Plan Project: gender Aggregate: gender Scan LlapRelation PushedFilter: StringEndsWith(name, Obama) Filter: EndsWith(name, Obama) Physical Plan Project: gender HashAggregate: gender …
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark Read filtered and masked data from LLAP jobConf.set("hive.llap.zk.registry.user", "hive") jobConf.set("llap.if.hs2.connection", parameters("url")) jobConf.set("llap.if.query", queryString) … // Create Hadoop RDD and convert LLAP Row into Spark Row sc.sparkContext .hadoopRDD(…) .mapPartitionsWithInputSplit(…)
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo (Video)
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Some related SPARK Issues  SPARK-14743 Add a configurable credential manager for Spark running on YARN  SPARK-15777 Catalog federation (Open)  SPARK-17767 Spark SQL ExternalCatalog API custom implementation support (Closed as Later)  SPARK-17819 Support default database in connection URIs for Spark Thrift Server  SPARK-18517 DROP TABLE IF EXISTS should not warn for non-exist  SPARK-18840 Avoid throw exception when getting token renewal interval in non HDFS security env.  SPARK-18857 Don't use `Iterator.duplicate` in STS  SPARK-19021 Generailize HDFSCredentialProvider to support non HDFS security filesystems  SPARK-19038 Avoid overwriting keytab configuration in yarn-client  SPARK-19179 Change spark.yarn.access.namenodes config and update docs  SPARK-19970 Table owner should be USER instead of PRINCIPAL  SPARK-19995 Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary  Support row/column-level security with – Spark apps – Spark shells – Spark Thrift Server  You can use the existing Spark 2.X SQL apps and scripts  Easy to turn on/off with only configurations  Ranger enforces Hive/Spark simultaneously and consistently Spark-LLAP with HDP 2.6 is TP
  34. 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Acknowledgement  Apache Hive / Apache Spark / Apache Ranger  Bikas Saha, Saisai Shao, Jason Dere, Thejas Nair, Zhan Zhang, and many others
  35. 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you

×