Dynamic Column Masking and Row-Level Filtering in HDP

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Row Filtering and Column Masking
with Apache Ranger
Srikanth Venkat
Senior Director, Product Management

Disclaimer
 This document may contain product features and technology directions that are under development, may be
under development in the future or may ultimately not be developed.
 Project capabilities are based on information that is publicly available within the Apache Software
Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception
to release through Apache, however, technical feasibility, market demand, user feedback and the
overarching Apache Software Foundation community development process can all effect timing and final
delivery.
 This document’s description of these features and technology directions does not represent a contractual
commitment, promise or obligation from Hortonworks to deliver these features in any generally available
product.
 Product features and technology directions are subject to change, and must not be included in contracts,
purchase orders, or sales agreements of any kind.
 Since this document contains an outline of general product development plans, customers should not rely
upon it when making purchasing decisions.

Agenda
Background
Dynamic Column Masking and Row Filtering
Spark SQL Security via Hive LLAP/Ranger
Demo

Security Challenges of Today’s Data Platforms
 Central repository of critical and sensitive data
– Grey Data
 Data maintained over long duration
– Forever
 External ecosystem is in flux
– The Zoo
 Users can access and analyze data in new
and different ways
– Democratization

Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query interface
AuditingAuthorization
• Store and manage encryption keys
• Support HDFS Transparent Data
Encryption
• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer
and manage security policies consistently
across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr,
Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user context
enrichers
• Easy to add new component types for
authorization

Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit Server
Ranger Plugin
HadoopComponentsEnterprise
Users
Ranger Plugin
Ranger Plugin
Legacy Tools and Data Governance
HDFS
Knox
NifI
Ranger Plugin
Ranger Plugin
SolrRanger Plugin
Ranger Policy Server Integration API
KafkaRanger Plugin
YARNRanger Plugin
Ranger PluginStorm Ranger Plugin Atlas
Solr

⬢ Simple Intuitive UI for Policy Editing and
Setup
⬢ Fine-grained specificity by resource type,
user context, tags, and operation
⬢ Supports Access, Tag Based, Dynamic Data
Masking, and Row Filtering Policy Types
Apache Ranger - Intuitive and Granular Policy Management

Apache Ranger Audits - Data Access
⬢ Comprehensive scalable audit logging
⬢ Audits for:
⬢ Resource Access Events with user context
⬢ Policy Edits/Creation/Deletion
⬢ User session information
⬢ Component plugin policy sync operations

Row Filtering in Hive
R A N G E R
Control Access to Rows in Hive Tables based on Context!
Goal: Improve reliability and robustness of HDP by providing Row
Level Security to Hive tables and reducing surface area of security
system
⬢ Capabilities
– Restrict data row access based on
– user characteristics (e.g. group membership) AND
– runtime context
⬢ Access restriction logic at Hive layer => No changes to apps!
– Hive applies the access restrictions every time that data access is
attempted
– Seamless behind the scenes enforcement of row level segmentation
without having to add this logic to the predicate of the query
– No need for multiple views to filter rows for different groups and
users!
⬢ Core Technologies: Ranger, Hive
AT L A S
H I V E

Row Filtering in Hive
R A N G E R
Control Access to Rows in Hive Tables based on Context!
⬢ Use Cases: Cross-industry application for data protection:
AT L A S
H I V E
Healthcare
• A hospital can create a security policy that allows doctors
to view data rows only for their own patients
• Insurance claims administrators can view only specific
rows for their specific site.
Financial Services
• A bank can create a policy to restrict access to rows of
financial data based on the employee’s business division,
locale, or based on the employee’s role
• Employees in the finance department are allowed to
see customer invoices, payments, and accrual data
• European HR employees can see European
employee data).
Information
Technology
A multi-tenant application can create logical separation of
each tenant’s data so that each tenant can see only their
own data rows.

Dynamic Data Masking of Hive Columns
R A N G E R
Protect Sensitive Data in real-time with Dynamic Data Masking/Obfuscation!
Goal: Mask or anonymize sensitive columns of data
(e.g. PII, PCI, PHI) from Hive query output
⬢ Benefits
– Does not physically alter the data, or make a copy of it
– Original sensitive data also does not leave the data
store, but obfuscated when presenting to the user.
– No changes are required at the application or Hive layer
– No need to produce additional protected duplicate
versions of datasets
– Simple & easy to setup masking policies
⬢ Core Technologies: Ranger, Hive
AT L A S
H I V E

Dynamic Masking and Row Level Filtering
Country National ID CC No Name DOB MRN Policy ID
US 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424
US 333287465 5391304868205600 Jane Doe 8/13/1979 3736885376 cadsd984
Germany T22000129 4532786256545550 Ernie Schwarz 3/5/1963 876452830A KK-2345909
Ranger Policy Enforcement
Country National ID CC No MRN Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Country National ID Name MRN
Germany T22000129 Ernie Schwarz 876452830A
Users from US customer
support group see row
filtered data for US persons
with CC and National ID
(SSN) as masked values and
MRN is nullified
EU Health Policy Admins
view relevant columns of
data unmasked but are
restricted by row filtering
policies to see data for
EU persons only

SparkSQL Security via Hive LLAP

Spark SQL Security: Row Filtering and Column Masking
 Spark SQL + Hive enables users to explore very large data sets using SQL
 Enterprises want to enable Spark SQL for ad-hoc analysis using BI tools with
fine grain security
 Spark provides strong authentication via Kerberos and wire encryption via
SSL but as general purpose compute has no built in authorization sub-system
 Spark also does not have any way to define a pluggable module that contains
policies for fine grain authorization
– With structured data with columns and rows with Hive, fine grain security becomes a challenge
 Co-mingled data in the same table may belong to two different groups, each
with their own regulatory requirements.
 Data may have regional restrictions, time based availability restrictions,
departmental restrictions, etc.
all user passwords: hadoop

Hive 2 with LLAP: Open Interfaces

Key Features: Spark Column Security with LLAP
 Fine-Grained Column Level Access Control for SparkSQL.
 Fully dynamic policies per user. Doesn’t require views.
 Use Standard Ranger policies and tools to control access and masking policies.
Flow:
1. SparkSQL gets data locations
known as “splits” from HiveServer
and plans query.
2. HiveServer2 authorizes access
using Ranger. Per-user policies
like row filtering are applied.
3. Spark gets a modified query plan
based on dynamic security policy.
4. Spark reads data from LLAP.
Filtering / masking guaranteed by
LLAP server.
HiveServer2
Authorization
Hive Metastore
Data Locations
View Definitions
LLAP
Data Read
Filter Pushdown
Ranger Server
Dynamic Policies
Spark Client
1
2
4
3

Example: Per-User Row Filtering by Region in SparkSQL
Spark User 2
(East Region)
Spark User 1
(West Region)
Original Query:
SELECT * from CUSTOMERS
WHERE total_spend > 10000
Query Rewrites based on
Dynamic Ranger Policies
LLAP Data Access
User ID Region Total Spend
1 East 5,131
2 East 27,828
3 West 55,493
4 West 7,193
5 East 18,193
Dynamic Rewrite:
AND region = “east”
Dynamic Rewrite:
AND region = “west”

Agenda
Demo

Demo Setup
 Hortonia – mid-size financial services company expanding from US to
international markets
 Employees in EU and US
 Multiple business units need access to customer data: Analysts, HR
 Customer data is co-mingled as well as isolated
 Needs to have rational security policies to provide the right level of access
control to customer data across geographies, business functions, and to
comply with external regulations (PII, HIPAA, EU Privacy etc.)

Demo Data
 Customer data in hortoniabank DB
• 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI & non-
sensitive data)
–us_customers: USA person data only
–ww_customers: multi-language, multi-country, localized person
data across the world
• 1 Reference table: eu_countries (reference table for looking up EU
country codes to country mappings – with BRExit etc.)
all user passwords: hadoop

Ranger Policies Setup for Demo
 Only US employees can see data in us_customers table and only from locations within the US
(access_us_customers)
 Only US employees can see data rows of US persons in ww_customers table (filter_ww_customers_table
+ access_ww_customers)
 Only EU employees can see rows with EU person data in ww_customers table (filter_ww_customers_table
+ access_ww_customers)
 US HR team members can see all original unmasked data (PCI, PII,….)
 Analysts can view masked versions of sensitive data from WW customers table but are prohibited from
viewing PII data in US tables (All masking policies under Masking Tab of Resource based policies)
 No combination of zip code, MRN, and bloodgroup data are permitted to be joined in any query
(prohibition policy)

Personas Setup for Demo
User Group Access Privileges
joe-analyst us_employees,
analyst
US Data Only, non-sensitive data only, rest masked or forbidden
depending on sensitivity
kate-hr us_employees, hr US Data Only, All sensitive data (PCI, PII, PHI)
ivana-eu-hr eu_employees, hr EU Data Only, All sensitive data

Data Column Data
Column
Description
Masking
Type
Sample Output Ranger Masking Policy
password Password Hash 237672b21819462ff39fcea7d990c3e5 mask_password_hash
nationalid National ID Show Last 4 xx-xx-9324 mask_nationalid_last4
ccnumber Credit Card
Number
Show First 4 4532xxxxxxxxxxxx mask_ccnumber_first4
streetaddress Street
Address
Redact nnn Xxxxxx Xxxxx mask_streetaddress_redact
MRN MRN Nullify null mask_mrn_nullify
age Age CUSTOM (Adds a random number below 20 to
actual age)
mask_age_custom
birthday Date of
Brith
CUSTOM 01-01-1987 (Keep year of birth and
make date & month 01-01)
mask_dob_custom
Data Masking Policies setup for us_customers data for analyst group

Backup

Dynamic Column Masking and Row-Level Filtering in HDP

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Dynamic Column Masking and Row-Level Filtering in HDP

Similaire à Dynamic Column Masking and Row-Level Filtering in HDP (20)

Plus de Hortonworks

Plus de Hortonworks (20)

Dernier

Dernier (20)

Dynamic Column Masking and Row-Level Filtering in HDP

Notes de l'éditeur