GDPR-focused partner community showcase for Apache Ranger and Apache Atlas

1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Inc. Dataguise Syncsort Synerscope
Ali Bajwa, Partner Solutions Subra Ramesh, SVP Marco de Jong, Director Jan-Kees Buenen, CEO
Srikanth Venkat, Product Management
DataWorks Summit - Berlin
April 2018
Partner Ecosystem Showcase For
Apache Ranger And Apache Atlas

Agenda
Apache Ranger & Apache Atlas
Journey, Ecosystem & Partners
Hortonworks Partner Certification Program
SEC Ready & GOV Ready program
Partner Technology Showcase

Community Snapshot
May 2014
XASecure
Acquisition
July 2014
Enters Apache
Incubation
Nov 2014
Ranger 0.4.0
Release
July 2015
Ranger 0.5/
HDP2.3
Aug 2016
Ranger 0.6/
HDP2.5
Nov 2016
Ranger 0.6.2/
HDP2.5.3
Jan 2017
Ranger TLP
graduation!
Jun 2017
Ranger 0.7.1
/HDP2.6.1
1.0.0
Q3 2018
• Committers: 27
• Contributors from:
Ebay, MSFT, Huawei,
Pandora, Accenture,
ING, Talend, ZTE
Ranger 0.7.1/
HDP2.6.1- HDP 2.6.4
Ranger 0.7/HDP2.6
• Export/import of Policies
• $User and macros
• Plugin status tab
• “Show columns” and “describe
extended support”
• Incremental LDAP Sync
• SmartSense Metrics
• User Sync Nested LDAP Support
• Tag based Masking
• Tag Attribute Based Policy
• Hive Replication Authorization
• Hive kill query Authorization
• Default Admin Group Mapping
Apr 2017
Ranger 0.7
/HDP2.6
Oct 2017
Ranger 0.7.1++
/HDP2.6.3
Aug 2017
Ranger 0.7.1+
/HDP2.6.2

Apache Ranger: Ecosystem
PartnerPartner Integrations
Apache Ranger
Apache
Kafka
Native Hadoop
Service Authorizers
Azure Data Lake
Store (ADLS)*
(Future)
Authorizer
Extensions
for Non-
Hadoop
Filesystems
& Stores

Community Snapshot
May
2015
Apache
Atlas
Incubation
DGI group
Kickoff
Dec
2014
Apr
2017
Apache 0.8
Release
Global Financial
Company
Aug
2016
Apache 0.7
Foundation
Release
Apache Atlas 0.8.2/HDP2.6.1-2.6.4
• Business User Friendly Search &
Filtering
• Knox Token Based Auth. Support
• Knox Proxying of Atlas UI
• Tag Deletion
Apache Atlas 0.8/HDP2.6.0
• Simplified Search UI
• Simplified API
• Classification-based security for
HDFS, Kafka, HBase
• Knox SSO
• Performance/scalability
improvements
• Committers – 37
• Code contributors:
Hortonworks, IBM, Aetna, Merck, Target
Jun
2017
Atlas
Becomes
TLP!
Q4
2017
Apache 0.8.1
Release
Apache 1.0
Release
Q3
2018

Apache Atlas: Current Connectors and Ecosystem
Custom
Integration
PartnerPartner
Apache Atlas
RDBMS
Apache
Kafka
Pending:

Dataguise, Hortonworks, and GDPR
Subra Ramesh,
SVP Products, Dataguise, Inc.
7

©2017 Dataguise, Inc.
Confidential and Proprietary
Dataguise Company Background
8
 Based in Silicon Valley
 10 years since founding
 Company Focus is on Sensitive, Personal Data
 Pioneers in Data-Centric Security in Big Data (first deployed Hadoop
customer 2012)
 Wide coverage of Cloud Technologies over the last 3-4 years
 Product: DgSecure, Service: DgSecure On-Demand

Proven technology, trusted by the world’s largest brands
9

Dataguise Technology
10
Where does DgSecure work? On-premises Cloud
What does DgSecure do?
What Hadoop Distributions does it support?
What Databases can it support?
What Cloud Stores does it support?
What other Repositories does it support?
Detection Mask/Encrypt/Access
Control Integration
Right of Access Right of Erasure ReportingBreach Detection
SQL Server
Right to Restrict
Processing

Dataguise has the widest automated coverage of key back-
end GDPR requirements
11
 Finding and cataloging personal data – structured and unstructured – in a wide
range of data stores
 Pseudonymizing – and where needed, anonymizing – data
 Breach detection with a focus on personal data access
 Right of access
 Right to be forgotten
 Right to restrict processing

Dataguise support of the Hortonworks Ecosystem
12
 Support for Detection of Personal Data in all the latest Hortonworks
distributions
 Support for Masking (35+ different options), AES, Format Preserving
Encryption in HDP
 Same level of support via direct HDFS and Hive APIs
 Support for YARN, Tez, Spark, and classical MapReduce execution engines
 Integration with Kerberos, Hortonworks TDE
 Monitoring of sensitive data access in both Hive and Hadoop
 HDInsight support
 Integration with Atlas and Ranger

Access Control Integration: DgSecure -> Atlas -> Ranger
13
 DgSecure does detection on a continuous basis
 DgSecure pipes its results to Atlas, marking elements or columns containing
personal data via Tags
 Ranger Tag-Based Policies can be used to protect access at the level of
personal data types
 Optionally, Ranger Tag-based Masking can be used to hide
data selectively.

DgSecure -> Atlas -> Ranger Integration
14
DgSecure Detection
Atlas Populated with
Personal Data Tags
Ranger Policies based
on tags
Access Control based
on type of Personal
Data

DgSecure Policies
18

DgSecure Sensitive Types
19

DgSecure Dashboard
20

DgSecure Hadoop Results
21

DgSecure Hive Results
22

DgSecure Monitor Overview
23

DgSecure GDPR DSAR Overview
24

DgSecure Atlas Tags
25

DgSecure Atlas Tags - Detail
26

Ranger Tag-Based Policy (Disabled)
27

Hive Query – fails, user doesn’t have Ranger permission
28

Hive Query – failure details
29

Ranger Policy – enabled
30

Hive Query Executes Successfully
31

Hive Query Results
32

Lineage in DMX-h – ingestion to the cluster
DMX-h job executes
• In the cluster Sources/Targets: HDFS, Hive, S3
• Out of the cluster Sources/Targets: Mainframe, DBMSs, local and remote FS –
Syncsort External Datasets
DMX-h job collects lineage information
• Source/Target File or Table level
DMX-h job lineage is published into Apache Atlas
• Connect with lineage published from other tools (REST)
33Syncsort Confidential and Proprietary - do not copy or distribute

Syncsort DMX-h Atlas Integration

Govern and Track Everything for Compliance
• Metadata and data lineage for Hive, Avro and
Parquet through HCatalog
• Metadata lineage export and API from DMX/DMX-h
– Simplify audits, analytics dashboards, metrics
– Integrate with enterprise metadata repositories
• Apache Ambari integration
– Native LDAP and Kerberos support
– Secure mainframe data access through FTPS and
Connect:Direct
• Apache Atlas ingestion lineage integration
– Audit and track data from source to cluster
– Lineage & tagging of Metadata for GDPR
Compliance

End-to-End Data Lineage in Apache Atlas
Data Sources

Data Sources
Syncsort accesses
data from sources
outside cluster.

Syncsort onboards
data, modifies
on-the-fly to match
Hadoop storage
model.
Data Sources
Syncsort accesses
data from sources
outside cluster.

Syncsort onboards
data, modifies
on-the-fly to match
Hadoop storage
model.
Data Sources
Syncsort accesses
data from sources
outside cluster.
Syncsort changes,
enhances, joins
data in cluster with
MapReduce or
Spark.
Data Hub

Syncsort onboards
data, modifies
on-the-fly to match
Hadoop storage
model.
Data Sources
Syncsort accesses
data from sources
outside cluster.
Syncsort changes,
enhances, joins
MapReduce or
Spark.
Syncsort passes
source-to-cluster
data lineage info
to Atlas.
Data Hub

Syncsort onboards
data, modifies
on-the-fly to match
Hadoop storage
model.
Data Sources
Syncsort accesses
data from sources
outside cluster.
Syncsort changes,
enhances, joins
MapReduce or
Spark.
Analytics and
visualizations
get complete
data.
Data analyst
gets end-to-
end data
lineage info
from Atlas
Syncsort passes
source-to-cluster
data lineage info
to Atlas.
Data Hub
Analytics,
Visualization

Syncsort: High Performance Import from Existing Databases
42
• Connect to virtually any data source, including
mainframe and MPP databases.
• Move data into and out of Hadoop up to 6x
faster without the need for manual scripts.
• Develop ETL processes without writing code.
• Seamlessly accelerate Hadoop performance and
scalability for ETL operations in both
MapReduce and Spark.
Benefits
Syncsort Confidential and Proprietary - do not copy or distribute

Syncsort + Hortonworks Advantages
• Apache Ambari Integration
• Deploy DMX-h across cluster
• Monitor DMX-h jobs
• Process in MapReduce or Spark
• Source relational and non relational data
(including mainframes)
• Out-of-the-box integration, interoperability &
certifications
• Kerberos-secured clusters
• Apache Ranger security certified
• Early beta, release certification
• Metadata lineage export from DMX
• Supports easy identification and management
of GDPR relevant Metadata
Technical Benefits

Demo: Apache Atlas

GDPR
Be transparent with all Pii data
Why not turn GDPR into a new
customer experience?
Dataworks Summit Berlin 2018
Jan-Kees Buenen, CEO
62(C) 2018 SynerScope

Discover
and classify
data
content in
full context
Know the
entire data
infrastructur
e
Know the
entire data
flows
patterns
Establish
and
execute
remediation
policies
Apply same
governance
to
processing
Monitor
through
certified
audits
“6 Steps for GDPR” expanded to unstructured enterprise data
63
Know who and what application produces and uses
Pii Data
Know the Pii data that rests in your unstructured data
Know its exact location, expiry date, consent status
Set and execute your policies based on your granular
knowledge of the content
Log every event touching your data  Atlas and
Ranger are integrated in fully automated processes in
SynerScope
Have the data instantly available at individual record
level for external (certified) audit purposes (Big4 love
sampling)

GDPR compliance for all content
Transparency for governance
• Data Discovery
• Data Search
• Data Matching
• Data Context
• Data Quality
• Data Use patterns
• Audit Ready (Big Four
endorsed)
64(C) 2018 SynerScope
Numbers Text
IoT Video, Audio
Eco-
system
Include the “other“ 80% of
the enterprise data

SynerScope Product position
for GDPR and IFRS
FAST, FLEXIBLE AND TRANSPARENT
o Fast and flexible with raw data, no cleaning, no
upfront modeling
o Fast and flexible for complex new combinations of
data brought in from many different silos
o Transparency at individual cell record level, with data
presented in full context allows for certified audits
o The big audit firms will play an important role
between the enterprise, regulators and supervisories
o Eco system demands independent certification of
data operations
Unstructured data is the Achilles Heel for true
GDPR compliance
…… “SynerScope’s Intelligence Augmentation (IA)
can handle the most complex data situations fast
and reliably” (Big 4 accounting firm)
(C) 2018 SynerScope

How SynerScope Ixiwa uses Atlas
ATLAS
MongoDB
IXIWA API
Push
RANGER
ATLAS
GraphDB
YARN
Launch
SPARK
content scan
TAGS
SPARK
HIVE
HDFS
(C) 2018 SynerScope

CONTACTS
WWW.SYNERSCOPE.COM
CEO jan-kees.Buenen@synerscope.com
FS Lead David.de.jong@synerscope.com
Thank
You
(C) 2018 SynerScope

HDP SEC READY & GOV READY Programs
✔ Choice: Customers choose features that they want to deploy—a la carte
✔ Curated & Fast: Partners to provide rich, complimentary and complete features ready to
deploy
✔ Agile: Faster deployment and accelerate innovation
✔ Centralized : Open metadata/governance and security infrastructure
✔ Flexibility: Portfolio of partner reference architectures and integration patterns
✔ Safe: HDP at core to provide stability and interoperability

Hortonworks Certified Technology Program
HDP YARN Ready
Integrates with YARN
(native, Tez, Slider) or
uses/runs on a YARN
Ready engine
HDP Operations Ready
Integrates with Ambari
APIs, Stacks, Blueprints,
or Views
HDP Governance Ready
Integrates with Atlas
HDP Security Ready
Integrates with
Ranger, Knox, or other
security features
Sign up to be a partner and request certification kit!
http://hortonworks.com/partners/product-integration-certification/

Questions

GDPR-focused partner community showcase for Apache Ranger and Apache Atlas

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à GDPR-focused partner community showcase for Apache Ranger and Apache Atlas

Similaire à GDPR-focused partner community showcase for Apache Ranger and Apache Atlas (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

GDPR-focused partner community showcase for Apache Ranger and Apache Atlas

Notes de l'éditeur