In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
2. Why You Need Cloudera Navigator
1
2
Many Users Working with the Data
3
2
Lots of Data Landing in Cloudera Enterprise
Need to Effectively Control & Consume Data
Huge quantities
Many different sources – structured & unstructured
Varying levels of sensitivity
Administrators & compliance officers
Analysts & data scientists
Business users
Get visibility & control over the environment
Discover and explore data
3. Cloudera Navigator
Data Management Layer for Cloudera Enterprise
Audit & Access Control
Ensuring appropriate permissions & reporting
on data access for compliance
CLOUDERA NAVIGATOR
Audit &
Access
Control
Discovery & Exploration
Finding out what data is available and
what it looks like
Discovery &
Exploration
Lineage
Lifecycle
Mgmt.
Enterprise Metadata Repository
Business metadata
Lineage metadata
Operational metadata
Lineage
Tracing data back to its original source
CDH
Lifecycle Management
Migration of data based on policies
3
HDFS
HBASE
HIVE
4. Cloudera Navigator 1.0
Data Audit & Access Control
Verify Permissions
View which users and groups have access to
files and directories
IAM / LDAP SYSTEM
Audit Configuration
Configuration of audit tracking for HDFS,
HBase and Hive
Audit Dashboard
Simple, queryable interface to view data access
Information Export
Export audit information for integration with
SIEM tools
4
CLOUDERA NAVIGATOR 1.0
ACCESS
SERVICE
AUDIT LOG
SERVICE
VIEW PERMISSIONS
HDFS
AUDIT LOG CONFIG
AUDIT LOG
COLLECTION
HBASE
3rd PARTY SIEM / GRC SYSTEM
HIVE
5. Benefits of Cloudera Navigator 1.0
Control
Visibility
Verify access permissions to files & directories
Report on data access by user and type
Integration
5
Store sensitive data
Maintain full audit history
The first & only centralized audit tool for Hadoop
View permissions for LDAP/IAM users
Export audit data for integration with 3rd party SIEM tools
6. Navigator Subscription
Data Management Layer
for Hadoop
Centralized audit management &
access control
8x5 or 24x7 support
CLOUDERA
SUPPORT
CLOUDERA
NAVIGATOR
CLOUDERA
MANAGER
CORE
PROJECTS
CLOUDERA
MGR
CLOUDERA
NVGTR
DATA AUDIT
BASIC
FEATURES
IMPALA
SEARCH
ACCESS MGMT
ADVANCED
FEATURES
CDH
Optional add-on to Cloudera
Enterprise subscription
HBASE
BACKUP
& DR
HBASE
CORE PROJECTS
IMPALA
SEARCH
Cloudera Enterprise
6
Navigator Subscription
7. Navigator 2.0 – Q1 2014
•
Manage and explore your data with Cloudera Navigator 2.0 (Q1
2014)
•
•
•
Data Discovery (what data do we have?), Annotations/Tags
Search, explore, define, and tag data sets.
Important for:
•
•
•
•
DBAs/Data Modelers
Self-Service Business Analysts
Data Scientists
Data Lineage (where did the data come from? where is it used?)
For files and tables, MR jobs, Hive queries, Impala queries, Pig scripts,
Sqoop load/export.
• Important for:
•
Risk and compliance audits.
BI users facing 10K tables in HDFS. Which ones are relevant to the source data I
need, or the table I’m looking at?
• Data retention policies, where you need to purge not just the source data, but any
data that’s been derived from it.
•
•
7
8. Navigator 2.0 - Lineage
•
•
•
•
8
Audit data access
Verify access
privileges
Search meta data
Visualize lineage