Contenu connexe
Similaire à Developing a Strategy for Data Lake Governance (20)
Developing a Strategy for Data Lake Governance
- 1. Developing a Strategy for
Data Lake Governance
Tony Baer, Principal Analyst, Information Management
tony.baer@ovum.com
@TonyBaer
- 2. Ovum | TMT intelligence | informa2 Copyright © Informa PLC
Agenda
Why are we having this conversation?
Why is governance critical?
How to govern the Data Lake?
- 3. Ovum | TMT intelligence | informa3 Copyright © Informa PLC
Let’s go to the polls
Where is your organization on the Data Lake journey?
Check one Already
implementing
Starting to
implement
Considering
implementation
No current plans
to implement
- 4. Ovum | TMT intelligence | informa4 Copyright © Informa PLC
Getting
the data
Profiting
from data
Seeking value from Big Data:
The Journey
- 5. Ovum | TMT intelligence | informa5 Copyright © Informa PLC
Getting
the data
Profiting
from data
Seeking value from Big Data:
The Journey
Core assumption: The Data Lake is a shared
enterprise resource
- 6. Ovum | TMT intelligence | informa6 Copyright © Informa PLC
Group
Log analytics
Sentiment Analysis
DW offload
The journey to Data Lake starts small
- 7. Ovum | TMT intelligence | informa7 Copyright © Informa PLC
Group Multi-department
Log analytics
Sentiment Analysis
DW offload
Exploratory Analytics
LOB analytic applications
Operational analytics
Success spreads…
- 8. Ovum | TMT intelligence | informa8 Copyright © Informa PLC
Group Multi-department Enterprise
Log analytics
Sentiment Analysis
DW offload
Data Lake
Exploratory Analytics
Line of business analytic applications
Operational analytics
The Data Lake is the culmination of the journey, not the start
- 9. Ovum | TMT intelligence | informa9 Copyright © Informa PLC
Why is governance critical?
Costs out of control
- 10. Ovum | TMT intelligence | informa10 Copyright © Informa PLC
Why is governance critical?
Costs out of control
Privacy, legal &
regulatory
compliance issues
- 11. Ovum | TMT intelligence | informa11 Copyright © Informa PLC
Why is governance critical?
Costs out of control
Privacy, legal &
regulatory
compliance issues
Untrustworthy
data
- 12. Ovum | TMT intelligence | informa12 Copyright © Informa PLC
How to govern the Data Lake
How to make the content
of your data lake
transparent?
- 13. Ovum | TMT intelligence | informa13 Copyright © Informa PLC
Availability/Reliability
(FT,HA,BackupDR)
Monitoring&troubleshooting
Perimeter
Security
END USER TIER
Data Lake building block
Hadoop platform management
End user tool
Data Lake governance reference architecture
DATA INVENTORY TIER
DATA SECURITY TIER
OPTIMIZATION TIER
DATA PLATFORM TIER
- 14. Ovum | TMT intelligence | informa14 Copyright © Informa PLC
Availability/Reliability
(FT,HA,BackupDR)
Monitoring&troubleshooting
Perimeter
Security
Data platform (Hadoop)
Query/Analytics tools, programs
Cost Optimization & Integration
Physical Inventory
Curation
Data-level security
Self-
service
tier
Data Lake building block
Hadoop platform management
End user tool
Data Lake governance functions
- 15. Ovum | TMT intelligence | informa15 Copyright © Informa PLC
Curation
Build your library of
information
Physical Inventory
Know/manage what data is in the
data lake
Data profiling, data preparation,
collaborative data enrichment,
catalog, match data, derive master
data, record data lineage
Business & Analytics teams Technology team
Manage data access, track data
lineage, tag for security, data
retention
Manage data access, tag for
security, data retention, lifecycle &
workflow, track data lineage
Data Inventory tier
- 16. Ovum | TMT intelligence | informa16 Copyright © Informa PLC
Data Security & Data Lake Optimization tiers
Security
Data Protection – policy-based masking,
encryption
Authorization, accounting & access control
(AAA)
Perimeter security & remote authentication are
functions of the core data platform
Optimization
Integration with other data platforms
Import/Export
Remote/federated/pushdown query processing
Lifecycle/workflow
Data retention policy?
Storage tiering?
- 17. Ovum | TMT intelligence | informa17 Copyright © Informa PLC
Governance: How Data Lakes compare to EDWs
90%
50%
90%
50%
30% confidence level
EDW provides good starting point
Core building blocks of governance are similar, but
approaches differ
Data Inventory
Flexible, evolving schema
Quality critical, but adjust to need
Business users exert key roles
IT still provides adult supervision
Security
Greater varieties of data, use of external data sources, and
(arguably) broader user constituencies demand more granular
approaches to data protection
Optimization
Just as important as any EDW. Workloads must be prioritized
Lifecycle
Sleeper issue for Data Lakes
90%
- 18. Ovum | TMT intelligence | informa18 Copyright © Informa PLC
Takeaways
The Data Lake is a shared enterprise resource
It is a later, mature stage of Hadoop adoption
Exploratory analytics is a great way to sell business users on the value proposition of
the Data Lake
Why governance? Because the data lake is an enterprise data resource
Governance will adapt & extend practices from EDW
Greater variety of data sources demands greater scrutiny for security, data retention &
lifecycle management practices
Data lineage is critical!!!
Like any enterprise data platform, workloads must be prioritized
- 19. Ovum | TMT intelligence | informa19 Copyright © Informa PLC
There is no silver bullet recipe for Data
Lake Governance
- 20. Ovum | TMT intelligence | informa20 Copyright © Informa PLC
Thank you
Tony Baer
Ovum
(646) 546-5330
tony.baer@ovum.com
Twitter: @TonyBaer