Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Walking Around the Data Lake
1. Walking around the
Data Lake
Fatima Naqui
Sr Big Data Engineer at
Fidelity Investments
Naqui.Fatima@gmail.com
linkedin.com/in/fatimanaqui
2. Agenda
• What is a Data Lake?
• Why do I need a Data Lake?
• High Level Architecture
• Principles of a Data Lake
• Data Lake layers and components
• Ingestion
• Metadata management
• Data Lineage and Governance
• Data Curation
• Technical Architectures from Cloud providers
• Q&A
3.
4. What is a
Data Lake?
Gartner’s definition of a Data Lake
“A data lake is a collection of storage instances of
various data assets. These assets are stored in a
near-exact, or even exact, copy of the source
format and are in addition to the originating data
stores.”
5. Why do I need
a Data Lake?
• Cost effective way to deal with big data
challenges
• Single source of truth that can be reused
• Cheaper and flexible alternative to batch
workloads
• Increase analytics and operational agility
11. Data Lineage and Governance
Data lineage deals with data's origins
Where data moves over time?
What happens to the data?
How is the data transformed?
Data governance is the process of
managing
Availability
Usability
Security
Integrity
12. Data Curation
Optimize data from its raw
form to be made consumable
Source of insights Enrich the data with analytic
and algorithmic functions