SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
Data Lake
Enabling Data Lake Architecture with Open
Source Technologies
What is Data
Lake ?
- It's a collection of Raw, Semi Structured,
UnStructured, Structured data at one place,
enabled by low cost technologies from which
downstream applications may use and act.
- Can keep the data in its original form like
native format, streaming data and big data. It
gives high agility to configure, reconfigure
data as per needs.
- All data sources are considered to form a Data
Lake.
What is the
use of Data
Lake ?
- It eliminates silos of data and simplifies the
management.
- It eliminates redundant data movement across
the platforms.
- It facilitates a common platform for data
access, data processing, data analytics and
data presentation.
- Orchestration becomes possible
- Streaming data accomodation possible
Social Media
Sensor
Video
Files
Logs
Enterprise
Transactions
OLTP, ERP, CRM
Data Sources Ingest into
Data Lake
Distributed FS
NoSQL
Data Lake Storage
Batch
Processing
Data Access and Processing
(SparkSQL,M
LIB,SparkR)
Dashboard
Predictive
Model
Stream
Processing
RDBMS
Data Analytics and Presentation
Enabling Data Lake Architecture with Open Source
Data Pipeline
& processing.
- All the data is fed into Hadoop data Lake.
- Data preparation and enrichment as per needs.
- Store the processed data into the data lake or
use in memory db's for low latency
applications.
- Streaming data processing can be done using
Kafka and Flume. Kafka and Flume both allow
connections directly into Hive and HBase, and
Spark can ingest and process data without ever
writing to disk.
Data Pipeline
& processing
- cont.
Spark - Continuous Application
- Spark Streaming can be used for near real time
processing of streams using Structured
Streaming.
- Structured Streaming allows applications to
connect to kafka sources and apply dataset
functions on the infinite tables. Below link gives
the details of continuous application.
https://databricks.com/blog/2016/07/28/continuo
us-applications-evolving-streaming-in-apache-spark-
2-0.html

Contenu connexe

En vedette

SUN TV NETWORK LIMITED
SUN TV NETWORK LIMITEDSUN TV NETWORK LIMITED
SUN TV NETWORK LIMITEDARVIND D
 
ximena araneda - The Next Generation MAM Systems
ximena araneda - The Next Generation MAM Systemsximena araneda - The Next Generation MAM Systems
ximena araneda - The Next Generation MAM SystemsFIAT/IFTA
 
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012Amazon Web Services
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Designing a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion PipelineDesigning a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion PipelineDataScience
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data ArchitectureZaloni
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...ETCenter
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data LakeVMware Tanzu
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 

En vedette (18)

SUN TV NETWORK LIMITED
SUN TV NETWORK LIMITEDSUN TV NETWORK LIMITED
SUN TV NETWORK LIMITED
 
ximena araneda - The Next Generation MAM Systems
ximena araneda - The Next Generation MAM Systemsximena araneda - The Next Generation MAM Systems
ximena araneda - The Next Generation MAM Systems
 
Sun Tv
Sun TvSun Tv
Sun Tv
 
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
 
Vidispine
VidispineVidispine
Vidispine
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Designing a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion PipelineDesigning a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion Pipeline
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 

Dernier

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Dernier (20)

How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Enabling data lake architecture with open source

  • 1. Data Lake Enabling Data Lake Architecture with Open Source Technologies
  • 2. What is Data Lake ? - It's a collection of Raw, Semi Structured, UnStructured, Structured data at one place, enabled by low cost technologies from which downstream applications may use and act. - Can keep the data in its original form like native format, streaming data and big data. It gives high agility to configure, reconfigure data as per needs. - All data sources are considered to form a Data Lake.
  • 3. What is the use of Data Lake ? - It eliminates silos of data and simplifies the management. - It eliminates redundant data movement across the platforms. - It facilitates a common platform for data access, data processing, data analytics and data presentation. - Orchestration becomes possible - Streaming data accomodation possible
  • 4. Social Media Sensor Video Files Logs Enterprise Transactions OLTP, ERP, CRM Data Sources Ingest into Data Lake Distributed FS NoSQL Data Lake Storage Batch Processing Data Access and Processing (SparkSQL,M LIB,SparkR) Dashboard Predictive Model Stream Processing RDBMS Data Analytics and Presentation Enabling Data Lake Architecture with Open Source
  • 5. Data Pipeline & processing. - All the data is fed into Hadoop data Lake. - Data preparation and enrichment as per needs. - Store the processed data into the data lake or use in memory db's for low latency applications. - Streaming data processing can be done using Kafka and Flume. Kafka and Flume both allow connections directly into Hive and HBase, and Spark can ingest and process data without ever writing to disk.
  • 6. Data Pipeline & processing - cont. Spark - Continuous Application - Spark Streaming can be used for near real time processing of streams using Structured Streaming. - Structured Streaming allows applications to connect to kafka sources and apply dataset functions on the infinite tables. Below link gives the details of continuous application. https://databricks.com/blog/2016/07/28/continuo us-applications-evolving-streaming-in-apache-spark- 2-0.html