Enabling data lake architecture with open source

•

3 j'aime•2,085 vues

Kiran Kumar Mavatoor A.H

Enabling data lake architecture with open source big data technologies like hadoop, spark etc

Technologie

What is the
use of Data
Lake ?
- It eliminates silos of data and simplifies the
management.
- It eliminates redundant data movement across
the platforms.
- It facilitates a common platform for data
access, data processing, data analytics and
data presentation.
- Orchestration becomes possible
- Streaming data accomodation possible

Social Media
Sensor
Video
Files
Logs
Enterprise
Transactions
OLTP, ERP, CRM
Data Sources Ingest into
Data Lake
Distributed FS
NoSQL
Data Lake Storage
Batch
Processing
Data Access and Processing
(SparkSQL,M
LIB,SparkR)
Dashboard
Predictive
Model
Stream
Processing
RDBMS
Data Analytics and Presentation
Enabling Data Lake Architecture with Open Source

Data Pipeline
& processing.
- All the data is fed into Hadoop data Lake.
- Data preparation and enrichment as per needs.
- Store the processed data into the data lake or
use in memory db's for low latency
applications.
- Streaming data processing can be done using
Kafka and Flume. Kafka and Flume both allow
connections directly into Hive and HBase, and
Spark can ingest and process data without ever
writing to disk.

Data Pipeline
& processing
- cont.
Spark - Continuous Application
- Spark Streaming can be used for near real time
processing of streams using Structured
Streaming.
- Structured Streaming allows applications to
connect to kafka sources and apply dataset
functions on the infinite tables. Below link gives
the details of continuous application.
https://databricks.com/blog/2016/07/28/continuo
us-applications-evolving-streaming-in-apache-spark-
2-0.html

Recommandé

Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks

Planning and Optimizing Data Lake Architecture - Milos MilovanovicInstitute of Contemporary Sciences

Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks

Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad

Three Steps to Modern Media Asset Management with Active ArchiveAvere Systems

GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNightHyderabad

An adaptive and eventually self healing framework for geo-distributed real-ti...Angad Singh

Digital Media Ingest and Storage Options on AWSAmazon Web Services

Recommandé

Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks

Planning and Optimizing Data Lake Architecture - Milos MilovanovicInstitute of Contemporary Sciences

Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks

Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad

Three Steps to Modern Media Asset Management with Active ArchiveAvere Systems

GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNightHyderabad

An adaptive and eventually self healing framework for geo-distributed real-ti...Angad Singh

Digital Media Ingest and Storage Options on AWSAmazon Web Services

SUN TV NETWORK LIMITEDARVIND D

ximena araneda - The Next Generation MAM SystemsFIAT/IFTA

Sun TvSunny Goyal

MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012Amazon Web Services

VidispineAmazon Web Services

Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks

Designing a Real Time Data Ingestion PipelineDataScience

Creating a Modern Data ArchitectureZaloni

Datalake ArchitectureTechYugadi IT Solutions & Consulting

Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo

Open Source Framework for Deploying Data Science Models and Cloud Based Appli...ETCenter

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni

Incorporating the Data Lake into Your Analytic ArchitectureCaserta

10 Amazing Things To Do With a Hadoop-Based Data LakeVMware Tanzu

Building the Enterprise Data Lake: A look at architecturemark madsen

Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe

Data Lake,beyond the Data WarehouseData Science Thailand

Building a Data Lake on AWSAmazon Web Services

How Tech Giants Cut Corners to Harvest Data for A.I.LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Français Patch Tuesday - AvrilIvanti

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Contenu connexe

En vedette

SUN TV NETWORK LIMITEDARVIND D

ximena araneda - The Next Generation MAM SystemsFIAT/IFTA

Sun TvSunny Goyal

MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012Amazon Web Services

VidispineAmazon Web Services

Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks

Designing a Real Time Data Ingestion PipelineDataScience

Creating a Modern Data ArchitectureZaloni

Datalake ArchitectureTechYugadi IT Solutions & Consulting

Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo

Open Source Framework for Deploying Data Science Models and Cloud Based Appli...ETCenter

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni

Incorporating the Data Lake into Your Analytic ArchitectureCaserta

10 Amazing Things To Do With a Hadoop-Based Data LakeVMware Tanzu

Building the Enterprise Data Lake: A look at architecturemark madsen

Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe

Data Lake,beyond the Data WarehouseData Science Thailand

Building a Data Lake on AWSAmazon Web Services

En vedette (18)

SUN TV NETWORK LIMITED

ximena araneda - The Next Generation MAM Systems

Sun Tv

MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012

Vidispine

Create a Smarter Data Lake with HP Haven and Apache Hadoop

Designing a Real Time Data Ingestion Pipeline

Creating a Modern Data Architecture

Datalake Architecture

Big Data: Architecture and Performance Considerations in Logical Data Lakes

Open Source Framework for Deploying Data Science Models and Cloud Based Appli...

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...

Incorporating the Data Lake into Your Analytic Architecture

10 Amazing Things To Do With a Hadoop-Based Data Lake

Building the Enterprise Data Lake: A look at architecture

Agile Big Data Analytics Development: An Architecture-Centric Approach

Data Lake,beyond the Data Warehouse

Building a Data Lake on AWS

Dernier

How Tech Giants Cut Corners to Harvest Data for A.I.LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Français Patch Tuesday - AvrilIvanti

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Manual 508 Accessibility Compliance AuditSkynet Technologies

Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos

Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

QCon London: Mastering long-running processes in modern architecturesBernd Ruecker

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Connecting the Dots for Information Discovery.pdfNeo4j

Dernier (20)

How Tech Giants Cut Corners to Harvest Data for A.I.

React Native vs Ionic - The Best Mobile App Framework

Français Patch Tuesday - Avril

Potential of AI (Generative AI) in Business: Learnings and Insights

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Manual 508 Accessibility Compliance Audit

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)

Microservices, Docker deploy and Microservices source code in C#

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Generative AI - Gitex v1Generative AI - Gitex v1.pptx

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

QCon London: Mastering long-running processes in modern architectures

UiPath Community: Communication Mining from Zero to Hero

Connecting the Dots for Information Discovery.pdf

Enabling data lake architecture with open source

1. Data Lake Enabling Data Lake Architecture with Open Source Technologies

2. What is Data Lake ? - It's a collection of Raw, Semi Structured, UnStructured, Structured data at one place, enabled by low cost technologies from which downstream applications may use and act. - Can keep the data in its original form like native format, streaming data and big data. It gives high agility to configure, reconfigure data as per needs. - All data sources are considered to form a Data Lake.

3. What is the use of Data Lake ? - It eliminates silos of data and simplifies the management. - It eliminates redundant data movement across the platforms. - It facilitates a common platform for data access, data processing, data analytics and data presentation. - Orchestration becomes possible - Streaming data accomodation possible

4. Social Media Sensor Video Files Logs Enterprise Transactions OLTP, ERP, CRM Data Sources Ingest into Data Lake Distributed FS NoSQL Data Lake Storage Batch Processing Data Access and Processing (SparkSQL,M LIB,SparkR) Dashboard Predictive Model Stream Processing RDBMS Data Analytics and Presentation Enabling Data Lake Architecture with Open Source

5. Data Pipeline & processing. - All the data is fed into Hadoop data Lake. - Data preparation and enrichment as per needs. - Store the processed data into the data lake or use in memory db's for low latency applications. - Streaming data processing can be done using Kafka and Flume. Kafka and Flume both allow connections directly into Hive and HBase, and Spark can ingest and process data without ever writing to disk.

6. Data Pipeline & processing - cont. Spark - Continuous Application - Spark Streaming can be used for near real time processing of streams using Structured Streaming. - Structured Streaming allows applications to connect to kafka sources and apply dataset functions on the infinite tables. Below link gives the details of continuous application. https://databricks.com/blog/2016/07/28/continuo us-applications-evolving-streaming-in-apache-spark- 2-0.html