SlideShare a Scribd company logo
1 of 22
DIRECTION
REDACTOR DATE
BIG DATA PLATFORM
INDUSTRIALIZATION
BIG DATA INITIAVES @Renault
2014
Big Data Sandbox on old
HPC Infrastructure.
Site: Innovation LAB
POC: Quality Data
Exploration
2015
DataLab Implementation
New HP Infrastructure
Data Protection: NO
1st Level of
Industrialization
2016
Big Data Platform
Industrialization to host
both Pocs and Projects in
Production.
Data Protection: YES
3
DIRECTION
REDACTOR DATE
Big Data Deployment Production Stakes
• One Hadoop cluster with a 24/7 always-on visibility of data(instead of siloing them).
• Many crossing Data possibilities
• Simplify Operations
• Design Simplicity
• Charge Back Model
• Scalability and Isolation
• Isolate Experimental applications from Production
4
DIRECTION
REDACTOR DATE
Déploiement des projets Qualité sur DataLake
Big Data
Developpers
Serveur Client
(Clients Hadoop
Installés)
DMZ
KNOX
GATEWAY
Search
Node
Edge
Node
Name
Node
DataStore
Master DNDNDNDNDNDNDNDNDN
DNDNDNDNDNDNDNDNDN
Data Sources
LoadBalancer
Web
Applications
Web service
Import
Access GUI
Search
Node
5
DIRECTION
REDACTOR DATE
Quality
Sales and
Marketing
Supply Chain Engineering
Consumers
Open Data
Internet of Things
Producers
Batch (RDBMS,
Files)
Messages, Logs
Streaming, Data Flow
NFS Gateway, Sqoop,
Spark SQL
FLUME
LOGSTASH
KAFKA PRODUCERS
Kafka
Broker
(Topics)
Spark
Streaming
Elasticsearch
HBASE
HIVE
HDFS
Spark SQL
Spark RDD
Big Data Ecosystem @ Renault
YARN + HDFS
6
DIRECTION
REDACTOR DATE
Data Ingestion Scenarios : RDBMS
RDBMS
Sqoop
Spark SQL
HIVE
Flat Files
HBase
Basic data import based on column id (Integer) or timestamp
Example: SOPHIA
ELT Architecture: Extract Load and Transform
Non standard Data Import, Specific schema
Example: BLMS
Support ETL Architecture : Extract Transfrom and Load
Support Ingestion directly to Elasticsearch
INSERT ONLY
NOCTURNAL BATCH
SQL QUERIES
HIGH LATENCY
INSERT ONLY
NOCTURNAL BATCH
DATA PROCESSING
Files Format: CSV,
PARQUET, AVRO
INSERT AND UPDATE
NOSQL DB (KEY-VALUE SCHEMA)
LOW LATENCY
FOR SCALING OUT RELATIONAL
DB ON HADOOP
INTERACTIVE ANALYTICS
(SPOTFIRE)
VERY LOW LATENCY (SSD DISKS)
NEAR REAL TIME ANALITYCS
(WATCHING ALERTS)
TEXT SEARCH (LOG ANALYSIS)
NESTED and PARENT/CHILD
RELATIONSHIPS
Elasticsearch
7
DIRECTION
REDACTOR DATE
Interactive SQL Data Analytics
Main Objectives
• Speeding up BI queries on Big Data Stores
• Hide Complexity of Big Data Architecture to End-User and Provide Only one Data
Connector for Spotfire
• Provide Interactive User Experience
• Data Virtualization (No need to import RDBMS systematically for Crossing Data)
 Spark SQL (1.6) : The emerging solution for Interactive SQL Data Analytics with the Data
Source API.
8
DIRECTION
REDACTOR DATE
Only One Data Connector
Data Processing Applications
Add In-Memory Capability
Load/Insert Load/Insert Load/Insert
Interactive SQL Data Analytics
HBASEHIVE Elasticsearch
Spark SQL
Files Parquet
DataSource API
Load/Insert
RDBMS
Load/Insert
9
DIRECTION
REDACTOR DATE
Big Data In Action
Multitenancy
High Availability
Security
Data Governance
Policy
Continuous Delivery
Data Protection
Hadoop
Organization
POC#
0
POC#
1
POC#
2
POC#
n
Release
Management
PRODUCTION
SLA & Monitoring
Tenant 1
App 1
Tenant 2
App 1
App 2
Tenant n
App 1
Managementand
Monitoring
DataLab, Open to
Data Exploration
Bi-modal Big Data Platform
One physical Cost –effective platform
11
DIRECTION
REDACTOR DATE
Hadoop Global Data Life Cycle
Data Sources
Load or archive
batch data
Stream real time
data
Mask Sensitive data
with Automated
Process
Renault Big Data
Platform
Refine, curate, process,
query data
Big Data Web
Services
INGESTION
Scheduled and
Monitored by
AITS in PROD
Policies pre-defined by
Security Officer
ADA
ARCA
subscription
request to
datasets
Data
Access
AITS: Groups
Management
DIRx Data
Owner
Validation
Sync
Users
Defined in Ranger
and Protegrity
Defined in
Falcon
12
DIRECTION
REDACTOR DATE
Active / Active Hadoop Platform
Rack Salle 1 C2 Rack Salle 2 C2
Data Nodes
for Bloc Storage
HDFS PROD HDFS POC
High Availability Architecture
13
DIRECTION
REDACTOR DATE
Hadoop Security Levels
OS Security
Authorization
Perimeter Level Security
Protected Zone
Data Protection
Selected Solution :
Tokenization
(Protegrity)
14
DIRECTION
REDACTOR DATE
Tokenization Definition
Selected solution: Tokenization
• Tokenization is a form of data protection that
converts sensitive data into fake data.
• The real data can be retrieved by authorized
users.
• Protegrity: The only Available Solution for Hadoop
(supports also traditional Data Systems)
Data Protection Key Requirements:
• Ability to de-identify Personally Identifiable Data
• Restrict data access (financial data, Data
residency obligations, …)
• Provide central management and control of all
data security operations
15
DIRECTION
REDACTOR DATE
Identifier Clear Protected
Authorized Role 1
* Can see most data in the
clear
Authorized Role 2
* Can see limited data in the
clear
Name Joe Smith csu wusoj Joe Smith Joe Smith
Address 100 Main Street, Pleasantville,
CA
476 srta coetse, cysieondusbak,
HA
100 Main Street, Pleasantville,
CA
“No Access”
Date of Birth 12/25/1966 01/02/1966 12/25/1966 01/02/1966
VIN VF1112C0000724284 AB9875R8467364752 VF1112C0000724284 “No Access”
Credit Card
Number
3678 2289 3907 3378 3846 2290 3371 3890 xxxx xxxx xxxx 3378 3846 2290 3371 3890
E-mail Address joe.smith@surferdude.org eoe.nwuer@beusorpdqo.aku joe.smith@surferdude.org joe.smith@surferdude.org
Telephone
Number
760-278-3389 998-389-2289 760-278-3389 998-389-2289
DATA PROTECTION Example 1
Fine Grained Protection
16
DIRECTION
REDACTOR DATE
DATA PROTECTION Example 2
Data Residency
17
DIRECTION
REDACTOR DATE
The Next Step: HDFS FEDERATION
Name Service 1
/store1/
Name Service 2
/store2/
Name
Node 1
Name
Node 2
Name
Node 3
Name
Node 4
Data
Node 1
Data
Node 2
Data
Node 3
Data
Node 4
Data
Node 5
Data
Node 6
Data
Node 7
Data
Node 8
Federation
Scale-out Data Nodes
Resources usage orchestrated by YARN
HA HA
• see only access /store2
• cannot access to /store1
• cannot access to
Name Node 1 and 2
• see only access /store1
• cannot access to /store2
• cannot access to
Name Node 3 and 4
PHYSICALISOLATION
Privileged Users can
access /store1 and /store2
for Crossing Data
Bloc Storage
Unreadable Data
Data
Node 9
18
DIRECTION
REDACTOR DATE
BIG DATA PROJECT PROCESS
Business
Use Case DIRx
POC PROD Project
Data
Exploration
AT
Data Sources Ingestion
Security Policies
Data Processing development
User Access Authorization
CPT
DIA - Innovation Front End deployment
Admin
@BICC
architecture
development
MEP
Implementation
Deployment
management
DAT
19
DIRECTION
REDACTOR DATE
DATA CHANGE MANAGEMENT
• Publish real time dashboard of all data flows.
• For each data source the producer and the consumers will be displayed:
• Producer: the DIRx generating the Data
• Consumers: all the Big Data applications using this Data source
• If the Producer decides to change the data source schema, He has to send notification
from the dashboard to all consumers.
• By monitoring in real time the logs of data ingestion jobs, the failure detection is more
reactive and efficient.
20
DIRECTION
REDACTOR DATE
COLLECT LOGS FOR (NRT) CARTO
Sqoop
Metastore
Storing Jobs
Hive
Metastore
Storing SQL Schema
Oozie
Metastore
Storing
Workflows
Oozie Logs
Yarn
Job History Logs
Storing
Projects
Hue
Database
Elasticsearch
J
D
B
C
P
L
U
G
I
N
L
O
G
S
T
A
S
H
Kibana
21
DIRECTION
REDACTOR DATE
Dynamic Relationship Portal
Big Data Platform Industrialization

More Related Content

What's hot

Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleDataWorks Summit/Hadoop Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsDataWorks Summit
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceDataWorks Summit/Hadoop Summit
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 

What's hot (20)

Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 

Viewers also liked

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Your transformation through Innovation & industrialization - with a Business ...
Your transformation through Innovation & industrialization - with a Business ...Your transformation through Innovation & industrialization - with a Business ...
Your transformation through Innovation & industrialization - with a Business ...Capgemini
 
Gartner eBook on Big Data
Gartner eBook on Big DataGartner eBook on Big Data
Gartner eBook on Big DataJyrki Määttä
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demoDatabricks
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
Offre Transformation Digitale HMC Conseil Cortambert Consultant
Offre Transformation Digitale HMC Conseil Cortambert ConsultantOffre Transformation Digitale HMC Conseil Cortambert Consultant
Offre Transformation Digitale HMC Conseil Cortambert ConsultantHelene Courtellemont - Mery
 
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemMigrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemPerficient, Inc.
 
Cloudera Impala 1.0
Cloudera Impala 1.0Cloudera Impala 1.0
Cloudera Impala 1.0Minwoo Kim
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesDatabricks
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesSpark Summit
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Sameer Farooqui
 
Scalable And Incremental Data Profiling With Spark
Scalable And Incremental Data Profiling With SparkScalable And Incremental Data Profiling With Spark
Scalable And Incremental Data Profiling With SparkJen Aman
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an examplehadooparchbook
 
Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...
Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...
Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...Jean-Michel Franco
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습동현 강
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Databricks
 
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제NAVER D2
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 

Viewers also liked (20)

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Your transformation through Innovation & industrialization - with a Business ...
Your transformation through Innovation & industrialization - with a Business ...Your transformation through Innovation & industrialization - with a Business ...
Your transformation through Innovation & industrialization - with a Business ...
 
Gartner eBook on Big Data
Gartner eBook on Big DataGartner eBook on Big Data
Gartner eBook on Big Data
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Offre Transformation Digitale HMC Conseil Cortambert Consultant
Offre Transformation Digitale HMC Conseil Cortambert ConsultantOffre Transformation Digitale HMC Conseil Cortambert Consultant
Offre Transformation Digitale HMC Conseil Cortambert Consultant
 
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemMigrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management System
 
Cloudera Impala 1.0
Cloudera Impala 1.0Cloudera Impala 1.0
Cloudera Impala 1.0
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
 
Scalable And Incremental Data Profiling With Spark
Scalable And Incremental Data Profiling With SparkScalable And Incremental Data Profiling With Spark
Scalable And Incremental Data Profiling With Spark
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...
Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...
Big Data : au delà du proof of concept et de l'expérimentation (Matinale busi...
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 

Similar to Big Data Platform Industrialization

The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Integration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speedIntegration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speedKenneth Peeples
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013IntelAPAC
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationVlad Ponomarev
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019George Walters
 
Product overview 6.0 v.1.0
Product overview 6.0 v.1.0Product overview 6.0 v.1.0
Product overview 6.0 v.1.0Gianluigi Riccio
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrSplunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrGeorg Knon
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 

Similar to Big Data Platform Industrialization (20)

The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Integration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speedIntegration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speed
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019
 
Product overview 6.0 v.1.0
Product overview 6.0 v.1.0Product overview 6.0 v.1.0
Product overview 6.0 v.1.0
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrSplunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Big Data Platform Industrialization

  • 1. DIRECTION REDACTOR DATE BIG DATA PLATFORM INDUSTRIALIZATION
  • 2. BIG DATA INITIAVES @Renault 2014 Big Data Sandbox on old HPC Infrastructure. Site: Innovation LAB POC: Quality Data Exploration 2015 DataLab Implementation New HP Infrastructure Data Protection: NO 1st Level of Industrialization 2016 Big Data Platform Industrialization to host both Pocs and Projects in Production. Data Protection: YES
  • 3. 3 DIRECTION REDACTOR DATE Big Data Deployment Production Stakes • One Hadoop cluster with a 24/7 always-on visibility of data(instead of siloing them). • Many crossing Data possibilities • Simplify Operations • Design Simplicity • Charge Back Model • Scalability and Isolation • Isolate Experimental applications from Production
  • 4. 4 DIRECTION REDACTOR DATE Déploiement des projets Qualité sur DataLake Big Data Developpers Serveur Client (Clients Hadoop Installés) DMZ KNOX GATEWAY Search Node Edge Node Name Node DataStore Master DNDNDNDNDNDNDNDNDN DNDNDNDNDNDNDNDNDN Data Sources LoadBalancer Web Applications Web service Import Access GUI Search Node
  • 5. 5 DIRECTION REDACTOR DATE Quality Sales and Marketing Supply Chain Engineering Consumers Open Data Internet of Things Producers Batch (RDBMS, Files) Messages, Logs Streaming, Data Flow NFS Gateway, Sqoop, Spark SQL FLUME LOGSTASH KAFKA PRODUCERS Kafka Broker (Topics) Spark Streaming Elasticsearch HBASE HIVE HDFS Spark SQL Spark RDD Big Data Ecosystem @ Renault YARN + HDFS
  • 6. 6 DIRECTION REDACTOR DATE Data Ingestion Scenarios : RDBMS RDBMS Sqoop Spark SQL HIVE Flat Files HBase Basic data import based on column id (Integer) or timestamp Example: SOPHIA ELT Architecture: Extract Load and Transform Non standard Data Import, Specific schema Example: BLMS Support ETL Architecture : Extract Transfrom and Load Support Ingestion directly to Elasticsearch INSERT ONLY NOCTURNAL BATCH SQL QUERIES HIGH LATENCY INSERT ONLY NOCTURNAL BATCH DATA PROCESSING Files Format: CSV, PARQUET, AVRO INSERT AND UPDATE NOSQL DB (KEY-VALUE SCHEMA) LOW LATENCY FOR SCALING OUT RELATIONAL DB ON HADOOP INTERACTIVE ANALYTICS (SPOTFIRE) VERY LOW LATENCY (SSD DISKS) NEAR REAL TIME ANALITYCS (WATCHING ALERTS) TEXT SEARCH (LOG ANALYSIS) NESTED and PARENT/CHILD RELATIONSHIPS Elasticsearch
  • 7. 7 DIRECTION REDACTOR DATE Interactive SQL Data Analytics Main Objectives • Speeding up BI queries on Big Data Stores • Hide Complexity of Big Data Architecture to End-User and Provide Only one Data Connector for Spotfire • Provide Interactive User Experience • Data Virtualization (No need to import RDBMS systematically for Crossing Data)  Spark SQL (1.6) : The emerging solution for Interactive SQL Data Analytics with the Data Source API.
  • 8. 8 DIRECTION REDACTOR DATE Only One Data Connector Data Processing Applications Add In-Memory Capability Load/Insert Load/Insert Load/Insert Interactive SQL Data Analytics HBASEHIVE Elasticsearch Spark SQL Files Parquet DataSource API Load/Insert RDBMS Load/Insert
  • 9. 9 DIRECTION REDACTOR DATE Big Data In Action Multitenancy High Availability Security Data Governance Policy Continuous Delivery Data Protection Hadoop Organization
  • 10. POC# 0 POC# 1 POC# 2 POC# n Release Management PRODUCTION SLA & Monitoring Tenant 1 App 1 Tenant 2 App 1 App 2 Tenant n App 1 Managementand Monitoring DataLab, Open to Data Exploration Bi-modal Big Data Platform One physical Cost –effective platform
  • 11. 11 DIRECTION REDACTOR DATE Hadoop Global Data Life Cycle Data Sources Load or archive batch data Stream real time data Mask Sensitive data with Automated Process Renault Big Data Platform Refine, curate, process, query data Big Data Web Services INGESTION Scheduled and Monitored by AITS in PROD Policies pre-defined by Security Officer ADA ARCA subscription request to datasets Data Access AITS: Groups Management DIRx Data Owner Validation Sync Users Defined in Ranger and Protegrity Defined in Falcon
  • 12. 12 DIRECTION REDACTOR DATE Active / Active Hadoop Platform Rack Salle 1 C2 Rack Salle 2 C2 Data Nodes for Bloc Storage HDFS PROD HDFS POC High Availability Architecture
  • 13. 13 DIRECTION REDACTOR DATE Hadoop Security Levels OS Security Authorization Perimeter Level Security Protected Zone Data Protection Selected Solution : Tokenization (Protegrity)
  • 14. 14 DIRECTION REDACTOR DATE Tokenization Definition Selected solution: Tokenization • Tokenization is a form of data protection that converts sensitive data into fake data. • The real data can be retrieved by authorized users. • Protegrity: The only Available Solution for Hadoop (supports also traditional Data Systems) Data Protection Key Requirements: • Ability to de-identify Personally Identifiable Data • Restrict data access (financial data, Data residency obligations, …) • Provide central management and control of all data security operations
  • 15. 15 DIRECTION REDACTOR DATE Identifier Clear Protected Authorized Role 1 * Can see most data in the clear Authorized Role 2 * Can see limited data in the clear Name Joe Smith csu wusoj Joe Smith Joe Smith Address 100 Main Street, Pleasantville, CA 476 srta coetse, cysieondusbak, HA 100 Main Street, Pleasantville, CA “No Access” Date of Birth 12/25/1966 01/02/1966 12/25/1966 01/02/1966 VIN VF1112C0000724284 AB9875R8467364752 VF1112C0000724284 “No Access” Credit Card Number 3678 2289 3907 3378 3846 2290 3371 3890 xxxx xxxx xxxx 3378 3846 2290 3371 3890 E-mail Address joe.smith@surferdude.org eoe.nwuer@beusorpdqo.aku joe.smith@surferdude.org joe.smith@surferdude.org Telephone Number 760-278-3389 998-389-2289 760-278-3389 998-389-2289 DATA PROTECTION Example 1 Fine Grained Protection
  • 16. 16 DIRECTION REDACTOR DATE DATA PROTECTION Example 2 Data Residency
  • 17. 17 DIRECTION REDACTOR DATE The Next Step: HDFS FEDERATION Name Service 1 /store1/ Name Service 2 /store2/ Name Node 1 Name Node 2 Name Node 3 Name Node 4 Data Node 1 Data Node 2 Data Node 3 Data Node 4 Data Node 5 Data Node 6 Data Node 7 Data Node 8 Federation Scale-out Data Nodes Resources usage orchestrated by YARN HA HA • see only access /store2 • cannot access to /store1 • cannot access to Name Node 1 and 2 • see only access /store1 • cannot access to /store2 • cannot access to Name Node 3 and 4 PHYSICALISOLATION Privileged Users can access /store1 and /store2 for Crossing Data Bloc Storage Unreadable Data Data Node 9
  • 18. 18 DIRECTION REDACTOR DATE BIG DATA PROJECT PROCESS Business Use Case DIRx POC PROD Project Data Exploration AT Data Sources Ingestion Security Policies Data Processing development User Access Authorization CPT DIA - Innovation Front End deployment Admin @BICC architecture development MEP Implementation Deployment management DAT
  • 19. 19 DIRECTION REDACTOR DATE DATA CHANGE MANAGEMENT • Publish real time dashboard of all data flows. • For each data source the producer and the consumers will be displayed: • Producer: the DIRx generating the Data • Consumers: all the Big Data applications using this Data source • If the Producer decides to change the data source schema, He has to send notification from the dashboard to all consumers. • By monitoring in real time the logs of data ingestion jobs, the failure detection is more reactive and efficient.
  • 20. 20 DIRECTION REDACTOR DATE COLLECT LOGS FOR (NRT) CARTO Sqoop Metastore Storing Jobs Hive Metastore Storing SQL Schema Oozie Metastore Storing Workflows Oozie Logs Yarn Job History Logs Storing Projects Hue Database Elasticsearch J D B C P L U G I N L O G S T A S H Kibana