SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
DataWorks Summit 2018 - Berlin
Building Audi's enterprise big data platform
Matthias Graunitz (AUDI AG, Germany)
Carsten Herbe (Audi Business Innovation GmbH, Germany)
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform2
WHO ARE WE?
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform3
Audi Group
Audi, Lamborghini, Ducati and Italdesign
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform4
Vorsprung is our promise
Strategy 2025
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform5
Audi Business Innovation GmbH
...is the development, establishment, sales and operation of
innovative concepts, products and services, as well the holding
of shares in the field of future mobility.
Audi mobility
innovations
Audi on demand
Audi balanced
technologies
Audi e-gas
Audi customer
IT solutions
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform6
About us
Matthias Graunitz
AUDI AG
» Center of Competence Big Data & BI
» Big Data Architect
» 10+ years Data Warehousing & BI
Carsten Herbe
Audi Business Innovation GmbH
» Data Platform & Solution Architecture
» Hadoop since 2013
» 10+ years Data Warehousing & BI
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform7
2 YEARS AGO…
STARTING BIG DATA
AT AUDI
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform8
Analytical Capabilities by 2015
! Data Domains
Finance
Purchase
Production
Quality
Sales
Car Data
Programs Projects Data Scientists
Embed Analytics
Analyze Data
Store, Distribute and Process Data
Deliver InformationSecure
Data
Infrastruc-
ture &
Services
Provision Data
Deliver Service
Manage
Infor-
mation
Design &
Maintain
Solutions
Authentifi-
cation
Data
Encryption
Auditing
Complex
Event
Processing
Analyitcal
APIs
Dash-
boarding
Planning &
Simulation
Visual
Analytics
BI Report &
OLAP
Statistical
Methods
Analytical
Script
Data
Warehouse
Analytical
Databases
ETL
Framework
Batch
Processing
Data Access /
APIs
On-Prem
Platform
Application
Deployment
Hardware,
Network, OS
Monitoring
Lifecycle
Mgmt
Development
Process &
Methods
Master Data
Mgmt
Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform9
Analytical Capabilities by 2015
! Data Domains
Finance
Purchase
Production
Quality
Sales
Car Data
Programs Projects Data Scientists
Embed Analytics
Analyze Data
Store, Distribute and Process Data
Deliver InformationSecure
Data
Infrastruc-
ture &
Services
Provision Data
Deliver Service
Manage
Infor-
mation
Design &
Maintain
Solutions
Authentifi-
cation
Data
Encryption
Auditing
Complex
Event
Processing
Analyitcal
APIs
Dash-
boarding
Planning &
Simulation
Visual
Analytics
BI Report &
OLAP
Statistical
Methods
Analytical
Script
Data
Warehouse
Analytical
Databases
ETL
Framework
Batch
Processing
Data Access /
APIs
On-Prem
Platform
Application
Deployment
Hardware,
Network, OS
Monitoring
Lifecycle
Mgmt
Development
Process &
Methods
Master Data
Mgmt
Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform10
Analytical Capabilities by 2015
! Data Domains
Finance
Purchase
Production
Quality
Sales
Car Data
Programs Projects Data Scientists
Embed Analytics
Analyze Data
Store, Distribute and Process Data
Deliver InformationSecure
Data
Infrastruc-
ture &
Services
Provision Data
Deliver Service
Manage
Infor-
mation
Design &
Maintain
Solutions
Authentifi-
cation
Data
Encryption
Auditing
Complex
Event
Processing
Analyitcal
APIs
Dash-
boarding
Planning &
Simulation
Visual
Analytics
BI Report &
OLAP
Statistical
Methods
Analytical
Script
Data
Warehouse
Analytical
Databases
ETL
Framework
Batch
Processing
Data Access /
APIs
On-Prem
Platform
Cloud
Platform
Application
Deployment
Hardware,
Network, OS
Monitoring
Lifecycle
Mgmt
Development
Process &
Methods
Master Data
Mgmt
Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
File Systems
(HDFS)
Stream
Processing
Machine
Learning
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform11
Our first Hadoop Cluster 2015
Hadoop per node Sum
# data nodes 1 4
RAM 128 GB 0,5 TB
Cores 24 96
HDD* 40 TB 160 TB
DEV
* Raw Capacity without replication and FS overhead!
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform12
Our first attempt to walk
with Big Data Technologies
SCREWDRIVER
ANALYSIS
COMPANY CAR
ANALYSIS
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform13
ENTERPRISE INTEGRATION
VS
SPEED OF DELIVERY
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform14
Securing the Cluster as multi-tenant environment
Step by step by step towards our target architecture …
Access Control:
ACLs
User Management
Local OS users
Basic Security:
iptables + ssh tunneling
Authentication:
LDAP for Hive
Protection from outside:
Knox
Protection from inside
Kerberos
Access Control
File Attributes
Dedicated network:
BI Zone
Access Control & Audit
Ranger
User Management
LDAP
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform15
Legend: password required no password required next step
Password Hell
Hive
WebHDFS
SparkUI
HDFS/YARNKnox
Audi Active Directory:
[ AD User ]
Named User
Technical Hive User
DATA NODE 1 - X
NAME NODE 1 - 2
EDGE NODE 1 - 2
OS Level
[ Local User ]
OS Named User
Technical Hive User
Technical Project User
Hadoop User
SSH 2
EdgeNode
kinit
Hadoop KDC:
[ Kerberos Principal ]
Name User
Technical Hive User
Technical Project User
Hadoop User
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform16
DATA
INGESTION
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform17
Data ingestion:
technical requirements from projects, security and ops
» Streaming data
» Batch data
» easy writing to HDFS/DWH
INGESTION
» Data Sources should not directly be coupled to analytical backend jobs
» This allows adding new analytical jobs without changing the source
DECOUPLING
» Data ingestion must be available 24x7
» Data must be buffered (persisted) in case backend or backend job is not available
HA & BUFFERING
» Source systems must not connect directly to the data zone (Hadoop, DWH) – by IT Sec
» Authentication + Data in motion encryption (multi tenancy)
» Protocol must be auditable
» Some data sources run in the cloudSECURITY
» Amount of data will increase over time for most projects
» Number of projects will increase
SCALABILITY
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform18
Solution: Kerberized Confluent Kafka Platform
FW
BI
FW
MSG
FW
MSG
FW
SRC
#1
BI Data ZoneData Source network #1 AAP Messaging Zone
authenticationLegend: encrypted (SSL) not encrypted protocol / direction
Data Source network #n FW
SRC
#n
firewall pain point
Schema Registry
HTTP HTTP
none noneKafka Client
Kerberos
BIN / push
Kafka Client
Kerberos
BIN / push
HDFS Connector
BIN / pull
Hadoop KDC
Kerberos
HDFS
Kerberos
Spark Streaming
BIN / pull
Kerberos
DataProxy KDC
Kafka Broker
Kerberos Kerberos
BIN BIN
Zookeeper
Kerberos Kerberos
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform19
Edge Node
Kafka Distributed Connector: unsecured REST API
User Bob
Connector
Java
Process
Bob‘s Kafka
keytab
Bob‘s HDFS
keytab
HDFS Sink
Bob
HTTP
Bob’s
data
sink
config
Bob
topic Bob
User Eve
sink
config
Eve
File Sink
Eve
Bob’s
data
HDFS
Source
Eve
source
config
Eve
Legend: evil connection good connection
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform20
TODAY
CURRENT STATE
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform21
Architecture & Network Zones – Data Ingestion
Data Proxy
BI Data Zone
Messaging Zone
Data Warehouse
System A
System A
HDFS
Connector
Spark
StreamingCloud App
System
System
Legend: encrypted (SSL) not encrypted
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform22
Architecture & Network Zones – User & Developer Access
PIPE
BI Data Zone
Deployment Zone
BI Application Zone
AAP Data Warehouse
Audi Office LAN
Audi Laptop
Data Mining
Dashboarding
AAP
Remote Desktop
Legend: encrypted (SSL) not encrypted
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform23
Hadoop Cluster Sizing Production 2017
* Raw Capacity without replication and FS overhead!
Hadoop per node Sum
# data
nodes
1 12
RAM 512 GB 6 TB
Cores 24 288
HDD* 96 TB 9.216 TB
PROD
Kafka per node Sum
# broker
nodes
1 4
RAM 32 GB 128 GB
Cores 6 24
HDD* 4 TB 16 TB
PROD
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform24
Current state
Organisational Tasks
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform25
Organisational Tasks
 Data Ownership & Data Governance
(Data Domain Modell with clear responsibility in each domain)
 Lifecycle Management for each Shared Service in strong collaboration
with the projects and programs
 Defined SLAs for each Shared Service based on general availability, data
loss, confidentiality and verifiability
 Different Development Lifecycle between car and backend systems
 Use of Open Source Software and Support requirements from IT
continuity
 Balance between multi tenant environment and flexibility
 Very long lifecycle of cars > 10 years with various built in software
versions
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform26
TOMORROW
WHAT’S UP NEXT
AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform27
Hybrid Approach for the AAP
Public Cloud
On Premise /
private Cloud
Entry Zone Application Zone Data Zone
Web
GATEWAY
Full Client
(Tableau, BO, etc.)
Web Client
(Tableau, BO, etc.)
HDP
Data
Warehouse
Messaging Zone
Kafka
Internet
RDP
GATEWAY
Business User
Ingestor 1*
Repositories
Knox
Direct Cloud Connect
Swarm VPC
KafkaData
Inventory
Analytical VPC
Ingestor
HDP
Knox
WE ARE HIRING
https://www.audi.com/corporate/de/karriere/einstieg-bei-audi.html
https://karriere.audibusinessinnovation.com/

Contenu connexe

Tendances

DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
How to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First ApproachHow to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First ApproachPrecisely
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
IIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the Key
IIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the KeyIIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the Key
IIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the KeyAustraliaChapterIIBA
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata ManagementDATAVERSITY
 
From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...Amazon Web Services Korea
 
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...CzechDreamin
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloudTu Pham
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Data Strategy
Data StrategyData Strategy
Data Strategysabnees
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
DAS Slides: Data Modeling Case Study — Business Data Modeling at Kiewit
DAS Slides: Data Modeling Case Study — Business Data Modeling at KiewitDAS Slides: Data Modeling Case Study — Business Data Modeling at Kiewit
DAS Slides: Data Modeling Case Study — Business Data Modeling at KiewitDATAVERSITY
 
CDMP Overview Professional Information Management Certification
CDMP Overview Professional Information Management CertificationCDMP Overview Professional Information Management Certification
CDMP Overview Professional Information Management CertificationChristopher Bradley
 

Tendances (20)

DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance Framework
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
How to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First ApproachHow to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First Approach
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management
 
IIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the Key
IIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the KeyIIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the Key
IIBA® Sydney Unlocking the Power of Low Code No Code: Why BAs Hold the Key
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
 
From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...
 
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Cloud Transformation
Cloud TransformationCloud Transformation
Cloud Transformation
 
DAS Slides: Data Modeling Case Study — Business Data Modeling at Kiewit
DAS Slides: Data Modeling Case Study — Business Data Modeling at KiewitDAS Slides: Data Modeling Case Study — Business Data Modeling at Kiewit
DAS Slides: Data Modeling Case Study — Business Data Modeling at Kiewit
 
CDMP Overview Professional Information Management Certification
CDMP Overview Professional Information Management CertificationCDMP Overview Professional Information Management Certification
CDMP Overview Professional Information Management Certification
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 

Similaire à Audi's Big Data Platform - Building Audi's Enterprise Big Data Platform

Audi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid CloudAudi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid CloudDataWorks Summit
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...Alluxio, Inc.
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingJan Wiegelmann
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
NA Adabas & Natural User Group Meeting April 2023
NA Adabas & Natural User Group Meeting April 2023NA Adabas & Natural User Group Meeting April 2023
NA Adabas & Natural User Group Meeting April 2023Software AG
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Pactera_US
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...Anand Haridass
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXtsigitnist02
 
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.aiCase Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.aiPronovix
 
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudCase Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudDATAVERSITY
 
A Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentA Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentDenodo
 
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssenDatenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssenDenodo
 
Schnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten DatenstrategieSchnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten DatenstrategieMongoDB
 
The Eco-System of AI and How to Use It
The Eco-System of AI and How to Use ItThe Eco-System of AI and How to Use It
The Eco-System of AI and How to Use Itinside-BigData.com
 
Generali connection platform_full
Generali connection platform_fullGenerali connection platform_full
Generali connection platform_fullconfluent
 

Similaire à Audi's Big Data Platform - Building Audi's Enterprise Big Data Platform (20)

Audi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid CloudAudi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid Cloud
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
NA Adabas & Natural User Group Meeting April 2023
NA Adabas & Natural User Group Meeting April 2023NA Adabas & Natural User Group Meeting April 2023
NA Adabas & Natural User Group Meeting April 2023
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.aiCase Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
 
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudCase Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
 
A Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentA Data Fabric for All Things Intelligent
A Data Fabric for All Things Intelligent
 
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssenDatenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
 
Schnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten DatenstrategieSchnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten Datenstrategie
 
The Eco-System of AI and How to Use It
The Eco-System of AI and How to Use ItThe Eco-System of AI and How to Use It
The Eco-System of AI and How to Use It
 
Generali connection platform_full
Generali connection platform_fullGenerali connection platform_full
Generali connection platform_full
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Audi's Big Data Platform - Building Audi's Enterprise Big Data Platform

  • 1. DataWorks Summit 2018 - Berlin Building Audi's enterprise big data platform Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany)
  • 2. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform2 WHO ARE WE?
  • 3. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform3 Audi Group Audi, Lamborghini, Ducati and Italdesign
  • 4. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform4 Vorsprung is our promise Strategy 2025
  • 5. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform5 Audi Business Innovation GmbH ...is the development, establishment, sales and operation of innovative concepts, products and services, as well the holding of shares in the field of future mobility. Audi mobility innovations Audi on demand Audi balanced technologies Audi e-gas Audi customer IT solutions
  • 6. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform6 About us Matthias Graunitz AUDI AG » Center of Competence Big Data & BI » Big Data Architect » 10+ years Data Warehousing & BI Carsten Herbe Audi Business Innovation GmbH » Data Platform & Solution Architecture » Hadoop since 2013 » 10+ years Data Warehousing & BI
  • 7. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform7 2 YEARS AGO… STARTING BIG DATA AT AUDI
  • 8. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform8 Analytical Capabilities by 2015 ! Data Domains Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists Embed Analytics Analyze Data Store, Distribute and Process Data Deliver InformationSecure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage AAP – AUDI ANALYTIC PLATTFORM
  • 9. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform9 Analytical Capabilities by 2015 ! Data Domains Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists Embed Analytics Analyze Data Store, Distribute and Process Data Deliver InformationSecure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage AAP – AUDI ANALYTIC PLATTFORM
  • 10. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform10 Analytical Capabilities by 2015 ! Data Domains Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists Embed Analytics Analyze Data Store, Distribute and Process Data Deliver InformationSecure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Cloud Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage AAP – AUDI ANALYTIC PLATTFORM File Systems (HDFS) Stream Processing Machine Learning
  • 11. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform11 Our first Hadoop Cluster 2015 Hadoop per node Sum # data nodes 1 4 RAM 128 GB 0,5 TB Cores 24 96 HDD* 40 TB 160 TB DEV * Raw Capacity without replication and FS overhead!
  • 12. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform12 Our first attempt to walk with Big Data Technologies SCREWDRIVER ANALYSIS COMPANY CAR ANALYSIS
  • 13. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform13 ENTERPRISE INTEGRATION VS SPEED OF DELIVERY
  • 14. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform14 Securing the Cluster as multi-tenant environment Step by step by step towards our target architecture … Access Control: ACLs User Management Local OS users Basic Security: iptables + ssh tunneling Authentication: LDAP for Hive Protection from outside: Knox Protection from inside Kerberos Access Control File Attributes Dedicated network: BI Zone Access Control & Audit Ranger User Management LDAP
  • 15. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform15 Legend: password required no password required next step Password Hell Hive WebHDFS SparkUI HDFS/YARNKnox Audi Active Directory: [ AD User ] Named User Technical Hive User DATA NODE 1 - X NAME NODE 1 - 2 EDGE NODE 1 - 2 OS Level [ Local User ] OS Named User Technical Hive User Technical Project User Hadoop User SSH 2 EdgeNode kinit Hadoop KDC: [ Kerberos Principal ] Name User Technical Hive User Technical Project User Hadoop User
  • 16. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform16 DATA INGESTION
  • 17. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform17 Data ingestion: technical requirements from projects, security and ops » Streaming data » Batch data » easy writing to HDFS/DWH INGESTION » Data Sources should not directly be coupled to analytical backend jobs » This allows adding new analytical jobs without changing the source DECOUPLING » Data ingestion must be available 24x7 » Data must be buffered (persisted) in case backend or backend job is not available HA & BUFFERING » Source systems must not connect directly to the data zone (Hadoop, DWH) – by IT Sec » Authentication + Data in motion encryption (multi tenancy) » Protocol must be auditable » Some data sources run in the cloudSECURITY » Amount of data will increase over time for most projects » Number of projects will increase SCALABILITY
  • 18. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform18 Solution: Kerberized Confluent Kafka Platform FW BI FW MSG FW MSG FW SRC #1 BI Data ZoneData Source network #1 AAP Messaging Zone authenticationLegend: encrypted (SSL) not encrypted protocol / direction Data Source network #n FW SRC #n firewall pain point Schema Registry HTTP HTTP none noneKafka Client Kerberos BIN / push Kafka Client Kerberos BIN / push HDFS Connector BIN / pull Hadoop KDC Kerberos HDFS Kerberos Spark Streaming BIN / pull Kerberos DataProxy KDC Kafka Broker Kerberos Kerberos BIN BIN Zookeeper Kerberos Kerberos
  • 19. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform19 Edge Node Kafka Distributed Connector: unsecured REST API User Bob Connector Java Process Bob‘s Kafka keytab Bob‘s HDFS keytab HDFS Sink Bob HTTP Bob’s data sink config Bob topic Bob User Eve sink config Eve File Sink Eve Bob’s data HDFS Source Eve source config Eve Legend: evil connection good connection
  • 20. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform20 TODAY CURRENT STATE
  • 21. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform21 Architecture & Network Zones – Data Ingestion Data Proxy BI Data Zone Messaging Zone Data Warehouse System A System A HDFS Connector Spark StreamingCloud App System System Legend: encrypted (SSL) not encrypted
  • 22. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform22 Architecture & Network Zones – User & Developer Access PIPE BI Data Zone Deployment Zone BI Application Zone AAP Data Warehouse Audi Office LAN Audi Laptop Data Mining Dashboarding AAP Remote Desktop Legend: encrypted (SSL) not encrypted
  • 23. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform23 Hadoop Cluster Sizing Production 2017 * Raw Capacity without replication and FS overhead! Hadoop per node Sum # data nodes 1 12 RAM 512 GB 6 TB Cores 24 288 HDD* 96 TB 9.216 TB PROD Kafka per node Sum # broker nodes 1 4 RAM 32 GB 128 GB Cores 6 24 HDD* 4 TB 16 TB PROD
  • 24. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform24 Current state Organisational Tasks
  • 25. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform25 Organisational Tasks  Data Ownership & Data Governance (Data Domain Modell with clear responsibility in each domain)  Lifecycle Management for each Shared Service in strong collaboration with the projects and programs  Defined SLAs for each Shared Service based on general availability, data loss, confidentiality and verifiability  Different Development Lifecycle between car and backend systems  Use of Open Source Software and Support requirements from IT continuity  Balance between multi tenant environment and flexibility  Very long lifecycle of cars > 10 years with various built in software versions
  • 26. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform26 TOMORROW WHAT’S UP NEXT
  • 27. AUDI AG DataWorks Summit 2018 - Building Audi's enterprise big data platform27 Hybrid Approach for the AAP Public Cloud On Premise / private Cloud Entry Zone Application Zone Data Zone Web GATEWAY Full Client (Tableau, BO, etc.) Web Client (Tableau, BO, etc.) HDP Data Warehouse Messaging Zone Kafka Internet RDP GATEWAY Business User Ingestor 1* Repositories Knox Direct Cloud Connect Swarm VPC KafkaData Inventory Analytical VPC Ingestor HDP Knox