This document summarizes Audi's journey in building a hybrid Hadoop platform between their on-premise data centers and the AWS cloud. It describes how Audi formed an agile team with internal and external experts to build out the Hybrid Audi Analytic Platform (HAAP) using technologies like Hadoop, Kafka, and FreeIPA across environments. The project aimed to provide a single platform and user experience spanning on-premise and cloud while taking advantage of cloud scalability and functionality. Lessons learned included the need for strong automation, security considerations, and knowledge sharing between distributed teams.
Nell’iperspazio con Rocket: il Framework Web di Rust!
Audi's Hybrid Hadoop Journey
1. DataWorks Summit 2019 - Barcelona
Audi‘s Hadoop Journey into the Hybrid Cloud
Carsten Herbe (Audi Business Innovation GmbH, Germany)
2. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe2
About us
3. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe3
Audi AG
1,8 million cars per year*, 90.000 employees worldwide*
* source: https://www.audi.com/de/company.html
4. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe4
Audi mobility
innovations
Audi on demand
Audi balanced
technologies
Audi e-gas
Audi customer
IT solutions
Audi Business Innovation GmbH
Munich based subsidiary of Audi AG
Carsten Herbe
Audi Business Innovation GmbH
» Data Platform & Solution Architecture
» Technical Product Owner & Architect for Cloud Hadoop
» 5 years Hadoop, 3 years Kafka, 1 year AWS
» 10+ years Data Warehousing & BI
5. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe5
HAAP – Hybrid Audi Analytic Platform
Big Data Capabilities & Focus data domains
! Data Domains
Finance
Purchase
Production
Quality
Sales
Car Data
Programs Projects Data Scientists
Embed Analytics
Analyze Data
Store, Distribute and Process Data
Deliver InformationSecurity
Infrastructure &
Services
Provision Data
Deliver Service
Manage
Information
Design &
Maintain
Solutions
Authentifi-
cation
Data
Encryption
Auditing
Complex Event
Processing
Analytical APIs
Dash-
boarding
Planning &
Simulation
Visual
Analytics
BI Report &
OLAP
Statistical
Methods
Analytical
Script
Data
Warehouse
Analytical
Databases
ETL Framework
Batch
Processing
Data Access /
APIs
On-Prem
Platform
Cloud Platform
Application
Deployment
Hardware,
Network, OS
Monitoring
Lifecycle Mgmt
Development
Process &
Methods
Master Data
Mgmt
Data Lineage
HAAP – HYBRID AUDI ANALYTIC PLATTFORM
File Systems
(HDFS)
Stream
Processing
Machine
Learning
6. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe6
Why cloud?
7. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe7
Audi’s motivation to extend its Hadoop platform to the cloud
• Audi is moving many applications to the cloud
• Data of one important use case is already in the cloud
Data “Locality”
• Scaling clusters: number of nodes, node types, …
• Scaling stages: testing new features, upgrades, …
Scalability
• Adding nodes with GPUs
• Use a more flexible staging process
• Cloud services: S3, RDS, Docker Registry, …
• Reducing work on infrastructure
Functionality
8. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe8
Goals
One platform as a hybrid solution
• Some related system are currently only on-premise:
• DWH, Reporting Tool, …
• Some data sources remain on-premise (e.g. manufacturing)
Hybrid
• Write once, run everywhere: identical tech stack
• Single sign-on: on-prem principals used for cloud
• Data: easy data movement & shared metadata
One platform
9. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe9
Project Setup
10. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe10
Team setup & project mode
• Companies: internal (Audi + ABI) + external (2 partner + HWX)
• Bases: 4 cities in 2 countries
• Nationalities: 5 different nationalities
Mixed Team
• Scrum based
• Weekly 2 days on-site workshop at the Audi project office
• Tools: Jira, Bitbucket, RocketChat
Collaboration
• get experts on various topics (devops, Hadoop, AWS) together
• Knowledge transfer from external to internal
Goals
12. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe12
Choice of Technologies
13. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe13
Finding the best fitting tech stack for Audi
• CloudFormation
• Terraform
AWS Infrastructure
setup
Terraform
• already used by other projects
• Terraform + Bash
• Ansible
• …
Configuration
Management
Ansible
• switched from Bash as complexity
increased
• already used by other projects
• Ambari Blueprints
• Cloudbreak
Hadoop Deployment
Ambari Blueprints
• Cloudbreak is difficult to integrate
into existing environment
• No versioning with Cloudbreak yet
• Local users manually
• Integrate with corporate
AD/LDAP
• Our own FreeIPA
User management
FreeIPA
• AD integration was not possible (yet)
• Highest flexibility (+AD later)
• DNS, Certificate Authority
14. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe14
Hybrid Architecture
15. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe15
HAAP Architecture – Big Picture
FW XTR
AAP messaging zone AAP data zone
Kafka Data Warehouse
AAP BI App Zone
Tableau
FW LSZ FW LSZ
on premise
KDC
HDP KDC
Splunk
FW XTR
AWS Frankfurt – CAAP VPC AWS Ireland
Kafka
Deploy
Automation
AWS Frankfurt - Hub VPC
public cloud
CAAP
KDC
FreeIPA
FW Cloud
DXC
16. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe16
High-level AWS network architecture
hub VPC
Cisco Router
Direct Connect
VPG
Spoke VPC C
Spoke VPC D
Spoke VPC A
Spoke VPC B
VPG
VPG
VPG
Cloud
On-Premise
FW Cloud
WAN Distri
17. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe17
Cloud Hadoop Platform: detailed view
mgmt public subnet
mgmt private subnet
blue public subnet
blue private hdp subnet
Cisco Router
bastion
deploy FreeIPA
IGW
DXC
NAT GW
IGWNAT GW
VPG
Ambari KDC
Edge
1
Master
1
Data
1
Data
2
Data
3
LLAP
1
SG bastion
SG deploy
SG edge
SG IDM
SG master
SG workerSG Ambari SG KDC
SG hdp
RDS Postgres
blue private rds subnet
ECR registry
VPG
S3
terraform
state
backup projects
S3 endpoint
S3 endpoint
CloudWatch CloudTrail IAM
blue VPChub VPCmgmt VPC
18. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe18
User Management & Kerberos Trust
Cloud DEV
MIT KDC
DEV.CAAP.AUDI.VWG
Cloud PRD
MIT KDC
PRD.CAAP.AUDI.VWG
FreeIPA
KDC
CAAP.AUDI.VWG
on-prem DEV
MIT KDC
DEV.AUDI.VWG
on-prem PRD
MIT KDC
PRD.AUDI.VWG
one-way trust
one-way trust one-way trust
LDAP
carsten: <dev>
carsten-adm: <dev, prd>
> kinit carsten@DEV.AUDI.VWG
> hdfs dfs –ls //ONPREMDEV:8020/user/carsten
> hdfs dfs –ls //CLOUDDEV:8020/user/carsten
> kinit carsten@CAAP.AUDI.VWG
> hdfs dfs –ls //CLOUDDEV:8020/user/carsten
> hdfs dfs –ls //CLOUDPRD:8020/user/carsten
> hdfs dfs –ls //ONPREMDEV:8020/user/carsten
ü
û
ü
ü
ü
ü
one-way trust
OS: local user mgmt
OS: local user mgmt
û
OS: FreeIPA user integration
OS: FreeIPA user integration
19. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe19
Lessons learned
20. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe20
With great freedom come great responsibilities …
• you can do anything you want right away!
• but you have to do it yourself: e.g. DNS, LDAP, …
• Automation pays off but requires initial invest
• Security must be considered from the start
Cloud
• Agile
• Strong involvement of product owner required
• Distributed teams costs lot of travelling time
• Different experts required: Cloud (AWS), Networking, DevOps, Hadoop, …
• Fluctuation: distribute knowledge
Project setup
21. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe21
Looking into the Future
22. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe22
Staging process for projects and platform
PRD
<projects>
feature A
<platform>
DEV
<platform>
feature B
<platform>
DEV & INT
<projects>
INT
<projects>
PRD
<projects>
DEV
<projects>
INT
<projects>
23. AUDI AG DataWorks Summit Barcelona 2019 - Audi‘s Hadoop Journey into the Hybrid Cloud – Carsten Herbe23
Technologies on the road map
• on demand nodes with GPU for machine learning
• S3/Glacier for „cold“ data
• Looking into Kafka as a Service (Confluent, AWS)
Cloud
• Data Steward Service for hybrid Data Governance
• Data Lifecycle Manager for data transfers and backup
Data Plane
• Using Docker under Yarn for more flexibility/functionality
• Hive3 Kafka Integration
HDP3.x
• on demand nodes with GPU for machine learning
• Data Science Workbench
Machine Learing