The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.
1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Architecture of Big Data
Solutions
Guido Schmutz
Frankfurt, 13.12.2017
@gschmutz guidoschmutz.wordpress.com
2. Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Architektur of Big Data Solutions
3. Agenda
1. Introduction
2. Big Data & Fast Data Reference Architectures
3. Continuous Streaming Data Ingestion
4. Big Data & Cloud
5. Microservices Architecture
6. Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
5. Big Data Definition (4 Vs)
+ Time to action ? – Big Data + Real-Time = Stream Processing
Characteristics of Big Data: Its Volume, Velocity
and Variety in combination
Architektur of Big Data Solutions
6. Architektur von Big Data Lösungen
Enterprise Data
Warehouse
ETL / Stored
Procedures
Data Marts /
Aggregations
Location
Social
Clickstream
Segmentation & Churn
Analysis
BI Tools
Marketing Offers
Billing &
Ordering
CRM / Profile
Marketing
Campaigns
Architektur of Big Data Solutions
7. Traditional Flow Diagram - Challenges
Enterprise Data
Warehouse
ETL / Stored
Procedures
Data Marts /
Aggregations
Location
Social
Clickstream
Segmentation & Churn
Analysis
BI Tools
Marketing Offers
Billing &
Ordering
CRM / Profile
Marketing
Campaigns
Limited
Processing
Power
Does not
model easily
to traditional
database
schema
Limited
Processing
Power
Storage
Scaling
very
expensive
Based on
sample /
limited data
Loss in
Fidelity
Other /
New Data
Sources
High
Voume
and
Velocity
Architektur of Big Data Solutions
8. Big Data to the rescue? Why is a structuring /
architecture important?
Architektur of Big Data Solutions
9. Why talk about Big Data Architectures?
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the
right/matching technologies
Architektur of Big Data Solutions
10. Big Data & Fast Data Reference
Architectures
Architektur of Big Data Solutions
11. Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Big Data Architecture
BI Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search / Explore
Online & Mobile
Apps
Search
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
12. Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Big Data Architecture - Hadoop
BI Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search / Explore
Online & Mobile
Apps
Search
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
13. Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Big Data Architecture - Spark
BI Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search / Explore
Online & Mobile
Apps
Search
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
14. Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BI Tools
Enterprise Data
Warehouse
Event
Hub
SQL
Search / Explore
Online & Mobile
Apps
Search
Data Flow • Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
15. Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BI Tools
Enterprise Data
Warehouse
Event
Hub
SQL
Search / Explore
Online & Mobile
Apps
Search
Data Flow • Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
Architektur of Big Data Solutions
16. Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub for handling streaming data
BI Tools
Enterprise Data
Warehouse
Event
Hub
SQL
Search / Explore
Online & Mobile
Apps
Search
Data Flow • Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
high latency
17. “Data at Rest” vs. “Data in Motion”
Architektur of Big Data Solutions
Data at Rest Data in Motion
18. Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Processing
Cluster
Streaming Analytics Architecture
BI Tools
Enterprise Data
Warehouse
Event
Hub
Search / Explore
Online & Mobile
Apps
Search
Data Flow Data Flow
Results
• Low Latency Processing
• Alerting
• ”Real-Time” Dashboard
Stream Analytics
Reference /
Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
19. Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Processing
Cluster
BI Tools
Enterprise Data
Warehouse
Event
Hub
Search / Explore
Online & Mobile
Apps
Search
Data Flow Data Flow
Results
• Low Latency Processing
• Alerting
• ”Real-Time” Dashboard
Stream Analytics
Reference /
Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
Streaming Analytics Architecture – Open Source
20. Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Processing
Cluster
Streaming Analytics Architecture
BI Tools
Enterprise Data
Warehouse
Event
Hub
Search / Explore
Online & Mobile
Apps
Search
Data Flow Data Flow
Results
• Low Latency Processing
• Alerting
• ”Real-Time” Dashboard
Stream Analytics
Reference /
Models
Dashboard
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
low latency without keeping raw data/events
21. Hadoop Clusterd
Hadoop Cluster
Event Processing
Cluster
Keep raw event data
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event
Hub
Event
Hub
Event
Hub
File Import / SQL Import
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Weather
Data
22. “Lambda Architecture” for Big Data
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Event
Hub
Event
Hub
Event
Hub
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
Hadoop Clusterd
Hadoop Cluster
Event Processing
Cluster
Results
Stream Analytics
Reference /
Models
Dashboard
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
23. “Kappa Architecture” for Big Data
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
Hadoop Clusterd
Hadoop Cluster
Event Processing
Cluster
Results
Stream Analytics
Reference /
Models
Dashboard
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
24. Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
“Unified Architecture” for Big Data
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
26. Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Continuous Data Ingestion
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
27. Continuous Streaming Data Ingestion
DB Source
Big DataLog
Stream
Processing
IoT Sensor
Event Hub
Topic
Topic
REST
Topic
IoT GW
CDC GW
Connect
CDC
DB Source
Log CDC
Native
IoT Sensor
IoT Sensor
31
Dataflow GW
Topic
Topic
Queue
Message GW
Topic
Dataflow GW
Dataflow
TopicREST
31
File Source
Log
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
28. Continuous Streaming Data Ingestion
Architektur of Big Data Solutions
SQL Polling
Change Data Capture
(CDC)
File Polling
File Stream (File Tailing)
File Stream (Appender)
Sensor Stream
29. Continuous Streaming Data Ingestion
DB Source
Big DataLog
Stream
Processing
IoT Sensor
Event Hub
Topic
Topic
REST
Topic
IoT GW
CDC GW
Connect
CDC
DB Source
Log CDC
Native
IoT Sensor
33
Dataflow GW
Topic
Topic
Queue
Message GW
Topic
Dataflow GW
Dataflow
TopicREST
33
File Source
Log
Log
Log
Social
Native
Topic
Topic
Architektur of Big Data Solutions
30. Big Data & Cloud
Architektur of Big Data Solutions
31. Data Locality vs. Compute/Storage Separation
Data Local Compute Separate Compute and Storage
Worker #1
Disk
Processing
Master Node
Worker #2
Disk
Processing
Worker #3
Disk
Processing
Network
Storage
Disk Disk Disk
Compute #1
Processing
Compute #2
Processing
Compute #3
Processing
Network
Master Node
Network
Separation of compute
and storage – the
fundamental difference
• store data in Object
Storage instead of DFS
• bring up Compute nodes
only for data processing
• multiple workloads on
separate clusters can
access same data
Architektur of Big Data Solutions
32. A new way to Manage Big Data
Big Data Traditional
Assumptions
Bare-metal
Data Locality
HDFS on local disks
Big Data
A New Approach
Containers and VMs
Compute and storage
separation
Shared storage
Benefits and Value
Big-Data-as-a-Service
Agility and cost savings
Faster time-to-insights
Architektur of Big Data Solutions
33. Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Event
Hub
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Architektur of Big Data Solutions
Big Data & Cloud - Amazon WebServices (AWS)
35. Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Asynchronous Microservice Architecture
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Cluster
Microservice State
{ }
API
Stream Analytics Cluster
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Service
Architektur of Big Data Solutions
36. Big Data Ecosystem – many
choices sorted!
Architektur of Big Data Solutions
37. Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions
38. Big Data Ecosystem – many choices sorted!
Architektur of Big Data Solutions