Batch and streaming visualization in big data reference architecture, architecture blueprints for streaming visualization, implementations of the blueprints in a fast data solution.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. Therefore, one can use a dedicated Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
1. @gschmutz
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Streaming Visualization
Guido Schmutz
Big Data Technology Summit 2019 – 27.2.19
@gschmutz guidoschmutz.wordpress.com
2. @gschmutz
Agenda
1. Introduction
2. Stream Data Integration & Stream Analytics Ecosystem
3. 3 Blueprints for Streaming Visualization
Streaming Visualization
End-to-End Demo available here:
https://github.com/gschmutz/various-demos/tree/master/streaming-visualization
3. @gschmutz
Guido Schmutz
Working at Trivadis for more than 22 years
Oracle Groundbreaker Ambassador & Oracle ACE Director
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
146th edition
Streaming Visualization
5. @gschmutz
Data Value Chain – Why Data Streaming?
Milliseconds
• Place Trace
• Serve ad
• Enrich Stream
• Approve Trans
Hundredths of Seconds
• Calculate Risk
• Leaderboard
• Aggregate
• Count
Second(s)
• Retrieve Click
Stream
• Show orders
Minutes
• Backtest algo
• BI
• Daily Reports
Hours
• Algo discovery
• Log analysis
• Fraud pattern match
Streaming Visualization
6. @gschmutz
Keep the data in motion …
Data at Rest Data in Motion
Store
(Re)Act
Visualize/
Analyze
StoreAct
Analyze
11101
01010
10110
11101
01010
10110
Streaming Visualization
vs.
Visualize
7. @gschmutz
Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture Modern Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Streaming Visualization
8. @gschmutz
Two Types of Stream Processing
(by Gartner)
Streaming Visualization
Stream Data Integration
• focuses on the ingestion and processing of
data sources targeting real-time extract-
transform-load (ETL) and data integration
use cases
• filter and enrich the data
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business
Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
13. @gschmutz
Streaming Analytics: KSQL
STREAM and TABLE as first-class
citizens
• STREAM = data in motion
• TABLE = collected state of a stream
Stream Processing with zero coding
using SQL-like language
Built on top of Kafka Streams
Interactive (CLI) and headless (command
file)
ksql> CREATE STREAM order_s
WITH (kafka_topic=‘order',
value_format=‘AVRO');
Message
----------------
Stream created
ksql> SELECT * FROM order_s
WHERE address->country = ‘Switzerland’;
...
trucking_
driver
Kafka Broker
KSQL Engine
Kafka Streams
KSQL CLI Commands
Streaming Visualization
16. @gschmutz
BP-1: Fast datastore with regular polling from consumer
Streaming Visualization
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
17. @gschmutz
BP-1: Elasticsearch / Kibana
Streaming Visualization
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
Influx DB & Grafana
Prometheus & Grafana
Druid & Superset
…
18. @gschmutz
BP-1: Redis NoSQL & Custom Web
Streaming Visualization
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
20. @gschmutz
BP-1: KafkaStreams Interactive Query & Custom Web App
Streaming Visualization
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
21. @gschmutz
BP-2: Direct Streaming to the Consumer
Streaming Visualization
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
22. @gschmutz
BP-2: Kafka Connect to Slack / WhatsApp / Twitter / …
Streaming Visualization
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
26. @gschmutz
BP-2: Web Sockets / SSE & Custom Modern Web App
Streaming Visualization
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Sever Sent Event (SSE)
27. @gschmutz
BP-3: Streaming SQL Result to Consumer
Streaming Visualization
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
28. @gschmutz
BP-3: KSQL and Arcadia Data
Streaming Visualization
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
30. @gschmutz
BP-3: KSQL with REST API to Custom Web App
Streaming Visualization
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
31. @gschmutz
BP-3: Spark Streaming & Oracle Stream Analytics
Streaming Visualization
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
33. @gschmutz
Summary
Streaming Visualization
BP-1: Fast Store &
Polling
• “classic” pattern
• Not end-to-end “data-in-
motion” -> “Data-at-rest”
before visualization
• Slight delay might not be
exceptable for
monitoring dashboard
• Can use full power of
data store
• In-memory option
• NoSQL databases
BP-2: Stream to
Consumer
• minimal latency
• More difficult on “client
side”
• good if stream holds
directly what should be
displayed
• More difficult if data in
stream needs to be
analyzed before
visualization
• No historical info
available
BP-3: Streaming SQL
• Minimal latency
• Power of SQL query
engine available for
visualization
• possibility for “self-
service” style
visualization
• Some analytics are more
difficult on streaming
data
• No historical info
available
34. @gschmutz
Technology on its own won't help you.
You need to know how to use it properly.
Streaming Visualization