Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One option is to first persist the data into a data store and then use a traditional data visualisation solution to present the data. If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and then we show how the different blueprints can be implemented by mapping products onto the blueprints.
2. Guido Schmutz
Working at Trivadis for more than 22 years
Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data
Oracle Groundbreaker Ambassador & Oracle ACE Director
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
167th edition
3. Agenda
1. Motivation / Introduction
2. Stream Data Integration & Stream Analytics Ecosystem
3. Three Blueprints for Streaming Visualization
End-to-End Demo available here:
https://github.com/gschmutz/various-demos/tree/master/streaming-visualization
6. Keep the data in motion …
Data at Rest Data in Motion
Store
(Re)Act
Visualize/
Analyze
StoreAct
Analyze
11101
01010
10110
11101
01010
10110
vs.
Visualize
7. Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
8. Two Types of Stream Processing
(by Gartner)
Stream Data Integration
• focuses on the ingestion and processing of
data sources targeting real-time extract-
transform-load (ETL) and data integration
use cases
• filter and enrich the data
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business
Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
15. Demo: KSQL for Streaming ETL
CREATE STREAM tweet_s
WITH (KAFKA_TOPIC='tweet-v1', VALUE_FORMAT='AVRO', PARTITIONS=8) AS
SELECT id , createdAt , text , user->screenName
FROM tweet_raw_s;
CREATE STREAM tweet_raw_s WITH (KAFKA_TOPIC='tweet-raw-v1',
VALUE_FORMAT='AVRO');
SELECT id, lang, removestopwords(split(LCASE(text), ' ')) AS word
FROM tweet_raw_s
WHERE lang = 'en' or lang = 'de';
SELECT id, LCASE(hashtagentities[0]->text)
FROM tweet_raw_s
WHERE hashtagentities[0] IS NOT NULL;
16. Demo using Kafka Stack for Stream Data Integration
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data Flow
??
Filter: #javazone,#javazone2019,#java,#kafka,….
User: @apachekafka, @javazone
19. BP1: Fast datastore with regular polling from
consumer
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
20. BP1-1: Elasticsearch / Kibana
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
SOLR & Banana
21. BP1-2: InfluxDB / Grafana or Chronograf
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
Prometheus & Grafana
Druid & Superset
22. BP1-3: NoSQL & Custom Web
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
23. BP-1: Demo Redis NoSQL & Custom Web
https://opensky-network.org/
24. BP1-4: Kafka Streams Interactive Query & Custom App
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
Flink
…
25. BP2: Direct Streaming to the Consumer
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
26. BP2-1: Kafka Connect to Slack / WhatsApp
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Alternatives:
Twitter
SMS
…
28. BP2-2: Kafka to Tipboard (Dashboard Solution)
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Alternatives:
Dashing
Geckoboard
…
29. BP2-2: Demo Kafka to Tipboard (Dashboard Solution)
http://allegro.tech/tipboard/
30. BP2-3: Web Sockets / SSE & Custom Modern Web App
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Sever Sent Event (SSE)
31. BP3: Streaming SQL Result to Consumer
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
32. BP3-1: KSQL and Arcadia Data
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
34. BP3-2: KSQL with REST API to Custom Web App
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
35. BP3-2: Demo KSQL with REST API
curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’
-i http://analyticsplatform:8088/query --data '{
"ksql": "SELECT text FROM tweet_raw_s;",
"streamsProperties": { "ksql.streams.auto.offset.reset": "latest” }
}'
{"row":{"columns":["The latest The Naji Filali Daily! https://t.co/9E6GonrySE Thanks to
@Xavier_Porter1 @ClouMedia #ai #bigdata"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["RT @Futurist_Invest: This robot can copy your face! Creepy nn#SaturdayThoughts
#SaturdayMorning #creepy #bots #bot #AI #bigdata #robotics
#…"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["She’s back telling us all about why datathons are exciting now :) Catch her
while you can! @ARUKscientist @S_Bauermeister #bigdata #ARUKConf
https://t.co/Br484db5ut"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["Blockchain Competitive Innovation
Advantage"]},"errorMessage":null,"finalMessage":null}
36. BP3-3: Spark Streaming & Oracle Stream Analytics
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
38. Summary
BP1: Fast Store & Polling
• “classic” pattern
• Not end-to-end “data-in-
motion” -> “Data-at-rest”
before visualization
• Slight delay might not be
acceptable for monitoring
dashboard
• Can use full power of data
store(s) => NoSQL
• In-memory reduces overhead
BP2: Stream to Consumer
• minimal latency
• More difficult on “client side”
• good if stream holds directly
what should be displayed
• More difficult if data in
stream needs to be analyzed
before visualization
• No historical info available
BP3: Streaming SQL
• Minimal latency
• Power of SQL query engine
available for visualization
• possibility for “self-service”
style visualization
• Some analytics are more
difficult on streaming data
• No historical info available