Data integration and processing is a huge challenge in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry) due to monolithic systems and proprietary protocols. Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this integration end to end in a scalable, reliable and flexible way.
This blog post covers a high level overview about the challenges and a good, flexible architecture. At the end, I share a video recording and the corresponding slide deck. These provide many more details and insights.
Apache Kafka is the De-facto Standard for Real-Time Event Streaming. It provides
Open Source (Apache 2.0 License)
Global-scale
Real-time
Persistent Storage
Stream Processing
PCL4X allows vertical integration and to write software independent of PLCs using JDBC-like adapters for various protocols like Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, OPC-UA, Emerson, Profinet, BACnet, Ethernet.
Github example: https://github.com/kaiwaehner/iiot-integration-apache-plc4x-kafka-connect-ksql-opc-ua-modbus-siemens-s7
More details: http://www.kai-waehner.de/blog/2019/09/02/iiot-data-integr…and-apache-plc4x/
Video Recording: https://youtu.be/RWKggid25ds
Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
1. 1Confidential
Flexible and Scalable Integration in
Automation Industry / Industrial IoT
Kai Waehner
Technology Evangelist
contact@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
Kafka-Native End-to-End IIoT Data Integration and Processing
with Kafka Connect, KSQL and Apache PLC4X
2. 2
Agenda
1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning
2) Automation Industry and its Challenges
3) Architecture for End-to-End Integration from Edge to Data Center / Cloud
4) Apache Kafka as Event Streaming Platform
5) Apache PLC4X for Edge Integration
6) Example: Supply Chain Optimization at Scale in Real Time
3. 3
Agenda
1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning
2) Automation Industry and its Challenges
3) Architecture for End-to-End Integration from Edge to Data Center / Cloud
4) Apache Kafka as Event Streaming Platform
5) Apache PLC4X for Edge Integration
6) Example: Supply Chain Optimization at Scale in Real Time
4. 4
Business Digitalization Trends are Driving the Need to Process
Events at a whole new Scale, Speed and Efficiency
Mobile Cloud Microservices Internet of Things Machine Learning
The world has changed!
6. 6
Some IIoT use cases
Analytics
• Ingest data into cloud for analytics
• Reduce cost: Leverage open frameworks instead of paying very expensive licenses per machine
• Flexible integration (select data to ingest, flexible changes over time)
• Machine Learning / Data Science
Manufacturing
• Collect data from machines à Preprocess + monitoring to optimize assembly line and reduce cost
• Aggregate data from different machines / companies —> Leverage (and sell?) insights
• Sell services on top of machines —> Predictive maintenance (remote)
• Scale up (add more sites, add more data)
Production Robots
• Ingest, process and monitor large volumes data (where the proprietary monolith does not scale)
Smart Factories
• Monitor and manage the whole factory (at scale, in real time, flexible)
• Integration with legacy proprietary protocols and modern cloud-native technologies
7. 7
Agenda
1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning
2) Automation Industry and its Challenges
3) Architecture for End-to-End Integration from Edge to Data Center / Cloud
4) Apache Kafka as Event Streaming Platform
5) Apache PLC4X for Edge Integration
6) Example: Supply Chain Optimization at Scale in Real Time
8. 8
History of Automation Industry vs. Big Data and Cloud
Christofer Dutz (codecentric)
https://foss-backstage.de/sites/foss-backstage.de/files/2018-07/Revolutionizing%20Industrial%20IoT%20with%20Apache%20PLC4X.pdf
9. 9
Challenges in Automation Industry
IoT != IIoT
• IoT = Connected cars, smart home, … à Large scale, secure, scalable, open,
modern technologies
• IIoT = Slow, insecure, not scalable, proprietary
Legacy / Proprietary IIoT Technologies
• Usually incompatible protocols, typically proprietary
• Usually serial connections (very low latency, nanoseconds) - with TCP / UDP
wrapper around it to integrate with “external world”
• Siemens S7, Modbus, Beckhoff, Profinet, Allen Bradley, etc.
• OPC-UA (required machine update + license cost)
Product Lifecycles
• Long lifecycle (tens of years)
• Factories cost millions, no simple changes / upgrades
• Still using Windows 7 without Service Packs => Usability and security issues
• Mantra: “Stay with your well known vendor forever”
10. 10
Challenges in Automation Industry
Monoliths
• No scalability
• No extendibility
• No real failover (start your backup machine)
Missing Security Capabilities
• Security in software development == Authentication,
Authorization, Antivirus, SSL, SASL, Kerberos
• Security in automation industry == Safety
• “if you press the red button, the machine stops
immediately”
• Insecure by nature => No Authentication /
Authorization / Encryption
• Mantra: “Our factory building and network is secure,
no access from outside”
• Contradicts with “move to cloud and big data
analytics”
11. 11
PLC (Programmable Logic Controller)
• Started early 70’s
• Control of manufacturing processes
• Small grey box
• ~100 messages per second, stored to CSV file, Windows Share
• Limited operations: Read (90+%), Write, Subscribe, Call
Functions, List Resources
• High reliability control, ease of programming and process
fault diagnosis
• Hardwire à softwire
• Has Input / Sensors, Output / Actors
• Firmware (= operating system)
• Mechanism to load user programs
• Highly fragmented market
• S7 (Siemens), Beckhoff ADS, Modbus (Asia), Ethernet/IP, KNX,
Emerson DeltaV, Profinet, Allen Bradley, etc.
• State of the art in automation industry
12. 12
Example: Siemens S7 Communication
When communicating with S7 Devices
there is a whole family of protocols,
that can be used.
In general you can divide them
into Profinet protocols and S7
Comm protocols. The later are far
simpler in structure, but also far less
documented.
The S7 Comm protocols are generally
split up into two flavors: The
classic S7 Comm and a newer version
unofficially called S7 Comm Plus.
https://plc4x.apache.org/protocols/s7/index.html
13. 13
Trends: ~50% of industrial assets in factories will be connected by 2020
https://iot-analytics.com/5-industrial-connectivity-trends-driving-the-it-ot-convergence
14. 14
Trends: Evolution of Convergence between IT and Industrial Automation
https://iot-analytics.com/5-industrial-connectivity-trends-driving-the-it-ot-convergence
15. 15
How to get from legacy, proprietary to cloud, big data, machine learning?
16. 16
Costly and inflexible legacy Integration between IIoT and other Systems
ModbusS7
Siemens
Integration
Middleware
Monolith
Schneider Electric
Integration
Middleware
Monolith
Integration
Middleware
18. 18
Agenda
1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning
2) Automation Industry and its Challenges
3) Architecture for End-to-End Integration from Edge to Data Center / Cloud
4) Apache Kafka as Event Streaming Platform
5) Apache PLC4X for Edge Integration
6) Example: Supply Chain Optimization at Scale in Real Time
19. 19
?
IIoT Architecture (High Level)
Kafka BrokerKafka BrokerStreaming
Platform
Connect
w/ MQTT
connector
GatewayDevicesDevicesDevicesMachine
Sensor Analytics
(Real Time)
Predictive
Maintenance
(Near Real Time)
Machine Learning
(Batch)
Edge Data Center / Cloud
How to integrate and process data at scale and reliable?
20. 20
Vendor-Neutral IoT Architectures across Edge, On Premise and Multi-Cloud
On-Premise / Edge
Deploy on bare-metal, VMs,
containers or Kubernetes in your
datacenter with Confluent Platform
and Confluent Operator
Public Cloud
Implement self-managed in the public
cloud or adopt a fully managed service
with Confluent Cloud
Hybrid Cloud
Build a persistent bridge between
datacenter and cloud with
Confluent Replicator
Confluent
Replicator
VM
SELF MANAGED FULLY MANAGED
21. Data Lake
Batch
Analytics
Event
Streaming
Platform
Batch
Integration
Real Time Pre-
processing
Machine Sensors
Streaming Platform
Other Components
Real Time
Processing
(6b) All Data
(7) Potential Defect
(3)
Read Data
Optimization
/ Analytics
(5)
Deploy
Optimization
Model
(8b) Alert Person (e.g. Mobile App)
(2)
Preprocess
Data (6a) Consume machine data
Model
Standard
based
Integration
(8a)
Stop Machine
(1)
Ingest Data
Real Time Edge
Computing
Model Lite
Real Time App
Model Server
RPC
PLC Proprietary
based
Integration
Standard
Interface
Proprietary
Interface
(9) Manual user-based analytics
and reporting to find insights
and improve real time process
22. 22
Agenda
1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning
2) Automation Industry and its Challenges
3) Architecture for End-to-End Integration from Edge to Data Center / Cloud
4) Apache Kafka as Event Streaming Platform
5) Apache PLC4X for Edge Integration
6) Example: Supply Chain Optimization at Scale in Real Time
23. 23
The beginning of a new Era
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
The first use case. This is why Kafka was created!
25. 25
● Global-scale
● Real-time
● Persistent Storage
● Stream Processing
Edge
Cloud
Data LakeDatabases
Datacenter
IoT
SaaS AppsMobile
Microservices Machine
Learning
Apache Kafka
Apache Kafka: The De-facto Standard for Real-Time Event Streaming
26. 26
Apache Kafka at Scale at Tech Giants
> 4.5 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
27. 27
Confluent - Business Value per Use Case
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Business
Value
Decrease
Costs
(save
money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved
in-car experience: Audi
Customer 360
Simplifying Omni-channel Retail at
Scale: Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log
aggregation, Splunk
replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka
Streams: Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security
and Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$
Example Case Studies
(of many)
34. 34
Kafka Streams
● No separate processing cluster required
● Develop on Mac, Linux, Windows
● Deploy to containers, VMs, bare metal, cloud
● Powered by Kafka: elastic, scalable, distributed,
battle-tested
● Perfect for small, medium, large use cases
● Fully integrated with Kafka security
● Exactly-once processing semantics
● Part of Apache Kafka
KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic");
KTable<Windowed<User>, Long> viewsPerUserSession = pageViews
.groupByKey()
.count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views");
https://docs.confluent.io/current/streams/
Write standard Java apps and microservices
to process your data in real-time
35. 35
KSQL: Enable Stream Processing using SQL-like Semantics
Leverage Kafka Streams API
using simple SQL commands
KSQL server
Engine
(runs queries)
REST API
CLIClients
Confluent
Control Center
GUI
Kafka Cluster
Use any programming language
Connect via Control Center UI,
CLI, REST or deploy in headless
mode
36. 36
streams
The streaming SQL engine for Apache Kafka
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
Apache Kafka library to write
real-time applications and
microservices in Java and Scala
confluent.io/product/ksql
Confluent KSQL
You write only SQL. No Java, Python, or
other boilerplate to wrap around it!
Event Transformation with Stream Processing
37. 37
Kafka Connect
● Centralized management and configuration
● Support for hundreds of technologies including
RDBMS, Elasticsearch, HDFS, S3
● Supports CDC ingest of events from RDBMS
● Preserves data schema
● Fault tolerant and automatically load balanced
● Extensible API
● Single Message Transforms
● Part of Apache Kafka
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
"table.whitelist": "sales,orders,customers"
}
https://docs.confluent.io/current/connect/
Reliable and scalable integration of Kafka with other systems
38. 38
Connect External Data Sources and Sinks with Connectors
SOURCES SINKS
CDC
Connectors developed and supported by Confluent, partners and the open source community available on
confluent.io/hub
39. 39
IoT Integration with Kafka Connect, MQTT and REST Proxy
Video and Slides:
https://www.confluent.io/kafka-summit-sf18/processing-iot-data-from-end-to-end
40. 40
Native, decoupled Integration between IIoT and other Systems
ModbusSiemens
S7
Siemens
S7
Siemens
S7
Modbus Modbus Modbus
Kafka Connect Kafka Connect
Siemens
S7
?
41. 41
Agenda
1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning
2) Automation Industry and its Challenges
3) Architecture for End-to-End Integration from Edge to Data Center / Cloud
4) Apache Kafka as Event Streaming Platform
5) Apache PLC4X for Edge Integration
6) Example: Supply Chain Optimization at Scale in Real Time
42. 42
Apache PLC4X
• Top Level Apache project
• PLC 4 (for) X (anything)
• Goal: Open up PLC interfaces to outside world
• Vertical integration
• Write software independent of PLC
• JDBC-like Adapters for various protocols
https://plc4x.apache.org/
44. 44
Native, decoupled Integration between IIoT and other Systems
ModbusSiemens
S7
Siemens
S7
Siemens
S7
Modbus Modbus ModbusSiemens
S7
Kafka Connect
45. 45
One more thing à PLC4X vs. OPC-UA
• Open standard
• All the pros and cons of an open standard
(works with different vendors; slow adoption;
inflexible, etc.)
• Often poorly implemented
• Requires app server on top of PLC
• Every device has to be retrofitted with the
ability to speak a new protocol and use a
common client to speak with these devices
• Often overengineering for just reading the data
• Activating OPC-UA support on existing PLCs
greatly increases the load on the PLCs
• With licensing cost for every machine
• Open source framework (Apache 2.0 license)
• Provides unified API by implementing drivers
for communicating with most industrial
controllers in the protocols they natively
understand
• No need to modify existing hardware
• No increased load on the PLCs
• No need to pay for licenses to activate OPC-UA
support
• Drivers being implemented from the specs or
by reverse engineering protocols in order to be
fully Apache 2.0 licensed
• PLC4X adapter for OPC-UA available -> Both
can be used together!
46. 46
Agenda
1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning
2) Automation Industry and its Challenges
3) Architecture for End-to-End Integration from Edge to Data Center / Cloud
4) Apache Kafka as Event Streaming Platform
5) Apache PLC4X for Edge Integration
6) Example: Supply Chain Optimization at Scale in Real Time
47. Spark
Notebooks
(Jupyter)
Kafka
Cluster
Kafka
Connect
KSQL
Machine Sensors
Kafka Ecosystem
Other Components Real Time
Kafka Streams
Application
(Java / Scala)
(6b) All Data
(7) Potential Defect
(3)
Read Data
TensorFlow I/O
TensorFlow
(5)
Deploy Model
(2)
Preprocess
Data (6a) Consume machine data
TensorFlow
File
HTTP
MQTT
ROS
(8a)
Stop Machine
(1)
Ingest Data
Real Time Edge
Computing
(C / librdkafka)
TensorFlow Lite
Real Time Kafka
App
TensorFlow
Serving
HTTP /
gRPC
(4)
Train Model
PLC
Beckhoff
S7
Modbus
Allen Bradley
OPC-UA
PLC4X
Connector
Kafka Connect
Standard
Interface
Proprietary
Interface
(8b) Alert Person (e.g. Mobile App)
(9) Manual user-based analytics
and reporting to find insights
and improve real time process
49. Planners
forecast long
term schedule
Production
begins
IOT data from
production:
inventories,
manufacturing
machines,
yield metrics
Production
forecast
Forecasted
production -
plan diffs
Re optimize
plan based on
actuals
Change orders
to supply
chain:
inventory,
manufacturing
schedules
Change
operational
characteristics
: plant 223
needs new Al
extruder
Customer
delivery SLAs:
actuals vs.
plan
Streaming analytics using Confluent
Batch analytics using other frameworks
Physical operations
UI UI UIUI
(Reference use case implemented with our partner Expero)
50. Planners
forecast long
term schedule
Production
begins
IOT data from
production:
inventories,
manufacturing
machines,
yield metrics
Production
forecast
Forecasted
production -
plan diffs
Re optimize
plan based on
actuals
Change orders
to supply
chain:
inventory,
manufacturing
schedules
Change
operational
characteristics
: plant 223
needs new Al
extruder
Customer
delivery SLAs:
actuals vs.
plan
UI UI UIUI
Kafka
Connect
+
PLC4X
Connector
Machine
Sensors
Kafka
Cluster
KSQL
Tensor
Flow
Kafka
Connect
Notebooks
(Jupyter)
Spark
Real
Time
Kafka
App
Streaming analytics using Confluent
Batch analytics using other frameworks
Physical operations
TensorFlow
Serving
(Reference use case implemented with our partner Expero)
51. 51
Supply Chain Optimization in Real Time at Scale
Slides and Video Recording:
http://www.kai-waehner.de/blog/2019/08/23/apache-kafka-machine-learning-for-real-time-supply-chain-iiot-opcua-modbus/
53. 53
Confluent Platform
The Event Streaming Platform Built by the Original Creators of Apache Kafka®
Operations and Security
Development & Stream Processing
Apache Kafka
Confluent Platform
Support,Services,
Training,&Partners
Mission-Critical Reliability
Complete Event
Streaming Platform
Freedom of Choice
Datacenter Public Cloud Confluent Cloud
Self-Managed Software Fully Managed Service
54. 56
Confluent Platform – Benefits for IoT Projects
• Based on open source and de facto standards for IoT projects
• Low license / subscription costs for Confluent support / services / training (compared to traditional IoT vendors + their products)
• Spend budget for consulting to realize the project successfully
• Mission critical deployments at large scale in various industries
• Automotive, Manufacturing, Logistics, Oil&Gas, Retail, Telco, …
• Flexible architecture
• Lightweight infrastructure footprint on commodity hardware
• Pick what you need
• Deploy where you want
• Complementary to other frameworks, technologies (e.g. Siemens MindSphere, Cisco Kinetic) and cloud services (e.g. Google Cloud IoT)
• Customize and build for the specific customer use case
• Battle-tested at large scale
• Event Streaming Platform for real time integration and processing (plus integration to batch, file and other communication protocols)
• Security and reliability as core concepts
• Elastic scalability, start small and grow to extreme scale easily
• Partner (open source) technologies for specific integrations (like HiveMQ or PLC4X)
• Integration with any legacy and modern technology
• IoT standards like MQTT or OPC-UA
• Legacy and proprietary IIoT protocols like Modbus, Siemens S7, Beckhoff, Allen Bradley, etc.
• Modern technologies like S3, HDFS, MongoDB, etc.
• Modern applications (business services like Salesforce and IoT solutions like Siemens MindSphere)
55. 57
Confluent and IoT Platform Solutions
Kafka
Cluster
Siemens
MindSphere
KSQL
Machine Sensors
File
HTTP
MQTT
ROS
PLC
Beckhoff
S7
Modbus
OPC-UA
“you-name-it”
PLC4X
Connector
Kafka Connect
Azure
IoT Hub
Framework or solution?
Or both as complementary technologies?
S7 PLC