6. @jimmydahlqvist
Concept
• A network of physical device
• Collection of data from connected devices
• Wide range of use cases
• Rapidly growing, more and more devices being connected
• IIoT – Industry 4.0
7. @jimmydahlqvist
Scenario
• Thousands to millions of devices
• Unpredicatble traffic and varying load
• High speed real time data
• Asynchronous data processing
9. @jimmydahlqvist
AWS IoT Core
• Managed service to easily connect, register, and manage devices
• Robust security, authentication, encryption, and access control
• Scales to millions of devices
• Support standard protocol, MQTT and HTTP
• Event-based architecture
10. @jimmydahlqvist
IoT Core and X.509 Certificates
• Used to authenticate and secure connections
• Certificate based mutual authentication
• Unique certificate per device
• Policy based authorization, based on identities
• Built in Certificate Authority (CA)
• Support custom CA
11. @jimmydahlqvist
MQTT Broker and Topics
• Integrates with several AWS services
• Powerful rules engine
• Topics are used to route messages
• Hierarchical
• sensors/{sensor_id}/temperature
First Second Third
15. @jimmydahlqvist
IoT Policy
• Define permissions and access
• Attached to Things, Thing Types, or AWS IoT resources
• Support variables from Thing and Certificate
37. @jimmydahlqvist
Multi Region
• Allow devices to connect to closest region
• Design from start
• Just In Time Provisioning
• Copy certificates to all regions
38. @jimmydahlqvist
• Reliably capture data
• Managed services
• Powerful
Storage first
Service C
Service B
Service A
Data Transformation Service
Analytics Service
55. @jimmydahlqvist
I would say so!
• Several production environments
• Thousands to millions of devices
• Thousands of messages per second
• Vast amount of data analysed
59. Please, fill in the
session survey!
https://feedback.aws-communityday.de/#/feedback/483904
Notes de l'éditeur
I started working with AWS in 2015…. First ever was serverless and event-driven…..
Story:
A couple of years ago I started working on my first IoT project and it as become several more over the years
First 100 of devices
Latest millions
Iot Systems come in all shapes and forms and our architecture must be able to meet the demand when is come to scalability and cost.
The number of IoT systems grows for every day and the IoT space is one of the most growing areas at the moment.
Today we will take a look at Iot and one of several cloud architectures I have built over the year. The challenges we phased and what changes we did.
Today we will look into 4 areas.
Starting of with an introduction to IoT and AWS IoT services
We’ll take a look at a sample architecture, the challenges we phased
What changes was made when we decided to go event-driven and fully serverless
And finally I will leae you with some takeaways and thoughts.
Before we start, who am I?
My name is Jimmy Dahlqvist!!
Father of two daughter, afraid of nothing!!
IoT, Serverless, event-driven fantast
Serverless since 2016 – lambda Old
AWS Ambassador + Community Builder (cheer ??)
Day Job – Head of sigma – (cheer) collogues in audience
IoT…. Internet of things.
IIOT…. Industrial internet of things, also called industry 4.0
What is it? What is a thing? How do we connect them? What are the purpose of them all?
How do we build a efficient cloud architecture it?
Hopefully I can answer some of those questions today.
network of physical devices, vehicles, home appliances, and other items embedded with electronics, software, sensors, and connectivity, which enables these objects to connect and exchange data.
IoT enables businesses and organizations to collect data from a vast network of connected devices, analyse that data, and use the insights gained to optimize operations, reduce costs, and improve customer experiences.
IoT technology is rapidly growing, with more and more devices being connected to the internet every day.
IoT has a wide range of use cases, from simple home automation to complex industrial automation systems.
IIOT…. Industrial internet of things, industry 4.0
IIoT enables machines, sensors, and other devices to collect and exchange data, and can be used to improve operational efficiency, reduce downtime, and enable predictive maintenance. IIoT can also help businesses to optimize their supply chain management, reduce waste, and improve product quality.
Through out this talk this is the scanario we’ll keep in mind.
This is what we’ll try to build our architecture and IoT system for.
With that in mind, now let’s move over and talk about different components of AWS IoT.
There are so many different services in this collection so our focus will be on AWS IoT Core and it’s components.
Except core there are services like Greengrass that you run on your device, Device Defender for security and protection of devices, Fleetwise to manage and on-board vast amount of devices.
managed cloud service that makes it easy to connect, register, and manage IoT devices at scale. organizations can easily connect a wide range of devices, including sensors, gateways, and appliances, to the cloud and securely transmit data to and from them.
key features of AWS IoT Core is its robust security capabilities. It provides authentication, encryption, and access control to help ensure that only authorized devices and users can access the system and the data transmitted over it.
AWS IoT Core is highly scalable and can support millions of devices, making it ideal for large-scale IoT deployments.
AWS IoT Core supports standard protocols such as MQTT (Message Queuing Telemetry Transport) and HTTP.
The event-based architecture of AWS IoT Core enables organizations to build complex IoT applications that respond to real-time data and events. This architecture supports the use of rules and actions to automate actions based on incoming data, which can help organizations to respond quickly to changing conditions and improve operational efficiency.
AWS IoT Core uses X.509 certificates to authenticate and secure connections between IoT devices and the cloud.
X.509 certificates are digital certificates that use a standard format to encode public key information, identity information, and digital signatures. X.509 certificates are widely used for authentication and encryption in many security protocols, such as SSL/TLS, IPSec, and S/MIME.
In AWS IoT Core, X.509 certificates are used for mutual authentication, which means that both the device and the cloud need to authenticate each other using a unique certificate.
Each device is assigned a unique X.509 certificate, which is used to authenticate and authorize its connections to the cloud. The device's certificate is signed by a built-in Certificate Authority (CA) in IoT Core, which ensures the authenticity and integrity of the certificate.
IoT Core also supports custom CAs, which enable customers to use their own certificate infrastructure to issue and manage device certificates.
In addition to authentication, IoT Core uses X.509 certificates for authorization, which means that access to resources and services is based on the policies associated with the device's identity.
Policies in IoT Core define the permissions and restrictions for each device, based on its identity and attributes. Policies can be based on the device's certificate, its Thing Type, its attributes, or other contextual information.
X.509 certificates and policies enable a secure and scalable IoT ecosystem, where devices and cloud services can communicate with each other in a trusted and controlled way.
In summary, AWS IoT Core uses X.509 certificates for mutual authentication and authorization of IoT devices. Each device has a unique certificate, which is signed by a built-in or custom CA. Policies define the permissions and restrictions for each device, based on its identity and attributes. X.509 certificates and policies enable a secure and scalable IoT ecosystem.
MQTT (Message Queuing Telemetry Transport) is a lightweight messaging protocol that is widely used in IoT applications
MQTT broker is a server that acts as a central hub for message exchange between devices. The broker receives messages published by devices and routes them to the appropriate subscribers.
key features of MQTT is its use of topics to route messages between devices
MQTT topics are hierarchical,
For example, a topic like "sensors/{sensor_id}/temperature"
AWS IoT integrates with several AWS services, including Amazon S3, Amazon Kinesis, and Amazon DynamoDB
AWS IoT also includes a powerful rules engine that allows organizations to define rules and actions based on incoming data from IoT devices.
In summary, MQTT is a lightweight messaging protocol that enables efficient and reliable communication between devices and the cloud. The use of topics and a hierarchical structure enables devices to subscribe to specific messages based on their interests. AWS IoT integrates with several AWS services and includes a powerful rules engine that enables organizations to automate workflows and respond quickly to changing conditions.
What is an IoT Thing?
IoT Things represent physical or virtual devices that can communicate with other devices and cloud services over the internet. Things can be assigned a Thing Type, which defines their attributes and behaviors. Things can publish and subscribe to MQTT topics to send and receive data. IoT Things can also interact with other AWS services to enable storage, analytics, and processing of IoT data.
Before we can use a Thing and receive data from it, it must be registered in IoT Core.
The unique device certificates are provisioned and flashed onto the device during manufacturing. However we don’t want it to be registered directly. Instead we like to provision our devices as they connect, just-in-time.
Why would we like to use JIT?
What do we need for JIT?
It required the use of our own custom CA, we can’t use the AWS built in CA as these certs are created and feteched from IoT Core.
For this CA we attach what is know as a provision template that defines attributes and actions that should happen during the provisioning process.
As an extra security measure it’s good pratcice to use pr-provisioning hooks to validate the cert data and things.
So a step by step illustration for the provisioning process.
The thing connects to IoT Core. The CA is recognized and the template is used to start the prcess.
We invoke our lambda hook that can validate our thing and return OK/ NOK back.
We create our policy, register the certificate but doesn’t enable it, and create the Iot thing.
Next the policy is attached to the cert and the cert is attached to the thing.
Finally we enable the cert.
An IoT Core policy is a set of rules and permissions that defines how IoT devices and other entities can interact with AWS IoT Core resources and services. Policies in IoT Core are based on identities, which are represented by X.509 certificates
Policies can grant or deny access to MQTT topics
Policies can be attached to things, thing types, principals, or groups.
Policy language support variables from Thing and Certificate
Here is an example policy that would allow a device to publish to a topic that end with it’s ThingName.
By using simiiar conditions we can restrict topics that a device can subscribe to, this way we can enforce that devices can only subscribe or publish to topics that belomng to that device.
Sometimes devices need to interact with AWS services directly, that could be a video camera sending video to Kinesis video stream or a device uploadning data to S3.
The device can then use the Iam credentials endpoint and exchange the x.509 certificate for temporary iam credentials. This is similar to IAM Roles Anywhere, but this functionality has been around for longer time.
When devices send data we use topic based routing.
IoT rules are created and associated with an topic, the rule can the contain logic and conditions and if these are met the rule and associated targets are invoked.
Receiving data work abou the same but there are no rules that are invoked.
A device creates a subscription on a topic and when messages arrive on that topic the message is sent to the device
With this send / subscribe functionality we can build an MQTT based API where devices send api actions on one topic and then expects an answer to be published to a specific topic.
This is a very powerful way for devices to interact and perform actions on resources in the cloud.
For example, a device could send an API action that is want to upload data to S3. Instead of using IAM credentials endpoint and implement logic in the device, an MQTT api could respond with an pre-signed url that the device can upload to.
So that was the basics…..
Now let’s jumo into and start building our Cloud architecture.
We will start out with a fairly basic design, from a real world use case, and we’ll iterate over that design to improve it over time, where we finally arrive at out final version.
First version of cloud looked like this…
Device data -> IoT Core
IoT Core -> Rules -> Storage + Athena + Dynamo
IoT Core -> Rules -> Several SF business logic (thresholds, trends…)
API GW RESTful…
IoT rules as router –> We can only react on messages from the devices.
Each event –> one OBJECT in s3 –> Glue/Athena not optimized for that
Data written directly to storage (Storage First is good but…) –> Format dictated by the device –> need transform
Hard to extend –> Several services did same thing –> notification –> or needed to implement API
First improvement to this is to move to a more event-driven system. Even if Iot Core can be used to build out an event-driven system I often found it not to be optimal.
The theory I had that was going to improve the architecture was:
Remove IoT Rules as event router
Improve extensibility
Introduce Amazon EventBridge
If you listened the keynote…. Just adding the service icon doesn’t show the design descions. And we will talk about the reasons why.
But before we do any changes, just let us look at what an event can be defined as
In the Theory the plan was to use EB.
So why use EB then and not IoT Core? Well it come with several good things and some differences
Main difference is that EB-2-EB, services can subscibe and publish, not only devices., easy to extend
So with that improvement we removed the Topic routing and instead send all messages to an SQS queue.
The queue invokes a Lambda function that send the event / message to EB.
All services can then listen for the event and react to that.
BUT…..
Still not solved the S3 part…. Needed a way for the raw untouched data to be stored for later use.
So instead of SQS Kinesis was used as the service bewteen IoT and EB
That way Firehose can be set as consumer on kinesis as well and send data in batches to S3.
Lambda function to transport data – poisen pill
Promote Anahit re:Invent
COM308 | Serverless data streaming: Amazon Kinesis Data Streams and AWS Lambda
If we then expand on that simplified architecture to include more services for different purpose.
We have expanded on the analytics service, not only does it store data in S3, it also send data to an OpenSearch cluster that can be used for easy visualization.
QuickSight could be used with Athena to create advanced business dashboards
I have also introduced one new key services….. CLICK!
This data transformation service.
Why? Becasuse this makes it easy to use different data formats from thde devices but all internal services use an different format.
We can also augment the data and add more information in the event that goes to EB.
That mean that this service pick upp all data events from IoT, transform to an internal format, repost it onto EB and all internal service uses this.
Of course the service is using StepFunction and in that as many SDK actions as possible.
Why write code if you don’t need to right ?
In this small example we….. And if you have seen my BBQ talk before you might recognize this
After a couple of iteration this is the architecture I’m using…. It has a couple of advantages
Reason You build an serverless and eventdriven architecture.
Loosely coupled services
Scale and fail independently
Cost effective – pay for what you use
Extensibility – easy and fast to extend
HA – built in
That has been a lot of information…
Now I like to leave you with some take-aways and thoughts
Latency based record
Detect new certificate
Store in DynamoDB
Stream new records
Eventbridge -> StepFunctions –> Copy
Let’s take a closer look at architecture pattern that I tend to use all the time.
Actually even used it in my very first architecture in 2016. That time we integrates API Gateway directly with SQS:
And that of course is:
CLICK! -> Animate
Create reliable way to capture data – prevent data loss
Use managed services
Very powerful when incoming data doesn’t require instant transformation
MQTT and HTTP messaging pricing
Up to 1 billion messages: $1.00 (per million messages)
Rules Engine pricing
Rules initiated: $0.15 (per million rules triggered / per million actions applied)
Actions applied: $0.15 (per million rules triggered / per million actions applied)
EventBridge Choreographs
Four bounded contexts represented by each service (orchestrated)
Stepfunctions
The service has unique business logic that need to be implemented and happen in a certain order, when a event is invoked.
Don’t use lambda to transport data, it should be used to transform data.
Before there were no other option, unfortunately. But nw we have EventBridge pipe……
So the next change in this will be to remove the Lambda function and move over to EB Pipes
Be kind to services around you.
In an event-driven and serverless system you can scale out massivly. IF you call downstream services, event 3rd party, try and not overwhelm them. If they fail don’t retry over and over agin.
Use exponantial backof and patterns like circuit breaker
Some tjoughts on StepFunctions
I use SDK integration as much as possble….
As I said, why write code.
But choose carefully
Be aware of the different workflow types, how they behave, and how you pay for them
Selecting the wrong one can become expensive
And also…. Remember that!
StepFunctions Express Workflow has an invocation model of “At least once” which means that it’s possible that your workflow get invoked twice.
In dev and test it’s unlikely that it will happen, even in my small project I never saw it. But if you run it in a large enough scale it will happen.
So make sure your workload is idempotent and can handle it.
We learned this the hard way….. With data being written twice.
In the normal small case you will not see this, but when you run millions of events it will happen
When building on EventBridge I would recommend that you create subscriptions.
With that I mean that you create one event rule for one target. Even if EB support up to 5 targets per rule I still say this should be a 1-2-1 mapping.
Why?
If you create one rule with multiple targets you will create a coupling on the event filter. And if you hit the 5 target limit what are you suppose to do then? Create a second rule with the same filter and start adding new targets?
What if you need to update the filter? You will impact several targets that might not be what you wanted to do in the first place, leaving you to start breaking things apart again.
So instead we create subscriptions where we create the coupling on the event it self, and we set one target for one rule.
It’s easy to add more target, just create a second rule and add the target to it. And the filter in every rule can change without affecting any other target.
However…. This of course can lead to several rules having the same filter and that could create problems on it’s own.
But in my opinion it’s still better to create subscriptions and deal with a rule explosion.
there is a default EB bus in the region already existing that AWS services post events to.
Should you use that as your bus in your application? My recommendation is NO. Leave that bus to be used by AWS services and create your own custom buses for you application.
The reason for that is that in that case will become easier to extend and ass busses later. I have seen the default bus being used and the mess it was when then moving to a custom bus.
So can’t repeat it enough… Leave the default bus alone! Create custom buses! It’s like using the default VPC…. We don’t do that either!
Talk about 2 types of patterns, centralized and de-centralized (Single / multi Bus)
Single and Multi Account
All the patterns allow us to decouple the publisher from the subscriber. The service that publish doesn’t really need to know who is listening in the other end.
Look at centralized advantages and disadvantages
Advantagesallow you to manage all routing, security and policies in one place, single deployment)
All routing centralized, concentrating all communication to a single event bus (
Enables central management of resources
Allows you to easy integrate applications with few changes.
Disadvantages
As number of integrations grow so does the complexity and resource utilization. Can become a Single point of failure.
All routing is centralized….
Prevents autonomy
Single point of failure
In a decentralized approach routing is spread across multiple event buses and the publisher often becomes the logical owner of that bus.
The service owns the mechanism to distribute the events.
Even if more buses are more work from a operational approach it enables autonomy and doesn’t become a single point of failure
On the other hand, designing distributed systems, managing all resources, can become a huge challenge if not done properly from the start. Applying this as an afterthought is almost impossible.
So the time to get started might be longer, integration of new services and applications require more change and take more time.
Characteristics……
Not to forget!!
It powers the most amazing system in the whole world!!