Presented at ServerlessConf NYC 2016.
iRobot is transitioning the cloud infrastructure for our IoT system to AWS with the goal of using zero EC2 instances. I'll cover our general architecture (AWS IoT, API Gateway, Lambda, etc.), our CloudFormation+Lambda deployment strategy, and the hardest patterns to make serverless on AWS.
2. Transition to the cloud:
“Treat servers like cattle, not pets”
(traces back to Bill Baker at Microsoft)
Transition to serverless:
Treat servers like roaches
5. • Consumer robotics
– Roomba (vacuuming)
– Braava (hard floor care)
– Other (Looj, Verra)
– Create
• Global
– Over 100 countries
– Only 40% North America
– Antarctica?
• High volume
– 2.4 million robots last year
• Me
– Cloud Robotics Research Scientist
– R&D, cloud architecture, IoT/smart home
– Background: UAVs, surgical robots, physics, theater, …
iRobot + me
7. • IoT/Smart Home
– From the consumer’s perspective, the
cloud is sometimes hidden in
(consumer) IoT
• This is not intuitive
• Smart Home is a better term
– The consumer may never interact with
an IoT device from outside their home
– The cloud may enable functionality that
the consumer only uses through direct
physical interaction
– This is especially true of robots
• Enterprise
– Global
– Scalable
– Secure
– Auditable
iRobot’s needs
8. • Cloud infrastructure for our customer-
facing system is undifferentiated heavy
lifting for us
– On the other hand, big data may be
a key part of our business, so cloud
infrastructure in that area is more
relevant for us
• This is where serverless architecture
comes in
– Hurray!
– Development limited mostly to
business logic
– Accept inefficiencies in system
design due to available service
functionality in exchange for vastly
reduced ops complexity
iRobot’s needs
10. • Users, apps, robots
– Users to apps: one to many
– Users to robots: many to many
– “Accountless apps”
System architecture
11. • Users, apps, robots
– Users to apps: one to many
– Users to robots: many to many
– “Accountless apps”
• Local connections
• Triangle of trust
• Two entry points: AWS IoT and API
Gateway
System architecture
12. • Robots
– No AWS credentials
– Certificates --> can only authenticate with
AWS IoT
• Not even API Gateway custom auth :-(
• BYO Cert (mfg-ing logistics)
• Use presigned URLs for e.g. S3 get/put
– OTA firmware update
• Apps
– Cognito identity --> AWS credentials
– “Accountless” functionality (UX driven)
– Uses the triangle of trust
• Admin console
– ADFS sign in
– Served through separate API Gateway
• Protip: single-page web app, files
served thru API GW using S3 service
proxying, API calls using relative paths
--> client always in sync with backend
Cloud architecture for IoT
13. • Computation: Lambda and IoT Rules
• Lots of SQS queues
• Storage: DynamoDB, IoT Shadows
• Security: Rube Goldberg WAF for API Gateway
• The *: Elasticsearch and RDS
Cloud architecture
14. • Enterprise needs
– Scale
• No problem!*
• Lambda limits are the most
worrying, CloudWatch Events
limits are the most annoying
– Mostly because of SQS
– Global
• Actually the biggest downside
to serverless
– Regional availability
– Vendor lock-in
– Security
• WAF
• CloudWatch
• 3rd party tools
– Auditability
• CloudTrail
16. • Serverless IT and Ops
– Infrastructure as code
– Build artifacts
– Inspectability
– Deploy from dev machine or test
server
– Deploy from working dir or git
commit
– Auditability
• Security
• Billing
17. • CloudFormation is great for deployment.
Slower is ok for us.
– Use CloudFormation custom resources
to deploy and manage arbitrary
resources
• E.g., API Gateway + WAF
– Give CloudFormation some syntactical
sugar
• Still need to deploy and manage custom
resource Lambdas
• Still need to deploy artifacts into S3
– Lambda source code
– CloudFormation templates
– Etc.
Serverless ops
18. • Our deployment tool is named “cloudr”
• “clowder” is the collective noun for cats
• Builds Lambda source code
• Deploys artifacts to S3
• Creates/compiles CloudFormation
templates, injects S3 locations from
previous step
• Deploys and manages custom resource
Lambdas using hash of source as alias
– Uses our cfnlambda library
cloudr
Source
19. • Creates an application consisting of a set of microservices
– One stack per microservice
• CloudFormation template defined by user, with
syntactic sugar
• DynamoDB table for service discovery added
automagically, name injected into Lambda
functions
• Required and provided resources defined by the
user
– One stack for the application as a whole
• A custom resource for each microservice stack
• Cross-service policies created based on the
declared dependencies
• Service discovery tables populated from info
contained in this stack
cloudr
20. • How do we actually roll out updates?
• This is the biggest area where
serverless offerings are lacking
• With IaaS and lower-level PaaS, you
get lots of control
– Canary deployments
– Roll out behind the load balancer,
or set up a new load balancer
with a whole separate fleet
• What can we do serverless on AWS
today?
Serverless deployment
21. • Rolling out a deployment “behind the
load balancer” is impossible
• Canary deployments are impossible if
we update in place
• So how do we host multiple versions
simultaneously?
– For API Gateway, multiple
versions can coexist as separate
stages or separate APIs
– For IoT, no such luck
• One MQTT server per
account (in a region)
• Certificates can only exist in
one account (in a region)
• (╯°□°) ┻━┻
Serverless deployment
22. • Solution for IoT: topic prefix
• All rules for an instance of an
application listen on prefixed topics
• What about /$aws/ topics?
– Robot sets prefix in the shadow
– Rules on shadow switch on that
field
Serverless deployment
23. • How do you switch clients over?
• A separate global API Gateway for
service discovery
– Well-known url using custom
domain names
• Client service discovery returns three
items:
– API Gateway base url
– MQTT host
– MQTT topic prefix
Serverless deployment
24. • When an app wants to communicate
with a robot, how do we make sure it
talks to the same instance the robot is
talking to?
– Separate service discovery for
robots and apps
– Robot service discovery: where
should I be?
• Robot updates “where am I”
in the cloud
– App service discovery: where is
this robot?
– Quadrilateral of trust
• A third service discovery for app’s non-
robot-related calls
Serverless deployment
26. • The awesome
– Zero unmanaged EC2 instances
– Zero Elastic Beanstalk
applications
• The good
– Lambda service in isolation
• Scaling
• Development
• Testing
– API Gateway features
– AWS IoT
• BYO Certificates
• Rules Engine computation
• Pricing!
Conclusion
27. • The bad
– Deployment gets complicated
– We could get a lot of mileage of
MQTT “retain”
– IoT fleet operations
– WAF for API Gateway
– VPC support
• The ugly
– Lambda SQS integration
– IoT instances/certificate
limitations
Conclusion
28. • IoT is complicated
• Serverless is the way
– Development
– Deployment
– Operations
• iRobot’s solution: cloudr
– (Hopefully) will be open source
Conclusion