4. Motivation
Deploying R in production is not very easy but it is a
good problem to have.
There are diverse approaches towards creating deployment
packages for AWS Lambda and creating an API but which one is
the best fit for R? This presentation aims to compare currently
available approaches and to present a custom solution.
Overview
5. Motivation
● Shiny Dashboard for data visualisation and model
validation
● In the backend, the Shiny App was also:
○ reading the data
○ performing data cleaning
○ computing features
○ building prediction models
● Client also needed to access to only the calculation
process to use it outside (and independent from)
the Shiny App
Our use case
6. Motivation
● Rewriting the code in another programming
language is not efficient
● Certain algorithms may not be available in
other programming languages
● Even though certain algorithms may be available
in other programming languages, they may not
render the same results as in R
Why not rewrite the needed code?
7. The answer to our problem
● Modularize the application
● Programming language independent
● Common concept: programmers know how to
work with APIs
● Common "data language": JSON (but not the
only one)
APIs
8. Of R and APIs
● Not directly, as R is not a web server
● We need a wrapper / server that:
● receives requests and hands them to R
● passes any response from R to the client
Candidate approaches
● Web server: plumber and OpenCPU
● Serverless: running R on AWS Lambda -> wrap R
functions inside FasS
Can we serve API requests from R?
9. Wrapping up the requirements
Getting ready to roll our sleeves
The App task engine should:
● get a request id
● read data from DB
● process data for 5-20 seconds
● return the results
Our challenge: How to use this code in
production?
● needs to be triggered
● uncertain and irregular future demand
● R code still in development
● clicking through an interface to re-deploy
the code every time we needed it was not
an option
10. Amazon Web Services
Bref
AWS is a collection of pay-as-you-go cloud services.
The services we needed:
● IAM (Identity and Access Management) - for security
● VPC - a logically isolated virtual network
● EC2 - a virtual machine
● API Gateway
● S3 - storage
● Lambda - serverless application service
11. Amazon Web Services
Lambda: Function as a Service
AWS Lambda is a serverless compute service that
launches and executes code when it is explicitly triggered
by an event (API), and stays up ONLY as long as the code
runs.
● Helps you build apps while minimizing infrastructure
maintenance
● Keep the focus on what’s important: data
engineering/analysis, not DevOps
● Pay only for what you use: supports ad-hoc requests
● Horizontally scalable
Image source:
https://dwdraju.medium.com/python-function-on-aws-lambda-with-api
-gateway-endpoint-288eae7617cb
12. R on Lambda
How difficult can it be?
Turns out, it is not as intuitive to directly run R in AWS Lambda.
Since December 2018, Amazon introduced
custom runtimes for AWS Lambda. This allows
us to use almost any programming language,
including R.
13. R on Lambda
Solutions
Our new approach:
● Use a Base R custom runtime provided by Bakdata
● Copy the additional R packages we need inside the deployment
package provided to Lambda
Other approaches:
● Previously we used package rpy2 to run R code from Python)
● R package lambda.r: triggers a Lambda function from R
14. AWS-Lambda-R
The architecture
Looks simple, but you don’t want to click through all that
every time you want to re-deploy the code...especially
when you have multiple releases
16. Automation
aws-lambda-r scripts
We needed a fast way to launch the infrastructure entirely.
Our approach: a series of shell scripts that launch the entire AWS
infrastructure needed to run R on AWS Lambda.
Top view:
Deploy R function on AWS Lambda
Configure access to AWS Lambda
17. Automation
aws-lambda-r implementation details
The scripts:
1. use your settings to create a VPC, S3, authorization
policies
2. install and compile R packages
3. create the zip file to load in AWS Lambda and save it to
S3
4. create Lambda function and deploy the zip file
5. configure AWS API Gateway to allow accessing the
code over the web
The scripts use AWS CLI through (Git)Bash => available on
all platforms
21. Pros
At the end of the setup, you will have the entire infrastructure to
run R on AWS Lambda, without worrying about EC2 instances or
scalability issues.
● use AWS Command Line Interface - no need for clicks anymore
● pay-as-you go
● fast deployment after each release
● easy to adapt to automatically deploy code written in Python or
JavaScript (if needed in the future)
22. Limitations
● Lambda function memory allocation: 128 MB to 10,240 MB
● Function timeout: 15 minutes
● Maximum zip file size: 250MB
● this is the most important limitation as it prevents using large R
packages
23. Finally
Where Data Science stops and Data
Engineering begins...
● Each project has unique requirements and constraints
● AWS was great for our needs, Lambda too, especially
since it became more flexible through custom
execution environments and layers
● Scripts still run in production, making the client happy
● It is worth automating something in order to be able
to focus in what’s more important