Presentation to the NASA Cloud Community of Interest on how we leveraged AWS Lambda in the GovCloud to do high volume OCR of design documents to improve astronaut safety.
horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goa
Deploying Serverless Cloud Optical Character Recognition in Support of NASA Astronaut Safety
1. Deploying Serverless Cloud Optical Character
Recognition in Support of NASA Astronaut Safety
Chris Shenton
CTO V! Studios
NASA Cloud COI
2017-11-02
2. Talk Overview
● The Problem
● The Challenge
● Architectures: server, cloud, serverless
● Lambda: FaaS, Events, Benefits, Limitations
● EVA OCR Architecture
● Security, FedRAMP, ATO
● Serverless Framework
● Gotchas!
● Happy Customer
● Future Challenges and Opportunities
3. Problem: Life-Threatening Spacesuit Failure
On July 16, 2013, water
filled the helmet of
Italian astronaut Luca
Parmitano, creating a
life threatening
scenario which forced
NASA to abort his
spacewalk.
4. The Challenge
● Designs on paper or scanned without OCR ability
● Current reporting processes and procedures
cannot be changed
● About 60 Discrepancy Reports (20 pages) and 190
Task Performance Sheet reports (500 pages) per
month
● Started OCR in 2015, stopped due to server load
● Overwhelmed the EVA Data Integration pipeline
100,000
pages/month
5. Architecture evolution: server to cloud to serverless
● Datacenter: no scaling
● Cloud servers: scaling
● Cloud Containers: scaling
● Serverless: fast, painless scaling
7. Architecture [1b]: Datacenter, no scaling
PDF
doc
Server
OCR
process
TXT
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Load overwhelms OCR server
8. Architecture [2]: Cloud with scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Pro:
● Scaling handles load spikes
Con:
● Complicated to set up
● Scale out takes a few minutes per server
● Still have to manage OS, security
Autoscaling group
SQS
Queue
Server
OCR
Server
OCR
Server
OCR
Server
OCR
Server
OCR
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
9. Architecture [3]: Cloud Containers with scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Pro:
● Scaling handles load spikes
● Can deploy immutable instances
Con:
● Have to manage scaling
● Have to manage placement, orchestration
SQS
Queue
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
Container Serve
Container
OCR
Container
OCR
Container
OCR
Container
OCR
Container
OCR
Container Server
Container
OCR
Container
OCR
Container
OCR
Container
OCR
Container
OCR
10. Automatic scaling
Architecture [4a]: Serverless Cloud with built-in scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Pro:
● Scaling is automatic, nearly instant
● No patching, open ports
Con:
● Some limits on size, lifetime
TXT
doc
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
TXT
doc
TXT
doc
TXT
doc
TXT
doc
11. PDF
page
Automatic scaling
Architecture [4b]: Serverless Cloud with built-in scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
page
PDF
page
Lambda
split doc
Lambda
split doc
Lambda
split doc
Lambda
split doc
Lambda
split doc
PDF
page
PDF
page
PDF
page
PDF
page
PDF
page
PDF
page
PDF
page
Automatic scaling
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
With instant, automatic scaling, we can split PDF docs into PDF pages
and OCR each page to text in parallel, with minimal extra effort.
Exploiting parallelism gives us our results much faster at no extra cost.
13. AWS Lambda is a “Function as a Service” (FaaS)
Function as a service (FaaS) is a category of cloud
computing services that provides a platform
allowing customers to develop, run, and manage
application functionalities without the complexity
of building and maintaining the infrastructure
typically associated with developing and launching
an app. Building an application following this
model is one way of achieving a “serverless”
architecture, and is typically used when building
microservices applications.
Wikipedia
FaaS Products
● AWS Lambda
● Google Cloud Functions
● Microsoft Azure Functions
● IBM OpenWhisk
14. Event-Driven Computing: trigger Functions based on events
S3
ObjectCreated
DynamoDB
Row Changed
API Gateway
GET, PUT, POST, DELETE
Lambda Function
processes event
15. Event-Driven Computing: Lambdas can trigger Lambdas
S3
ObjectCreated
DynamoDB
Row Changed
API Gateway
GET, PUT, POST, DELETE
Lambda Function
processes event
Lambda
invoke Sync or Async
16. Event-Driven Computing: allows interesting architectures
Lambda:
return new S3
location
DynamoDB
Row Changed
store info to
DynamoDB
GET /newUpload
{uploadUrl:
s3://bucket/newKey}
PUT /newKey
...data…
S3 ObjectCreated
{bucket: b,
key: newKey}
Application
Search
Engine
Send metadata to application via HTTP
{method: GET,
url: /newUpload,
data: none}
{uploadUrl:
s3://bucket/newKey}
18. Lambda Benefits
Example application: process 1000 2-second requests/day
● Server: $16.84/month (AWS t2.small, 24x7)
● Lambda: $1.50/month
No Servers to Manage:
no patching, open ports
or logins
Subsecond Metering:
no idle capacity
Continuous Scaling:
high availability
22. EVA OCR Architecture: Big Wins
● Architecture designed for lowest operational cost possible:
○ no database cost
○ S3 files removed after 24 hours: minimal data charges, better security
● Architectural patterns we used:
○ directory prefixes for progress tracking instead of database
○ propagating metadata in S3 objects instead of database
● Lambda autoscaling, fast scaling, pay only for active use
● Serverless Framework simplified deployment
● but see the Gotchas in a few slides...
23. EVA OCR Architecture: securely connect with cloud policies
EDI App
IAM Role:
eva-app-role
EVA OCR S3 bucket
eva-ocr-dev
● /doc_pdf/
● /page_pdf/
● /page_txt/
● /doc_txt/
EVA Search API
on 3x EC2
HTTPS API on port 5333
Security Group:
sg-002: eva-search
● allow from
sg-001, ...
● to port 5333
OCREVA
Lambda Functions
IAM Role:
eva-ocr-dev-us-east-1-lambda
Security Group:
sg-001: ocreva-lambda-output
EVA
code
uploads
PDF to
/doc_pdf/
HTTP POST
{docid: ‘mydocid’,
page: 42,
text: ‘ocr text…’}
Lambdas read/write pdf
and txt in various folders
IAM Role:
eva-app-role
Policies:
● ocreva-s3-write-doc_pdf
● other EDI policies...
IAM Policy:
ocreva-s3-write-doc_pdf
allow write
arn:aws:s3:::eva-ocr-dev/doc_pdf/
No LaunchPad
servers were
harmed in the
making of this
service
24. EVA OCR Security Controls
EVA
Data
Integration
systems
EVA OCR
S3 Storage
EVA OCR
Autoscaling
Lambdas
Even though Lambda is currently undergoing
FedRAMP certification, WESTPrime security
provided ATO based on the following controls:
● IAM policies, roles and Security Groups
restrict access
● Separate VPC for Lambdas
● No VPC network egress for Lambdas
● Security Group allows output of final
Lambda to EDI Search API
● Encrypted data in transit and at rest
● Static Code Analysis
Lambda VPC
private IP space
/16 = 65535 IPs
EDI VPC
NASA IP space
limited IPs
SG Allow
to port 5333
S3 VPC
Endpoint
25. Serverless Framework: from the horse’s mouth
Serverless is your toolkit for
deploying and operating
serverless architectures.
Focus on your application,
not your infrastructure.
serverless.com
npm install serverless -g
serverless create --template hello-world
serverless deploy
curl http://xyz.amazonaws.com/hello-world
27. Gotchas!
● Will get duplicate events if Lambda exits unsuccessfully
○ this is a good thing
● May get duplicate events
○ detect and possibly ignore them (idempotent)
● Timeouts if job takes longer than 300 seconds
○ may have to chain Lambdas
● Overloading destinations is likely due to scale
○ detect, back-off
○ may require handling like Timeouts
● S3 eventual consistency
○ use UUIDs in S3 keys to force read consistency
● Fast scaling can exhaust limited IP addresses in a VPC
○ use separate VPC for Lambda with large private IP space, e.g., /16 with 65,535 IPs
28. Happy Customer
“The work you’ve accomplished
is a big step proving out this
new technology for NASA”
Cuong Q Nguyen, JSC/NASA EVA Office
29. Future Challenges and Opportunities
Cuong has told us he needs to track assembly, subassembly and part hierarchies. Can
we extract structured text?
He also needs to identify inspector and approval “stamps”. This is not OCR but hard
image processing.
32. Questions?
Reach out to us!
chris.shenton@nasa.gov
chris@v-studios.com
@shentonfreude
Code available in WESTPrime’s GovCloud repo:
https://stash-gc.nasa.gov/projects/EVA/repos/ocreva