This document provides an overview and agenda for a presentation on batch processing solutions on AWS. It discusses batch computing challenges and needs, why the cloud is suitable for batch workloads, and options for running batch jobs on AWS including AWS Batch and Amazon ECS. It provides details on how AWS Batch and ECS work, examples of using them for batch processing, and best practices like leveraging spot instances. The presentation demonstrates how companies can build massively scalable systems on AWS for batch-oriented workloads like processing maps at scale.
2. • Batch processing – overview and challenges
• Why run batch workloads in the cloud
• Overview of AWS batch solutions
• Deep dive look at AWS Batch and Amazon ECS
• Best practices review
Agenda
3. What is batch computing?
Run jobs asynchronously and automatically across one or
more computers.
Jobs may have dependencies, making the sequencing and
scheduling of multiple jobs complex and challenging.
4. Challenges of Running Batch Workloads
• Typically resource intensive
• Time constraint for completion
• Potential impact to concurrent batch jobs
• Scaling infrastructure resources
• Ensuring effective resource utilization and cost savings
• Fragile and unreliable
5. What Batch Workloads Need
Reliability Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
6. Why the cloud makes sense for batch workloads
Reliable Scalable Pay as you goInfrastructure as
code
7. Why containers make sense for batch workloads
• Simple to model
• Polyglot
• Image is the version
• Do one thing well
• You build it, you run it
• Black box
9. Introducing AWS Batch
• Fully managed batch primitives
• Focus on your applications (shell scripts,
Linux executables, Docker images) and
their resource requirements
• We take care of the rest!
10. IAM role for the
AWS Batch job
Input files
Queue of
runnable jobs
S3 events trigger a
Lambda function that submits
an AWS Batch job
AWS Batch
compute environments
AWS Batch
job output
Typical AWS Batch Job Architecture
Job definition
Job resource requirements
and other parameters
AWS Batch execution
Application
image
AWS Batch
Scheduler
11. Amazon EC2 Container Service (ECS) is a highly
scalable, high performance container management
service that supports Docker containers and allows you
to easily run applications on a managed cluster of
Amazon EC2 instances.
Introducing Amazon ECS
15. Designed for Use with Other AWS Services
Elastic Load Balancing
Amazon Elastic Block Store
Amazon Virtual Private Cloud
AWS Identity and Access Management
AWS CloudTrail
Spot Fleet
16. Security
Your own EC2 instances in a VPC
with all its security features to
provide a high level of isolation.
17. How ECS works
EC2 INSTANCES
LOAD
BALANCER
Internet
ECS
AGENT
TASK
Container
TASK
Container
ECS
AGENT
TASK
Container
TASK
Container
AGENT COMMUNICATION
SERVICE
Amazon
ECS
API
CLUSTER MANAGEMENT
ENGINE
KEY/VALUE STORE
ECS
AGENT
TASK
Container
TASK
Container
LOAD
BALANCER
19. File put into
S3 bucket
Amazon
Simple Queue
Service
Output to S3
bucket
Amazon ECS provisions compute
clusters and schedules tasks based
on demand
Batch worker
task polls
SQS for new
jobs
Queue load is
communicated to
ECS
Containerized
batch worker
processes file
Basic batch workflow with ECS
20. Trigger Batch Processing with Lambda
Amazon ECS
Availability Zone Availability Zone
Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3 Bucket
(Source)
ecs:RunTask
Amazon
S3 Bucket
(Target)
Amazon
S3 Bucket
Object
Amazon
CloudWatch
AWS CloudTrail
Container Instance
21. Fleet of workers with ECS with SQS
Amazon ECS
Availability Zone Availability Zone
SQS queue
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3
DynamoDB
Amazon
Kinesis
ecs:RunTask
Amazon
CloudWatch
AWS CloudTrail
22. Long-running Batch Jobs
• Utilize Spot
Instances
• EC2 Spot Blocks for
Defined-Duration
Workloads
• ECS event stream
for CloudWatch
Events
• Service Scaling and
Monitoring
Amazon ECS
Availability Zone Availability Zone
Container Instance Container Instance
AutoScaling Group
Task A Task B
Task C
Amazon
CloudWatch
AWS CloudTrail
23. Get the Best Value for EC2 Capacity – Spot
Instances
• Since Spot instances typically cost 50-90% less than
On-Demand, you can increase your compute capacity by
2-10x within the same budget
• Or you could save 50-90% on your existing workload
• Either way, you should try it!
24. Best Practices
• Store state and inputs, outputs in S3 or another
datastore
• Minimize dependencies between task definitions (should
be independent of each other)
• Use Spot Instances and Spot fleets for long-running
batch jobs
• Monitor cluster state with ECS APIs
• Share pools of resources
• Auto Scaling, VPC, IAM, scheduled Reserved Instances
35. Time and Event-Based Task Scheduling
• schedule on fixed time intervals (e.g.: number of minutes, hours,
or days)
• Or use cron expressions
• Set Amazon ECS as a CloudWatch Events target
36. Time and Event-Based Task Scheduling
• schedule on fixed time intervals (e.g.: number of minutes, hours,
or days)
• Or use cron expressions
• Set Amazon ECS as a CloudWatch Events target
37. Summary
• Cloud and containers are a great way to run batch
workloads
• Two options on AWS: Batch and ECS
• Why AWS Batch:
• Managed Batch Processing environment
• Why ECS:
• DIY Batch Processing
• Very flexible Time and Event based Task Scheduling