(CMP404) Cloud Rendering at Walt Disney Animation Studios

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Usman Shakeel, Amazon Web Services
Kevin Constantine, Walt Disney Animation Studios
October 2015
CMP404
Cloud Rendering at
Walt Disney Animation Studios

Visual Effects and
Animation
1
Who is using AWS for rendering?
3 Theme Parks
5 Gaming
Marketing2
4 Manufacturing
6 Life Sciences
7 Engineering and Architecture

Visual Effects and
Animation
1
Let’s make a film in the cloud…

VFX/Animation Rendering - workflow components
CompositingModeling Rendering
Asset management
Collaboration and task management

The challenge of making a film

On-premises capacity

Rendering in the cloud

Rendering in the cloud
Cloud provides you the capability to
scale fast and get the outputs faster
Initial project on-boarding
artwork

A tale of two customers
A boutique studio Walt Disney Animation Studios
On-Premises
Hardware
No or very little investment A significant investment
Licenses Limited Unlimited
Project
Structure
Project based from other studios Internal customers/projects
Budget
Constraints
Time and resources Time and resources
Compute Needs Large scale Very large scale
Infrastructure
Efficiencies
No or very little On-premises infrastructure optimized for
rendering workload
Cloud Model All-in mostly Hybrid mostly
Security Mandated by customers Required due to high valued assets

They both ask us the same thing…
The ability to spin up thousands of cores on-demand
…without any upfront investment
…and leveraging the most up-to-date configurations
A project-based “disposable” infrastructure
…with a flexible licensing / utility / by the hour

They both tell us the same thing…
=< $0.01
per core/hour
Access to thousands of
cores whenever needed
No upfront investments in
infrastructure
Easier collaboration
Ecosystem of software
providers
Access to large memory
configs to do 6K/10K renders
Project based “disposable”
infrastructure

…when the rubber meets the road !
Share FS everywhere Latency Large datasets Lots of instances
{Data/Content}

Rendering in the Cloud - State of the Union
Scale at a very cheap price
EC2 Spot

Leveraging Spot successfully today requires some
effort
Build stateless, distributed, scalable applications
Choose which instance types fit your workload the best
Ingest price feed data for AZs and regions
Make run time decisions on which Spot pools to launch in based on
price and volatility
Manage interruptions
Monitor and manage market prices across AZs and instance types
Manage the capacity footprint in the fleet
And all of this while you don’t know where the capacity is
Serve your customers

Spot Fleet
Instead of writing all that code to manage Spot instances,
simply specify:
•  Target Capacity – The number of EC2 instances that you want
in your fleet.
•  Maximum Bid Price – The maximum bid price that you are
willing to pay.
•  Launch Specifications – # of and types of instances, AMI ID,
VPC, subnets or AZs, etc.
•  IAM Fleet Role – The name of an IAM role. It must allow
Amazon EC2 to terminate instances on your behalf.

Spot Fleet Example – Instance Weighting
Say your workload needs at least 60 GB of memory
Want capacity to complete 20 units of work
Choices:
•  r3.2xlarge (61.0 GB, 8 vCPUs) = 1 unit of 20
•  r3.4xlarge (122.0 GB, 16 vCPUs) = 2 units of 20
•  r3.8xlarge (244.0 GB, 32 vCPUs) = 4 units of 20
An option to bid for all of these instance types:

AWS cloud scale is “large”
• 10s/100s/1000s/10000s cores on-demand in the cloud
• A “large” (Disney Animation Studio) renderfarm:
55,000 cores
• In this demo:
~40,000 vCPUs on
EC2 Spot Market
Scale at a very cheap price

• BYOL
• SaaS
• AWS Marketplace
• Elastic Licensing models
Thinkbox Deadline Usage Based Licensing
•  Render nodes pull metered licenses from cloud-based license server
•  Usage is tracked per minute
•  Bulk minutes will be available via Thinkbox’s online store
•  Store will eventually host 3rd party licensing (Nuke, VRay, etc.)
AutoDesk Maya
Licensing at Cloud Scale

Hydrating the Cloud Renderfarm
Amazon S3 as the source of truth for your content/data
•  On AWS Marketplace/SaaS
(Aspera, Signiant, File Catalyst, Expedat)
•  Amazon S3 Multi-part Upload
Direct to Shared File Systems
•  Amazon EFS throughput scales linearly to the storage
•  Lustre can hydrate from an S3 bucket
•  Avere can be fronted to Amazon S3 or an
on-premises NAS
+ AWS Direct Connect
EFS
S3
Multipart

Shared FileSystem Everywhere (some ideas)
Shared Storage
On-prem Storage
AWS Direct Connect
Storage Cache
Amazon S3
Luster on EC2
Avere on EC2
EFS
AWS Direct Connect
Hydrate workers
EC2 Spot
Shared Storage
FXT on-prem

NFS/CIFS (Content/Data Share) Everywhere (some ideas)
Elastic File System
•  Designed to support petabyte scale file systems
•  Throughput scales linearly to storage
•  Same latency spec across each AZ
•  Thousands of concurrent NFS connections
•  Works great for large I/O sizes
•  Pay for only what you use not what you provision
•  Managed with multi-copy durability
EFS

Move the Graphic Artist to the Cloud …
•  NVIDIA GPU based EC2 instances
•  Teradici PCoIP
•  Frame, Otoy
•  Windows and Linux (VNC+VirtualGL)

Managing your “disposable” infrastructure
Launch a CloudFormation stack
with all the infrastructure
resources for a specific project
Automatically scale the stack
as appropriate
AMI
CloudFormation
Template
CloudFormation
Terminate
Template

The Crown Jewels
•  AWS alignment with the latest MPAA cloud based application
guidelines for content security – August 2015
•  VPC private endpoint for Amazon S3 – enables a true private
workflow capability
•  Encryption & key management capabilities
•  Amazon Glacier Vault for high-value media/originals

Rendering in the Cloud - A Sample Architecture
(All in Cloud Pipeline)
Shared Storage
Renderfarm
On-Prem Storage
Pipeline and License Manager
3D Modeler
Remote
App Visualization
AWS Direct Connect
Modeling Dumb Client
Storage Cache
Amazon S3
Avere on EC2
Scalable Renderfarm on EC2
Appstream or Teradici running on a G2 instance
Pipeline Manager running on EC2
G2
EC2 SPOT
EFS
Hydrate workers
EC2 Spot

Render Farm
Rendering in the Cloud - A Sample Architecture
(A Hybrid Pipeline)
Shared Storage
Renderfarm
On-Prem Storage
AWS Direct Connect
Storage Cache
Amazon S3
Avere on EC2
Scalable Renderfarm on EC2
EFS
Hydrate workers
EC2 Spot
On-premise
Renderfarm
EC2 SPOT
Cloud renderfarm as an
extension of on-prem renderfarm
FXT on-prem
Pipeline and License
Manager (also manage
cloud renderfarm)

Let’s make a real film in the cloud…

Disney Animation Renderfarm
Renderfarm
Avere FXT cluster
WDAS Data Center
Renderfarm
Avere FXT cluster
Storage
Remote Data Center
Renderfarm
Avere FXT cluster
Remote Data Center
San Francisco
Los Angeles
Burbank
Artists
Redundant 10Gb

Disney Animation’s Environment
•  90% Red Hat Enterprise Linux 6, 8% MacOSX
•  1Gb/s Ethernet to clients, 10Gb/s to most servers
•  Clients are bursty, not generally bandwidth constrained
•  Major Applications:
•  Hyperion (GI Renderer)
•  Maya
•  Houdini
•  Nuke
•  Coda (Scheduler)

Disney Animation’s Environment
•  NFS v3 Everywhere
•  5-7 petabytes
•  500 TB working-set
•  100 TB/week of data churn
•  Global namespace
•  Lots of metadata operations
•  Serve everything out of RAM/SSD
•  Renderfarm Footprint
•  55,000 core renderfarm
•  1.1 million render hours per day
•  200,000-400,000 tasks per day
•  Typical render
•  8-16 threads, 64 GB
•  3-5 hours per task

Disney Animation Renderfarm
Renderfarm
Avere FXT cluster
WDAS Data Center
Renderfarm
Avere FXT cluster
Storage
Remote Data Center
Renderfarm
Avere FXT cluster
Remote Data Center
San Francisco
Los Angeles
Burbank
Artists
Redundant 10Gb
virtual private cloud
Avere vFXT
Oregon
Spot Instances
10Gb Primary, 1Gb backup
EFS

Mostly Automated Deployment
•  Pre-built EBS-backed AMI
•  Heavily customized RHEL
•  Python/Boto3
•  Pass in how many resources and the minimum instance size
•  Calculates resource weights
•  Needs to calculate pricing
•  User-Data
•  Raids ephemeral disks if available for scratch space
•  Integrate with on-premises environment (DNS, asset inventory,
Puppet)
•  Creates EC2 tags
•  Runs Puppet to pick up changes since AMI-build-time
•  Joins the render queue and asks for work
•  Scale-up/down still a manual process

Spot Fleet Deployment
Core Count
./aws_spot_fleet_request
-‐p
reinvent
-‐-‐cpu
8
-‐-‐ram
64
-‐m
4.7

-‐c
1500

Spot Fleet Pricing
•  Target Price 1
•  $0.47/resource for the 40,000 core
•  Target Price 2
•  $0.16/resource for 16,000 cores

Benchmarks: On Premises vs. the Cloud
0"
20"
40"
60"
80"
100"
120"
stream"triad" disk"read" disk"write"
On"Prem"
r3.4xlarge"
r3.8xlarge"
m4.4xlarge"
m4.10xlarge"
cr1.8xlarge"
Higher is better

EFS Hydration
Single Node
50 Clients – multi-threaded file copy

Average Read Latency
0
100
200
300
400
500
600
700
100 500 800 1200 2400 4000
Time(µs)
Render Processes
Mid-TierA
Mid-TierB
Mid-TierC
Archive
EFS

Rendering in the Cloud vs. On-Premises
!"!!!!
!5,000!!
!10,000!!
!15,000!!
!20,000!!
!25,000!!
!30,000!!
1! 10! 20! 30! 40! 50! 60! 70! 80! 90!
RenderTime(s)
Frame #
EC2/EFS!
On!Prem!
Lower is better

Lessons Learned
•  Use as many different instance types as you can. Especially older generations.
•  Think about ways to modify your workload
•  Use every Availability Zone
•  Check your limits, especially your Amazon EBS limit and
VPC setup (address space)
•  Resource-oriented bidding
•  Diversified allocation
•  Benchmark your workload and set pricing accordingly
•  Set ONLY realistic pricing that you will pay for
•  Don’t be afraid to ask for help or pre-planning your run from AWS

Conclusion
•  Cloud rendering on AWS - State of the Union
Is getting stronger …
•  Rendering forecast
Partly cloudy with a chance of all in the cloud…
•  Future research
• Storage hydration
Distribute across many clients to saturate the EFS throughput
• Storage for processing
Read freely and lump the writes (for shared FS performance)
• Latency is killer
Atomic workflows within a single AZ/region
Caching appliances

Remember to complete
your evaluations!

(CMP404) Cloud Rendering at Walt Disney Animation Studios

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à (CMP404) Cloud Rendering at Walt Disney Animation Studios

Similaire à (CMP404) Cloud Rendering at Walt Disney Animation Studios (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

(CMP404) Cloud Rendering at Walt Disney Animation Studios