Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Video Transcoding on Hadoop
1. Video Transcoding on Hadoop
P R E S E N T E D B Y S h i t a l M e h t a a n d K i s h o r e A n g a n i ⎪ J u n e 3 , 2 0 1 4
2 0 1 4 H a d o o p S u m m i t , S a n J o s e , C a l i f o r n i a
2. Outline
2 2014 Hadoop Summit, San Jose, California
Video Transcoding at Yahoo
Current Architecture: (Hadoop 0.23.x)
New Requirements
Generic YARN (master / worker)
4. Video Transcoding
4 Yahoo Confidential & Proprietary
Convert source videos to standard output formats
› input support
• > 10 container formats
• > 40 video codecs
• > 60 audio codecs
› output support (at various resolutions and bitrates)
• mp4/h264/AAC
• webm/vp8/vorbis
AVI
MP4
Mov
3GP
FLV
WebM
…
MP4
WebM
5. Related Jobs
5 Yahoo Confidential & Proprietary
Post Transcode enrichments
› watermarking
› previews
› thumbnails
› visual seek
Machine learning
6. Extremely Compute and I/O intensive
6 Yahoo Confidential & Proprietary
SLA is measured in multiples of source video length
FFmpeg takes between 0.5x to 5x video duration
› depending on hardware / resources available
› tool configuration, etc
Computation requirements are dependent on:
› source and destination parameters
Job parallelism
› some jobs can work on fragmented videos
› many require the whole video file for optimal results
8. Job Characteristics
8 Yahoo Confidential & Proprietary
Tens of thousands of input videos / day
Source duration ranges from 10 seconds to 2 hours
Video sizes vary from a few MBs to a few GBs
Variable source / output fan-out
› 5 to 15 output jobs per source video
› hundreds of thousands of processing tasks per day
Job split and planning at ‘t1’
› dependent on source video parameters
Static Job plan (DAGs) based approaches lead to:
› high resource wastage with reduced concurrency if the DAG over provisioned
› high resource contention with SLA misses when DAG plan too strict
SLA and predictability are very important
10. Cascaded Map – Reduce Jobs
10 Yahoo Confidential & Proprietary
MR Job
MR Job
OOZIE
MR Job
(M)
Download + Split
Generation
Video Store
HDFS
MR Job
MR Job
MR Job
(R) Cleanup,
Notify
(M) Transcode
(M) Transcode
(M) Transcode
API
API
11. Why Hadoop 1/2
11 Yahoo Confidential & Proprietary
Extremely reliable as a framework
Good Resource Management
› custom container asks based on source video parameters
› multiple 2G to 6G MR jobs spawned on demand
› minimal resource wastage (job plan decided by the parent MR job)
Distributed File System (HDFS)
› used to share video files between various transcode jobs
Elasticity
› scaling achieved by increasing queue capacity
Fault Tolerance
OOZIE provides job level fault tolerance
MR framework provides task level fault tolerance
12. Why Hadoop 2/2
12 Yahoo Confidential & Proprietary
Log analysis and reporting
› run as MR jobs alongside transcode jobs in the same queue
All functions well contained within the Hadoop MR ecosystem
Very low maintenance
› over and above Grid maintenance
Lets us focus on the business logic and functions
Excellent SLA for big jobs
14. UGC and the current architecture (shortcomings)
14 Yahoo Confidential & Proprietary
Very high variance in User Generated Content
› duration, size, bitrates, etc.
Users want immediate feedback
› SLA very important here
Large number of short length videos (< 30 seconds)
SLAs on small videos is very high
› latency in MR containers’ allocation and preparation
› some latency added by OOZIE scheduling
OOZIE / MR designed for batch jobs
15. The Latency
15 Yahoo Confidential & Proprietary
Total Δt1 ~ 50 seconds to a minute, Δt2 ~ few seconds
Job split decision point important
› leads to efficient resource utilization
Map Reduce framework very good for batch jobs
› but not suitable for near real-time processing
Well known and documented
Alternate low latency frameworks available
OOZIE MR1
Δt1
MR3
Δt1
MR2
Δt1
MR4
Δt1
t1
job split
(DAG planning based on source
video / requester)
Δt1
Job Queuing / Scheduling
Container Allocation
Container Localization
Δt2
Δt2
Δt2
Δt2
Container warming
- (ML Models, etc)
16. New Requirements and options explored
16 Yahoo Confidential & Proprietary
Need
› near real-time scheduling (Δt1)
› long running re-usable containers (Δt2)
Options explored
› Tez
› Storm / Spark
› Slider
17. Issues with options explored
17 Yahoo Confidential & Proprietary
Most (if not all) frameworks optimized for captive data flow
› (in our case) only job metadata flows through the framework
› while video blobs are consumed from outer subsystems (HDFS / local storage)
› metadata is not a clear indicator of job characteristics
Video vs Text Processing
› cannot process line by line
› no key / value decomposition
› many jobs require the whole video file to be present locally
18. The Comparison Sheet
18 Yahoo Confidential & Proprietary
Requirement Current Tez Storm / Spark Slider
Elasticity High High High High
Latency High Low Low Low
Resource Efficiency (usage %) High Low* High High
Dynamic DAG Yes No No No DAG
Fault Tolerance Framework Framework Framework Framework
Resource Management Fine Fine Coarse / None Fine
Job / Task Abstraction Yes Yes Yes No
Container Release Yes Yes No No
Container Isolation Yes Yes No Yes
Container PreWarm Per Job Once Once Once
* Containers remain idle as DAG cannot be changed post first step
20. Generic YARN Master / Worker
20 Yahoo Confidential & Proprietary
Master w1
Workers – (Type 1…k)
… wn
Jobs RPC
Extremely simple framework
Master manages a pool of workers
Master reads jobs and distributes to workers over Hadoop RPC
Framework has pluggable master and worker tasks
Pluggable scheduling strategy to manage workers
Heterogeneous worker tasks in same pool
Custom resource allocation per worker type
Worker resources setup once at bootstrap
State management is done by Master using HDFS
Security and token management by framework harness
…
21. Master, Worker Interfaces
21 Yahoo Confidential & Proprietary
public interface Master {
Job getJobInput(String workerName);
void setJobOutput(Job jobOutput);
}
public interface Worker {
public Job execute(Job jobInput);
}
22. New Architecture for Transcoding
22 Yahoo Confidential & Proprietary
HDFS
Pool
Master
w1
Worker1
… w
m
Client
API
Job Queue
w1
Workerk
… wn
API
State
Information
Video Storage
…
23. Characteristics of the New Framework
23 Yahoo Confidential & Proprietary
Long running workers in YARN containers
› configurable TTL and timeouts
Pools consists of 1 Master and multiple workers
Multiple pools are managed by the client
Multiple clients across clusters
Adaptive container allocation and release
› scheduling strategy (low – high watermark based)
Significant improvements in latency
› job scheduling and distribution in milliseconds
YARN and the Client provide Master fault tolerance
Master takes care of fault tolerance for workers
24. What Next …
24 Yahoo Confidential & Proprietary
Hope to release to the community soon
In-principle similar to Google containers
› with a low latency Job abstraction
YARN (nice to have):
› Multi dimensional scheduling
› Node Labels