Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bryan Samis
Solutions Architect
SRV322
Increase the Value of Video Using
Machine Learning and AWS Media
Services

Agenda
• Brief introduction to services
• Using ML in video workflows
• Content indexing / metadata generation
• Add searchable metadata to a video archive
• Log when celebrities appear in new episodes
• Generate captions for a collection of video assets
• Content retrieval
• Transcode just the video clip that contains a specific person
• Putting it all together

Services Used
• AWS Elemental MediaConvert
• AWS Elemental MediaLive
• Amazon Rekognition
• Amazon Transcribe
• AWS Lambda
• AWS Step Functions

AWS Elemental MediaConvert
AWS Elemental MediaConvert is a file-based video processing service that enables anyone, with
any size content library, to easily and reliably transcode on-demand content for broadcast and
multiscreen delivery.
• Access to professional grade video features and quality
• No software or hardware infrastructure to manage
• Automatically scales in response to variations in incoming video volume
• Ability to manage capacity and control order in which jobs are processed
• Pay for what you use, billed by the second of content produced

Amazon Rekognition
Object & scene
detection
Facial
analysis
Face
comparison
Face
search
Celebrity
detection
Image
moderation
Text
detection

Amazon Rekognition Video
Object, scene &
Activity detection
Face
search
Facial analysis Activity pathing
Unsafe content
detection
Celebrity
detection
Text in images

Amazon Rekognition
File requirements
• Image recognition
• JPEG / PNG image
• Up to 15 MB
• Video recognition
• MOV / MP4 file with H.264 video
• Up to 8 GB

Amazon Transcribe
A fully managed and continuously trained automatic speech recognition (ASR)
service that takes in audio and automatically generates accurate transcripts
Support for audio in
many formats and
low fidelity
§
Amazon S3
integration
Hello/
Hola
Time stamps and
confidence scores
English and SpanishPunctuation

Amazon Transcribe
• Input file types accepted are:
• FLAC
• MP3
• WAV
• MP4
• Up to 2 hours in duration
• Up to 1 GB in size
• Produces JSON output with full transcription and word timing

Amazon Transcribe – Use Cases
Call centers Subtitles for
VOD
Transcribe
meetings
Broadcast closed
captions
Content
indexing
Compliance

Using Amazon ML Services for Media

Using Amazon ML Services for Media
• Use services such as Amazon Rekognition &
Amazon Transcribe to generate metadata about
your content
• Store that metadata and make it searchable
• Retrieve only the portion of the content
you want
• Prepare it for timely use
Live and file
Sources
Amazon ML
Services
ML
Amazon
DynamoDB
Database
Live and file
Content
Content Indexing / Metadata Generation Content Retrieval / Action Metadata
AWS Elemental
Media Services
Media processing
AWS Elemental
Media Services
Media processing

Content Indexing / Metadata Generation

Content Indexing / Metadata Generation
File-based
content
Live
content
MediaLive
Kinesis
Video
Streams
MediaConvert
Amazon Rekognition
(Image)
• JPEG/PNG
• Up to 15 MB
Amazon Rekognition
(Video)
• H.264 video
• MP4/MOV file
• Up to 8 GB
Transcribe
• FLAC/MP3/WAV/MP
4
• Up to 2 hours
• Up to 1 GB

Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Rekognition
TheChallenge
• A broadcaster
wants to add
metadata to
existing archive of
video content
• Index metadata
and video to make
it searchable
• Keep costs low
TheSolution
• Use AWS
Elemental
MediaConvert to
extract frames
from video content
• Use Amazon
Rekognition to
analyze and create
metadata for video
content
TheBenefit
• Video tagged with
objects, scene
and celebrities
detection
• Five-second frame
extraction keeps
cost low while
providing
searching index

AWS Elemental
MediaConvert job
transcodes file and
extracts JPEG frames to S3
bucket.
AWS Lambda function
triggered by Amazon S3
object-created event tells
Amazon Rekognition to
analyze the JPEG file.
Amazon Rekognition
performs requested
operation on image (i.e.,
object detection, celebrity
recognition, etc.).
Amazon Rekognition returns result
to AWS Lambda, which stores tags
and confidence scores in Amazon
DynamoDB, Amazon Redshift,
Amazon Elasticsearch Service,
Amazon RDS, or whichever service
best suits the use case.
• Use AWS Elemental MediaConvert to extract still frames from a video
AWS Elemental
MediaConvert
File-based
processing
Amazon S3
Storage
AWS Lambda
Serverless
Amazon
Rekognition
ML / AI
Amazon
DynamoDB
Database
File
Source

• Add new file output group to an AWS Elemental MediaConvert job

Add frame capture (JPEG) output the job
Framerate
determines the
number of images
that will be extracted
from the video per
second. 1/5 indicates
to create one JPEG
every 5 seconds.

Amazon Rekognition
AWS Lambda function to invoke Amazon Rekognition on our extracted JPEG to detect
celebrities

Amazon Rekognition
Result from our image

Demo

AWS Elemental MediaConvert and Amazon Rekognition Video
TheChallenge
• A content producer
wants to log who is
in each scene of
new episode of a
show
• Raw video files are
~200 GB for 60 min
TheSolution
• Use AWS Elemental
MediaConvert to
compress video
content (but retain
quality
• Use Amazon
Rekognition Video to
analyze and create
metadata for video
content
TheBenefit
• Video tagged
celebrities detection
and timing and
position of celebrity
• Video files reduces
to <8 GB for 60 mins
to reduce costs

AWS Elemental MediaConvert and Amazon Rekognition Video
job transcodes source file to
H.264/MP4 at a bit rate such that
the file size is <8 GB.
AWS Lambda function triggered by
Amazon S3 object-created event
tells Amazon Rekognition to analyze
the video file.
performs requested operation on
video (i.e., person tracking,
celebrity recognition, etc.).
Amazon Rekognition returns result to
AWS Lambda, which stores tags and
confidence scores in Amazon
DynamoDB, Amazon Redshift, Amazon
Elasticsearch Service, Amazon RDS, or
whichever service best suits the use
case.
Use AWS Elemental MediaConvert to compress files >8 GB and feed it to
Amazon Rekognition
AWS Elemental
MediaConvert
File-based
processing
Amazon S3
Storage
AWS Lambda
Serverless
Amazon
Rekognition
ML / AI
Amazon
DynamoDB
Database
File
Source

• Add H.264/MP4 output to MediaConvert job

• Add H.264/MP4 output to AWS
Elemental MediaConvert job
Use Container MPEG-
4 Container (MP4)
and a file extension of
mp4.
Set Video Codec to
MPEG-4 AVC (H.264).
Select bit rate accordingly
so output file is smaller
than 8 GB. For example, a
60-minute movie at 7 Mbps
will be approximately
3.2 GB.

AWS Lambda function to invoke Amazon Rekognition on our transcoded video
to detect labels:

Example code to fetch our Amazon Rekognition Video results when Amazon
SNS notification is published:

Result from our video

AWS Elemental MediaConvert and Amazon Transcribe
TheChallenge
• An online training
provider has 1,000s of
hours of video that
need captions
• Video is in a variety of
formats
TheSolution
• Use AWS Elemental
MediaConvert create
audio only version of
content
• Use Amazon
Transcribe to generate
timestamped
transcription
• Convert Amazon
Transcribe output to
captions file
TheBenefit
• All formats of video
content get captions
added to make them
more accessible
• Option to run Amazon
Transcribe output
through Amazon
Translate to get multi-
language captions

Amazon Transcribe
job transcodes source file,
creating audio-only rendition for
Amazon Transcribe
AWS Elemental
MediaConvert also
creates normal
audio/video output
AWS Lambda function triggered
by Amazon S3 object-created
event creates a new Transcribe
job
Amazon Transcribe outputs
JSON file of detected words
and timing
Lambda function converts Amazon
Transcribe JSON into subtitle format
(such as WebVTT, SRT, or TTML) and
delivers to
Amazon S3 bucket with content
AWS Elemental
MediaConvert
File-based
processing
AWS Lambda
Serverless
Amazon
Transcribe
ML / AI
File
Source
Amazon S3
Storage
Amazon S3
STORAGE

Add audio-only WAV output to the job. Start by adding an additional file output group.

Configure audio-only uncompressed WAV or MP4 output.

Amazon Transcribe
AWS Lambda function to create an Amazon Transcribe job from the audio file
created by AWS Elemental MediaConvert.

Amazon Transcribe
Use AWS Step Functions to monitor status of Transcribe job.

Amazon Transcribe
Transcribe creates a JSON file with complete transcription
and word by word timing.

Amazon Transcribe
Must convert Amazon Transcribe JSON into usable closed caption / subtitle format, such
as SRT.
• Not a trivial problem. We need to determine sentence boundaries and which words to
combine into the same captions.
Example:

Amazon Transcribe
• Some ideas for tackling this problem:
• Calculate the cadence of the wording, and look for larger than average gaps
between words. Use these points as our breaks.
• Use a fixed caption duration of 1–2 seconds and “aggregate” all words that fall
within that duration.
• None of these methods are perfect. Analyzing audio alone won’t necessarily account for
scene changes, gaps in dialog, non-dialog sound elements, etc.
But they can get us close…

Content Retrieval

Content Retrieval
TheChallenge
• The content
producer would like
to create a promo
clip of all of the
scenes from their
episode that contain
a particular actor.
• Remember, the
source file is 60
minutes long and
200 GB.
TheSolution
• Amazon Rekognition
video facial
recognition identifies
when the star
appears in the
source video.
• AWS Elemental
MediaConvert uses
time references to
selectively transcode
source video.
TheBenefit
• Faster and more
cost-effective clip
generation as only
the video contents
that has been
identified as
featuring the
celebrity is
transcoded.

Content Retrieval AWS Elemental
MediaConvert
transcodes clips from the
source file, using only the
time range(s) specified
AWS Elemental
MediaConvert
File-based
processing
Amazon S3
Storage
Clipped file
Output
Amazon
DynamoDB
Database
AWS Lambda
Serverless
Lambda function
queries database for
metadata being
searched
Lambda function creates
MediaConvert transcode job
specifying time(s) from
source to clip

Content Retrieval
Use AWS Elemental MediaConvert “Input Clipping” feature to clip a file to specific times

AWS Media Analysis Solution
https://aws.amazon.com/answers/media-entertainment/media-analysis-solution/
• Generate searchable metadata from
your media assets using Amazon
Rekognition, Amazon Transcribe,
Amazon Comprehend, and Amazon
Elasticsearch Service
• Deploy in minutes with a single click
using AWS CloudFormation
• Interact via API or demo web UI
• Orchestrated with Step Functions,
extensible and easily customizable

Bringing it all together
Adding video transcoding to the Media Analysis Solution
AWS Elemental
MediaConvert

Amazon Machine Learning Stack
Platforms
Application services
A m a z o n
R e k o g n i t i o n
A m a z o n
R e k o g n i t i o n
V i d e o
P o l l y T r a n s c r i b e T r a n s l a t e C o m p r e h e n dL e x
Amazon SageMaker Amazon Mechanical Turk
Frameworks KERAS
NVIDIA
Tesla V100 GPUs
(14x faster than P2)
P3
Machine Learning
AMIs
5,120 Tensor cores
128 GB of memory
1 Petaflop of compute
NVLink 2.0
Infrastructure
&

Submit Session Feedback
1. Tap the Schedule icon.
2. Select the session you
attended.
3. Tap Session Evaluation to
submit your feedback.

Thank You

Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit

Similaire à Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit