This document summarizes Chris Adamson's presentation on mastering media with AV Foundation. The presentation covered the fundamentals of digital media, including analog vs digital formats. It then discussed the iOS media frameworks, focusing on AV Foundation. It provided an overview of key AV Foundation classes for playback, capture, editing and advanced features. It also briefly covered HTTP Live Streaming and included demos of basic playback and recording functionality using AV Foundation.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Mastering Media with AV Foundation
1. Mastering Media with AV
Foundation
Chris Adamson — @invalidname — http://www.subfurther.com/blog
Voices That Matter IPhone Developer Conference — October 17, 2010
Wednesday, October 20, 2010
2. Road Map
✤ Fundamentals of Dynamic Media
✤ iOS Media Frameworks
✤ Playback
✤ Capture
✤ Editing
✤ Advanced Stuff
Wednesday, October 20, 2010
4. Analog
✤ Having a measurable value that is continuously variable
✤ Contrast to digital or “discrete” signals
Wednesday, October 20, 2010
5. Audio
✤ Phonograph — Grooves
vibrate a needle, which is
amplified to a speaker
✤ Telephone — Voice vibrates
microphone membrane,
vibration is transmitted as
voltage and reproduced by
vibrating headset speaker
✤ Radio — Audio signal
modulated on a carrier wave
Wednesday, October 20, 2010
6. Film
✤ Light projected through
translucent film frames onto
screen
✤ Each frame held in place briefly
(≅ 1/24 sec)
✤ Eye sees a moving image due
to “persistence of vision”
✤ Sound may be out-of-band
Wednesday, October 20, 2010
7. Television
✤ Chrominance and luminance
sent as continuous AM signal
✤ CRT gun sweeps across screen
in zig-zag pattern, illuminating
phosphors
✤ Sound is FM in adjacent
spectrum
Wednesday, October 20, 2010
8. Digital Media
✤ Represents a continuous signal numerically
✤ Audio — sample the signal at some frequency
✤ Video — digital images as frames
✤ Other kinds of samples — text (captions, subtitles), metadata, web
links, executable code, etc.
Wednesday, October 20, 2010
9. Encoding
✤ How do we turn a digital signal into numbers
✤ Audio — Pulse Code Modulation (PCM). Each sample represents
amplitude of audio signal at a specific time.
✤ Compressed audio — Lossless and lossy transformations to and from
PCM
✤ Video — Series of images (e.g., M-JPEG), or keyframe image (i-frame)
followed by deltas (p-frames and b-frames)
✤ Other media — Text samples are just strings
Wednesday, October 20, 2010
10. Containers
✤ Transport and/or storage of encoded streams
✤ Examples: MP3, AIFF, QuickTime Movie, Core Audio Format, .mp4,
MPEG-2 transport stream
✤ Containers may be optimized for streaming, editing, end-user
delivery, etc.
Wednesday, October 20, 2010
13. Metadata
✤ Information related to the audio data other than the signal itself
✤ Song title/album/artist, movie title, TV episode title/series, etc.
✤ Some containers support metadata, otherwise it is provided out-of-
band
Wednesday, October 20, 2010
14. Keep in mind…
✤ Different codecs may go in different containers
✤ A network stream is a container
✤ A media stream and a network stream are two different things
✤ Containers can contain multiple media streams
✤ A stream’s data is not necessarily in the container file
✤ Media samples may be in distinct places, or interleaved
Wednesday, October 20, 2010
16. iPhone 2 Media Frameworks
Core Audio /
Low-level audio streaming
OpenAL
Media Player Full-screen video player
Obj-C wrapper for audio
AV Foundation
playback (2.2 only)
Wednesday, October 20, 2010
17. iPhone 3 Media Frameworks
Core Audio /
Low-level audio streaming
OpenAL
Media Player iPod library search/playback
Obj-C wrapper for audio
AV Foundation
playback, recording
Wednesday, October 20, 2010
18. iOS 4 Media Frameworks
Core Audio /
Low-level audio streaming
OpenAL
Media Player iPod library search/playback
Audio / Video capture, editing,
AV Foundation
playback, export…
Core Video Quartz effects on moving images
Objects for representing media
Core Media
times, formats, buffers
Wednesday, October 20, 2010
22. Size is relative
AV android. QuickTime
QT Kit
Foundation media for Java
Classes 56 40 24 576
Methods 460 280 360 >10,000
Wednesday, October 20, 2010
23. How do media frameworks work?
Wednesday, October 20, 2010
25. “Boom Box” APIs
✤ Simple API for playback, sometimes
recording
✤ Little or no support for editing,
mixing, metadata, etc.
✤ Example: HTML 5 <audio> tag
Wednesday, October 20, 2010
26. “Streaming” APIs
✤ Use “stream of audio” metaphor
✤ Strong support for mixing, effects,
other real-time operations
✤ Example: Core Audio
and AV Foundation (capture)
Wednesday, October 20, 2010
27. “Document” APIs
✤ Use “media document” metaphor
✤ Strong support for editing
✤ Mixing may be a special case of
editing
✤ Example: QuickTime
and AV Foundation (playback and editing)
Wednesday, October 20, 2010
28. AV Foundation Classes
✤ Capture
✤ Assets and compositions
✤ Playback, editing, and export
✤ Legacy classes
Wednesday, October 20, 2010
29. AVAsset
✤ A collection of time-based media data
✤ Sound, video, text (closed captions, subtitles, etc.)
✤ Each distinct media type is contained in a track
✤ An asset represents the arrangement of the tracks. The tracks
represent the traits of the media’s presentation (volume, pan, affine
transforms, opacity, etc.).
✤ Asset ≠ media. Track ≠ media. Media = media.
✤ Also contains metadata (where common to all tracks)
Wednesday, October 20, 2010
30. AVAsset subclasses
✤ AVURLAsset — An asset created from a URL, such as a song or
movie file or network document/stream
✤ AVComposition — An asset created from assets in multiple files, used
to combine and present media together.
✤ Used for editing
Wednesday, October 20, 2010
31. AVPlayer
✤ Provides the ability to play an asset
✤ play, pause, seekToTime: methods; currentTime, rate properties
✤ Init with URL or with AVPlayerItem
NSURL *url = [NSURL URLWithString:
@"http://www.subfurther.com/video/running-start-
iphone.m4v"];
AVURLAsset *asset = [AVURLAsset URLAssetWithURL:url
! ! ! ! ! ! ! ! options:nil];
AVPlayerItem *playerItem = [AVPlayerItem
playerItemWithAsset:asset];
player = [[AVPlayer playerWithPlayerItem:playerItem]
retain];
Wednesday, October 20, 2010
32. AVPlayerLayer (or not)
✤ CALayer used to display video from a player
✤ Check that the media has video
NSArray *visualTracks = [asset tracksWithMediaCharacteristic:
AVMediaCharacteristicVisual];
if ((!visualTracks) ||
! ([visualTracks count] == 0)) {
! playerView.hidden = YES;
! noVideoLabel.hidden = NO;
}
Wednesday, October 20, 2010
33. AVPlayerLayer (no really)
✤ If you have video, create AVPlayerLayer from AVPlayer.
✤ Set bounds and video “gravity” (bounds-filling behavior)
else {
! playerView.hidden = NO;
! noVideoLabel.hidden = YES;
! AVPlayerLayer *playerLayer = [AVPlayerLayer
playerLayerWithPlayer:player];
! [playerView.layer addSublayer:playerLayer];
! playerLayer.frame = playerView.layer.bounds;
! playerLayer.videoGravity =
AVLayerVideoGravityResizeAspect;
}
Wednesday, October 20, 2010
36. HTTP Live
Streaming
Wednesday, October 20, 2010
37. HTTP Live Streaming
✤ Audio / Video network streaming standard developed by Apple
✤ Replaces RTP/RTSP
✤ Built-in support in iOS (AV Framework, Media Player) and Mac OS X
10.6 (QTKit)
✤ Required for apps that stream more than 10 MB over cellular network
Wednesday, October 20, 2010
38. How HTTP Live Streaming works
✤ Segmenting server splits source media into separate files
(usually .m4a for audio-only, .ts for A/V), usually of about 10 seconds
each, and creates an .m3u8 playlist file
✤ Playlist may point to bandwidth-appropriate playlists
✤ Clients download the playlist, fetch the segments, queue them up
✤ Server updates playlist periodically with latest segments; clients
refresh playlist, fetch and queue new segments
Wednesday, October 20, 2010
40. HTTP Live Streaming wins
✤ Works with existing file servers and content delivery networks
✤ Port 80 is never blocked
✤ Adapts to changes in available bandwidth
✤ Can be encrypted
✤ Has been submitted as a proposed IETF standard
✤ http://tools.ietf.org/htmłdraft-pantos-http-live-streaming-04
Wednesday, October 20, 2010
41. HTTP Live Streaming fails
✤ Not really “live” when buffer can be a minute long
✤ Can’t watch a game on TV and listen to HLS web radio for the
audio
✤ No meaningful adoption outside of the Apple world
✤ This may change before the next Stevenote. It’s an easy protocol to
implement
Wednesday, October 20, 2010
42. Back to AV Foundation…
Wednesday, October 20, 2010
43. Media Capture
✤ AV Foundation capture classes for audio / video capture, along with
still image capture
✤ Programmatic control of white balance, autofocus, zoom, etc.
✤ Does not exist on the simulator. AV Foundation capture apps can
only be compiled for and run on the device.
✤ API design is borrowed from QTKit on the Mac
Wednesday, October 20, 2010
45. Capture basics
✤ Create an AVCaptureSession to coordinate the capture
✤ Investigate available AVCaptureDevices
✤ Create AVCaptureDeviceInput and connect it to the session
✤ Optional: set up an AVCaptureVideoPreviewLayer
✤ Optional: connect AVCaptureOutputs
✤ Tell the session to start recording
Wednesday, October 20, 2010
46. Getting capture device and input
AVCaptureDevice *videoDevice = [AVCaptureDevice
defaultDeviceWithMediaType: AVMediaTypeVideo];
if (videoDevice) {
! NSLog (@"got videoDevice");
! AVCaptureDeviceInput *videoInput = [AVCaptureDeviceInput
deviceInputWithDevice:videoDevice
! ! ! ! ! ! ! ! error:&setUpError];
! if (videoInput) {
! ! [captureSession addInput: videoInput];
! }
}
Note 1: You may also want to check for AVMediaTypeMuxed
Note 2: Do not assume devices based on model (c.f. iPad
Camera Connection Kit)
Wednesday, October 20, 2010
47. Creating a video preview layer
AVCaptureVideoPreviewLayer *previewLayer =
[AVCaptureVideoPreviewLayer
layerWithSession:captureSession];
previewLayer.frame = captureView.layer.bounds;
previewLayer.videoGravity =
AVLayerVideoGravityResizeAspect;
[captureView.layer addSublayer:previewLayer];
Keep in mind that the iPhone cameras have a
portrait orientation
Wednesday, October 20, 2010
48. Setting an output
captureMovieOutput = [[AVCaptureMovieFileOutput alloc] init];
if (! captureMovieURL) {
! captureMoviePath = [getCaptureMoviePath() retain];
! captureMovieURL = [[NSURL alloc]
initFileURLWithPath:captureMoviePath];
}
NSLog (@"recording to %@", captureMovieURL);
[captureSession addOutput:captureMovieOutput];
We’ll use the captureMovieURL later…
Wednesday, October 20, 2010
49. Start capturing
[captureSession startRunning];
recordButton.selected = YES;
if ([[NSFileManager defaultManager]
fileExistsAtPath:captureMoviePath]) {
! [[NSFileManager defaultManager]
removeItemAtPath:captureMoviePath error:nil];
}
// note: must have a delegate
[captureMovieOutput
startRecordingToOutputFileURL:captureMovieURL
! ! ! ! ! ! ! ! recordingDelegate:self];
Wednesday, October 20, 2010
51. Demo
VTM_AVRecPlay
Wednesday, October 20, 2010
52. More fun with capture
✤ Can analyze video data coming off the camera with the
AVCaptureVideoDataOutput class
✤ Can provide uncompressed frames to your
AVCaptureVideoDataOutputSampleBufferDelegate
✤ The callback provides you with a CMSampleBufferRef
Wednesday, October 20, 2010
54. Core Media
✤ C-based framework containing structures that represent media
samples and media timing
✤ Opaque types: CMBlockBuffer, CMBufferQueue,
CMFormatDescription, CMSampleBuffer, CMTime, CMTimeRange
✤ Handful of convenience functions to work with these
✤ Buffer types provide wrappers around possibly-fragmented memory,
time types provide timing at arbitrary precision
Wednesday, October 20, 2010
56. Video Editing? On iPhone?
Really?
1999: 2010:
Power Mac G4 500 AGP iPhone 4
CPU: 500 MHz G4 CPU: 800 MHz Apple A4
RAM: 256 MB RAM: 512 MB
Storage: 20 GB HDD Storage: 16 GB Flash
Comparison specs from everymac.com
Wednesday, October 20, 2010
57. AVComposition
✤ An AVAsset that gets its tracks from multiple file-based sources
✤ To create a movie, you typically use an AVMutableComposition
composition = [[AVMutableComposition alloc] init];
Wednesday, October 20, 2010
59. CMTime
✤ CMTime contains a value and a timescale (similar to QuickTime)
✤ Time scale is how the time is measured: “nths of a second”
✤ Time in seconds = value / timescale
✤ Allows for exact timing of any kind of media
✤ Different tracks of an asset can and will have different timescales
✤ Convert with CMTimeConvertScale()
Wednesday, October 20, 2010
61. Export
✤ Create an AVAssetExportSession
✤ Must set outputURL and outputFileType properties
✤ Inspect possible types with supportedFileTypes property (list of
AVFileType… strings in docs)
✤ Begin export with exportAsynchronouslyWithCompletionHandler:
✤ This takes a block, which will be called on completion, failure,
cancellation, etc.
Wednesday, October 20, 2010
64. Effects 1
✤ AVAudioMix, AVMutableAudioMix: set volumes or audio ramps at
specific times
✤ AVVideoCompositionInstructions: provide a set of layer-based
instructions for performing time-based opacity or affine transform
ramps
Wednesday, October 20, 2010
65. Effects 2
✤ AVSynchronizedLayer: CALayer that synchronizes with a
AVPlayerItem’s playback timing
✤ Use for overlays, titles, rendered images, Ken Burns effects, etc.
✤ Exporting any of these effects is tricky: have to create, configure, and
set AVAudioMix and AVVideoComposition properties on the exporter
Wednesday, October 20, 2010
66. Sample-level access
✤ AVAssetReader and AVAssetWriter introduced in iOS 4.1
✤ Similar to capture: you add AVAssetReaderOutputs and
AVAssetWriterInputs to the readers and writers, respectively
✤ Reads and writes use CMSampleBufferRef pointers
✤ You can re-encode media by setting up the AVAssetWriter with an
NSDictionary of output settings
✤ Read through AVAudioSettings.h and AVVideoSettings.h to find
the appropriate keys. Be prepared for lots of trial and error.
Wednesday, October 20, 2010
68. Interfaces to other iOS
frameworks
✤ MPMediaItem now has a MPMediaItemPropertyAssetURL that
allows you to open iPod Library songs as AVURLAssets
✤ ALAssetLibrary URLs can also be opened with AV Foundation
✤ Core Media has functions to convert CMSampleBuffers to and from
Core Audio AudioBufferLists
✤ Accelerate framework’s vDSP functions may be useful when
processing samples from AVCaptureDataOutput or AVAssetReader
Wednesday, October 20, 2010
69. You can’t help but speculate…
AVAssetExportSession.h:
extern NSString *const AVAssetExportPreset1280x720
__OSX_AVAILABLE_STARTING(__MAC_10_7,__IPHONE_4_0);
CMTime.h:
CM_EXPORT const CFStringRef kCMTimeScaleKey
__OSX_AVAILABLE_STARTING(__MAC_10_7,__IPHONE_4_0);
Wednesday, October 20, 2010
70. Q&A Time
✤ Cleaned-up code will be available on my blog:
✤ http://www.subfurther.com/blog
✤ invalidname [at] gmail.com
✤ @invalidname
✤ Check out the Rough Cut of my Core Audio book too.
✤ Get the WWDC 2010 sessions (there are 3 on AV Foundation!)
Wednesday, October 20, 2010