Soumettre la recherche
Mettre en ligne
Analog Digital Video
•
2 j'aime
•
1,084 vues
Yoss Cohen
Suivre
Analog Digital Video Course
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 325
Recommandé
Image compression .
Image compression .
Payal Vishwakarma
Raster graphics
Raster graphics
lenance
image enhancement
image enhancement
Rajendra Prasad
Multimedia color in image and video
Multimedia color in image and video
Mazin Alwaaly
Image restoration and degradation model
Image restoration and degradation model
AnupriyaDurai
Texture in image processing
Texture in image processing
Anna Aquarian
Image processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filters
Kuppusamy P
Digital Audio
Digital Audio
kavitha muneeshwaran
Contenu connexe
Tendances
Introduction to Image Compression
Introduction to Image Compression
Kalyan Acharjya
Spatial domain and filtering
Spatial domain and filtering
University of Potsdam
Image Processing Basics
Image Processing Basics
A B Shinde
Video compression
Video compression
nnmaurya
Image Enhancement in Spatial Domain
Image Enhancement in Spatial Domain
A B Shinde
Hdtv technology
Hdtv technology
Naveen Sihag
Image segmentation
Image segmentation
Deepak Kumar
Analog Video
Analog Video
Yoss Cohen
Multimedia tools (sound)
Multimedia tools (sound)
dhruv patel
Digital video
Digital video
lavanya marichamy
Data Redundacy
Data Redundacy
Poonam Seth
Presentation of Lossy compression
Presentation of Lossy compression
Omar Ghazi
Overview of Graphics System
Overview of Graphics System
PrathimaBaliga
JPEG Image Compression
JPEG Image Compression
Aishwarya K. M.
Audio compression
Audio compression
Miled Othmen
Color models in Digitel image processing
Color models in Digitel image processing
Aryan Shivhare
Digital Audio in Multimedia
Digital Audio in Multimedia
lalithambiga kamaraj
Compression: Images (JPEG)
Compression: Images (JPEG)
danishrafiq
MPEG video compression standard
MPEG video compression standard
anuragjagetiya
Image file formats
Image file formats
Bob Watson
Tendances
(20)
Introduction to Image Compression
Introduction to Image Compression
Spatial domain and filtering
Spatial domain and filtering
Image Processing Basics
Image Processing Basics
Video compression
Video compression
Image Enhancement in Spatial Domain
Image Enhancement in Spatial Domain
Hdtv technology
Hdtv technology
Image segmentation
Image segmentation
Analog Video
Analog Video
Multimedia tools (sound)
Multimedia tools (sound)
Digital video
Digital video
Data Redundacy
Data Redundacy
Presentation of Lossy compression
Presentation of Lossy compression
Overview of Graphics System
Overview of Graphics System
JPEG Image Compression
JPEG Image Compression
Audio compression
Audio compression
Color models in Digitel image processing
Color models in Digitel image processing
Digital Audio in Multimedia
Digital Audio in Multimedia
Compression: Images (JPEG)
Compression: Images (JPEG)
MPEG video compression standard
MPEG video compression standard
Image file formats
Image file formats
Similaire à Analog Digital Video
Video audio
Video audio
Tony Stark
Chapter four.pptx
Chapter four.pptx
TadeseBeyene
Chapter fourvvvvvvvbbhhgggghhhhhhheryuuuhh
Chapter fourvvvvvvvbbhhgggghhhhhhheryuuuhh
TadeseBeyene
simple video compression
simple video compression
LaLit DuBey
To Understand Video
To Understand Video
adil raja
Chapter 3- Media Representation and Formats.ppt
Chapter 3- Media Representation and Formats.ppt
VasanthiMuniasamy2
Scct2013 topic4 video
Scct2013 topic4 video
Anies Syahieda
MULTECH2 LESSON 5.pdf
MULTECH2 LESSON 5.pdf
RayCenteno1
Multimedia fundamental concepts in video
Multimedia fundamental concepts in video
Mazin Alwaaly
Unit ii mm_chap5_fundamentals concepts in video
Unit ii mm_chap5_fundamentals concepts in video
Eellekwameowusu
7
7
Mattithyahu
A beginners guide to video technology
A beginners guide to video technology
aealey
Digital tv with cas
Digital tv with cas
Rahul Giri
Image and Video Compression, A brief history - Wang.ppt
Image and Video Compression, A brief history - Wang.ppt
NeutronZion
Video
Video
Parag Gupta
Analog TV Systems/Digital TV Systems/3DTV
Analog TV Systems/Digital TV Systems/3DTV
Sumudu Wasantha
video compression
video compression
Aniruddh Tyagi
video compression
video compression
aniruddh Tyagi
video compression
video compression
aniruddh Tyagi
Enensys -Content Repurposing for Mobile TV Networks
Enensys -Content Repurposing for Mobile TV Networks
Sematron UK Ltd
Similaire à Analog Digital Video
(20)
Video audio
Video audio
Chapter four.pptx
Chapter four.pptx
Chapter fourvvvvvvvbbhhgggghhhhhhheryuuuhh
Chapter fourvvvvvvvbbhhgggghhhhhhheryuuuhh
simple video compression
simple video compression
To Understand Video
To Understand Video
Chapter 3- Media Representation and Formats.ppt
Chapter 3- Media Representation and Formats.ppt
Scct2013 topic4 video
Scct2013 topic4 video
MULTECH2 LESSON 5.pdf
MULTECH2 LESSON 5.pdf
Multimedia fundamental concepts in video
Multimedia fundamental concepts in video
Unit ii mm_chap5_fundamentals concepts in video
Unit ii mm_chap5_fundamentals concepts in video
7
7
A beginners guide to video technology
A beginners guide to video technology
Digital tv with cas
Digital tv with cas
Image and Video Compression, A brief history - Wang.ppt
Image and Video Compression, A brief history - Wang.ppt
Video
Video
Analog TV Systems/Digital TV Systems/3DTV
Analog TV Systems/Digital TV Systems/3DTV
video compression
video compression
video compression
video compression
video compression
video compression
Enensys -Content Repurposing for Mobile TV Networks
Enensys -Content Repurposing for Mobile TV Networks
Plus de Yoss Cohen
open platform for swarm training
open platform for swarm training
Yoss Cohen
Deep Learning - system view
Deep Learning - system view
Yoss Cohen
Dspip deep learning syllabus
Dspip deep learning syllabus
Yoss Cohen
IoT consideration selection
IoT consideration selection
Yoss Cohen
IoT evolution
IoT evolution
Yoss Cohen
Nvidia jetson nano bringup
Nvidia jetson nano bringup
Yoss Cohen
Autonomous car teleportation architecture
Autonomous car teleportation architecture
Yoss Cohen
Motion estimation overview
Motion estimation overview
Yoss Cohen
Computer Vision - Image Filters
Computer Vision - Image Filters
Yoss Cohen
Intro to machine learning with scikit learn
Intro to machine learning with scikit learn
Yoss Cohen
DASH and HTTP2.0
DASH and HTTP2.0
Yoss Cohen
HEVC Definitions and high-level syntax
HEVC Definitions and high-level syntax
Yoss Cohen
Introduction to HEVC
Introduction to HEVC
Yoss Cohen
FFMPEG on android
FFMPEG on android
Yoss Cohen
Hands-on Video Course - "RAW Video"
Hands-on Video Course - "RAW Video"
Yoss Cohen
Video quality testing
Video quality testing
Yoss Cohen
HEVC / H265 Hands-On course
HEVC / H265 Hands-On course
Yoss Cohen
Web video standards
Web video standards
Yoss Cohen
Product wise computer vision development
Product wise computer vision development
Yoss Cohen
3D Video Programming for Android
3D Video Programming for Android
Yoss Cohen
Plus de Yoss Cohen
(20)
open platform for swarm training
open platform for swarm training
Deep Learning - system view
Deep Learning - system view
Dspip deep learning syllabus
Dspip deep learning syllabus
IoT consideration selection
IoT consideration selection
IoT evolution
IoT evolution
Nvidia jetson nano bringup
Nvidia jetson nano bringup
Autonomous car teleportation architecture
Autonomous car teleportation architecture
Motion estimation overview
Motion estimation overview
Computer Vision - Image Filters
Computer Vision - Image Filters
Intro to machine learning with scikit learn
Intro to machine learning with scikit learn
DASH and HTTP2.0
DASH and HTTP2.0
HEVC Definitions and high-level syntax
HEVC Definitions and high-level syntax
Introduction to HEVC
Introduction to HEVC
FFMPEG on android
FFMPEG on android
Hands-on Video Course - "RAW Video"
Hands-on Video Course - "RAW Video"
Video quality testing
Video quality testing
HEVC / H265 Hands-On course
HEVC / H265 Hands-On course
Web video standards
Web video standards
Product wise computer vision development
Product wise computer vision development
3D Video Programming for Android
3D Video Programming for Android
Dernier
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Precisely
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
David Newbury
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
Daniel Santiago Silva Capera
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
Bachir Benyammi
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
Adtran
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
DianaGray10
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Aijun Zhang
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
infogdgmi
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
Liveplex
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
Seth Reyes
20150722 - AGV
20150722 - AGV
Jamie (Taka) Wang
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
Asko Soukka
Designing A Time bound resource download URL
Designing A Time bound resource download URL
Runcy Oommen
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
IES VE
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
DianaGray10
20230104 - machine vision
20230104 - machine vision
Jamie (Taka) Wang
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
D Cloud Solutions
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
Jamie (Taka) Wang
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
IES VE
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
GDSC PJATK
Dernier
(20)
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
20150722 - AGV
20150722 - AGV
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
Designing A Time bound resource download URL
Designing A Time bound resource download URL
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
20230104 - machine vision
20230104 - machine vision
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
Analog Digital Video
1.
Analog Digital Video
By: Yossi Cohen / DSP-IP Copyright © 2008 LOGTEL
2.
Course Content
Introduction to Video • Basic Concepts & Formats • Introduction to Multimedia coding • Lossy Compression • Basic Video CODEC • Standardization Landscape • Components • File Formats • AVI, MPEG4 FF, MKV • Codecs • H264, VP6, WMV / VC-1, VP8 Copyright © 2008 LOGTEL Yossi Cohen
3.
Course Content
• Delivery methods • RTP Streaming • Progressive Download • HTML5 Video • HTTP Streaming Copyright © 2008 LOGTEL Yossi Cohen
4.
Introduction to Video
By: Yossi Cohen / DSP-IP Copyright © 2008 LOGTEL
5.
Agenda
Basic Video Concepts Color Spaces Interlacing Video Connection(Component, S-Video) Image compression Introduction to video compression Copyright © 2008 LOGTEL Yossi Cohen
6.
4.2 Color Models
in Images Colors models and spaces used to store, display, and print images. RGB Color Model for CRT Displays We expect to be able to use 8 bits per color channel for color that is accurate enough. However, in fact we have to use about 12 bits per channel to avoid an aliasing effect in dark image areas — contour bands that result from gamma correction. For images produced from computer graphics, we store integers proportional to intensity in the frame buffer. So should have a gamma correction LUT between the frame buffer and the CRT. Copyright © 2008 LOGTEL Yossi Cohen
7.
Color matching
How can we compare colors so that the content creators and consumers know what they are seeing? Many different ways including CIE chromacity diagram Copyright © 2008 LOGTEL Yossi Cohen
8.
Video Color Transforms
Largely derived from older analog methods of coding color for TV. Luminance is separated from color information. YIQ is used to transmit TV signals in North America and Japan.This coding also makes its way into VHS video tape coding in these countries since video tape technologies also use YIQ. In Europe, video tape uses the PAL or SECAM codings, which are based on TV that uses a matrix transform called YUV. Finally, digital video mostly uses a matrix transform called YCbCr that is closely related to YUV Copyright © 2008 LOGTEL Yossi Cohen
9.
Color Models in
Video • Largely derive from older analog methods of coding color for TV. Luminance is separated from color information. • A matrix transform YIQ is used to transmit TV signals in North America and Japan. (NTSC) This coding also makes its way into VHS video tape coding in these countries since video tape technologies also use YIQ. • In Europe, video tape uses the PAL or SECAM codings, which are based on TV that uses a matrix transform called YUV. • Finally, digital video mostly uses a matrix transform called YCbCr that is closely related to YUV. Copyright © 2008 LOGTEL Yossi Cohen
10.
YUV Separation Copyright ©
2008 LOGTEL Yossi Cohen
11.
YUV Color Model
•YUV codes a luminance signal (for gamma-corrected signals) equal to Y , the “luma". •Chrominance refers to the difference between a color and a reference white at the same luminance. (U and V) The transform is: Copyright © 2008 LOGTEL Yossi Cohen
12.
RGB->YUV Color Transform
G G B B Y U V R R Copyright © 2008 LOGTEL Yossi Cohen
13.
YIQ Color Model
YIQ is used in NTSC color TV broadcasting. Again, gray pixels generate zero (I;Q) chrominance signal. I and Q are a rotated version of U and V . The transform is: Copyright © 2008 LOGTEL Yossi Cohen
14.
YCbCr Color Model
1. The Rec. 601 standard for digital video uses another color space YCbCr which closely related to the YUV transform. 2. The YCbCr transform is used in JPEG image compression and MPEG video compression. For 8-bit coding: Copyright © 2008 LOGTEL Yossi Cohen
15.
VIDEO CONNECTION TYPES
• Component Video • Composite Video • S-Video Copyright © 2008 LOGTEL Yossi Cohen
16.
Component Video
High-end solution, use of three separate video signals for R,G,B planes. Each color channel is sent as a separate video signal. (a) Most computer systems use Component Video, with separate signals for R, G, and B signals. (b) Provides the best color reproduction since there is no “crosstalk“ between the three channels. (c) Component video, requires more bandwidth and good synchronization of the three components than composite/S-Video . Copyright © 2008 LOGTEL Yossi Cohen
17.
Composite Video
• color (“chrominance") and intensity (“luminance") signals are mixed into a single carrier wave. a) Chrominance is a composition of two color components (I and Q, or U and V). b) In NTSC TV, e.g., I and Q are combined into a chroma signal, and a color subcarrier is then employed to put the chroma signal at the high- frequency end of the signal shared with the luminance signal. c) The chrominance and luminance components can be separated at the receiver end and then the two color components can be further recovered. Copyright © 2008 LOGTEL Yossi Cohen
18.
Composite Video
d) When connecting to TVs or VCRs, Composite Video uses only one wire and video color signals are mixed, not sent separately. The audio and sync signals are additions to this one signal. Since color and intensity are wrapped into the same signal, some interference between the luminance and chrominance signals is inevitable. Copyright © 2008 LOGTEL Yossi Cohen
19.
S-Video
Uses two wires, one for luminance and another for a composite chrominance signal. less crosstalk between the color information and the gray- scale information. In fact, humans are able to differentiate spatial resolution in grayscale images with a much higher acuity than for the color part of color images. As a result, we can reduce color information since we can only see fairly large blobs of color, so it makes sense to send less color detail. Copyright © 2008 LOGTEL Yossi Cohen
20.
VIDEO SCANNING
•Interlacing •De-Interlacing Copyright © 2008 LOGTEL Yossi Cohen
21.
Analog Video Scanning
Process An analog signal f(t) samples a time-varying image. So- called “progressive" scanning traces through a complete picture (a frame) row-wise for each time interval. In TV, and in some monitors and multimedia standards as well, another system, called “interlaced" scanning is used: a) The odd-numbered lines are traced first, and then the even-numbered lines are traced. This results in “odd" and “even" fields | two fields make up one frame. b) In fact, the odd lines (starting from 1) end up at the middle of a line at the end of the odd field, and the even scan starts at a half-way point. Copyright © 2008 LOGTEL Yossi Cohen
22.
Q
R : horizontal Trace. V P : vertical trace Copyright © 2008 LOGTEL Yossi Cohen
23.
Interlacing effects
• Because of interlacing, the odd and even lines are displaced in time from each other | generally not noticeable except when very fast action is taking place on screen, when blurring may occur. • For example, in the video in Fig. 5.2, the moving helicopter is blurred more than is the still background. Copyright © 2008 LOGTEL Yossi Cohen
24.
Interlaced and de-Interlace
images Copyright © 2008 LOGTEL Yossi Cohen
25.
de-Interlace
Since it is sometimes necessary to change the frame rate, resize, or even produce stills from an interlaced source video, various schemes are used to “de-interlace" it. a) The simplest de-interlacing method consists of discarding one field and duplicating the scan lines of the other field. The information in one field is lost completely using this simple technique. b) b) Other more complicated methods that retain information from both fields are also possible. Analog video use a small voltage offset from zero to indicate “black", and another value such as zero to indicate the start of a line. For example, we could use a blacker- than-black“ zero signal to indicate the beginning of a line. Copyright © 2008 LOGTEL Yossi Cohen
26.
NTSC Video
NTSC NTSC (National Television System Committee) TV standard is mostly used in North America and Japan. It uses the familiar 4:3 aspect ratio (i.e., the ratio of picture width to its height) and uses 525 scan lines per frame at 30 frames per second (fps). a) NTSC follows the interlaced scanning system, and each frame is divided into two fields, with 262.5 lines/field. b) Thus the horizontal sweep frequency is 525 X 29.97 =15, 734 lines/sec, so that each line is swept out in 63.6 u second. c) Since the horizontal retrace takes 10.9 u sec, this leaves 52.7 sec for the active line signal during which image data is displayed (see Fig.5.3). Copyright © 2008 LOGTEL Yossi Cohen
27.
NTSC
NTSC video is an analog signal with no fixed horizontal resolution. Therefore one must decide how many times to sample the signal for display: each sample corresponds to one pixel output. A “pixel clock" is used to divide each horizontal line of video into samples. The higher the frequency of the pixel clock, the more samples per line there are. Different video formats provide dierent numbers of samples per line, as listed in Table 5.1. Copyright © 2008 LOGTEL Yossi Cohen
28.
NTSC Copyright © 2008
LOGTEL Yossi Cohen
29.
NTSC Color Modulation
NTSC uses the YIQ color model, and the technique of quadrature modulation is employed to combine (the spectrally overlapped part of) I (in- phase) and Q (quadrature) signals into a single chroma signal C: C = I cos(Fsct) + Qsin(Fsct) (5:1) This modulated chroma signal is also known as the color subcarrier, whose magnitude is qI2 +Q2, and phase is arctan(Q/I). The frequency of C is Fsc 3:58 MHz. The NTSC composite signal is a further composition of the luminance signal Y and the chroma signal as defined below: composite = Y +C = Y +I cos(Fsct) + Qsin(Fsct) (5:2) Copyright © 2008 LOGTEL Yossi Cohen
30.
PAL
PAL (Phase Alternating Line) is a TV standard widely used in Western Europe, China, India, and many other parts of the world. PAL uses 625 scan lines per frame, at 25 frames/second, with a 4:3 aspect ratio and interlaced fields. (a) PAL uses the YUV color model. It uses an 8 MHz channel and allocates a bandwidth of 5.5 MHz to Y, and 1.8 MHz each to U and V. The color subcarrier frequency is fsc 4:43 MHz. (b) In order to improve picture quality, chroma signals have alternate signs (e.g., +U and -U) in successive scan lines, hence the name “Phase Alternating Line". Copyright © 2008 LOGTEL Yossi Cohen
31.
PAL
(c) This facilitates the use of a (line rate) comb filter at the receiver| the signals in consecutive lines are averaged so as to cancel the chroma signals (that always carry opposite signs) for separating Y and C and obtaining high quality Y signals. Copyright © 2008 LOGTEL Yossi Cohen
32.
Video Worlds
Intro to Media Coding Image and Video Speech Audio Copyright © 2008 LOGTEL
33.
Compression
Compression – Representing information by less bit than the original information Lossless Compression – Original information and compressed information are identical. example LZ, TAR and other compression techniques. Lossy Compression – Compressed info is not the same as uncompressed info. Example: MP3, JPEG etc Lossy compression is often MODEL Based Compression Copyright © 2008 LOGTEL Yossi Cohen
34.
Compression terms
Encoder – Module which compress the information Decoder – Module which decompress the information CODEC – (en)CODer / DEcoder Channel – the medium which the information is passed through for example ADSL line or disk Decoder Encoder Channel Disk Copyright © 2008 LOGTEL Yossi Cohen
35.
Model Based Compression
Pre Processing Losless Compression Model Quantize / Entropy Based Prioritize Reorder Coding Transform Bit rate control Copyright © 2008 LOGTEL Yossi Cohen
36.
Human Visual System
The human eye has two basic light receptors: Rods – Light Intensity receptors Cons – Colored light receptors Copyright © 2008 LOGTEL Yossi Cohen
37.
The Human Eye
Rods Concentration >> Cons Concentration Green Discrimination << Red, Blue Discrimination Low Frequency > High Frequency Copyright © 2008 LOGTEL Yossi Cohen
38.
Image Coding Model
Based transformations RGB (3 equally quantized colors) -> YUV (Light Intensity + two color channels) Pixel based domain -> Frequency domain Copyright © 2008 LOGTEL Yossi Cohen
39.
Speech coding
In speech coding, the vocal tract is used as a model: Copyright © 2008 LOGTEL Yossi Cohen
40.
Audio / Music
Coding In general Audio Coding, the ear is used as a model: Frequencies -> Frequency bands Masking and Temporal Masking are used Copyright © 2008 LOGTEL Yossi Cohen
41.
Basic Image and
Video coding Definitions Where to lose information: color & frequency Copyright © 2008 LOGTEL
42.
What is a
digital image? Audio PCM One 1-D array of sample BMP Image Three 2-D arrays of numbers representing Red, Green and Blue values Copyright © 2008 LOGTEL Yossi Cohen
43.
Image Compression? Why?
Image size = 720*580 3 Image Layers RGB =720*580*3 8 Bits per pixel 720*580*3*8 = 10022400 bits Lots of bits for one Lena Copyright © 2008 LOGTEL Yossi Cohen
44.
IMAGE COMPRESSION Copyright ©
2008 LOGTEL Yossi Cohen
45.
Color based decimation
Our eyes have better resolution and scaling for luminance then for color. Compress color by using 4:2:0 method Copyright © 2008 LOGTEL Yossi Cohen
46.
Counting the bits
How much can we save by color compression? 3*Image size in RGB 24 bit color representation. 1 + 2*1/4 Image size in 4:2:0 YUV representation. Compression ratio is 2 !! Actual saving is bigger due to different Y and UV quantization. Copyright © 2008 LOGTEL Yossi Cohen
47.
Linear Transform
If the signal is formatted as a Energy compaction property: vector, a linear transform can The transformed signal vector be formulated as a matrix- has few, large coefficients and vector product that transform many nearly zero small the signal into a different coefficients. These few large domain. coefficients can be encoded Examples: efficiently with few bits while K-L Transform retaining the majority of energy of the original signal. Discrete Fourier Transform Discrete cosine transform Discrete wavelet transform Copyright © 2008 LOGTEL Yossi Cohen
48.
Block-based Image Coding
Block-based image Advantages: coding scheme: Parallel processing partitions the entire can be applied to image into 8 by 8 or process individual blocks in parallel. 16 by 16 (or other Redundant information size) blocks. in close proximity (like The coding algorithm cache) is applied to individual blocks independently. Copyright © 2008 LOGTEL Yossi Cohen
49.
Transform - DCT
The DCT transform the data from pixel intensity to frequency intensity. Low frequency are important high frequency less 1 7 7 (2m + 1)uπ (2n + 1)vπ 4 ∑∑ F (u , v) cos cos m = n = 0; u =0 v =0 16 16 f (m, n) = 7 7 1 (2m + 1)uπ (2n + 1)vπ 8 ∑∑ F (u, v) cos cos 0 ≤ m, n ≤ 7; m + n > 0. u = v =0 (You’ll0 get launch even if you 1616 don’t remember the IDCT formula above) Copyright © 2008 LOGTEL Yossi Cohen
50.
DCT Coefficients Quantization Copyright
© 2008 LOGTEL Yossi Cohen
51.
AC Coefficients
AC coefficients are first weighted with a quantization 1 2 6 7 15 16 28 29 matrix: 3 5 8 14 17 27 30 43 C(i,j)/q(i,j) = Cq(i,j) 4 9 13 18 26 31 42 44 Then quantized. 10 12 19 25 32 41 45 54 Then they are scanned in a 11 20 24 33 40 46 53 55 zig-zag order into a 1D 21 23 34 39 47 52 56 61 sequence to be subject to AC 22 35 38 48 51 57 60 62 Huffman encoding. 36 37 49 50 58 59 63 64 Question: Given a 8 by 8 array, how to convert it into a Zig-Zag scan order vector according to the zig- zag scan order? What is the algorithm? Copyright © 2008 LOGTEL Yossi Cohen
52.
DCT Basis Functions Copyright
© 2008 LOGTEL Yossi Cohen
53.
DCT compression Example
Original Image Copyright © 2008 LOGTEL Yossi Cohen
54.
DCT 1 coefficient Copyright
© 2008 LOGTEL Yossi Cohen
55.
DCT 6 coefficients Copyright
© 2008 LOGTEL Yossi Cohen
56.
DCT 20 coefficient Copyright
© 2008 LOGTEL Yossi Cohen
57.
JPEG Image Coding
Algorithms Quantization DC 8x8 Matrix DC DPCM Huffman block DCT Q Zig Zag AC AC Scan Huffman Code books JPEG Encoding Process Copyright © 2008 LOGTEL Yossi Cohen
58.
Generalization of JPEG
Coding Transform Entropy Color, Frequency Quantize Reorder Coding JPEG Encoding Process Copyright © 2008 LOGTEL Yossi Cohen
59.
Video Coding Basics
By: Yossi Cohen Copyright © 2008 LOGTEL
60.
Video Coding
Video coding is often implemented as encoding a sequence of images.Motion compensation is used to exploit temporal redundancy between successive frames. Examples: MPEG-I, MPEG-II, MPEG-IV, H.263, H.263+, H264 Existing video coding standards are based on JPEG image compression as well as motion compensation. Copyright © 2008 LOGTEL Yossi Cohen
61.
Video Coding Standardization
Scope Only restrictions on the Bitstream, Syntax, and Decoder are standardized: Permits the optimization of encoding Permits complexity reduction Provides no guarantees on quality Copyright © 2008 LOGTEL Yossi Cohen
62.
Video Encoding
Buffer control Current frame x(t) r Bit stream + DCT Q VLC Buffer − Q-1 This is a simplified block diagram where the encoding of intra coded IDCT frames is not shown. Xp(t): predicted ^ r(t): reconstructed residue frame + ^ x(t): reconstructed Motion ^x(t-1) current frame x(t) Frame Estimation & Compensation Buffer Motion vectors Copyright © 2008 LOGTEL Yossi Cohen
63.
Video Encoding Color
Frequency Transform Buffer control Transform + Q Reorder Entropy − Q-1 This is a simplified block diagram where the encoding of intra coded Tf-1 frames is not shown. Xp(t): predicted ^ r(t): reconstructed residue frame + ^ x(t): reconstructed Motion ^x(t-1) current frame x(t) Frame Estimation & Compensation Buffer Motion vectors Copyright © 2008 LOGTEL Yossi Cohen
64.
Forward Motion Estimation
1 2 3 4 1 2 4 3 5 6 7 8 5 7 8 6 9 10 11 12 9 11 12 10 13 15 16 13 14 15 16 14 Current frame constructed From different parts of reference frame Reference frame Copyright © 2008 LOGTEL Yossi Cohen
65.
Video sequence :
Tennis frame 0, 1 previous frame current frame 50 50 100 100 150 150 200 200 50 100 150 200 250 300 350 50 100 150 200 250 300 350 Copyright © 2008 LOGTEL Yossi Cohen
66.
Frame Difference
Frame Difference :frame 0 and 1 Copyright © 2008 LOGTEL Yossi Cohen
67.
What is motion
estimation? Motion Vector Field of frame 1 50 0 -50 -100 -150 -200 -250 0 50 100 150 200 250 300 350 400 Copyright © 2008 LOGTEL Yossi Cohen
68.
What is motion
compensation ? Motion compensated frame 50 100 150 200 50 100 150 200 250 300 350 Copyright © 2008 LOGTEL Yossi Cohen
69.
Motion Compensated Frame
Difference Motion Compensated Frame Difference :frame 0 and 1 Frame Difference :frame 0 and 1 Copyright © 2008 LOGTEL Yossi Cohen
70.
Video Worlds
Video Structures Copyright © 2008 LOGTEL
71.
Frame Types
Three types of frames: Intra (I): the frame is coded as if it is an image Predicted (P): predicted from an I or P frame Bi-directional (B): forward and backward predicted from a pair of I or P frames. A typical frame arrangement is: I1 B1 B2 P1 B3 B4 P2 B5 B6 I2 P1, P2 are both forward-predicted from I1. B1, B2 are interpolated from I1 and P1, B3, B4 are interpolated from P1, P2, and B5, B6 are interpolated from P2, I2. New Coding standards added other frame types: SP, SI, D Copyright © 2008 LOGTEL Yossi Cohen
72.
Macro-blocks and Blocks
Y(16x16) Cr (8x8) RGB Cb (8x8) 16x16x3 Copyright © 2008 LOGTEL Yossi Cohen
73.
VIDEO CODING STANDARDS Copyright
© 2008 LOGTEL Yossi Cohen
74.
Chronological evolution of
Video Coding Standards ITU-T H.263 H.263++ VCEG (1995/96) H.263+ (2000) H.261 (1997/98) H.264 (1990) MPEG-2 ( MPEG-4 (H.262) Part 10 ) (1994/95) MPEG-4 v1 (2002) ISO/IEC (1998/99) MPEG MPEG-4 v2 MPEG-1 (1999/00) MPEG-4 v3 (1993) (2001) 1990 1992 1994 1996 1998 2000 2002 2003 Copyright © 2008 LOGTEL Yossi Cohen
75.
ITU Standards
H261 Early standard Compressed data rate, n*64 Kbps (was created for ISDN connections, remember it’s an ITU standard) Resolution QCIF 176x144,CIF 352x288 H263 Supports a wider range of bit-rates <64Kbs and up Error recovery and performance improvements over h.261 Resolution SQCIF, QCIF, CIF, 4CIF 704x576, 16CIF 1408x115 www.dsp-ip.com Copyright © 2008 LOGTEL Yossi Cohen
76.
ITU Standards
H264 Improved H263 Arithmetic coding Dynamic block size (not only 8x8) (Much) Better results then MPEG4-2 Tradeoff – computational overhead. www.dsp-ip.com Copyright © 2008 LOGTEL Yossi Cohen
77.
ITU Standards
ITU standard evolution over the years H261 H262 MPEG2 What’s next? H263 H264 www.dsp-ip.com Copyright © 2008 LOGTEL Yossi Cohen
78.
ISO MPEG Standards
MPEG-1: CD Compression (X1) MPEG-2: Television Broadcast quality MPEG-4: Multimedia & Systems standard MPEG-7: Meta-Data description MPEG-21: Standard for the creation, distribution and consumption of Multimedia (mainly DRM, IPMP). www.dsp-ip.com Copyright © 2008 LOGTEL Yossi Cohen
79.
Data virtualization in
ISO standards The evolution of standards from pixel description to object description manipulation and right in ISO standards Object Rights MPEG-21 Object Descriptors MPEG-7 Object coding MPEG-4 Image Coding MPEG-1/2 www.dsp-ip.com Copyright © 2008 LOGTEL Yossi Cohen
80.
MPEG-1
A standard for storage and retrieval of audio and video, (1992) Up to 1.5 Mbps P-frame, Predictive-coded frames requires info from previous I or P frames B-frames, Bi-directionally predictive coded frames requires previous and following frames D-frame, DC-coded frames Consists of lowest frequency of an image Used for fast forward and fast reverse modes Copyright © 2008 LOGTEL Yossi Cohen
81.
MPEG-2
A standard for high-quality video and digital television, (1994) 2-100 Mbps Coding similar to MPEG-1 Several profiles and levels for different resolutions and qualities Enhanced audio, (multiple channels) Copyright © 2008 LOGTEL Yossi Cohen
82.
MPEG-4
Designed for multimedia, (v1 Oct.1998) Coding of both natural and synthetic audio- visual data Improved efficiency, (object based) Error robustness Many more MM features Copyright © 2008 LOGTEL Yossi Cohen
83.
Why ISO adopted
ITU technology Comparison of compression formats 38 CIF 30Hz 37 36 35 34 33 Quality 32 Y-PSNR [dB] 31 30 29 28 JVT/H.26L 27 MPEG-4 26 MPEG-2 25 H.263 0 500 1000 1500 2000 2500 3000 3500 Bit-rate [kbit/s] Copyright © 2008 LOGTEL Yossi Cohen
84.
MPEG-2 STANDARD Copyright ©
2008 LOGTEL Yossi Cohen
85.
MPEG History
Moving Picture Experts Group was founded in January 1988 by Leonardo Chiariglione together with around 15 experts in compression technology Creator of numerous standards like MPEG-1, MPEG- 2, MPEG-4, MPEG-7, MPEG-21 etc. The Group has not limited it’s scope to only “pictures” – sound wasn’t forgot (e.g. MPEG-1 Layer3) The industry adopted fast the MPEG standard (Philips, Samsung, Intel, Sony etc) MPEG has given birth to a number of technologies we take now for granted: DVD and Digital TV (MPEG-2), MP3 (MPEG-1 L3) Copyright © 2008 LOGTEL Yossi Cohen
86.
MPEG-2
In 1994, MPEG has published the ISO/IEC- 13818, also known as MPEG-2 MPEG-2 was the standard adopted by DVD and Digital TV MPEG2 is designed for video compression between 1.5 and 15 Mbps for SD MPEG-2 streams come in 2 forms: Program Stream and Transport Stream Copyright © 2008 LOGTEL Yossi Cohen
87.
The MPEG Standard Copyright
© 2008 LOGTEL Yossi Cohen
88.
MPEG2- Systems
Define Storage Transport Control of MPEG2 streams Copyright © 2008 LOGTEL Yossi Cohen Yossi Cohen DSP-IP
89.
Model for MPEG-2
Systems Copyright © 2008 LOGTEL Yossi Cohen Yossi Cohen DSP-IP
90.
MPEG-2 Program Stream
Similar to MPEG-1 Systems Multiplex Combines one or more Packetised Elementary Streams (PES), which have a common time- base, into a single stream Designed for use in relatively error-free environments and suitable for applications which may involve software processing Program stream packets may be of variable and relatively great length Variable length / Error free what's the connection? Copyright © 2008 LOGTEL Yossi Cohen
91.
MPEG-2 Transport Stream
Combines one or more Packetized Elementary Streams (PES) with one or more independent time bases into a single stream (sometimes called multiplex) Elementary streams sharing a common time- base form a program Designed for use in environments where errors are likely, such as storage or transmission in lossy or noisy media The transport stream is made of packets with fixed length of 188 bytes – Why? What is the header overhead in 188 bytes packet? Copyright © 2008 LOGTEL Yossi Cohen
92.
MPEG2 AAC Copyright ©
2008 LOGTEL Yossi Cohen
93.
MPEG2 Audio (AAC) Copyright
© 2008 LOGTEL Yossi Cohen
94.
MPEG-2 Audio
Backwards compatible - defines extensions: MultiChannel coding 5 channel audio (L, R, C, LS, RS) Multilingual coding 7 multilingual channels Lower sampling frequencies (LSF) Optional Low Frequency Enhancement (LFE) - Bass Copyright © 2008 LOGTEL Yossi Cohen
95.
Media Delivery Components
File Format / Container Codec Delivery Protocols Copyright © 2008 LOGTEL
96.
File Formats
Movie (meta-data) Video track trak moov Audio track trak Media Data sample sample sample sample mdat frame frame Copyright © 2008 LOGTEL
97.
Agenda
Intro to file formats Second Generation formats RIFF: AVI, WAV Third Generation Containers MPEG4 FF MKV Copyright © 2008 LOGTEL Yossi Cohen
98.
File Format Segmentation
File Formats 3rd 2nd 1st Generation Generation Generation Object Media Raw / XML Based Based Muxer Proprietary Copyright © 2008 LOGTEL Yossi Cohen
99.
2ND GENERATION FILE
FORMATS Copyright © 2008 LOGTEL Yossi Cohen
100.
2ND Generation Files
features Multiple media track in the same file Identification of codec Usually by FourCC Interleaving Copyright © 2008 LOGTEL Yossi Cohen
101.
2nd Generation File
Formats 2nd Generation FF RIFF ASF MPEG2 FLV MP2PS WAV AVI WMA WMV MP2TS VOB Copyright © 2008 LOGTEL Yossi Cohen
102.
AVI FILE FORMAT Copyright
© 2008 LOGTEL Yossi Cohen
103.
AVI Overview
AVI files use the AVI RIFF format (like WAV) Introduced by Microsoft on 1992 File is divided into: Streams – Audio, Video, Subtitles Blocks “Chunks” - Copyright © 2008 LOGTEL Yossi Cohen
104.
Blocks / Chunks
A RIFF File logical unit Chunks are identified by four letters (FOUR-CC) RIFF file has two mandatory sub-chunks and one optional sub-chunk Mandatory Chunks: RIFF ('AVI ' LIST ('hdrl‘ hdrl – File header 'avih'(<Main AVI Header>) movi - Media Data LIST ('strl’ ... ) . . . ) LIST ('movi‘ . . . ) Optional Chunk ['idx1 ['idx1'<AVI Index>] idx1 - Index ) *This order is fixed Copyright © 2008 LOGTEL Yossi Cohen
105.
AVI main header
RIFF 'AVI ' - Identifies the file as RIFF file. LIST 'hdrl' - Identifies a chunk containing sub- chunks that define the format of the data. 'avih' - Identifies a chunk containing general information about the file. Includes: dwMicrosecPerFrame - Time between frames dwMaxBytesPerSec – number of bytes per second the player should handle dwReserved1 - Reserved dwFlags - Contains any flags for the file. Copyright © 2008 LOGTEL Yossi Cohen
106.
Example - headers
Avi file header Initial frame chunk ID chunk size format chunk ID Data rate flages Time between streams frames Total no. of frames Frame Stream header width 320 Frame height reserved Size of padding Junk chunk identifier Copyright © 2008 LOGTEL Yossi Cohen
107.
Example – data
chunks Audio data chunk (stream 01) video data chunk (stream 00) Copyright © 2008 LOGTEL Yossi Cohen
108.
AVI Summary
Advantages Includes both audio and video Index-able Disadvantage Not suited for progressive DW Very rigid format Insufficient support for: seeking, metadata multi- reference frames Copyright © 2008 LOGTEL Yossi Cohen
109.
3RD GENERATION FILE
FORMATS Copyright © 2008 LOGTEL Yossi Cohen
110.
Why “Fix it”?
2nd Generation Formats are missing: Metadata Separate from Media Info on angle, language, Synchronization Versioning Better Streaming Support Reduce CPU per stream Better seeking support Better parsing XML Atom Based Copyright © 2008 LOGTEL Yossi Cohen
111.
Main Attributes
File format is not just a Video / Audio multiplexer Separation between Media – Audio, Video, Images, Subtitles Metadata – Indexing, frame length, Tags Copyright © 2008 LOGTEL Yossi Cohen
112.
3rd Generation File
Formats 3rd Generation XML Based Object Based Matruska (MKV) MOV MPEG4 FF Fragmented 3GPP FF MPEG4 FF Copyright © 2008 LOGTEL Yossi Cohen
113.
MPEG4 FILE FORMAT Copyright
© 2008 LOGTEL Yossi Cohen
114.
MP4 File Format
File Structuring Concepts Separate the media data from descriptive (meta) data. Support the use of multiple files. Support for hint tracks: support of real time streaming over any protocol Copyright © 2008 LOGTEL Yossi Cohen
115.
Separate Metadata and
Media Key meta-information is compact The type of media present Time-scales Timing Synchronization points etc. Enables Random access Inspection, composition, editing etc. Simplified update Copyright © 2008 LOGTEL Yossi Cohen
116.
Multiple file support
Use URLs to ‘point to’ media Distinct from URLs in MPEG-4 Systems URLs use file-access service e.g. file://, http://, ftp:// etc. Permits assembly of composition without requiring data-copy Referenced files contain only media Meta-data all in ‘main’ file Copyright © 2008 LOGTEL Yossi Cohen
117.
Logical File Structure
Presentation (‘movie’) contains Tracks which contain Samples Copyright © 2008 LOGTEL Yossi Cohen
118.
Physical Structure—File
Succession of objects (atoms, boxes) Exactly one Meta-data object Zero or more media data object(s) Free space etc. Copyright © 2008 LOGTEL Yossi Cohen
119.
Example Layout
Movie (meta-data) Video track trak moov Audio track trak Media Data sample sample sample sample mdat frame frame Copyright © 2008 LOGTEL Yossi Cohen
120.
Meta-data tables
Sample Timing Sample Size and position Synchronization (random access) points, priority etc. Temporal/physical order de-coupled May be aligned for optimization Permits composition, editing, re-use etc. without re- write Tables are compacted Copyright © 2008 LOGTEL Yossi Cohen
121.
Multi-protocol Streaming support
Two kinds of track Media (Elementary Stream) Tracks Sample is Access Unit Protocol ‘hint’ tracks Sample tells server how to build protocol transmission unit (packet, protocol data unit etc.) Copyright © 2008 LOGTEL Yossi Cohen
122.
Track types
Visual—’description’ formats MPEG4 JPEG2000 Audio—’description’ formats MPEG4 compressed tracks ‘Raw’ (DV) audio Other MPEG-4 tracks Hint Tracks (streaming) Copyright © 2008 LOGTEL Yossi Cohen
123.
Track Structure
Sample pointers (time, position) Sample description(s) Track references Dependencies, hint-media links Edit lists Re-use, time-shifting, ‘silent’ intervals etc. Copyright © 2008 LOGTEL Yossi Cohen
124.
Hint Tracks
May include media (ES) data by ref. Only ‘extra’ protocol headers etc. added to hint tracks — compact Make SL, RTP headers as needed May multiplex data from several tracks Packetization/fragmentation/multiplex through hint structures Timing is derived from media timing Copyright © 2008 LOGTEL Yossi Cohen
125.
Hint track structure
Movie (meta-data) Video track trak moov Hint track trak Sample Data sample sample hint sample hint sample mdat header header frame frame pointer pointer Copyright © 2008 LOGTEL Yossi Cohen
126.
Extensibility
Other media types. Non-sc29 sample descriptions (e.G. Other video). Non-sc29 track types (e.G. Laboratory instrument trace). Copyright notice (file or track level) etc. General object extensions (GUIDs). Copyright © 2008 LOGTEL Yossi Cohen
127.
Advantages
Compatibility files can be played by other companies players. Real Player with envivo plug-in. Windows media player etc. Files can be streamed by other companies streaming server Darwin Streaming Server. Quick Time Streaming Server. Copyright © 2008 LOGTEL Yossi Cohen
128.
Single File-Multiple data
types No need to do an export process for files, one file type is used for storage of video, audio, events, continues telemetry data from sensors and JPEG images in one file. Audio Métadonnées Video JPEG1 JPEG1 Sensor Continues data events Copyright © 2008 LOGTEL Yossi Cohen
129.
Single file playback
All video track of a site could be stored in one file. In order to view many cameras in a synchronized manner the MPEG-4 file format can hold all the views of multiple cameras in one file. Audio Métadonnées Video cam 1 Video cam 2 Video cam ……. Video cam N Copyright © 2008 LOGTEL Yossi Cohen
130.
Skimming
Skimming – shortening a long movie to its interesting points, much like creating a “promo”. For example skimming a surveillance movie of two hours to 2 minutes where there is movement and people are entering the building. MPEG-4 FF enables the creation of skims within the file through the use of edit-list (part of the standard) without overhead. Copyright © 2008 LOGTEL Yossi Cohen
131.
MKV FILE FORMAT
XML Based File-Format Copyright © 2008 LOGTEL Yossi Cohen
132.
MKV - File
Format Container file format for videos, audio tracks, pictures and subtitles all in one file. Announced on Dec. 2002 by Steve Lhomme. Based on Binary XML format called EBML (Extensible Binary Meta Language) Complete Open-Standard format. (Free for personal use). Source is licensed under GNU L-GPL. Copyright © 2008 LOGTEL Yossi Cohen
133.
MKV - Specifications
Can contain chapter entries of video streams Allows fast in-file seeking. Metadata tags are fully supported. Multiple streams container in a single file. Modular – Can be expanded to company special needs. Can be streamed over HTTP, FTP, etc. Copyright © 2008 LOGTEL Yossi Cohen
134.
MKV Support software
& hardware Players: All Player, BS.Player, DivX Player, Gstreamer-Based players, VLC media, xine, Zoom Player, Mplayer, Media Player Classic, ShowTime, Media Player Classic and many more Media Centers: Boxee, DivX connected, Media Portal, PS3 Media Server, Moovida, XBMC etc. Blu-Ray Players: Samsung, LG and Oppo. Mobile Players: Archos 5 android device, Cowon A3 and O2. Copyright © 2008 LOGTEL Yossi Cohen
135.
MKV - EBML
in details A binary format for representing data in XML-like format. Using specific XML tags to define stream properties and data. MKV conforms to the rules of EBML by defining a set of tags. Segment , Info, Seek, Block, Slices etc. Uses 3 Lacing mechanisms for shortening small data block (usually frames). Uses: Xiph, EBML or fixed-sized lacing. Copyright © 2008 LOGTEL Yossi Cohen
136.
MKV – Simple
representation Type Description Header Version info, EBML type ( matroska in our case ). Meta Seek Optional, Allows fast seeking of other level 1 elements in file. Information Segment File information - title, unique file ID, part number, next file Information ID. Track Basic information about the track – resolution, sample rate, codec info. Chapters Predefines seek point in media. Clusters Video and audio frames for each track Cueing Data Stores cue points for each track. Allows fast in track seeking. Attachment Any other file relates to this. ( subtitles, Album covers, etc… ) Tagging Tags that relates to the file and for each track (similar to MP3 ID3 tags). Copyright © 2008 LOGTEL Yossi Cohen
137.
MKV – Streaming
Matroska supports two types of streaming. File Access Used for reading file locally or from remote web server. Prone to reading and seeking errors. Causes buffering issues on slow servers. Live Streaming Usually over HTTP or other TCP based protocol. Special streaming structure – no Meta seek, Cues, Chapters or attachments are allowed. Copyright © 2008 LOGTEL Yossi Cohen
138.
File Format Summary
- Trends Metadata is important Simple metadata or XML Separated from media Forward compatibility Not crash if don’t understand a data entry Progressive download oriented Multi-bitrate oriented Fragmentation -> Lower granularity Self contained File fragments CDN-ability Copyright © 2008 LOGTEL Yossi Cohen
139.
Video Codecs
Movie (meta-data) Video track trak moov Audio track trak Media Data sample sample sample sample mdat frame frame Copyright © 2008 LOGTEL Yossi Cohen
140.
Why Advance ?
MPEG2 Works . Coding efficiency Packetization Robustness Scalable profiles Internet requires Interaction Scalable & On demand Fast-Forward / Fast Rewind / Random Access Stream switching Multi Bitrate resolution /screen Copyright © 2008 LOGTEL Yossi Cohen
141.
Coding efficiency Motivation Copyright
© 2008 LOGTEL Yossi Cohen
142.
Codec discussion
Internet and video codec Standard codecs – MPEG4-2 and H.264 Non standard codecs Sorenson Spark VP6 WMV9 VC-1 VP8 Copyright © 2008 LOGTEL Yossi Cohen
143.
H.264 Copyright © 2008
LOGTEL Yossi Cohen
144.
H.264 Terminology
The following terms are used interchangeably: H.26L “JVT CODEC” The “AVC” or Advanced Video CODE Proper Terminology going forward: MPEG-4 Part 10 (Official MPEG Term) ISO/IEC 14496-10 AVC H.264 (Official ITU Term) Copyright © 2008 LOGTEL Yossi Cohen
145.
H264 Standard ideas
“Blocks” size fixed ->Variable Slice Block Block Size order/scanning –> different orders Zig-zag, Flexible Macroblock Order Additional spatial prediction - >Intra prediction Inter prediction 1 frame only ->Multiple frames P and B picture Multiple reference frame Copyright © 2008 LOGTEL Yossi Cohen
146.
H264 Standard Ideas
Pixel interpolation Motion vectors In-loop Deblocking filter Improved Entropy coding Copyright © 2008 LOGTEL Yossi Cohen
147.
New Features of
H.264 - summarized SP, SI - Additional picture types NAL (Network Abstraction Layer) CABAC - Additional entropy coding mode ¼ & 1/8-pixel motion vector precision In-loop de-blocking filter B-frame prediction weighting 4×4 integer transform Multi-mode intra-prediction NAL - Coding and transport layers separation FMO - Flexible MacroBlock ordering. Copyright © 2008 LOGTEL Yossi Cohen
148.
Block diagram Copyright ©
2008 LOGTEL Yossi Cohen
149.
Profiles and Levels
Profiles: Baseline, Main, and X Baseline: Progressive, Videoconferencing & Wireless Main: esp. Broadcast Extended: Mobile network Wireless <> Mobile Copyright © 2008 LOGTEL Yossi Cohen
150.
Copyright © 2008
LOGTEL Yossi Cohen
151.
Baseline Profile
Baseline profile is the minimum implementation No CABAC, 1/8 MC, B-frame, SP-slices 15 levels Resolution, capability, bit rate, buffer, reference # Built to match popular international production and emission formats From QCIF to D-Cinema Progressive (not interlaced) I and P slices types Copyright © 2008 LOGTEL Yossi Cohen
152.
Baseline Profile
1/4-sample Inter prediction Deblocking filter, Redundant slices VLC-based entropy coding (no CABAC) 4:2:0 chroma format Flexible Macroblock Ordering (FMO) Arbitrary Slice Order (ASO) Decoder process slices in an arbitrary order as they arrive to the decoder. The decoder dose not have a wait for all slices to be properly arranged before it starts processing them. Reduces the processing delay at the decoder. Copyright © 2008 LOGTEL Yossi Cohen
153.
Baseline Profile
FMO: Flexible Macroblock Ordering With FMO, macroblocks are coded according to a macroblock allocation map that groups, within a given slice. Macroblocks from spatially different locations in the frame. Enhances error resilience Redundant slices: allow the transmission of duplicate slices. Copyright © 2008 LOGTEL Yossi Cohen
154.
H.264 Profiles &
Levels - Main All Baseline features Plus Interlace B slice types (bi directional reference ) CABAC Weighted prediction All features included in the Baseline profile except: Arbitrary Slice Order (ASO) Flexible Macroblock Order (FMO) Redundant Slices Copyright © 2008 LOGTEL Yossi Cohen
155.
Main Profile
CABAC Good performance (bit rate reduction) by Selecting models by context Adapting estimates by local statistics Arithmetic coding reduces computational complexity Improve computational complexity more than 10%~20% of the total decoder execution time at medium bitrate Average bit-rate saving over CAVLC 10-15% Copyright © 2008 LOGTEL Yossi Cohen
156.
Extended Profile
All Baseline features plus Interlace B slice types Weighted prediction Copyright © 2008 LOGTEL Yossi Cohen
157.
Frame structure
Slices: A picture is split into 1 or several slices. Slices are self contained. Slices are a sequence of MB. MacroBlocks [MB] Basic syntax & processing unit. Contains 16x16 luma samples and 2 x 8x8 chroma samples. MB within a slice depend on each other. MB can be further partitioned. Copyright © 2008 LOGTEL Yossi Cohen
158.
Macroblock scanning Copyright ©
2008 LOGTEL Yossi Cohen
159.
Scanning order of
residual blocks For Intra 16x16 MB , block labeled -1 is transmitted first containing DC coeff. Luma residual blocks 0-15 are transmitted Block 16 & 17 contain a 2x2 array of chroma DC coeff. Chroma residual blocks 18-25 are sent Copyright © 2008 LOGTEL Yossi Cohen
160.
Variable block size Slices
A picture split into 1 or several slices Slices are a sequence of macroblocks Macroblock Contains 16x16 luminance samples and two 8x8 chrominance samples Macroblocks within a slices depend on each others Macroblocks can be further partitioned Slice 0 Slice 1 Slice 2 Copyright © 2008 LOGTEL Yossi Cohen
161.
Basic Marcoblock Coding
Structure Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform 16x16 pixels Entropy Coding De-blocking Intra-frame Filter Prediction Output Motion- Video Compensation Signal Intra/Inter Motion Data Motion Estimation Copyright © 2008 LOGTEL Yossi Cohen
162.
Motion Compensation
Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform 16x16 pixels Entropy Coding De-blocking 16x16 16x8 8x16 8x8 Intra-frame MBFilter 0 0 1 Prediction Types 0 0 1 Output1 2 3 Motion- Video Compensation 8x8 8x4 Signal 4x8 4x4 Intra/Inter 8x8 0 0 1 0 0 1 Types Motion 1 2 3 Data Motion Various block sizes and shapes Estimation Copyright © 2008 LOGTEL Yossi Cohen
163.
Tree structured Motion
Compensation Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform 16x16 pixels 16x16 16x8 8x16 Entropy 8x8 MB 0 Coding 0 1 Types 0 0 1 De-blocking 1 2 3 Intra-frame Filter 8x8 8x4 4x8 4x4 Prediction 0 0 1 8x8 0 Output 0 1 Motion- Types Video 1 2 3 Compensation Signal Intra/Inter Motion 5 013 46 Data7 Motion 2 8 Estimation Motion vector accuracy 1/4 (6-tap filter) Copyright © 2008 LOGTEL Yossi Cohen
164.
Variable block size
Block sizes of 0 0 1 16x8, 8x16, 8x8, 8x4 , 4X8 and 0 0 1 2 3 1 4X4 are available. Mode 1 Mode 2 Mode 3 Mode 4 1 16x16 block 2 16x8 blocks 2 8x16 blocks 4 8x8 blocks 0 1 0 1 2 3 Using seven different 0 1 2 3 2 3 4 5 6 7 block sizes can translate 4 5 8 9 1 1 into bit rate savings of 4 5 6 7 0 1 6 7 more than 15% as 1 2 1 3 1 4 1 5 compared to using only a Mode 5 Mode 6 16x16 block size. 8 8x4 blocks 8 4x8 blocks Mode 7 16 4x4 blocks Copyright © 2008 LOGTEL Yossi Cohen
165.
How to select
the partition size? The partition size that minimizes the coded residual and motion vectors Copyright © 2008 LOGTEL Yossi Cohen
166.
The Trade off
. Large partition size (e.g. 16x16,16x8, 8x16) requires small number of bits to signal the choice of motion vector and the partition type. However, the motion compensated residual may contain a significant amount of energy in frame areas with high details. Small partition size (e.g. 8x4, 4x4 etc) may give a lower energy residual after motion compensation but requires a large number of bits to signal the motion vectors and the choice of partition. The choice of partition size therefore has significant impact on compression performance. In general, a large partition size is appropriate for homogeneous areas of the frame and a small partition size may be beneficial for details area. Copyright © 2008 LOGTEL Yossi Cohen
167.
Interpolation
Quarter sample luma interpolation 2 steps: Applying a 6 tap filter with tap values: (1,-5,20,20,-5,1) Quarter sample positions are obtained by averaging samples at integer and half sample positions. b=round((E-5F+20G+20H-5I+J)/32) Copyright © 2008 LOGTEL Yossi Cohen
168.
Chroma Interpolation
Chroma interpolation is 1/8 sample accurate since luma motion is ¼ sample accurate. Fractional chroma sample positions are obtained using the equation: Copyright © 2008 LOGTEL Yossi Cohen
169.
Inter prediction modes
MVs for neighboring partitions are often highly correlated. So we encode MVDs instead of MVs MVD = predicted MV – MVp ¼ pixel accurate motion compensation Copyright © 2008 LOGTEL Yossi Cohen
170.
Multiple Reference Frames
Input Coder Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Macroblocks Transform 16x16 pixels Entropy Coding De-blocking Intra-frame Filter Prediction Output Motion- Video Compensation Signal Intra/Inter Motion Multiple Reference Data Frames for Motion Motion Compensation Estimation Copyright © 2008 LOGTEL Yossi Cohen
171.
Multiple Reference Frames Copyright
© 2008 LOGTEL Yossi Cohen
172.
Intra prediction modes
4x4 luminance prediction modes 0(vertical) 1(Horizontal) 2(DC) 3(Diagonal 4(Diagonal Down/left) Down/right) 5(Vertical-right) 6(Horizontal-down) 7(Vertical-left) 8(Horizontal-top) Mode 2 (DC) Predict all pixels from (A+B+C+D+I+J+K+L+4)/8 or (A+B+C+D+2)/4 or (I+J+K+L+2)/4 Copyright © 2008 LOGTEL Yossi Cohen
173.
Intra prediction modes
4x4 luminance prediction modes Copyright © 2008 LOGTEL Yossi Cohen
174.
Intra prediction modes
Intra 16x16 luminance and 8x8 chrominance prediction modes Copyright © 2008 LOGTEL Yossi Cohen
175.
Inter prediction modes
chrominance Pixel interpolation Quarter chrominance Pixels are A B interpolated by tacking weighted dy dx averages of distance from the new S-dx pixel to four surrounding original S-dy pixels. C D (s-dx)(s-dy)A+dx(s-dy)B+(s-dx)dyC+dxdyD+s2/2 V= S2 Copyright © 2008 LOGTEL Yossi Cohen
176.
Deblocking filter
Deblocking filter: Improves subjective visual and objective quality of the decoded picture Significantly superior to post filtering Filtering affects the edges of 4x4 block structure Highly content adaptive filtering procedure mainly removes blocking artifacts and does not unnecessarily blur the visual content Filtering strength is dependent on inter,intra, motion and coded residuals. Copyright © 2008 LOGTEL Yossi Cohen
177.
Deblocking filter
Principle: Copyright © 2008 LOGTEL Yossi Cohen
178.
Deblocking filter
Deblocking filter: Highly compressed decoded inter picture 1) Without Filter 2) with H264/AVC Deblocking Copyright © 2008 LOGTEL Yossi Cohen
179.
Entropy coding Copyright ©
2008 LOGTEL Yossi Cohen
180.
Entropy coding
Entropy coding methods: CABAC - Discussed UVLC H.264 offers a single Universal VLC (UVLC) table for all symbol CAVLC CAVLC (Context-based variable Length Coding ) Probability distribution is static Code words must have integer number of bits (Low coding efficiency for highly peaked pdfs) Copyright © 2008 LOGTEL Yossi Cohen
181.
CABAC: Technical Overview
update probability estimation Context Binarization Probability Coding modeling estimation engine Adaptive binary arithmetic coder Chooses a model Maps non-binary Uses the provided model conditioned on symbols to a for the actual encoding past observations binary sequence and updates the model Copyright © 2008 LOGTEL Yossi Cohen
182.
Complexity of codec
design Codec design includes much higher complexity (memory & computation) – rough guess 2-3x decoding power increase relative to MPEG4, 4-5x encoding Problem areas: Smaller block sizes for motion compensation (cache access issues) Longer filters for motion compensation (more memory access) Multi-frame motion compensation (more memory for reference frame storage) More segmentations of macroblock to choose from (more searching in the encoder) More methods of predicting intra data (more searching) Arithmetic coding (adaptivity, computation on output bits) Copyright © 2008 LOGTEL Yossi Cohen
183.
Comparison Copyright © 2008
LOGTEL Yossi Cohen
184.
Summary
New key features are: Enhanced motion compensation Small blocks for transform coding Improved de-blocking filter Enhanced entropy coding Substantial bit-rate savings (up to 50%) relative to other standards for the same quality The complexity of the encoder triples that of the prior ones The complexity of the decoder doubles that of the prior ones Copyright © 2008 LOGTEL Yossi Cohen
185.
Sorenson Spark video
Codec H263 variant Low footprint (code size) ~100K Good performance for 2002 Quality SPARK vs Optimal MPEG (H263+) 20-30% less efficient SPARK Quality RT vs Offline RT has Considerably lower quality due to processing power and RT (delay) constraints Copyright © 2008 LOGTEL Yossi Cohen
186.
Sorenson Spark -
2 Does Not support: Arithmetic coding Advance prediction B-frames Features De-blocking filter mode UMV - Unrestricted Motion Vector mode Arbitrary frame dimensions Supported by FFMPEG D – Frames Copyright © 2008 LOGTEL Yossi Cohen
187.
D-Frames
D (Disposable) frames One way prediction Provides flexible bit-rate: I-D-P-D-P-D-P D-frames used only when feeding a flash communication server Copyright © 2008 LOGTEL Yossi Cohen
188.
On2 TrueMotion VP6
Features Compressed I-frames (Intra-compression makes use of spatial predictors) unidirectional predicted frames (P-frames) Multiple reference P-frames 8x8 iDCT-class transform (4x4 in VP7) improved quantization strategy (preserves image details) Advance Entropy Coding Copyright © 2008 LOGTEL Yossi Cohen
189.
VP6 Features
Entropy Coding various techniques are used based on complexity and frame size including: VLC Context modeled binary coding (like H264 CABAC) Bit Rate Control To reach the requested data rate, VP6 adjusts Quantization levels Encoded frame dimensions Entropy Coding Drop frames Copyright © 2008 LOGTEL Yossi Cohen
190.
VP6 motion prediction
Motion Vectors One vector per MacroBlock (16x16) or 4 vectors for each block (8x8) Quarter pel motion compensation support Unrestricted motion compensation support Two reference frames: The previous frame or Previously bookmarked frame Copyright © 2008 LOGTEL Yossi Cohen
191.
VP6 vs H264
VP6 is much simpler than H.264 Requires less CPU resourced for decoding & encoding Code size is considerably smaller. Simpler means less efficient? NO! Techniques used: Mix of adaptive sub-pixel motion estimation Better prediction of low-order frequency coefficients Improved quantization strategy de-blocking and de-ringing filters Enhanced context based entropy coding, Copyright © 2008 LOGTEL Yossi Cohen
192.
PSNR Graphs are
used for comparative analysis of compression quality. Each line 720p High Profile H.264 vs VP7 represents the encode quality on a given This axis represents quality. Higher is better clip at multiple datarates. The highest line Draw a line straight represents the codec with the best quality. Alexander Trailer intersect across until you In this case VP7 clearly is better than x264. 47 the lower line ( in this Pick any point on case x264. i.e. keep the 46.5 the top line, in this Tips for reading this kind of a quality/ psnr constant ) 46 case it’s VP7. graph (a PSNR graph): What this means: 45.5 On this clip VP7 at 2750 kbps has the same quality / PSNR as x264 high profile 45 Draw a line straight kbps. i.e. you’d need 30% higher a line straight at 3620 PSNR Draw 44.5 down from that pointdatarate to get the same quality outfrom that point to to down of Vp7 the datarate axis. The x264 that you got from vP7. x264 the datarate axis. The 44 crossing point tells you crossing point tells you 43.5 the datarate at that the datarate at that point. point. 43 42.5 1400 1900 2400 2750 kbps 2900 3400 3620 kbps 3900 4400 Kbps This axis represents datarate in kilobits per second. Copyright © 2008 LOGTEL Yossi Cohen
193.
VP6 vs. H264
There is a difference between the codec technology and a codec implementation. Copyright © 2008 LOGTEL Yossi Cohen
194.
On2 VP7
Not open source Non-standard royalties model Better video quality than H264 Used by: Part of EVD – China standard for HD-DVD Skype Beta (V 2.0) Flash Player Copyright © 2008 LOGTEL Yossi Cohen
195.
Windows Media
Windows media is a format used by Microsoft for encoding and distributing Audio and Video. Windows Media has two types of media: Windows Media Audio (WMA) Windows Media video (WMV) Windows Media Video A modified version of MPEG 4 Codec version has initially started from version 7 for windows media player 7 and then evolved to version 8-10 Copyright © 2008 LOGTEL Yossi Cohen
196.
Windows Media 9
- VC1 Format Microsoft has submitted Version 9 codec to the Society of Motion Picture and Television Engineers (SMPTE), for approval as an international standard. SMPTE is reviewing the submission under the draft-name "VC-1") This codec is also used to distribute high definition video on standard DVDs in a format Microsoft has branded as WMV HD. This WMV HD content can be played back on computers or compatible DVD players. The Trial version of standards were published by SMPTE in September 2005 WMV9 was approved by SMPTE, April 2006 Copyright © 2008 LOGTEL Yossi Cohen
197.
GOOGLE VP8 Copyright ©
2008 LOGTEL Yossi Cohen
198.
Before we start
VP8 goal is NOT to delivery the best video quality in any given bitrate VP8 was designed as a mobile video decoder and should be examined in this context: VP8 vs H.264 base profile Copyright © 2008 LOGTEL Yossi Cohen
199.
Google VP8
Last month, in Google IO (its developer confrence), Google released VP8 as open source VP8 is a light weight video codec developed by On2. VP8 provide quality which is the same/higher than H.264 base profile VP8 memory requirements are lower than H.264 base profile After optimization, VP8 might have better MIPS performance than H.264 base profile Copyright © 2008 LOGTEL Yossi Cohen
200.
Genealogy
VP8 is part of a well know codec family VP3 was released to open source to become XIPH Theora VP6 is used in Flash video VP7 is used in Skype Theora VP3 Motivation: “No Royalties” CODEC VP7 VP6 VP8 Copyright © 2008 LOGTEL Yossi Cohen
201.
ADAPTATION – WHO
USE IT? Software Hardware Platform & Publishers Copyright © 2008 LOGTEL Yossi Cohen
202.
Software Adaptation
Android, Anystream, Collabora Corecodec, Firefox, Adobe Flash Google Chrome, iLinc, Inlet, Opera, ooVoo Skype, Sorenson Media Theora.org, Telestream, Wildform. Copyright © 2008 LOGTEL Yossi Cohen
203.
Hardware adaptation
AMD, ARM, Broadcom Digital Rapids, Freescale Harmonic ,Logitech, ViewCast Imagination Technologies, Marvell NVIDIA, Qualcomm, Texas Instruments VeriSilicon, MIPS Copyright © 2008 LOGTEL Yossi Cohen
204.
Platforms and Publishers
Brightcove Encoding.com HD Cloud Kaltura Ooyala YouTube Zencoder Copyright © 2008 LOGTEL Yossi Cohen
205.
VP8 MAIN FEATURES Copyright
© 2008 LOGTEL Yossi Cohen
206.
Adaptive Loop Filter
Improved Loop filter provides better quality & preformance in comparison to H.264 Source: On2 Copyright © 2008 LOGTEL Yossi Cohen
207.
Golden Frames
Golden frames enables better decoding of background which is used for prediction in later frames Could be used as resync-point: Golden frame can reference an I frame Could be hidden (not for display) Source: On2 Copyright © 2008 LOGTEL Yossi Cohen
208.
Decoding efficiency
CABAC is an H.264 feature which improves coding efficiency but consumes many CPU cycles VP8 has better entropy coding than H.264, this leads to relatively lower CPU consumption under the same conditions • Decoding efficiency is important for smooth operation and long battery life in netbooks and mobile devices Copyright © 2008 LOGTEL Source: On2 Yossi Cohen
209.
Resolution up-scaling &
downscaling Supported by the decoder Encoder could decide dynamically (RT applications) to lower resolution in case of low bit rate and let the decoder scale. Remove decision from the application No need for an I frame Copyright © 2008 LOGTEL Yossi Cohen
210.
VP8 BASICS
Definitions Bitstream structure Frame structure Copyright © 2008 LOGTEL Yossi Cohen
211.
Definitions
Frame – same as H.264 Segment – Parallel to slice in H.264. MB in the same segment will use the settings such as: Probabilistic encoder/decoder settings De-blocking filter settings Partition – block of byte aligned compressed video bits. Copyright © 2008 LOGTEL Yossi Cohen