SlideShare a Scribd company logo
1 of 131
Download to read offline
2. Multimedia
Objectives
2
ī‚¨ To understand the terminologies of multimedia
systems.
ī‚¨ To study protocols used in multimedia applications.
ī‚¨ To know different hardware and software
components required to run multimedia.
ī‚¨ To evaluate multimedia services that satisfy user
requirements.
Introduction
3
ī‚¨ The way we utilize audio and video has evolved as a result of
recent technological advancements.
ī‚¨ In the past, we would listen to an audio broadcast on the radio
and watch a video show on the television.
ī‚¨ People nowadays desire to utilize the Internet for audio and
video services in addition to text and image communications.
ī‚¨ This chapter focuses on programs that provide audio and video
services via the Internet.
4
ī‚¨ Audio and video services may be divided into three
main categories:
Streaming Stored Audio/Video
5
ī‚¨ The files are compressed and saved on a server using this
approach.
ī‚¨ The files are downloaded by a client through the Internet.
ī‚¨ It is named On-demand audio/video.
ī‚¨ E.g.
ī‚¨ Stored audio files: songs, symphonies, books on tape, and
popular lectures.
ī‚¨ Stored video files: movies, TV shows, and music video clips.
Streaming Live Audio/Video
6
ī‚¨ Streaming live audio/video refers to the broadcasting
of radio and TV programs through the Internet.
ī‚¨ A user listens to broadcast audio and video through the
Internet.
ī‚¨ E.g. Internet Radio. Some radio stations solely transmit
their programming via the Internet, while others
broadcast them both over the Internet and over the air.
Interactive Audio/Video
7
ī‚¨ Interactive audio/video refers to the use of the
Internet for interactive audio/video applications.
ī‚¨ E.g. Internet telephony and Internet teleconferencing.
Digitizing Audio and Video
8
ī‚¨ Before audio or video signals can be transmitted over
the Internet, they must first be digitized.
ī‚¨ Digitizing Audio
ī‚¨ Digitizing Video
Digitizing Audio
9
ī‚¨ When sound is supplied into a microphone, an
electrical analog signal is produced that represents
the amplitude of the sound as a function of time.
ī‚¨ These signals are named analog audio signals.
10
ī‚¨ An analog signal, such as audio, can be digitized to
produce a digital signal.
ī‚¨ According to the Nyquist theorem, if the highest
frequency of the signal is f, we need to sample the
signal 21 times per second.
Digitizing Video
11
ī‚¨ A video is made up of a series of frames. We receive
the sensation of motion if the frames are presented on
the screen quickly enough.
ī‚¨ The reason for this is that our eyes cannot differentiate
between the quickly flashing frames and individual
frames.
12
ī‚¨ There is no standard for the number of frames per second;
nevertheless, 25 frames per second is popular in North
America.
ī‚¨ A frame must be refreshed to avoid a situation known as
flickering(change in brightness).
ī‚¨ Each frame is repainted twice in the television industry.
ī‚¨ This implies 50 frames must be delivered, or 25 frames if
memory is available at the sender site, with each frame
repainted from memory.
13
ī‚¨ Each frame is subdivided into picture elements, or pixels, which are
tiny grids.
ī‚¨ Each 8-bit pixel on black-and-white television represents one of
256 distinct grey levels. Each pixel on a color TV is 24 bits, with 8
bits for each basic color (red, green, and blue).
ī‚¨ We can calculate the number of bits in 1s for a specific resolution.
ī‚¨ A color frame with the lowest resolution is 1024 × 768 pixels. This
equates to
ī‚¨ 2 x 25 x 1024 x768x 24=944 Mbps.
Audio and Video Compression
14
ī‚¨ Compression is required when sending audio or
video over the Internet.
1. Audio Compression
15
ī‚¨ Speech and music may both benefit from audio
compression. We need to compress a 64-kHz digitized
signal for speech, and a 1.41 I-MHz signal for music.
ī‚¨ There are two kinds of techniques for audio
compression:
ī‚¤ Predictive Encoding
ī‚¤ Perceptual Encoding
Predictive Encoding
16
ī‚¨ Instead of storing all of the sampled values, predictive
encoding encodes the changes between the samples.
ī‚¨ Speech compression is the most common use for this sort
of compression.
ī‚¨ GSM (13 kbps), G.729 (8 kbps), and G.723.3 are
some of the standards that have been established (6.4
or 5.3 kbps).
Perceptual Encoding: MP3
17
ī‚¨ The perceptual encoding approach is the most popular
compression technique used to generate CD-quality
audio.
ī‚¨ This kind of audio requires at least 1.411 Mbps, which
cannot be sent without compression via the Internet.
ī‚¨ This method is used by MP3 (MPEG audio layer 3), which
is part of the MPEG standard.
18
ī‚¨ Perceptual audio coding is a type of audio signal
compression method that is based on human ear
defects.
ī‚¨ Perceptual encoding is based on the science of
psychology concerned with the perception of sound and
its physiological effects.
ī‚¨ The concept is based on defects in our auditory system,
which allows some sounds to hide other sounds. Masking
can occur in both frequency and time.
19
ī‚¨ Frequency masking: A strong sound in one
frequency band can partially or completely hide a
lower sound in another frequency range, which is
known as frequency masking.
ī‚¨ E.g. We cannot hear the words of a person who is
sitting beside us in a room where an Arkestra in
loud sound is playing.
20
ī‚¨ Temporal masking: A loud sound can affect our
hearing for a short period after it has ended in
temporal masking.
21
ī‚¨ MP3 compresses audio signals by using frequency
and temporal masking. MP3 has three different
data rates: 96 kbps, 128 kbps, and 160 kbps.
ī‚¨ The rate is determined by the frequency range of
the original analog audio.
Video Compression
22
ī‚¨ Video is comprised of multiple frames, and each of the frames is an
image.
ī‚¨ Video can be compressed by compressing the images.
ī‚¨ The market is dominated by two standards:
ī‚¤ Joint Photographic Experts Group (JPEG) and
ī‚¤ Moving Picture Experts Group (MPEG).
ī‚¨ Images are compressed using the Joint Photographic Experts Group
(JPEG).
ī‚¨ Video is compressed using the Moving Picture Experts Group (MPEG).
Image Compression: JPEG
23
ī‚¨ In the grayscale picture, each pixel can be
represented by an 8-bit integer (256 levels).
ī‚¨ The picture is in color, each pixel can be represented
by 24 bits (3 x 8 bits), with every 8 bits representing
red, blue, or green (RBG).
24
ī‚¨ A grayscale image is split into 8 × 8-pixel blocks in
JPEG .
ī‚¨ The goal of splitting the image into blocks is to reduce
the number of computations since the number of
mathematical operations for each picture is equal to
the square of the number of units.
25
Figure 2.2 JPEG grayscale
26
ī‚¨ JPEG's entire concept is to convert the image into a
linear (vector) set of numbers that shows the
redundancies.
ī‚¨ Using one of the text compression methods, the
redundancies (lack of changes) may then be
eliminated.
27
JPEG Process
Discrete Cosine Transform (DCT)
28
ī‚¨ During this phase, each block of 64 pixels is
transformed using the discrete cosine transform (DCT).
ī‚¨ The transformation modifies the 64 values, preserving
the relative connections between pixels while revealing
the redundancies.
ī‚¨ We present the transformation outcomes for three
different situations.
29
ī‚¨ We present the transformation outcomes for
three different situations.
Case 1:Uniform Gray Scale
Case 2:Two Sections
Case 3:Gradient Gray Scale
Case 1:Uniform Gray Scale
30
ī‚¨ In this case, we have a grayscale block with a value of 20
for each pixel.
ī‚¨ We receive a nonzero value for the first element (upper
left corner) when we perform the transformations and the
remaining of pixels have a 0 value.
ī‚¨ The value of T(0,0) is the average (multiplied by a
constant) of the P(x,y) values and is called the dc value
(direct current, borrowed from electrical engineering).
31
ī‚¨ The remaining values are called ac values, in which T(m,n)
represents changes in the pixel values. As shown in Figure the
rest of the values are 0s.
Case 1: Uniform Gray Scale
Case 2: Two Sections
32
ī‚¨ In the second example, we have a block that has two
distinct uniform greyscale sections.
ī‚¨ The pixel values have changed significantly (from 20 to
50).
ī‚¨ We receive a dc value as well as nonzero ac values when
we perform the transformations.
ī‚¨ However, the dc value is surrounded by just a few nonzero
values. As per Figure 2.5, the majority of the values are
zero.
33
Case 2: Two Sections
Case 3:Gradient Gray Scale
34
ī‚¨ In the third case, we have a block that slowly
transforms.
ī‚¨ That is, there is no significant difference in the
values of nearby pixels.
ī‚¨ When we do the transformations, we obtain a dc
value along with several nonzero ac values as
shown in Figure
35
Case 3: Gradient Gray Scale
36
From the all above cases we can conclude that:
ī‚¨ The transformation creates table T from table P.
ī‚¨ The dc value is the average value (multiplied by a
constant) of the pixels.
ī‚¨ The ac values are the changes.
ī‚¨ Lack of changes in neighboring pixels creates 0s.
Quantization
37
ī‚¨ Quantization is the process of reducing the number of
bits needed to store an integer value by reducing the
precision of the integer.
ī‚¨ Previously, when we quantized each number, we
removed the fraction and preserved the integer part.
ī‚¨ The number is divided by a constant, and the fraction is
then dropped.
ī‚¨ This further reduces the number of bits required.
38
ī‚¨ A quantizing table (8 x 8) is used in most
implementations to specify how to quantize each
value.
ī‚¨ The divisor is determined by the value's position in
the T table.
ī‚¨ This is done to optimize the number of bits and 0s
for each specific application.
39
ī‚¨ The quantizing step is the only part of the process
that cannot be reversed.
ī‚¨ We've lost some information that can't be
recovered.
ī‚¨ Due to this reason, only JPEG is called lossy
compression because of this quantization phase.
Compression
40
ī‚¨ The values are read from the table after quantization,
and redundant 0s are eliminated.
ī‚¨ The table is read diagonally in a zigzag way rather
than row by row or column by column to cluster the 0s
together.
ī‚¨ The reason behind this is if the picture changes
smoothly, the bottom right corner of the T table is all 0s.
ī‚¨ Figure depicts the process of reading the table.
41
Reading the Table
Video Compression: MPEG
42
ī‚¨ A motion picture is a fast sequence of frames, each of
which represents an image.
ī‚¨ To put it another way, a frame is a spatial combination
of pixels, whereas a video is a temporal combination of
frames transmitted one after the other.
ī‚¨ Compressing video means spatially compressing each
frame and temporally compressing a set of frames.
43
ī‚¨ Spatial Compression
ī‚¨ JPEG is used to compress each frame's spatial data.
Each frame is an image that may be compressed
separately.
44
Temporal Compression
ī‚¨ Duplicate frames are eliminated during temporal compression.
ī‚¨ We get 50 frames per second when we watch television.
ī‚¨ However, the majority of the frames in a sequence are nearly
identical.
ī‚¨ E.g. When someone is speaking, the majority of the frame
remains the same from one frame to the next, with the exception
of the segment of the frame around the lips, which varies from
one frame to the next.
45
ī‚¨ For temporal data compression, the MPEG method
divides frames into three types:
ī‚¤ I-frames
ī‚¤ P-frames
ī‚¤ B-frames
I-Frames (Intracoded Frame)
46
ī‚¨ It is a frame that exists independently of any other frame
(not to the frame sent before or to the frame sent after).
ī‚¨ They are not constructed by other frames.
ī‚¨ They arrive at regular intervals (e.g., every ninth frame is
an I-frame).
ī‚¨ An I-frame must appear on a regular basis to manage a
rapid change in the frame that the preceding and
subsequent frames are unable to display
47
ī‚¨ A viewer may tune in at any moment when a video
is shown.
ī‚¨ If there is only one I-frame at the start of the show,
late viewers will not get a complete picture.
P-Frames (Predicted Frame)
48
ī‚¨ It is related to the previous I-frame or P-frame.
ī‚¨ Each P-frame only contains the differences from the
previous frame.
ī‚¨ E.g. if an object is moving quickly, the new changes may not
be recorded in a P-frame. P-frames can only be built from
previous I- or P-frames.
ī‚¨ P-frames carry significantly less information than other
frame types and even fewer bits after compression.
B-Frames (Bidirectional Frame)
49
ī‚¨ It is related to the I-frame or P-frame that comes
before and after it. Each B-frame is relative to the
past and future. It should be noted that a B-frame is
never related to another B-frame.
ī‚¨ Figure depicts a sample frame sequence.
50
MPEG Frames
51
ī‚¨ Figure depicts the construction of I-Frames, P-Frames,
and B-frames from a series of seven frames.
MPEG Frame Construction
Streaming Stored Audio/Video
52
ī‚¨ In this section, we will discuss different approaches for
downloading Streaming Stored audio/video files
from the webserver.
First Approach: Using a Web Server
53
ī‚¨ You can save a compressed audio/video file as a text file.
ī‚¨ To download the file, the client (browser) can use HTTP services and
send a GET message.
ī‚¨ The compressed file can be sent to the browser by the Web server.
ī‚¨ The browser can then play the file using an application, referred to
as a media player.
ī‚¨ This method is very simple and clear and does not require any
streaming.
ī‚¨ This method is depicted in Figure 2.10.
54
Using a Web Server
Drawbacks
55
This method has several drawbacks.
ī‚¨ Even after compression, an audio/video file is usually quite
large.
ī‚¨ A video file and audio file require lots of megabits to store.
ī‚¨ The file must be completely downloaded before it can be
played.
ī‚¨ With today's data rates, the user will have to wait a few
seconds or even tens of seconds before the file can be
played.
Second Approach: Using a Web Server
with Metafile
56
ī‚¨ This approach involves connecting the media player
directly to the Web server and downloading the
audio/video file.
ī‚¨ The audio/video file and a metafile containing
information about the audio/video file are both
stored on the Web server.
ī‚¨ The steps in this approach are depicted in Figure
57
Using a Web Server with a Metafile
58
1. The HTTP client accesses the Web server by using the
GET message.
2. The information about the metafile comes in the
response.
3. The metafile is passed to the media player.
4. The media player uses the URL in the metafile to access
the audio/video file.
5. The Web server responds.
Third Approach: Using a Media Server
59
ī‚¨ The issue with the second approach is that both the
browser and the media player rely on HTTP services.
ī‚¨ HTTP is intended to operate over TCP.
ī‚¨ This is appropriate for retrieving the metafile but not
the audio/video file.
ī‚¨ The reason for this is that TCP retransmits a lost or
damaged segment, which goes against the streaming
philosophy.
60
ī‚¨ TCP and its error control must be dropped in favor
of UDP.
ī‚¨ HTTP connects to the Web server, and the Web
server itself is designed to work with TCP;
ī‚¨ Here, we need a separate server, a media server
for the processing of the audio and video files.
61
Using a Media Server
62
1. The HTTP client accesses the Web server by using a
GET message.
2. The information about the metafile comes in the
response.
3. The metafile is forwarded to the media player.
4. The media player uses the URL in the metafile to access
the media server to download the file.
5. The media server sends reply.
Fourth Approach: Using a Media Server
and RTSP
63
ī‚¨ The Real-Time Streaming Protocol (RTSP) is a control
protocol that was created to enhance the functionality
of the streaming process.
ī‚¨ We can control the playback of audio/video using
RTSP.
ī‚¨ RTSP is an out-of-band control protocol similar to FTP's
second connection.
ī‚¨ A media server and RTSP are depicted in Figure
64
Using a Media Server and RTSP
65
1. The HTTP client accesses the Web server by using a GET
message.
2. The information about the metafile comes in the response.
3. The metafile is passed to the media player.
4. The media player sends a SETUP message to create a
connection with the media server.
5. The media server responds.
66
6. The media player sends a PLAY message to start
playing (downloading).
7. The audio/video file is downloaded by using
another protocol that runs over UDP.
8. The connection is broken by using the TEARDOWN
message.
9. The media server responds.
Streaming Live Audio/Video
67
ī‚¨ Streaming live audio/video follows the same
strategy to broadcast audio and video on radio
and television stations.
ī‚¨ Only the difference is that the station uses the
Internet for broadcasting instead of the air.
68
ī‚¨ Streaming stored audio/video and streaming live
audio/video are both affected by delays, and neither
can accept retransmission.
ī‚¨ There is a distinction.
ī‚¨ The communication in the first application is unicast and
on-demand.
ī‚¨ The communication is multicast and live in the second.
69
ī‚¨ Live streaming is better suited to IP multicast
services and protocols like UDP and RTP.
ī‚¨ However, live streaming is still using TCP and
multiple unicasting rather than multicasting.
Real-Time Interactive Audio/Video
70
ī‚¨ In Real-Time Interactive Audio/Video people
interact with each other in real-time.
ī‚¨ E.g. Internet phone or voice over IP and Video
conferencing.
Characteristics
71
ī‚¨ we discuss several characteristics of real-time audio/video
communication.
1. Time Relationship
2. Timestamp
3. Playback Buffer
4. Ordering
5. Multicasting
6. Translation
7. Mixing
Time Relationship
72
ī‚¨ The preservation of the time relationship between
packets of a session is required for real-time data on a
packet-switched network.
ī‚¨ For Example: let us assume that a real time video server
creates live video images and sends them online.
ī‚¨ The video is digitized and packetized.
ī‚¨ There are only three packets and each packet holds
10s of video information.
73
Time Relationship
74
ī‚¨ But what if the packets arrive at different times?
ī‚¨ Assume the
ī‚¤ first packet arrives at 00:00:01 (1-s delay),
ī‚¤ the second at 00:00:15 (5-s delay),
ī‚¤ and the third at 00:00:27. (7-s delay).
ī‚¨ If the receiver begins to play the first packet at 00:00:01,
it will end at 00:00:11.
ī‚¨ The next packet, however, has not yet arrived; it will arrive
4 seconds later.
75
ī‚¨ As the video is viewed at the remote site, there is a
gap between the first and second packets, and
between the second and third.
ī‚¨ This is referred to as jitter.
ī‚¨ The delay between packets causes jitter in real-time
data.
ī‚¨ The situation is depicted in Figure
76
ī‚¨ Assume, for example, that a real-time video server
generates and distributes live video images over the
internet.
ī‚¨ Video has been digitized and packetized.
ī‚¨ There are only three packets, and each packet contains
10s of video data.
ī‚¨ The first packet begins at 00:00:00, the second packet
at 00:10, and the third packet at 00:20.
77
ī‚¨ Assume that each packet takes 1 second to reach its
destination (equal delay).
ī‚¨ The first packet can be played back at 00:00:01, the second
packet at 00:00:11, and the third packet at 00:00:21.
ī‚¨ Despite the fact that there is a 1s time difference between
what the server sends and what the client sees on the computer
screen, the action is taking place in real-time.
ī‚¨ The packets' time relationship is maintained. The 1s lag is
insignificant.
78
Jitter
Timestamp
79
ī‚¨ To prevent Jitter, we can time-stamp the packets and
separate the arrival time from the playback time.
ī‚¨ The use of a timestamp is one solution to Jitter. If each
packet contains a timestamp indicating the time it was
created in relation to the first (or previous) packet, the
receiver can add this time to the time it begins
playback.
80
ī‚¨ In other words, the receiver knows when to play each
packet.
ī‚¨ Consider the previous example, where the first packet has a
timestamp of 0, the second has a timestamp of 10, and the
third has a timestamp of 20.
ī‚¨ If the receiver begins playing the first packet at 00:00:08,
the second at 00:00:18, and the third at 00:00:28.
ī‚¨ There are no gaps between packets. The situation is
depicted in Figure
81
Timestamp
Playback Buffer
82
ī‚¨ We need a buffer to store the data until it is played back so that
we can separate the arrival time from the playback time.
ī‚¨ The buffer is known as a playback buffer.
ī‚¨ When a session starts (the first bit of the first packet arrives), the
receiver defers playing the data until a certain threshold is reached.
ī‚¨ The first bit of the first packet arrives at 00:00:01 in the preceding
example; the threshold is 7 s, and the playback time is 00:00:08.
ī‚¨ The threshold is measured in data time units.
ī‚¨ The replay does not begin until the data time units reach the
threshold value.
83
ī‚¨ The data is stored in the buffer at a variable rate, but
it is extracted and played back at a constant rate.
ī‚¨ The amount of data in the buffer shrinks or expands,
but there is no jitter as long as the delay is less than the
time it takes to playback the threshold amount of data.
ī‚¨ For our example, Figure depicts the buffer at various
times.
84
Playback Buffer
Ordering
85
ī‚¨ One more feature is required in addition to time relationship
information and timestamps for real-time traffic.
ī‚¨ Each packet requires a sequence number.
ī‚¨ If a packet is lost, the timestamp alone will not alert the
receiver.
ī‚¨ Let's pretend the timestamps are 0, 10, and 20.
ī‚¨ The receiver receives only two packets with timestamps 0
and 20 if the second packet is lost.
86
ī‚¨ The receiver assumes the packet with the timestamp
20 is the second packet, which was sent 20 seconds
after the first.
ī‚¨ The receiver has no way of knowing whether or not
the second packet was lost.
ī‚¨ To deal with this situation, you'll need a sequence
number to order the packets.
Multicasting
87
ī‚¤ Audio and video conferencing rely heavily on
multimedia.
ī‚¤ The data is distributed using multicasting methods
because the traffic can be heavy.
ī‚¤ Two-way communication between receivers and senders
is required for conferencing.
Translation
88
ī‚¨ A translator is a computer that can change the format
of a high-bandwidth video signal to a lower-quality
narrow-bandwidth signal.
ī‚¨ This is required, for example, when a source generates
a high-quality video signal at 5 Mbps and sends it to a
recipient with a bandwidth of less than 1 Mbps.
ī‚¨ A translator is required to decode the signal and
encode it again at a lower quality that requires less
bandwidth in order to receive it.
Mixing
89
ī‚¨ When multiple sources can send data at the same
time (as in a video or audio conference), the traffic
is divided into multiple streams.
ī‚¨ Data from various sources can be mixed to
converge traffic to a single stream.
ī‚¨ A mixer mathematically combines signals from
various sources to produce a single signal.
Support from Transport Layer Protocol
90
ī‚¨ Some of the procedures in real-time applications
are preferable to implement in the transport layer
protocol.
ī‚¨ Let's take a look at which of the existing transport
layers is appropriate for this type of traffic.
91
ī‚¨ Mainly TCP and UDP are two transport layer protocols. TCP is not
appropriate for interactive traffic.
ī‚¨ It does not support time-stamping and multicasting.
ī‚¨ The error control mechanism supported by TCP is not suitable for
interactive traffic as retransmission of the lost or corrupted packet is not
expected.
ī‚¨ The concept of time-stamping and playback is thrown off by retransmission.
ī‚¨ Today's audio and video signals have so much redundancy (even with
compression) that we can simply ignore a lost packet.
ī‚¨ The listener or viewer at the remote location may miss it.
92
ī‚¨ For interactive multimedia traffic, UDP is better.
ī‚¨ Multicasting is supported by UDP, but there is no retransmission
strategy.
ī‚¨ UDP, on the other hand, does not support time-stamping, sequencing,
or mixing.
ī‚¨ These features are provided by the Real-time Transport Protocol
(RTP), a new transport protocol.
ī‚¨ For interactive traffic, UDP is preferable to TCP.
ī‚¨ However, we require the services of RTP, a different transport layer
protocol, to compensate for UDP's shortcomings.
RTP (Real-time Transport Protocol)
93
ī‚¨ The Real-time Transport Protocol (RTP) is a protocol
designed to handle real-time Internet traffic.
ī‚¨ RTP lacks a delivery mechanism (multicasting, port numbers,
and so on).
ī‚¨ It must be used in conjunction with UDP. RTP acts as a bridge
between UDP and the application program.
ī‚¨ RTP's primary contributions are time-stamping, sequencing,
and mixing capabilities.
ī‚¨ RTP's position in the protocol suite is sketched in Figure
RTP
94
RTP-Packet Format
95
ī‚¨ The format is simple and broad enough to cover a
wide range of real-time applications.
ī‚¨ If an application requires additional data, it adds
it to the beginning of its payload.
ī‚¨ The RTP packet header is shown in Figure
96
RTP packet header format
97
ī‚¨ Ver (2-bits) :It defines the version number. The current
version is 2.
ī‚¨ P (1-bit):If this field is set to 1, it indicates the appearance
of padding at the end of the packet. The value of the last
byte in the padding defines the length of the padding.
There is no padding if the value of the P field is 0.
ī‚¨ X (1-bit):If this field is set to 1, it indicates an extra
extension header between the basic header and the data. If
this field is set to 0 then, no extra extension header.
98
ī‚¨ Contributor Count (4-bits):It gives the count of
Contributors. We can have a maximum of 15
contributors (between 0 and 15).
ī‚¨ M (1-bit):It is used by the application as a marker. It
indicates, for example, the end of its data.
ī‚¨ Payload Type (7-bits):It gives the type of payload.
Several Payload Types are defined but Table 2.1
describes some of the payload types and the
applications.
99
Payload types
100
ī‚¨ Sequence Number (16-bits)
This field is used to give the number to the RTP packets. The first
packet's sequence number is chosen at random, and it is increased
by one for each subsequent packet. The receiver uses the sequence
number to detect lost or out-of-order packets.
ī‚¨ Timestamp (32-bits)
This field indicates the time relationship between the packets. The
first packet's timestamp is a random number. The value for each
subsequent packet is the sum of the preceding timestamp plus the
time the first byte is produced.
101
ī‚¨ Synchronization Source Identifier (32-bits)
In the case of only one source, this field defines the source. If there
are multiple sources, the mixer serves as the synchronization source,
while the other sources serve as contributors. The source identifier's
value is a random number chosen by the source.
ī‚¨ Contributor Identifier (32-bits)
Each of these 32-bit identifiers (up to 15 in total) defines a source.
When there are multiple sources in a session, the mixer serves as the
synchronization source, while the remaining sources serve as
contributors.
102
ī‚¨ Despite the fact that RTP is a transport layer protocol, the RTP
packet is not directly encapsulated in an IP datagram. Instead, RTP
is encapsulated in a UDP user datagram and treated as an
application program.
ī‚¨ RTP does not have a well-known port assigned to it.
ī‚¨ The port can be chosen at any time, with the exception that the port
number must be an even number.
ī‚¨ RTP's companion, Real-time Transport Control Protocol (RTCP), uses
the next number (an odd number).
ī‚¨ RTP uses a temporary even-numbered UDP port.
RTCP(Real-time Transport Control
Protocol)
103
ī‚¨ Real-time Transport Control Protocol (RTCP) is a
protocol implemented to facilitate messages which
regulate the flow and quality of data while also
allowing the recipient to provide feedback to the
source or sources.
ī‚¨ Figure depicts the five types of messages supported by
RTCP. The number next to each box denotes the
message's type.
RTCP-Message Types
104
Sender Report
105
ī‚¨ The active senders in a conference send the sender report
on a regular basis to report transmission and reception
statistics for all RTP packets sent during the interval.
ī‚¨ The sender report includes an absolute timestamp, which is
the number of seconds since 12:00 a.m. on January 1, 1970.
ī‚¨ The absolute timestamp enables the receiver to synchronize
multiple RTP messages at the same time.
ī‚¨ It is especially critical when both audio and video are
transmitted.
Receiver Report
106
ī‚¨ The receiver report is intended for passive
participants who do not send RTP packets.
ī‚¨ The report informs the sender and other recipients
about the service's quality.
Source Description Message
107
ī‚¨ A source description message is sent by the source
on a regular basis to provide additional information
about itself.
ī‚¨ The name, e-mail address, phone number, and
address of the source's owner or controller can be
included in this information.
Bye Message
108
ī‚¨ To close a stream, a source sends a bye message. It
enables the source to announce its departure from
the conference. Other sources can detect a lack of
a source, but this message is a direct announcement.
Application-Specific Message
109
ī‚¨ A packet for an application that wants to use new
applications is called an application-specific
message. It enables the creation of new message
types.
110
ī‚¨ UDP Port
ī‚¨ RTPC uses a temporary port. RTCP uses an odd-
numbered UDP port number that follows the port
number selected for RTP.
Voice Over IP
111
ī‚¨ Voice over IP or Internet telephony is a real-time interactive
audio/video application.
ī‚¨ The concept here is to use the Internet as a telephone
network with some added features.
ī‚¨ This application allows two parties to communicate over a
packet-switched Internet.
ī‚¨ SIP and H.323 are two protocols designed specifically for
this type of communication.
ī‚¨ They are discussed briefly here.
SIP (Session Initiation Protocol)
112
ī‚¨ Session Initiation Protocol (SIP) is an application layer
protocol and is created by IETE.
ī‚¨ It establishes, manages, and terminates a multimedia
session (call).
ī‚¨ It allows you to create two-party, multi-party, or
multicast sessions.
ī‚¨ SIP is designed to run on UDP, TCP, or SCTP, regardless
of the underlying transport layer.
Messages
113
ī‚¨ SIP, like HTTP, is a text-based protocol.
ī‚¨ Six messages are used in SIP, shown in Figure.
114
ī‚¨ A header and a body are included in each SIP message. The header is made up of several
lines that describe the message's structure, caller capability, media type, and other details.
ī‚¨ SIP messages are described as follows.
ī‚¨ INVITE: The caller initializes a session with the INVITE message.
ī‚¨ ACK: After the callee answers the call, the caller sends an ACK message for confirmation.
ī‚¨ BYE: The BYE message terminates a session.
ī‚¨ OPTIONS: The OPTIONS message queries a machine about its capabilities.
ī‚¨ CANCEL: The CANCEL message cancels an already started initialization process.
ī‚¨ REGISTER: The REGISTER message makes a connection when the callee is not available.
Addresses
115
ī‚¨ SIP is a very adaptable protocol. To identify the
sender and receiver in SIP, an e-mail address, an IP
address, a phone number, and other types of
addresses can be used.
ī‚¨ However, the address must be in SIP format. Some
common formats are shown in Figure
116
SIP formats
SIP Session
ī‚¨ A basic SIP session
comprises three modules:
Establishing,
Communicating, and
Terminating.
ī‚¨ Figure depicts a simple
SIP session.
117
118
ī‚¨ Establishing a Session
ī‚¨ In order to establish a session in SIP, a three-way
handshake is required. To initiate communication, the
caller sends an INVITE message via UDP, TCP, or SCTP. If
the callee agrees to begin the session, she sends a
reply message. The caller sends an ACK message to
confirm that a reply code has been received.
119
ī‚¨ Communicating
ī‚¨ After the session is established, the caller and callee
can communicate via two temporary ports.
ī‚¨ Terminating the Session
ī‚¨ The session can be ended by either party sending a
BYE message.
Tracking the Callee
120
ī‚¨ SIP has a mechanism (similar to DNS) for determining the IP
address of the terminal where the callee is seated.
ī‚¨ SIP employs the concept of registration to carry out this
tracking.
ī‚¨ Some servers are designated as registrars by SIP.
ī‚¨ At any given time, a user is registered with at least one
registrar server, which is aware of the callee's IP address.
121
ī‚¨ When a caller needs to communicate with the callee, the caller can use the
e-mail address in the INVITE message instead of the IP address.
ī‚¨ The message is routed through a proxy server.
ī‚¨ The proxy server sends a lookup message to the registrar server that has
the callee's information.
ī‚¨ When the proxy server receives a reply message from the registrar server,
it inserts the newly discovered IP address of the callee into the caller's
INVITE message.
ī‚¨ This message is then delivered to the callee.
ī‚¨ The procedure is depicted in Figure
122
H.323
123
ī‚¨ Architecture
ī‚¨ H.323 is a standard developed by ITV that allows
telephones on the public telephone network to
communicate with computers connected to the
Internet (referred to as terminals in H.323).
ī‚¨ The general architecture of H.323 is depicted in
Figure
H.323 Architecture
124
125
ī‚¨ A gateway is a device that connects the Internet to the
telephone network.
ī‚¨ A gateway is a five-layer device that can convert a
message from one protocol stack to another.
ī‚¨ The gateway in this case does the same thing.
ī‚¨ It converts a message from a telephone network to an
Internet message.
ī‚¨ As we discussed in the SIP, the gatekeeper server on the
local area network serves as the registrar server.
Protocols
ī‚¨ To establish and
maintain voice (or
video) communication,
H.323 employs several
protocols.
ī‚¨ These protocols are
depicted in Figure
126
127
ī‚¨ H.323 compresses using G.71 or G.723.1.
ī‚¨ It employs the H.245 protocol, which allows the parties
to negotiate the compression method.
ī‚¨ Q.931 protocol is used to establish and terminate
connections.
ī‚¨ For registration with the gatekeeper, another protocol
called H.225, or RAS (Registration, Administration,
Status), is used.
128
129
ī‚¨ Let us use a simple example to demonstrate the operation of
telephone communication using H.323.
ī‚¨ Figure 2.27 depicts the steps that a terminal takes to communicate
with a telephone.
1. The gatekeeper receives a broadcast message from the terminal.
The gatekeeper responds by providing its IP address.
2. The terminal and gatekeeper communicate via H.225, which is used
to negotiate bandwidth.
3. Q.931 is used to establish a connection between the terminal,
gatekeeper, gateway, and telephone.
130
4. To negotiate the compression method, the terminal,
gatekeeper, gateway, and telephone use H.245 to
communicate.
5. RTP is used by the terminal, gateway, and telephone to
exchange audio under the management of RTCP.
6. To terminate the communication, the terminal,
gatekeeper, gateway, and telephone use Q.931.
References
131
1. Data communications and networking by Behrouz Forouzan 4th/5th edition,
McGraw Hill Pvt Ltd.
2. Computer Networks by Andrew S Tanenbaum, 4th/5th edition, Pearson
Education
3. Cryptography and Network Security: Principles and Practice, William Stallings,
7th edition, Pearson Education
4. Network Security Essentials: Applications and Standards (For VTU), William
Stallings, 3rd edition, Pearson Education

More Related Content

Similar to Multimedia.pdf

VII Compression Introduction
VII Compression IntroductionVII Compression Introduction
VII Compression Introductionsangusajjan
 
image basics and image compression
image basics and image compressionimage basics and image compression
image basics and image compressionmurugan hari
 
J03502050055
J03502050055J03502050055
J03502050055theijes
 
Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Joel P
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data CompressionPratik Pradhan
 
Optical recording system
Optical recording systemOptical recording system
Optical recording systemTamilarasan N
 
Chapter 3- Media Representation and Formats.ppt
Chapter 3- Media Representation and Formats.pptChapter 3- Media Representation and Formats.ppt
Chapter 3- Media Representation and Formats.pptVasanthiMuniasamy2
 
Compressionbasics
CompressionbasicsCompressionbasics
CompressionbasicsRohini R Iyer
 
Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2engLamaMahmoud
 
notes_Image Compression_edited.ppt
notes_Image Compression_edited.pptnotes_Image Compression_edited.ppt
notes_Image Compression_edited.pptHarisMasood20
 
Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)Godo Dodo
 
MULTECH2 LESSON 5.pdf
MULTECH2 LESSON 5.pdfMULTECH2 LESSON 5.pdf
MULTECH2 LESSON 5.pdfRayCenteno1
 
Chapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.pptChapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.pptBinyamBekele3
 
Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237Kunal mane
 

Similar to Multimedia.pdf (20)

B070306010
B070306010B070306010
B070306010
 
Mpeg 2
Mpeg 2Mpeg 2
Mpeg 2
 
Image compression and jpeg
Image compression and jpegImage compression and jpeg
Image compression and jpeg
 
VII Compression Introduction
VII Compression IntroductionVII Compression Introduction
VII Compression Introduction
 
image basics and image compression
image basics and image compressionimage basics and image compression
image basics and image compression
 
J03502050055
J03502050055J03502050055
J03502050055
 
Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data Compression
 
Optical recording system
Optical recording systemOptical recording system
Optical recording system
 
Chapter 3- Media Representation and Formats.ppt
Chapter 3- Media Representation and Formats.pptChapter 3- Media Representation and Formats.ppt
Chapter 3- Media Representation and Formats.ppt
 
HDTV
HDTVHDTV
HDTV
 
Compressionbasics
CompressionbasicsCompressionbasics
Compressionbasics
 
Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2
 
Jpeg and mpeg ppt
Jpeg and mpeg pptJpeg and mpeg ppt
Jpeg and mpeg ppt
 
notes_Image Compression_edited.ppt
notes_Image Compression_edited.pptnotes_Image Compression_edited.ppt
notes_Image Compression_edited.ppt
 
Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)
 
MULTECH2 LESSON 5.pdf
MULTECH2 LESSON 5.pdfMULTECH2 LESSON 5.pdf
MULTECH2 LESSON 5.pdf
 
Chapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.pptChapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
 
Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
đŸŦ The future of MySQL is Postgres 🐘
đŸŦ  The future of MySQL is Postgres   🐘đŸŦ  The future of MySQL is Postgres   🐘
đŸŦ The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
đŸŦ The future of MySQL is Postgres 🐘
đŸŦ  The future of MySQL is Postgres   🐘đŸŦ  The future of MySQL is Postgres   🐘
đŸŦ The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Multimedia.pdf

  • 2. Objectives 2 ī‚¨ To understand the terminologies of multimedia systems. ī‚¨ To study protocols used in multimedia applications. ī‚¨ To know different hardware and software components required to run multimedia. ī‚¨ To evaluate multimedia services that satisfy user requirements.
  • 3. Introduction 3 ī‚¨ The way we utilize audio and video has evolved as a result of recent technological advancements. ī‚¨ In the past, we would listen to an audio broadcast on the radio and watch a video show on the television. ī‚¨ People nowadays desire to utilize the Internet for audio and video services in addition to text and image communications. ī‚¨ This chapter focuses on programs that provide audio and video services via the Internet.
  • 4. 4 ī‚¨ Audio and video services may be divided into three main categories:
  • 5. Streaming Stored Audio/Video 5 ī‚¨ The files are compressed and saved on a server using this approach. ī‚¨ The files are downloaded by a client through the Internet. ī‚¨ It is named On-demand audio/video. ī‚¨ E.g. ī‚¨ Stored audio files: songs, symphonies, books on tape, and popular lectures. ī‚¨ Stored video files: movies, TV shows, and music video clips.
  • 6. Streaming Live Audio/Video 6 ī‚¨ Streaming live audio/video refers to the broadcasting of radio and TV programs through the Internet. ī‚¨ A user listens to broadcast audio and video through the Internet. ī‚¨ E.g. Internet Radio. Some radio stations solely transmit their programming via the Internet, while others broadcast them both over the Internet and over the air.
  • 7. Interactive Audio/Video 7 ī‚¨ Interactive audio/video refers to the use of the Internet for interactive audio/video applications. ī‚¨ E.g. Internet telephony and Internet teleconferencing.
  • 8. Digitizing Audio and Video 8 ī‚¨ Before audio or video signals can be transmitted over the Internet, they must first be digitized. ī‚¨ Digitizing Audio ī‚¨ Digitizing Video
  • 9. Digitizing Audio 9 ī‚¨ When sound is supplied into a microphone, an electrical analog signal is produced that represents the amplitude of the sound as a function of time. ī‚¨ These signals are named analog audio signals.
  • 10. 10 ī‚¨ An analog signal, such as audio, can be digitized to produce a digital signal. ī‚¨ According to the Nyquist theorem, if the highest frequency of the signal is f, we need to sample the signal 21 times per second.
  • 11. Digitizing Video 11 ī‚¨ A video is made up of a series of frames. We receive the sensation of motion if the frames are presented on the screen quickly enough. ī‚¨ The reason for this is that our eyes cannot differentiate between the quickly flashing frames and individual frames.
  • 12. 12 ī‚¨ There is no standard for the number of frames per second; nevertheless, 25 frames per second is popular in North America. ī‚¨ A frame must be refreshed to avoid a situation known as flickering(change in brightness). ī‚¨ Each frame is repainted twice in the television industry. ī‚¨ This implies 50 frames must be delivered, or 25 frames if memory is available at the sender site, with each frame repainted from memory.
  • 13. 13 ī‚¨ Each frame is subdivided into picture elements, or pixels, which are tiny grids. ī‚¨ Each 8-bit pixel on black-and-white television represents one of 256 distinct grey levels. Each pixel on a color TV is 24 bits, with 8 bits for each basic color (red, green, and blue). ī‚¨ We can calculate the number of bits in 1s for a specific resolution. ī‚¨ A color frame with the lowest resolution is 1024 × 768 pixels. This equates to ī‚¨ 2 x 25 x 1024 x768x 24=944 Mbps.
  • 14. Audio and Video Compression 14 ī‚¨ Compression is required when sending audio or video over the Internet.
  • 15. 1. Audio Compression 15 ī‚¨ Speech and music may both benefit from audio compression. We need to compress a 64-kHz digitized signal for speech, and a 1.41 I-MHz signal for music. ī‚¨ There are two kinds of techniques for audio compression: ī‚¤ Predictive Encoding ī‚¤ Perceptual Encoding
  • 16. Predictive Encoding 16 ī‚¨ Instead of storing all of the sampled values, predictive encoding encodes the changes between the samples. ī‚¨ Speech compression is the most common use for this sort of compression. ī‚¨ GSM (13 kbps), G.729 (8 kbps), and G.723.3 are some of the standards that have been established (6.4 or 5.3 kbps).
  • 17. Perceptual Encoding: MP3 17 ī‚¨ The perceptual encoding approach is the most popular compression technique used to generate CD-quality audio. ī‚¨ This kind of audio requires at least 1.411 Mbps, which cannot be sent without compression via the Internet. ī‚¨ This method is used by MP3 (MPEG audio layer 3), which is part of the MPEG standard.
  • 18. 18 ī‚¨ Perceptual audio coding is a type of audio signal compression method that is based on human ear defects. ī‚¨ Perceptual encoding is based on the science of psychology concerned with the perception of sound and its physiological effects. ī‚¨ The concept is based on defects in our auditory system, which allows some sounds to hide other sounds. Masking can occur in both frequency and time.
  • 19. 19 ī‚¨ Frequency masking: A strong sound in one frequency band can partially or completely hide a lower sound in another frequency range, which is known as frequency masking. ī‚¨ E.g. We cannot hear the words of a person who is sitting beside us in a room where an Arkestra in loud sound is playing.
  • 20. 20 ī‚¨ Temporal masking: A loud sound can affect our hearing for a short period after it has ended in temporal masking.
  • 21. 21 ī‚¨ MP3 compresses audio signals by using frequency and temporal masking. MP3 has three different data rates: 96 kbps, 128 kbps, and 160 kbps. ī‚¨ The rate is determined by the frequency range of the original analog audio.
  • 22. Video Compression 22 ī‚¨ Video is comprised of multiple frames, and each of the frames is an image. ī‚¨ Video can be compressed by compressing the images. ī‚¨ The market is dominated by two standards: ī‚¤ Joint Photographic Experts Group (JPEG) and ī‚¤ Moving Picture Experts Group (MPEG). ī‚¨ Images are compressed using the Joint Photographic Experts Group (JPEG). ī‚¨ Video is compressed using the Moving Picture Experts Group (MPEG).
  • 23. Image Compression: JPEG 23 ī‚¨ In the grayscale picture, each pixel can be represented by an 8-bit integer (256 levels). ī‚¨ The picture is in color, each pixel can be represented by 24 bits (3 x 8 bits), with every 8 bits representing red, blue, or green (RBG).
  • 24. 24 ī‚¨ A grayscale image is split into 8 × 8-pixel blocks in JPEG . ī‚¨ The goal of splitting the image into blocks is to reduce the number of computations since the number of mathematical operations for each picture is equal to the square of the number of units.
  • 25. 25 Figure 2.2 JPEG grayscale
  • 26. 26 ī‚¨ JPEG's entire concept is to convert the image into a linear (vector) set of numbers that shows the redundancies. ī‚¨ Using one of the text compression methods, the redundancies (lack of changes) may then be eliminated.
  • 28. Discrete Cosine Transform (DCT) 28 ī‚¨ During this phase, each block of 64 pixels is transformed using the discrete cosine transform (DCT). ī‚¨ The transformation modifies the 64 values, preserving the relative connections between pixels while revealing the redundancies. ī‚¨ We present the transformation outcomes for three different situations.
  • 29. 29 ī‚¨ We present the transformation outcomes for three different situations. Case 1:Uniform Gray Scale Case 2:Two Sections Case 3:Gradient Gray Scale
  • 30. Case 1:Uniform Gray Scale 30 ī‚¨ In this case, we have a grayscale block with a value of 20 for each pixel. ī‚¨ We receive a nonzero value for the first element (upper left corner) when we perform the transformations and the remaining of pixels have a 0 value. ī‚¨ The value of T(0,0) is the average (multiplied by a constant) of the P(x,y) values and is called the dc value (direct current, borrowed from electrical engineering).
  • 31. 31 ī‚¨ The remaining values are called ac values, in which T(m,n) represents changes in the pixel values. As shown in Figure the rest of the values are 0s. Case 1: Uniform Gray Scale
  • 32. Case 2: Two Sections 32 ī‚¨ In the second example, we have a block that has two distinct uniform greyscale sections. ī‚¨ The pixel values have changed significantly (from 20 to 50). ī‚¨ We receive a dc value as well as nonzero ac values when we perform the transformations. ī‚¨ However, the dc value is surrounded by just a few nonzero values. As per Figure 2.5, the majority of the values are zero.
  • 33. 33 Case 2: Two Sections
  • 34. Case 3:Gradient Gray Scale 34 ī‚¨ In the third case, we have a block that slowly transforms. ī‚¨ That is, there is no significant difference in the values of nearby pixels. ī‚¨ When we do the transformations, we obtain a dc value along with several nonzero ac values as shown in Figure
  • 35. 35 Case 3: Gradient Gray Scale
  • 36. 36 From the all above cases we can conclude that: ī‚¨ The transformation creates table T from table P. ī‚¨ The dc value is the average value (multiplied by a constant) of the pixels. ī‚¨ The ac values are the changes. ī‚¨ Lack of changes in neighboring pixels creates 0s.
  • 37. Quantization 37 ī‚¨ Quantization is the process of reducing the number of bits needed to store an integer value by reducing the precision of the integer. ī‚¨ Previously, when we quantized each number, we removed the fraction and preserved the integer part. ī‚¨ The number is divided by a constant, and the fraction is then dropped. ī‚¨ This further reduces the number of bits required.
  • 38. 38 ī‚¨ A quantizing table (8 x 8) is used in most implementations to specify how to quantize each value. ī‚¨ The divisor is determined by the value's position in the T table. ī‚¨ This is done to optimize the number of bits and 0s for each specific application.
  • 39. 39 ī‚¨ The quantizing step is the only part of the process that cannot be reversed. ī‚¨ We've lost some information that can't be recovered. ī‚¨ Due to this reason, only JPEG is called lossy compression because of this quantization phase.
  • 40. Compression 40 ī‚¨ The values are read from the table after quantization, and redundant 0s are eliminated. ī‚¨ The table is read diagonally in a zigzag way rather than row by row or column by column to cluster the 0s together. ī‚¨ The reason behind this is if the picture changes smoothly, the bottom right corner of the T table is all 0s. ī‚¨ Figure depicts the process of reading the table.
  • 42. Video Compression: MPEG 42 ī‚¨ A motion picture is a fast sequence of frames, each of which represents an image. ī‚¨ To put it another way, a frame is a spatial combination of pixels, whereas a video is a temporal combination of frames transmitted one after the other. ī‚¨ Compressing video means spatially compressing each frame and temporally compressing a set of frames.
  • 43. 43 ī‚¨ Spatial Compression ī‚¨ JPEG is used to compress each frame's spatial data. Each frame is an image that may be compressed separately.
  • 44. 44 Temporal Compression ī‚¨ Duplicate frames are eliminated during temporal compression. ī‚¨ We get 50 frames per second when we watch television. ī‚¨ However, the majority of the frames in a sequence are nearly identical. ī‚¨ E.g. When someone is speaking, the majority of the frame remains the same from one frame to the next, with the exception of the segment of the frame around the lips, which varies from one frame to the next.
  • 45. 45 ī‚¨ For temporal data compression, the MPEG method divides frames into three types: ī‚¤ I-frames ī‚¤ P-frames ī‚¤ B-frames
  • 46. I-Frames (Intracoded Frame) 46 ī‚¨ It is a frame that exists independently of any other frame (not to the frame sent before or to the frame sent after). ī‚¨ They are not constructed by other frames. ī‚¨ They arrive at regular intervals (e.g., every ninth frame is an I-frame). ī‚¨ An I-frame must appear on a regular basis to manage a rapid change in the frame that the preceding and subsequent frames are unable to display
  • 47. 47 ī‚¨ A viewer may tune in at any moment when a video is shown. ī‚¨ If there is only one I-frame at the start of the show, late viewers will not get a complete picture.
  • 48. P-Frames (Predicted Frame) 48 ī‚¨ It is related to the previous I-frame or P-frame. ī‚¨ Each P-frame only contains the differences from the previous frame. ī‚¨ E.g. if an object is moving quickly, the new changes may not be recorded in a P-frame. P-frames can only be built from previous I- or P-frames. ī‚¨ P-frames carry significantly less information than other frame types and even fewer bits after compression.
  • 49. B-Frames (Bidirectional Frame) 49 ī‚¨ It is related to the I-frame or P-frame that comes before and after it. Each B-frame is relative to the past and future. It should be noted that a B-frame is never related to another B-frame. ī‚¨ Figure depicts a sample frame sequence.
  • 51. 51 ī‚¨ Figure depicts the construction of I-Frames, P-Frames, and B-frames from a series of seven frames. MPEG Frame Construction
  • 52. Streaming Stored Audio/Video 52 ī‚¨ In this section, we will discuss different approaches for downloading Streaming Stored audio/video files from the webserver.
  • 53. First Approach: Using a Web Server 53 ī‚¨ You can save a compressed audio/video file as a text file. ī‚¨ To download the file, the client (browser) can use HTTP services and send a GET message. ī‚¨ The compressed file can be sent to the browser by the Web server. ī‚¨ The browser can then play the file using an application, referred to as a media player. ī‚¨ This method is very simple and clear and does not require any streaming. ī‚¨ This method is depicted in Figure 2.10.
  • 54. 54 Using a Web Server
  • 55. Drawbacks 55 This method has several drawbacks. ī‚¨ Even after compression, an audio/video file is usually quite large. ī‚¨ A video file and audio file require lots of megabits to store. ī‚¨ The file must be completely downloaded before it can be played. ī‚¨ With today's data rates, the user will have to wait a few seconds or even tens of seconds before the file can be played.
  • 56. Second Approach: Using a Web Server with Metafile 56 ī‚¨ This approach involves connecting the media player directly to the Web server and downloading the audio/video file. ī‚¨ The audio/video file and a metafile containing information about the audio/video file are both stored on the Web server. ī‚¨ The steps in this approach are depicted in Figure
  • 57. 57 Using a Web Server with a Metafile
  • 58. 58 1. The HTTP client accesses the Web server by using the GET message. 2. The information about the metafile comes in the response. 3. The metafile is passed to the media player. 4. The media player uses the URL in the metafile to access the audio/video file. 5. The Web server responds.
  • 59. Third Approach: Using a Media Server 59 ī‚¨ The issue with the second approach is that both the browser and the media player rely on HTTP services. ī‚¨ HTTP is intended to operate over TCP. ī‚¨ This is appropriate for retrieving the metafile but not the audio/video file. ī‚¨ The reason for this is that TCP retransmits a lost or damaged segment, which goes against the streaming philosophy.
  • 60. 60 ī‚¨ TCP and its error control must be dropped in favor of UDP. ī‚¨ HTTP connects to the Web server, and the Web server itself is designed to work with TCP; ī‚¨ Here, we need a separate server, a media server for the processing of the audio and video files.
  • 62. 62 1. The HTTP client accesses the Web server by using a GET message. 2. The information about the metafile comes in the response. 3. The metafile is forwarded to the media player. 4. The media player uses the URL in the metafile to access the media server to download the file. 5. The media server sends reply.
  • 63. Fourth Approach: Using a Media Server and RTSP 63 ī‚¨ The Real-Time Streaming Protocol (RTSP) is a control protocol that was created to enhance the functionality of the streaming process. ī‚¨ We can control the playback of audio/video using RTSP. ī‚¨ RTSP is an out-of-band control protocol similar to FTP's second connection. ī‚¨ A media server and RTSP are depicted in Figure
  • 64. 64 Using a Media Server and RTSP
  • 65. 65 1. The HTTP client accesses the Web server by using a GET message. 2. The information about the metafile comes in the response. 3. The metafile is passed to the media player. 4. The media player sends a SETUP message to create a connection with the media server. 5. The media server responds.
  • 66. 66 6. The media player sends a PLAY message to start playing (downloading). 7. The audio/video file is downloaded by using another protocol that runs over UDP. 8. The connection is broken by using the TEARDOWN message. 9. The media server responds.
  • 67. Streaming Live Audio/Video 67 ī‚¨ Streaming live audio/video follows the same strategy to broadcast audio and video on radio and television stations. ī‚¨ Only the difference is that the station uses the Internet for broadcasting instead of the air.
  • 68. 68 ī‚¨ Streaming stored audio/video and streaming live audio/video are both affected by delays, and neither can accept retransmission. ī‚¨ There is a distinction. ī‚¨ The communication in the first application is unicast and on-demand. ī‚¨ The communication is multicast and live in the second.
  • 69. 69 ī‚¨ Live streaming is better suited to IP multicast services and protocols like UDP and RTP. ī‚¨ However, live streaming is still using TCP and multiple unicasting rather than multicasting.
  • 70. Real-Time Interactive Audio/Video 70 ī‚¨ In Real-Time Interactive Audio/Video people interact with each other in real-time. ī‚¨ E.g. Internet phone or voice over IP and Video conferencing.
  • 71. Characteristics 71 ī‚¨ we discuss several characteristics of real-time audio/video communication. 1. Time Relationship 2. Timestamp 3. Playback Buffer 4. Ordering 5. Multicasting 6. Translation 7. Mixing
  • 72. Time Relationship 72 ī‚¨ The preservation of the time relationship between packets of a session is required for real-time data on a packet-switched network. ī‚¨ For Example: let us assume that a real time video server creates live video images and sends them online. ī‚¨ The video is digitized and packetized. ī‚¨ There are only three packets and each packet holds 10s of video information.
  • 74. 74 ī‚¨ But what if the packets arrive at different times? ī‚¨ Assume the ī‚¤ first packet arrives at 00:00:01 (1-s delay), ī‚¤ the second at 00:00:15 (5-s delay), ī‚¤ and the third at 00:00:27. (7-s delay). ī‚¨ If the receiver begins to play the first packet at 00:00:01, it will end at 00:00:11. ī‚¨ The next packet, however, has not yet arrived; it will arrive 4 seconds later.
  • 75. 75 ī‚¨ As the video is viewed at the remote site, there is a gap between the first and second packets, and between the second and third. ī‚¨ This is referred to as jitter. ī‚¨ The delay between packets causes jitter in real-time data. ī‚¨ The situation is depicted in Figure
  • 76. 76 ī‚¨ Assume, for example, that a real-time video server generates and distributes live video images over the internet. ī‚¨ Video has been digitized and packetized. ī‚¨ There are only three packets, and each packet contains 10s of video data. ī‚¨ The first packet begins at 00:00:00, the second packet at 00:10, and the third packet at 00:20.
  • 77. 77 ī‚¨ Assume that each packet takes 1 second to reach its destination (equal delay). ī‚¨ The first packet can be played back at 00:00:01, the second packet at 00:00:11, and the third packet at 00:00:21. ī‚¨ Despite the fact that there is a 1s time difference between what the server sends and what the client sees on the computer screen, the action is taking place in real-time. ī‚¨ The packets' time relationship is maintained. The 1s lag is insignificant.
  • 79. Timestamp 79 ī‚¨ To prevent Jitter, we can time-stamp the packets and separate the arrival time from the playback time. ī‚¨ The use of a timestamp is one solution to Jitter. If each packet contains a timestamp indicating the time it was created in relation to the first (or previous) packet, the receiver can add this time to the time it begins playback.
  • 80. 80 ī‚¨ In other words, the receiver knows when to play each packet. ī‚¨ Consider the previous example, where the first packet has a timestamp of 0, the second has a timestamp of 10, and the third has a timestamp of 20. ī‚¨ If the receiver begins playing the first packet at 00:00:08, the second at 00:00:18, and the third at 00:00:28. ī‚¨ There are no gaps between packets. The situation is depicted in Figure
  • 82. Playback Buffer 82 ī‚¨ We need a buffer to store the data until it is played back so that we can separate the arrival time from the playback time. ī‚¨ The buffer is known as a playback buffer. ī‚¨ When a session starts (the first bit of the first packet arrives), the receiver defers playing the data until a certain threshold is reached. ī‚¨ The first bit of the first packet arrives at 00:00:01 in the preceding example; the threshold is 7 s, and the playback time is 00:00:08. ī‚¨ The threshold is measured in data time units. ī‚¨ The replay does not begin until the data time units reach the threshold value.
  • 83. 83 ī‚¨ The data is stored in the buffer at a variable rate, but it is extracted and played back at a constant rate. ī‚¨ The amount of data in the buffer shrinks or expands, but there is no jitter as long as the delay is less than the time it takes to playback the threshold amount of data. ī‚¨ For our example, Figure depicts the buffer at various times.
  • 85. Ordering 85 ī‚¨ One more feature is required in addition to time relationship information and timestamps for real-time traffic. ī‚¨ Each packet requires a sequence number. ī‚¨ If a packet is lost, the timestamp alone will not alert the receiver. ī‚¨ Let's pretend the timestamps are 0, 10, and 20. ī‚¨ The receiver receives only two packets with timestamps 0 and 20 if the second packet is lost.
  • 86. 86 ī‚¨ The receiver assumes the packet with the timestamp 20 is the second packet, which was sent 20 seconds after the first. ī‚¨ The receiver has no way of knowing whether or not the second packet was lost. ī‚¨ To deal with this situation, you'll need a sequence number to order the packets.
  • 87. Multicasting 87 ī‚¤ Audio and video conferencing rely heavily on multimedia. ī‚¤ The data is distributed using multicasting methods because the traffic can be heavy. ī‚¤ Two-way communication between receivers and senders is required for conferencing.
  • 88. Translation 88 ī‚¨ A translator is a computer that can change the format of a high-bandwidth video signal to a lower-quality narrow-bandwidth signal. ī‚¨ This is required, for example, when a source generates a high-quality video signal at 5 Mbps and sends it to a recipient with a bandwidth of less than 1 Mbps. ī‚¨ A translator is required to decode the signal and encode it again at a lower quality that requires less bandwidth in order to receive it.
  • 89. Mixing 89 ī‚¨ When multiple sources can send data at the same time (as in a video or audio conference), the traffic is divided into multiple streams. ī‚¨ Data from various sources can be mixed to converge traffic to a single stream. ī‚¨ A mixer mathematically combines signals from various sources to produce a single signal.
  • 90. Support from Transport Layer Protocol 90 ī‚¨ Some of the procedures in real-time applications are preferable to implement in the transport layer protocol. ī‚¨ Let's take a look at which of the existing transport layers is appropriate for this type of traffic.
  • 91. 91 ī‚¨ Mainly TCP and UDP are two transport layer protocols. TCP is not appropriate for interactive traffic. ī‚¨ It does not support time-stamping and multicasting. ī‚¨ The error control mechanism supported by TCP is not suitable for interactive traffic as retransmission of the lost or corrupted packet is not expected. ī‚¨ The concept of time-stamping and playback is thrown off by retransmission. ī‚¨ Today's audio and video signals have so much redundancy (even with compression) that we can simply ignore a lost packet. ī‚¨ The listener or viewer at the remote location may miss it.
  • 92. 92 ī‚¨ For interactive multimedia traffic, UDP is better. ī‚¨ Multicasting is supported by UDP, but there is no retransmission strategy. ī‚¨ UDP, on the other hand, does not support time-stamping, sequencing, or mixing. ī‚¨ These features are provided by the Real-time Transport Protocol (RTP), a new transport protocol. ī‚¨ For interactive traffic, UDP is preferable to TCP. ī‚¨ However, we require the services of RTP, a different transport layer protocol, to compensate for UDP's shortcomings.
  • 93. RTP (Real-time Transport Protocol) 93 ī‚¨ The Real-time Transport Protocol (RTP) is a protocol designed to handle real-time Internet traffic. ī‚¨ RTP lacks a delivery mechanism (multicasting, port numbers, and so on). ī‚¨ It must be used in conjunction with UDP. RTP acts as a bridge between UDP and the application program. ī‚¨ RTP's primary contributions are time-stamping, sequencing, and mixing capabilities. ī‚¨ RTP's position in the protocol suite is sketched in Figure
  • 95. RTP-Packet Format 95 ī‚¨ The format is simple and broad enough to cover a wide range of real-time applications. ī‚¨ If an application requires additional data, it adds it to the beginning of its payload. ī‚¨ The RTP packet header is shown in Figure
  • 97. 97 ī‚¨ Ver (2-bits) :It defines the version number. The current version is 2. ī‚¨ P (1-bit):If this field is set to 1, it indicates the appearance of padding at the end of the packet. The value of the last byte in the padding defines the length of the padding. There is no padding if the value of the P field is 0. ī‚¨ X (1-bit):If this field is set to 1, it indicates an extra extension header between the basic header and the data. If this field is set to 0 then, no extra extension header.
  • 98. 98 ī‚¨ Contributor Count (4-bits):It gives the count of Contributors. We can have a maximum of 15 contributors (between 0 and 15). ī‚¨ M (1-bit):It is used by the application as a marker. It indicates, for example, the end of its data. ī‚¨ Payload Type (7-bits):It gives the type of payload. Several Payload Types are defined but Table 2.1 describes some of the payload types and the applications.
  • 100. 100 ī‚¨ Sequence Number (16-bits) This field is used to give the number to the RTP packets. The first packet's sequence number is chosen at random, and it is increased by one for each subsequent packet. The receiver uses the sequence number to detect lost or out-of-order packets. ī‚¨ Timestamp (32-bits) This field indicates the time relationship between the packets. The first packet's timestamp is a random number. The value for each subsequent packet is the sum of the preceding timestamp plus the time the first byte is produced.
  • 101. 101 ī‚¨ Synchronization Source Identifier (32-bits) In the case of only one source, this field defines the source. If there are multiple sources, the mixer serves as the synchronization source, while the other sources serve as contributors. The source identifier's value is a random number chosen by the source. ī‚¨ Contributor Identifier (32-bits) Each of these 32-bit identifiers (up to 15 in total) defines a source. When there are multiple sources in a session, the mixer serves as the synchronization source, while the remaining sources serve as contributors.
  • 102. 102 ī‚¨ Despite the fact that RTP is a transport layer protocol, the RTP packet is not directly encapsulated in an IP datagram. Instead, RTP is encapsulated in a UDP user datagram and treated as an application program. ī‚¨ RTP does not have a well-known port assigned to it. ī‚¨ The port can be chosen at any time, with the exception that the port number must be an even number. ī‚¨ RTP's companion, Real-time Transport Control Protocol (RTCP), uses the next number (an odd number). ī‚¨ RTP uses a temporary even-numbered UDP port.
  • 103. RTCP(Real-time Transport Control Protocol) 103 ī‚¨ Real-time Transport Control Protocol (RTCP) is a protocol implemented to facilitate messages which regulate the flow and quality of data while also allowing the recipient to provide feedback to the source or sources. ī‚¨ Figure depicts the five types of messages supported by RTCP. The number next to each box denotes the message's type.
  • 105. Sender Report 105 ī‚¨ The active senders in a conference send the sender report on a regular basis to report transmission and reception statistics for all RTP packets sent during the interval. ī‚¨ The sender report includes an absolute timestamp, which is the number of seconds since 12:00 a.m. on January 1, 1970. ī‚¨ The absolute timestamp enables the receiver to synchronize multiple RTP messages at the same time. ī‚¨ It is especially critical when both audio and video are transmitted.
  • 106. Receiver Report 106 ī‚¨ The receiver report is intended for passive participants who do not send RTP packets. ī‚¨ The report informs the sender and other recipients about the service's quality.
  • 107. Source Description Message 107 ī‚¨ A source description message is sent by the source on a regular basis to provide additional information about itself. ī‚¨ The name, e-mail address, phone number, and address of the source's owner or controller can be included in this information.
  • 108. Bye Message 108 ī‚¨ To close a stream, a source sends a bye message. It enables the source to announce its departure from the conference. Other sources can detect a lack of a source, but this message is a direct announcement.
  • 109. Application-Specific Message 109 ī‚¨ A packet for an application that wants to use new applications is called an application-specific message. It enables the creation of new message types.
  • 110. 110 ī‚¨ UDP Port ī‚¨ RTPC uses a temporary port. RTCP uses an odd- numbered UDP port number that follows the port number selected for RTP.
  • 111. Voice Over IP 111 ī‚¨ Voice over IP or Internet telephony is a real-time interactive audio/video application. ī‚¨ The concept here is to use the Internet as a telephone network with some added features. ī‚¨ This application allows two parties to communicate over a packet-switched Internet. ī‚¨ SIP and H.323 are two protocols designed specifically for this type of communication. ī‚¨ They are discussed briefly here.
  • 112. SIP (Session Initiation Protocol) 112 ī‚¨ Session Initiation Protocol (SIP) is an application layer protocol and is created by IETE. ī‚¨ It establishes, manages, and terminates a multimedia session (call). ī‚¨ It allows you to create two-party, multi-party, or multicast sessions. ī‚¨ SIP is designed to run on UDP, TCP, or SCTP, regardless of the underlying transport layer.
  • 113. Messages 113 ī‚¨ SIP, like HTTP, is a text-based protocol. ī‚¨ Six messages are used in SIP, shown in Figure.
  • 114. 114 ī‚¨ A header and a body are included in each SIP message. The header is made up of several lines that describe the message's structure, caller capability, media type, and other details. ī‚¨ SIP messages are described as follows. ī‚¨ INVITE: The caller initializes a session with the INVITE message. ī‚¨ ACK: After the callee answers the call, the caller sends an ACK message for confirmation. ī‚¨ BYE: The BYE message terminates a session. ī‚¨ OPTIONS: The OPTIONS message queries a machine about its capabilities. ī‚¨ CANCEL: The CANCEL message cancels an already started initialization process. ī‚¨ REGISTER: The REGISTER message makes a connection when the callee is not available.
  • 115. Addresses 115 ī‚¨ SIP is a very adaptable protocol. To identify the sender and receiver in SIP, an e-mail address, an IP address, a phone number, and other types of addresses can be used. ī‚¨ However, the address must be in SIP format. Some common formats are shown in Figure
  • 117. SIP Session ī‚¨ A basic SIP session comprises three modules: Establishing, Communicating, and Terminating. ī‚¨ Figure depicts a simple SIP session. 117
  • 118. 118 ī‚¨ Establishing a Session ī‚¨ In order to establish a session in SIP, a three-way handshake is required. To initiate communication, the caller sends an INVITE message via UDP, TCP, or SCTP. If the callee agrees to begin the session, she sends a reply message. The caller sends an ACK message to confirm that a reply code has been received.
  • 119. 119 ī‚¨ Communicating ī‚¨ After the session is established, the caller and callee can communicate via two temporary ports. ī‚¨ Terminating the Session ī‚¨ The session can be ended by either party sending a BYE message.
  • 120. Tracking the Callee 120 ī‚¨ SIP has a mechanism (similar to DNS) for determining the IP address of the terminal where the callee is seated. ī‚¨ SIP employs the concept of registration to carry out this tracking. ī‚¨ Some servers are designated as registrars by SIP. ī‚¨ At any given time, a user is registered with at least one registrar server, which is aware of the callee's IP address.
  • 121. 121 ī‚¨ When a caller needs to communicate with the callee, the caller can use the e-mail address in the INVITE message instead of the IP address. ī‚¨ The message is routed through a proxy server. ī‚¨ The proxy server sends a lookup message to the registrar server that has the callee's information. ī‚¨ When the proxy server receives a reply message from the registrar server, it inserts the newly discovered IP address of the callee into the caller's INVITE message. ī‚¨ This message is then delivered to the callee. ī‚¨ The procedure is depicted in Figure
  • 122. 122
  • 123. H.323 123 ī‚¨ Architecture ī‚¨ H.323 is a standard developed by ITV that allows telephones on the public telephone network to communicate with computers connected to the Internet (referred to as terminals in H.323). ī‚¨ The general architecture of H.323 is depicted in Figure
  • 125. 125 ī‚¨ A gateway is a device that connects the Internet to the telephone network. ī‚¨ A gateway is a five-layer device that can convert a message from one protocol stack to another. ī‚¨ The gateway in this case does the same thing. ī‚¨ It converts a message from a telephone network to an Internet message. ī‚¨ As we discussed in the SIP, the gatekeeper server on the local area network serves as the registrar server.
  • 126. Protocols ī‚¨ To establish and maintain voice (or video) communication, H.323 employs several protocols. ī‚¨ These protocols are depicted in Figure 126
  • 127. 127 ī‚¨ H.323 compresses using G.71 or G.723.1. ī‚¨ It employs the H.245 protocol, which allows the parties to negotiate the compression method. ī‚¨ Q.931 protocol is used to establish and terminate connections. ī‚¨ For registration with the gatekeeper, another protocol called H.225, or RAS (Registration, Administration, Status), is used.
  • 128. 128
  • 129. 129 ī‚¨ Let us use a simple example to demonstrate the operation of telephone communication using H.323. ī‚¨ Figure 2.27 depicts the steps that a terminal takes to communicate with a telephone. 1. The gatekeeper receives a broadcast message from the terminal. The gatekeeper responds by providing its IP address. 2. The terminal and gatekeeper communicate via H.225, which is used to negotiate bandwidth. 3. Q.931 is used to establish a connection between the terminal, gatekeeper, gateway, and telephone.
  • 130. 130 4. To negotiate the compression method, the terminal, gatekeeper, gateway, and telephone use H.245 to communicate. 5. RTP is used by the terminal, gateway, and telephone to exchange audio under the management of RTCP. 6. To terminate the communication, the terminal, gatekeeper, gateway, and telephone use Q.931.
  • 131. References 131 1. Data communications and networking by Behrouz Forouzan 4th/5th edition, McGraw Hill Pvt Ltd. 2. Computer Networks by Andrew S Tanenbaum, 4th/5th edition, Pearson Education 3. Cryptography and Network Security: Principles and Practice, William Stallings, 7th edition, Pearson Education 4. Network Security Essentials: Applications and Standards (For VTU), William Stallings, 3rd edition, Pearson Education