SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Migrating Visual Communications from H.323 to SIP


White Paper
by Stefan Karapetkov



April 2, 2008




Introduction
The H.323 protocol was developed by the International Telecommunication Union (ITU) - an
international standardization body based in Geneva, Switzerland - with video conferencing in
mind, and most traditional video conferencing systems are based on H.323. However, the
convergence of the voice, video, and data into what is often referred to as Unified
Communications (UC) has a dramatic impact on how people use video, and presents a new set of
requirements to solutions for the emerging visual communications market.

In order to meet these new requirements, Polycom is working on a seamless migration from
H.323 to the Session Initiation Protocol (SIP). This process will take long time, and H.323 and
SIP will coexist in customer networks for years to come.

The content of this paper is based on Polycom’s presentation ‘A New Paradigm for SIP-based
Video Communications’ at the International SIP Conference in Paris (January 29 – February 1,
2008). The paper provides an overview of H.323 and SIP, and compares the two protocols. The
paper also makes references to specific technologies that Polycom is deploying to guarantee
smooth migration of the installed customer base from H.323 to SIP.

Visual Communications Market
Polycom envisions a dramatically different marketplace for video in the years ahead. Social,
economic, and technological trends are aligning to create a unique opportunity for new and
innovative forms of visual communication. This combination of factors will bring video into the
mainstream and make visual communication essential in both our personal and professional lives.
Polycom calls this transformation VC2.

Visual communications today include applications such as telepresence which provides an
immersive experience for users, group video conferencing which is now available with High-
Definition audio and video and provides a new level of user experience, as well as personal video
which brings visual communication to the individual user’s desktop or project space. Figure 1 is
an overview of these applications.




POLYCOM, Inc.                                                                                     1
Figure 1: Enterprise Video Communications Today

While dedicated personal video systems integrate monitor, camera, microphones, speakers, and
codec into one and are optimized for video communication, soft clients rely on the PC video and
voice processing capabilities.

Today, visual communications solutions are widely deployed in education, medical, and
government organizations. Deployments in general enterprises were recently revitalized as a
result of travel restrictions and green policies.

Market Trends
Two major market trends are driving the visual communications market. The first trend is the
shift from reserved to on-demand conferencing. Both audio and video conferencing started as
scheduled events with reserved resources, e.g. ports on the Multipoint Conferencing Unit (MCU)
and bandwidth, e.g. B channels in the ISDN network. Audio conferencing made the transition to
reservation-less, operator-less systems and is now 96% on-demand. Figure 2 summarizes the
trend to on-demand conferencing.




Figure 2: Trend from Reserved to On-demand Conferencing

Video has stayed scheduled for longer, and even today, about 80% of video conferencing is
scheduled. However, there is a clear trend to on-demand video, and strong indicators that future
conferencing will be even richer and more flexible - with presence integration and increased
number of choices how to access the services, e.g. from desktop computers and mobile phones.

Looking at this trend on a higher level, reserved operator-attended services are becoming
presence-enabled customer-initiated services. Note that audio conferencing is running ahead of



POLYCOM, Inc.                                                                                      2
video conferencing – with higher desktop/mobile penetration and higher percentage of on-
demand conferences.

This trend has huge impact on the choice of communication protocols in visual communication
systems. The trend requires more scalability because desktop video drives up the number of users.
It also requires that new features such as presence and instant messaging are seamlessly
integrated with audio, video and content.

The second major market trend is from overlay video networks to unified collaboration. Video
systems have been deployed as overlay networks (over the organization’s IP network) for years,
and video has been a stand-alone application, separate from the mainstream IT applications.
Video also required separate management tools, directories, and has in general hardly connected
to the rest of the IT infrastructure. With the emergence of the Unified Communications concept,
enterprises, service providers and other organizations started morphing their voice, video, and
data communication systems into one. Figure 3 describes the trend towards Unified
Communications.




Figure 3: Trend towards Unified Collaboration

This trend creates an interesting technical challenge. Telephony call control servers have started
the migration from proprietary protocols to standard SIP, and there are already a large number of
standards-based implementations, some of them open source. Even the remaining proprietary IP-
PBX systems on the market provide some level of SIP interoperability and allow third-party
equipment to connect to the IP-PBX, or even control it.

Many Presence and Instant Messaging systems support SIP via the SIP for Instant Messaging and
Presence Leveraging Extensions (SIMPLE) protocol. Other implementations are based on the
eXtensible Messaging and Presence Protocol (XMPP).

Enterprise video today is mostly H.323-based, although video endpoints, video soft clients and
even MCU’s support basic SIP connectivity. For example, all Polycom endpoints can run in SIP
mode, while conference servers such as Polycom RMX 2000 and MCG support SIP, H.323,
H.320, etc.

The technical challenge that UC poses is how to connect all of the elements in Figure 3 into one
system that provides the full range of services to users. Based on the current state of the
networking technology, SIP is the most functional common denominator that could interconnect
the different applications within the organization.




POLYCOM, Inc.                                                                                      3
H.323 Basics
In order to compare SIP and H.323, we will need a brief description of the H.323 protocol. H.323
is an umbrella signaling protocol, i.e. it refers to a set of other protocol such as H.225 and H.245
which are known as ‘the H.323 family of protocols’. H.323 was originally defined for multimedia
communications and perfectly fits the video conferencing application because it had from the
very beginning mechanisms for audio and video call setup. It also has the so-called capability
exchange procedure (often referred to as CAPS) that is very important for finding communication
parameters acceptable for both communication sides, as well as a master-slave determination
mechanism that is very useful when MCUs are involved in the communication.

H.323 is optimized for machine communication. It uses ASN.1 notation/encoding, and the H.323
messages are encoded using the Basic Encoding Rules (BER). This means that very few people
can actually read captured H.323 messages.

H.323 Elements and Call Flow

H.323 defines H.323 Terminals which can initiate or receive calls and H.323 Gatekeepers which
register H.323 terminals, provide call admission control, and call routing. Gatekeepers can be
very simple or very complex – depending on how many of the optional functions in H.323 they
implement. H.323 also defines Gateways to other networks, e.g. H.320/ISDN. While gateways
are optional in H.323, they play a central role when migration to H.323 (e.g. from H.320/ISDN to
H.323) or from H.323 (e.g. to SIP) is required. Since the topic of this paper is migration from
H.323 to SIP, we will discuss the H.323-SIP gateway in more detail later in this paper. Figure 4
looks at the interaction of the two critical and mandatory elements in the H.323 network:
Terminals and Gatekeeper.




Figure 4: H.323 Basic Call Flow

H.323 describes the call setup procedure, and refers to the H.225 and H.245 protocols for
signaling message formats and some additional functions. The signaling messages are described
in H.225. The H.225 SETUP message includes information about the source, i.e. who is sending
the message (in Figure 4, this is Terminal A) and about the destination (Terminal B). The
Gatekeeper then uses this information to allocate the destination (Terminal B).

After receiving the SETUP message, Terminal B stores the information about the request (IP
addresses, port numbers, etc.), and sends back the CONNECT message. The most important
information in the CONNECT message is about the setup of an H.245 control channel, which is




POLYCOM, Inc.                                                                                     4
used for three main functions: capability exchange (CAPS), master-slave determination (MS),
and opening logical channels (OLC), i.e. creating media streams for audio, video and content.

H.245 Terminal Capability Exchange is a procedure for exchanging preferred codecs and settings
between the two H.323 terminals. For example, Terminal A may suggest H.264 or H.263 video
and Siren 22 Stereo or Siren 14 Mono audio, and the Terminal B may respond that it only
supports H.263 and Siren 14. Once both sides agree on common parameters the ‘conversation’
moves to its next phase - H.245 Master Slave Determination - which is useful for avoiding
conflicts during call control operations. H.245 Master Slave Determination is very important
when an H.323 Terminal connects to an MCU (the MCU is the master), and when one MCU
connects to another MCU through a so-called ‘cascading’ – in this case one of the MCUs has to
be the master.

After capabilities have been exchanged and connection master determined, the H.245 Open
Logical Channel Request procedure creates media channels (voice, video, or content/data)
between the communication parties. Note that these channels are always created in pairs, i.e. the
video channel from Terminal A to Terminal B is different and separate from the video channel
from Terminal B to Terminal A. Therefore, communication can be asymmetric: Terminal A can
send high quality video to B, and receive lower quality video from B, and vice versa.

H.245 control channel is also used to transmit the Flow Control command, which is used by the
receiver to set an upper limit for the transmitter bit rate on any logical channel, and the Fast
Update command, which is used by the receiver to request resending video frames that were lost
in the transmission.

Audio streams and video streams are transmitted via the Real Time Protocol (RTP, RFC 3550),
and for each RTP stream there is an associated Real Time Control Protocol (RTCP, also RFC
3550) channel which is used to periodically transmit control packets to participants in a
multimedia session. The primary function of RTCP is to provide feedback on the quality of
service being provided by RTP.

H.323 for Enterprise Video

H.323 has been widely deployed in visual communication equipment. The H.323 Terminal
function is implemented in video endpoints such as Polycom HDX and VSX. The H.323
Gatekeeper function is implemented in products such as Polycom SE 200 and PathNavigator. The
H.323 MCU function is implemented in products such as Polycom RMX 2000 and MGC.

In addition to basic call and DTMF tones, these systems support a range of additional features.
The most important ones are listed in Figure 5.




POLYCOM, Inc.                                                                                       5
Figure 5: H.323 Enterprise Video

Multipoint conferencing is very natural in H.323 because every call in H.323 (including point-to-
point calls) is defined as a ‘conference’. It is therefore assumed from the start that parties will be
added to the conference.

H.323 has its own set of security mechanisms. Early implementations used DES and 3DES
encryption, while the latest generation of equipment supports the Advanced Encryption Standard
(AES). H.323 also has a mechanism for traversing firewalls and NATs – it is described in
H.460.17, H.460.18, and H.460.19 standards.

Vendors embraced the H.323 protocol and added functions that are quite unique to visual
communications. Examples are Dual Video Streams (based on the H.239 protocol), Video
Channel Control (implemented in the H.245 protocol) and Far End Camera Control (FECC, based
on H.224 and H.281 protocols). We will discuss each of the features later in this paper.

SIP Basics
The Session Initiation Protocol (SIP, RFC 3261) was developed by the Internet Engineering Task
Force (IETF), an organization that sets the technical standards for the Internet. In many ways SIP
is similar to H.323 as it also can be used to setup audio and video calls, and it also refers to a long
list of other standards (called ‘Request for Comment’ or RFCs in the IETF lingo) that constitute
‘the SIP family of protocols’. For example, SIP refers to the Session Description Protocol (SDP,
RFC 2327) as format for describing media parameters.

IETF envisioned SIP to be generic protocol that can setup any kind of session, not just audio and
video, i.e. SIP can be used for instant messaging, data transfer, etc. In addition, SIP was designed
to be similar to the Hyper Text Transfer Protocol (HTTP) which is used for web browsing in the
Internet. The idea was that HTTP developers should be able to easily learn the SIP protocol and
develop Voice over IP and Video over IP applications, the same way they develop web
applications. While this did not exactly happen, SIP became easier to read and understand than
H.323, mainly because it uses readable clear-text messages (in comparison, H.323 uses ASN.1
and BER).

Since IETF develops standards for Internet, it is very concerned about the scalability of
networking protocols. Therefore, SIP was designed to be lightweight and scale well. While wave
of extensions, mainly for VoIP applications, increased the complexity of the protocol, the core
SIP specification (RFC 3261) and a few closely related specs - such as SDP (RFC 2327) and RTP
(RFC 3550) - are sufficient for a functional SIP implementation.




POLYCOM, Inc.                                                                                        6
SIP Elements and Call Flow

The equivalent of H.323 Terminal in SIP is the SIP User Agent (UA). The name ‘user agent’
leans towards mobile communication and user mobility, i.e. the ability of the user to log on at a
communication device which then becomes the user’s agent. Different from H.323, SIP splits the
server functions (concentrated in the H.323 Gatekeeper) into several entities: SIP Redirect Server,
SIP Proxy Server, and SIP Registrar. This is also in line with the Internet philosophy that the
server that registers and authenticates you (the Registrar) does not need be the server that gets
your requests (the Proxy) and does not need be the server that knows the current location of the
destination (the Redirect Server). Figure 6 shows the basic SIP message exchange necessary to
setup an audio/video call.




Figure 6: SIP Basic Call Flow

The UA’s learn the SIP servers’ addresses (Domain Name like www.sipregistrar1.com or IP
address like 192.168.1.2) by configuration/provisioning or dynamically, i.e., by sending a DNS
SRV request asking the Internet ‘What SIP servers are there?’ and receiving a list of servers.

Subsequently, UA’s register with their home Registrars (registration procedure not shown here),
and get authenticated, i.e., the Registrar queries a user data base to verify user name, user
password, and an additional authentication parameters called ‘SIP Realm’.

While H.323 uses E.164 phone numbers (e.g. +14085551212) or aliases to identify the
destination, SIP uses Unified Resource Identifier (URI) in the format user@<domain name>. In
our example, UA A is in the domain home.com and wants to reach ‘userB’ which is currently in a
different domain visited.com. UA A starts the session (call) by sending an INVITE message (the
equivalent of a H.323 SETUP message) for userB@home.com to the local Redirect Server asking
for the current location of ‘userB’. The Redirect Server responds with error code 302 (SIP error
codes are similar and often equivalent to the HTTP error codes) which means that the user has
moved temporarily. The response includes the new domain of the user: visited.com.

UA A then sends a new INVITE to the local Proxy Server (for simplicity Proxy and Registrar are
residing in the same server in Figure 6), and the Proxy server routes the INVITE through the
network to the destination. A handshake procedure including the SIP messages 200OK and ACK
makes sure both communicating partners and the proxy server know that the session is
successfully setup.

Similar to H.323, the signaling procedure ends with the setup of media streams, e.g. for audio and
video. As in H.323, audio streams and video streams are transmitted via the Real Time Protocol
(RTP, RFC 3550), and for each RTP stream there is an associated Real Time Control Protocol


POLYCOM, Inc.                                                                                     7
(RTCP, also RFC 3550) channel. The importance of the RTP use in both H.323 and SIP will be
highlighted later in the discussion around SIP-H.323 gateways.

SIP for Enterprise Video

As mentioned above, the H.323 community invested much effort adding new functionality to
H.323 for the purposes of visual communication. SIP on the other hand was embraced by the
Voice over IP community and extended in many ways to support voice communications - both
replicate some traditional telephony functions and create new ones. For example, IETF created a
set of security mechanism (STUN, TURN, and ICE) that allow RTP streams to traverse firewalls
and Network Address Translation (NAT) boxes – very common elements in IP networks, and a
huge problem for both Voice over IP and Video over IP. As Figure 7 below shows, SIP is
available today in visual communication equipment (endpoints, MCUs) but the list of features
available in SIP – from visual communications perspective – is still shorter than in H.323.




Figure 7: SIP Enterprise Video

The major difference between SIP and H.323 is in the area of security and Firewall/NAT
traversal. While H.323 systems deploy AES for media encoding, i.e. all RTP packets carrying
audio and video are encrypted by the sender using AES, SIP refers to Secure Real Time Protocol
(SRTP, RFC 3711) for encrypting media. While signaling messages in H.323 are transmitted
unencrypted, SIP – maybe because it is a clear text protocol that can be read easily – enforces the
use of Transport Layer Security (TLS, RFC 4346) to encrypt SIP signaling messages.

The other major delta – also related to security - is in the area of Firewall and NAT traversal.
H.323 relies on H.460.17, H.460.18, and H.460.19 standards for Firewall and NAT traversal.
IETF originally developed STUN (Simple Traversal of UDP through NATs), then added the
TURN (Traversal Using Relay NAT) mechanism to increase the firewall traversal success rate,
and finally created the ICE (Interactive Connectivity Establishment ) specification that combines
STUN and TURN functions into one. Firewall traversal has long been considered the forte of
IETF and the hope is that through the newly developed traversal mechanisms, SIP-based
communication will be able to flow across enterprise (including healthcare, government, and
education) and service provider networks.

What is SIP Used for Today?
Although video network elements today support SIP, they are rarely deployed in a complete SIP
video solution. The reason is that SIP still cannot match the H.323 functionality and an all-H.323
solution can provide great interoperability and more functionality than an all-SIP solution.


POLYCOM, Inc.                                                                                     8
SIP gained ground from proprietary protocols from Avaya, Nortel, Siemens, etc. – mostly
because it allows better interoperability across vendors, i.e. the ability to mix and match
components. But in the H.323 video communications market, interoperability is great, and H.323
interoperability events (bakeoffs, cookouts, for some reason culinary terminology was widely
adopted) are as efficient as SIP interoperability events such as SIPit.

SIP for Integration with IM and Presence

SIP is however irreplaceable in integrations with IM/Presence systems such as IBM Same Time
and Microsoft LCS and OCS. The idea is that since SIP is used for exchanging Presence
information and for setting up IM sessions (based on the SIMPLE specifications) it makes sense
to integrate video system via SIP. The reality is however that SIMPLE is not the leading approach
to Presence and IM. Microsoft added proprietary extensions to SIP for MS Office Communicator
and LCS/OCS. Even within IETF, the competing XMPP protocol is gaining momentum, and
seems to have eclipsed SIMPLE for Internet applications. Nevertheless, SIP is today the only
common denominator that allows integration of video into IM and Presence systems. Figure 8 is
an example of such integration.




Figure 8: Integration with IM/Presence

In the diagram, two IM/Presence clients communicate with an IM/Presence server which is
connected through a gateway function - translation software that runs on a standard server. The
SIP protocol is used for the communication among video components: video soft clients
(associated with the IM/Presence clients), video endpoints (as the room system displayed in
Figure 8) and conferencing servers (MCUs). A SIP Registrar/Proxy (marked ‘SIP Server’ here)
handles registration, call setup, and call tear-down.

A video client can be connected to another video client or to a video endpoint such as a room
system. All video clients and endpoints can be part of a multipoint call through the conferencing
server. Note that once video soft clients and video endpoints connect in a multi-party conference
call, additional participants from H.323, H.320 (ISDN), and PSTN (voice only) can also join the
conference.

SIP for Integration with IP-PBXs

Early versions of IP-PBXs supported basic H.323 and allowed registering H.323 clients.
However, as SIP became more important to IP-PBX interoperability, IP-PBXs started supporting
SIP registrations, SIP trunking, etc. H.323 support was dropped or was not updated to the latest
H.323 versions. Since most IP-PBXs in the market support SIP (and do not support H.323), SIP is


POLYCOM, Inc.                                                                                       9
irreplaceable in integrations with systems such as Avaya Call Manager, Nortel MCS 5100, and
Cisco Call Manager. Note that since most IP-PBXs are based on proprietary architectures, the SIP
interfaces provide only limited functions, i.e. registration, basic call, and DTMF. Hold is usually
also supported because Hold is a part of the base SIP standard (RFC 3261). With the development
of a new generation of IP communication systems based on SIP soft switches (such as Nortel
MCS 5100), the SIP functionality became richer and included features such as Transfer, Forward,
and Conference. Video endpoints can now support such functions, and mirror the functionality of
desktop phones. These features mainly apply to personal video users and are less attractive to
users of group conferencing systems.

If the IP-PBX does not support SIP, integration is still possible through a CTI server with SIP
plug-ins. While one can argue that using SIP or H.323 for such integrations is equally efficient,
almost all integrations are done via SIP since it is not probable that H.323 will be supported
natively in IP-PBXs. There is hope that over time the proprietary solutions will migrate to SIP. So
the protocol selection is often based on which protocol looks more future proof. Figure 9 shows
an example of an integration of video equipment with a SIP-based communication system.




Figure 9: Integration with SIP Communication Server

The SIP Communication Server in Figure 9 acts as SIP Proxy and Registrar for all user agents:
SIP soft clients, SIP phones, video endpoints in SIP mode (HDX 4000 and 9000 in Figure 9), and
the conferencing server that supports multiple protocols simultaneously.

Similar to the integration with IM/Presence systems, the conferencing server (RMX 2000 in this
example) allows H.323, H.320/ISDN, and PSTN (voice-only) participants to join a multiparty
conference. Further benefits of using the conferencing server in such configurations are discussed
in the SIP-H.323 gateway section below.

SIP for Integration with IMS

Integration of video systems (endpoints, application servers, conferencing servers/MCUs) with IP
Multimedia Subsystem (IMS) networks is also based on SIP. IMS uses SIP for communication
among network elements but has defined extensions (most visibly in the form of Privacy P-
headers), so that seamless integration with IMS networks requires a bit more than plain SIP. More
information about Polycom’s involvement in IMS is in the white paper ‘Polycom and IMS’
http://www.polycom.com/common/documents/whitepapers/polycom_ims_1.pdf.




POLYCOM, Inc.                                                                                   10
Implementing Visual Communications Features in SIP
In this section, we will look at the implementation approaches for three major video features –
Dual Stream, FECC, and Video Channel Control – in SIP. As discussed in the H.323 section of
this paper, the H.323 community developed these mechanisms, which became very popular
among video users. A migration from H.323 to SIP therefore requires replication of the
functionality in the new environment.

Dual Video Stream

Dual Video Streams allows a ‘presentation’ (sometimes also called ‘content’) audio-video stream
to be created in parallel to the primary ‘live’ audio-video stream. This second stream is used to
share any type of content: slides, spreadsheets, X-rays, video clips, etc. Polycom’s pre-standard
version of this technology is called People+Content. H.239 is heavily based on intellectual
property from Polycom People+Content and became the ITU-T standard that allows
interoperability between different vendors. Figure 10 summarizes the Dual Video Streams
concept.




Figure 10: Dual Video Streams

While the function works well on single-monitor systems, it is especially powerful in multi-
screen setups (video endpoints can support up to 4 monitors). In the example in Figure 10, a
Polycom HDX 4000 personal video system is on a live call with a Polycom HDX 9000 Executive
Collection with two flat screen monitors. The live stream is shown on the right monitor.

The user of the HDX 4000 uses a laptop directly connected to HDX 4000 or running Polycom
content sharing software to activate content sharing to the HDX 9000 Executive Collection. A
‘presentation’ stream is created in parallel to the ‘live’ stream, and the content is displayed on the
left screen of the receiver system.

The benefit of this functionality is that users can share not just slides or spreadsheet but also
moving images: Flash video, movie clips, commercials, etc. The ‘presentation’ channel has
flexible resolution, frame rates, and bit rates. For dynamic images, it can support full High-
Definition video at 30 frames per second, and for static content, such as slides, can work for
example at 3 frames per second, and save bandwidth in the IP network. Another major benefit of
using a video channel for content sharing is that the media is encrypted (by AES in H.323 and by
SRTP in SIP). In addition, once the firewall and NAT traversal works for the ‘live’ stream, it
works for the ‘presentation’ channel as well and there is no need for separate traversal solution.




POLYCOM, Inc.                                                                                      11
The first issue with supporting Dual Video Streams in SIP is describing the content/presentation
stream. As discussed above, the Session Description Protocol (SDP, RFC 2327) is used to
describe media stream parameters. SIP endpoints and conferencing servers have to support RFC
4574 that defines the ‘label’ attribute in the SDP and the RFC 4796 that defines the ‘content’
attribute. Now that we can describe the content stream, we have to be able to associate the content
stream with a live stream – this can be done by supporting RFC 3388 ‘Grouping of Media Lines
in the Session Description Protocol’.

The remaining issue is how to identify who is sending the content and who is receiving it. This is
usually done by tokens (the party that has the token, can send content), and token management
protocols make sure that there is only one token in the session, and that anyone can request and
receive the token. RFC 4582 ‘Binary Flow Control Protocol (BFCP)’ defines token management
mechanism, and can be used for Dual Video Stream implementation in SIP. And since everything
has to be described in SDP, we also need a way to describe the BFCP streams in SDP. This can
be done by supporting RFC 4583 ‘SDP Format for Binary Floor Control Protocol Streams’.

Since it takes 5 specifications (RFCs) to implement the equivalent of H.239 functionality in SIP,
Polycom created a specification that describes how to glue these RFCs together. This
specification is now Internet Draft ‘Role Management and Multiple Stream Functionality in SIP’
(draft-even-xcon-pnc).

Far End Camera Control

FECC is a popular feature in the visual communications – if H.323 Terminals A and B are on a
call, the feature allows Terminal A to control the camera of Terminal B: zoom, pan (move the
camera left and right), and tilt (move the camera up and down). The assumption is that Terminal
B has a PTZ (Pan, Tilt, and Zoom) camera, and has the FECC feature enabled. Figure 11 explains
the concept.




Figure 11: Far End Camera Control (FECC)

In group conferencing setting, the key FECC benefit is that users can adjust the image that they
get from the remote site, focus on a particular person or a group of people, and then move to
another part of the room. In personal video setting, the feature can be used to adjust the camera if
the remote party is sitting too close or too far from the camera.

In H.323, FECC is implemented via two ITU standards: H.281 defines the binary data that is
transmitted between Terminal A and B to control the camera while H.224 defines the format of
the frames that carry the binary data.


POLYCOM, Inc.                                                                                     12
In SIP, RFC 4573 ‘MIME Type Registration for RTP Payload Format for H.224’ (authored by
Polycom) registers the H.224 media type, and defines the syntax and the semantics of the Session
Description Protocol (SDP) parameters needed to support far-end camera control protocol using
H.224 in SIP. In effect, RFC 4573 creates a tunnel through the SIP based network, and allows
video endpoints to exchange H.224/H.281 information exactly as they do in H.323-based
networks.

Video Channel Control

Video channel control is embedded in H.245 and was discussed in detail earlier in this paper. The
protocol allows sending messages such as ‘Flow Control’ from the receiver of live and
presentation streams back to the sender of these streams, and telling the sender to modify the bit
rate, usually to reduce the bit rate when the receiver detects high packet loss. By sending ‘Fast
Update’ message the receiver asks the sender to resend a full or intra video frame(s), usually
when a video frame is lost in transmission. Figure 12 provides graphical description of the
functionality.




Figure 12: Video Channel Control

There is still no standard solution for replicating the video channel control functionality in SIP.
Polycom uses the SIP INFO message because it allows easy mapping of the H.245 messages into
SIP. This approach has been embraced by other vendors in the market. However, IETF is in favor
of an RTCP-based mechanism, and there is a work on the so-called Audio Video Profile
Feedback - extension to RTCP that will allow for video channel control functionality.

This approach has substantial impact on the SIP-H.323 gateway function. While H.245-INFO
interworking is simple to implement and only touches the H.323-SIP signaling, RTCP is always
associated with RTP and using RTCP for video channel control means touching the media
stream. We will discuss that in more detail in the SIP-H.323 gateway section that follows.

SIP-H.323 Interworking
Although we expect SIP deployments to grow rapidly in the future, the installed base of H.323
endpoints and infrastructure is here to stay in the healthcare, government, education, and general
enterprise markets. Interworking between the two protocols becomes an important issue. In
general, there are two ways to bridge the SIP and H.323 networks: through a signaling gateway
and through a conferencing server/MCU. Figure 13 provides a visual representation of the
interworking concept and lists the functions that have to be considered in the SIP-H.323
interworking scenario.




POLYCOM, Inc.                                                                                   13
Figure 13: SIP - H.323 Interworking

SIP and H.323 are different protocols with different message formats but they both can be used in
similar ways. Comparing the call flows in Figure 4 and Figure 6 shows a lot of similarities in the
call setup process. Similarities exist in the call tear down process (not covered in this paper) and
in the mechanisms to spontaneously exchange information during the call. A signaling gateway is
a piece of software that takes incoming SIP messages, extracts the communication parameters,
creates H.323 messages and sends them to the H.323 network. It also takes the incoming H.323
messages, extracts the communication parameters, creates corresponding SIP messages, and
sends them to the SIP network. The gateway therefore looks like a SIP user agent to the SIP
network and like H.323 terminal to the H.323 network.

Luckily, both SIP and H.323 rely on the same protocols (RTP and RTCP) for transmitting media
streams. The signaling gateway can then focus on mediating between the H.323 and SIP signaling
but does not need touch the media. This is very important as media processing is very resource-
intensive. While signaling messages generate traffic in the magnitude of few kilobits per second,
video media streams can be in the megabits per second (HD 720p video starts at 1.2Mbps).

The base RFC relevant to SIP-H.323 signaling interworking is RFC 4123 ‘SIP - H.323
Interworking Requirements’. Since a lot of the audio and video codecs used in visual
communication are ITU-T standards, it was necessary to define RTP payload formats for each of
them: G.722.1, G.722.1 Annex C, H.261 Video, H.263 Video, and H.264 Video.

There are however several issues with the signaling gateway approach. First, media security gets
broken because H.323-based video networks use the AES encryption while SIP refers to SRTP
for encryption. These two standards are completely different – the encryption algorithms and the
key exchange procedures are incompatible. The consequence is that deploying a signaling
gateway would result in failure of the media encryption, i.e. the audio and video streams will be
transmitted unencrypted.

As we mentioned in the video channel control section, another issue is the IETF-backed approach
that requires the use of RTCP which is associated with RTP media. This concept goes against the
concept of signaling-only gateway because H.245 messages must somehow be mapped into
RTCP messages. There are currently no implementations where RTCP is independent from an
RTP media stream, so media has to traverse the gateway, in order to follow the IETF approach.

The third issue is that signaling gateways only address the SIP-H.323 interworking; ISDN and
PSTN have different media (e.g. B channels in ISDN), and ISDN/PSTN users cannot use this
gateway to connect to the SIP network.




POLYCOM, Inc.                                                                                    14
Due to these limitation, using the conferencing server as a gateway has been seriously considered
as an alternative concept for H.323-SIP interworking. Conferencing servers can originate and
terminate H.323 and SIP calls, and have sufficient processing power to handle the media. They
already support AES, and can easily add support of SRTP encryption. Mechanisms for video
channel control that use RTCP can be accommodated as well since RTP and RTCP streams go
through the conferencing server. The main disadvantage of this approach is that it creates a
bottleneck – even point to point calls between SIP and H.323 domains have to go through the
conferencing server – and the associated high cost of additional conferencing server ports to
support SIP-H.323 interworking.

The Future of Visual Communications
In the long run, visual communications will migrate from H.323 to SIP, and will seamlessly
integrate with other communications network components: IP-PBXs, IM/Presence servers, etc.
The legacy H.323 equipment will continue to connect to the SIP network through gateways and
conferencing servers. Figure 14 displays the configuration of the future network.




Figure 14: Future Visual Communications

The migration to SIP will allow not only better interoperability with other communication
systems but also increased scalability, better traversal of firewall and NATs, and better security.

With regards to scalability, servers handling tens of thousands of users and providing voice,
video, IM, presence, and directory services are feasible. Through federation, these servers can
support large networks of personal video systems, group conferencing systems, immersive
telepresence systems, soft clients, and mobile clients.

Firewalls and NATs have always been barriers to IP communication but current video solutions
are intranet-based and predominately used for internal company communication where firewalls
are less of a problem. Future networks will connect companies with their suppliers, customers,
and partners, all of which are separated by multiple firewalls. SIP in combination with ICE will
provide an efficient way for connecting people across networks, and making visual
communication ubiquitous, similar to voice communication today.

With the ubiquity of SIP visual communications, security becomes of utmost importance. Once
SRTP is universally adopted and deployed for media security and TLS is supported across
vendors for signaling security, visual communications will become fully protected.




POLYCOM, Inc.                                                                                     15
Conclusion
Visual communication is expanding beyond enterprise conference rooms to the user’s desktop.
The trend towards Unified Communications requires integrating video with variety of SIP-based
systems in enterprises, hospitals, universities, and government organizations.

SIP is a new protocol that can meet the requirements for scalable distributed visual
communications. SIP has already been deployed for visual communication in certain scenarios.
Once the missing functionality is added to SIP, it will become a solid foundation for visual
communication solution. Transition from H.323 to SIP will be gradual, and interoperability with
the installed H.323 base throughout the process is a key requirement and main technical
challenge.

Polycom is uniquely positioned to leverage its broad product portfolio, market leadership and
extensive partner network to lead customers through the migration process from H.323 to SIP,
and deliver on the VC2 promise: transform traditional video conferencing into tomorrow’s visual
communications.




POLYCOM, Inc.                                                                                 16

Contenu connexe

Tendances

Introduction to bluetooth
Introduction to bluetoothIntroduction to bluetooth
Introduction to bluetoothvish0110
 
Lec40 45 video conferencing
Lec40 45 video conferencingLec40 45 video conferencing
Lec40 45 video conferencingDom Mike
 
Lec40 41 42_43_44_45 video conferencing
Lec40 41 42_43_44_45 video conferencingLec40 41 42_43_44_45 video conferencing
Lec40 41 42_43_44_45 video conferencingShona Hira
 
UMAandFemtocells-MakingFMCHappen
UMAandFemtocells-MakingFMCHappenUMAandFemtocells-MakingFMCHappen
UMAandFemtocells-MakingFMCHappenPartho Choudhury
 
The IP Packet Exchange (IPX) by GENBAND
The IP Packet Exchange (IPX) by GENBANDThe IP Packet Exchange (IPX) by GENBAND
The IP Packet Exchange (IPX) by GENBANDGENBANDcorporate
 
H.323 Network Components include H.323 Terminals, Gatekeepers ...
H.323 Network Components include H.323 Terminals, Gatekeepers ...H.323 Network Components include H.323 Terminals, Gatekeepers ...
H.323 Network Components include H.323 Terminals, Gatekeepers ...Videoguy
 
4g overview
4g overview4g overview
4g overviewciddic
 
Towfique 063382056
Towfique 063382056Towfique 063382056
Towfique 063382056mashiur
 
IJSRED-V2I3P59
IJSRED-V2I3P59IJSRED-V2I3P59
IJSRED-V2I3P59IJSRED
 
Sk M Rezaul Karim 072899056
Sk M Rezaul Karim  072899056Sk M Rezaul Karim  072899056
Sk M Rezaul Karim 072899056mashiur
 
Touseef Kamal062159056
Touseef Kamal062159056Touseef Kamal062159056
Touseef Kamal062159056mashiur
 
Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...Videoguy
 
Media processing in the cloud- what, where and how
Media processing in the cloud-  what, where and howMedia processing in the cloud-  what, where and how
Media processing in the cloud- what, where and howEricsson Slides
 

Tendances (14)

Introduction to bluetooth
Introduction to bluetoothIntroduction to bluetooth
Introduction to bluetooth
 
Lec40 45 video conferencing
Lec40 45 video conferencingLec40 45 video conferencing
Lec40 45 video conferencing
 
Lec40 41 42_43_44_45 video conferencing
Lec40 41 42_43_44_45 video conferencingLec40 41 42_43_44_45 video conferencing
Lec40 41 42_43_44_45 video conferencing
 
UMAandFemtocells-MakingFMCHappen
UMAandFemtocells-MakingFMCHappenUMAandFemtocells-MakingFMCHappen
UMAandFemtocells-MakingFMCHappen
 
IPX Solution
IPX SolutionIPX Solution
IPX Solution
 
The IP Packet Exchange (IPX) by GENBAND
The IP Packet Exchange (IPX) by GENBANDThe IP Packet Exchange (IPX) by GENBAND
The IP Packet Exchange (IPX) by GENBAND
 
H.323 Network Components include H.323 Terminals, Gatekeepers ...
H.323 Network Components include H.323 Terminals, Gatekeepers ...H.323 Network Components include H.323 Terminals, Gatekeepers ...
H.323 Network Components include H.323 Terminals, Gatekeepers ...
 
4g overview
4g overview4g overview
4g overview
 
Towfique 063382056
Towfique 063382056Towfique 063382056
Towfique 063382056
 
IJSRED-V2I3P59
IJSRED-V2I3P59IJSRED-V2I3P59
IJSRED-V2I3P59
 
Sk M Rezaul Karim 072899056
Sk M Rezaul Karim  072899056Sk M Rezaul Karim  072899056
Sk M Rezaul Karim 072899056
 
Touseef Kamal062159056
Touseef Kamal062159056Touseef Kamal062159056
Touseef Kamal062159056
 
Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...Cost Efficient H.320 Video Conferencing over ISDN including ...
Cost Efficient H.320 Video Conferencing over ISDN including ...
 
Media processing in the cloud- what, where and how
Media processing in the cloud-  what, where and howMedia processing in the cloud-  what, where and how
Media processing in the cloud- what, where and how
 

Similaire à Migrating Visual Communications from H.323 to SIP

A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...csandit
 
Video conferencing services
Video conferencing servicesVideo conferencing services
Video conferencing servicesSmriti Tikoo
 
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...cscpconf
 
Performance Analysis between H.323 and SIP over VoIP
Performance Analysis between H.323 and SIP over VoIPPerformance Analysis between H.323 and SIP over VoIP
Performance Analysis between H.323 and SIP over VoIPijtsrd
 
Service Architectures in H.323 and SIP – A Comparison
Service Architectures in H.323 and SIP – A Comparison Service Architectures in H.323 and SIP – A Comparison
Service Architectures in H.323 and SIP – A Comparison Long Nguyen
 
A Model Of An Integrated Unified Communication Network Using Public Switched ...
A Model Of An Integrated Unified Communication Network Using Public Switched ...A Model Of An Integrated Unified Communication Network Using Public Switched ...
A Model Of An Integrated Unified Communication Network Using Public Switched ...Becky Gilbert
 
Global Multimedia Collaboration System
Global Multimedia Collaboration SystemGlobal Multimedia Collaboration System
Global Multimedia Collaboration SystemVideoguy
 
Global Multimedia Collaboration System
Global Multimedia Collaboration SystemGlobal Multimedia Collaboration System
Global Multimedia Collaboration SystemVideoguy
 
Global Multimedia Collaboration System
Global Multimedia Collaboration SystemGlobal Multimedia Collaboration System
Global Multimedia Collaboration SystemVideoguy
 
Raisul Haq Rajib (063435056)
Raisul Haq Rajib  (063435056)Raisul Haq Rajib  (063435056)
Raisul Haq Rajib (063435056)mashiur
 
Video Conferencing Standards
Video Conferencing StandardsVideo Conferencing Standards
Video Conferencing StandardsVideoguy
 
Video Conferencing
Video ConferencingVideo Conferencing
Video ConferencingTHANVAS
 

Similaire à Migrating Visual Communications from H.323 to SIP (20)

A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
 
Video conferencing services
Video conferencing servicesVideo conferencing services
Video conferencing services
 
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
A NEW SYSTEM ON CHIP RECONFIGURABLE GATEWAY ARCHITECTURE FOR VOICE OVER INTER...
 
Performance Analysis between H.323 and SIP over VoIP
Performance Analysis between H.323 and SIP over VoIPPerformance Analysis between H.323 and SIP over VoIP
Performance Analysis between H.323 and SIP over VoIP
 
Service Architectures in H.323 and SIP – A Comparison
Service Architectures in H.323 and SIP – A Comparison Service Architectures in H.323 and SIP – A Comparison
Service Architectures in H.323 and SIP – A Comparison
 
Voip
VoipVoip
Voip
 
A Model Of An Integrated Unified Communication Network Using Public Switched ...
A Model Of An Integrated Unified Communication Network Using Public Switched ...A Model Of An Integrated Unified Communication Network Using Public Switched ...
A Model Of An Integrated Unified Communication Network Using Public Switched ...
 
How does VOIP work diagram
How does VOIP work diagramHow does VOIP work diagram
How does VOIP work diagram
 
VoIP Research Paper
VoIP Research PaperVoIP Research Paper
VoIP Research Paper
 
Video QoS
Video QoSVideo QoS
Video QoS
 
Global Multimedia Collaboration System
Global Multimedia Collaboration SystemGlobal Multimedia Collaboration System
Global Multimedia Collaboration System
 
Global Multimedia Collaboration System
Global Multimedia Collaboration SystemGlobal Multimedia Collaboration System
Global Multimedia Collaboration System
 
Global Multimedia Collaboration System
Global Multimedia Collaboration SystemGlobal Multimedia Collaboration System
Global Multimedia Collaboration System
 
ccna project
ccna projectccna project
ccna project
 
Download
DownloadDownload
Download
 
Raisul Haq Rajib (063435056)
Raisul Haq Rajib  (063435056)Raisul Haq Rajib  (063435056)
Raisul Haq Rajib (063435056)
 
Voip on Wimax
Voip on WimaxVoip on Wimax
Voip on Wimax
 
Video Conferencing Standards
Video Conferencing StandardsVideo Conferencing Standards
Video Conferencing Standards
 
Ip telephony
Ip telephonyIp telephony
Ip telephony
 
Video Conferencing
Video ConferencingVideo Conferencing
Video Conferencing
 

Plus de Videoguy

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresVideoguy
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingVideoguy
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksVideoguy
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streamingVideoguy
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideoguy
 
Video Streaming
Video StreamingVideo Streaming
Video StreamingVideoguy
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader AudienceVideoguy
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Videoguy
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGVideoguy
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingVideoguy
 
Application Brief
Application BriefApplication Brief
Application BriefVideoguy
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Videoguy
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second LifeVideoguy
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming SoftwareVideoguy
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoguy
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video FormatenVideoguy
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareVideoguy
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxVideoguy
 

Plus de Videoguy (20)

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_Pres
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video Streaming
 
Adobe
AdobeAdobe
Adobe
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streaming
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A Survey
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader Audience
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video Streaming
 
Application Brief
Application BriefApplication Brief
Application Brief
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second Life
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming Software
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions Cookbook
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video Formaten
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming Software
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - Firefox
 

Migrating Visual Communications from H.323 to SIP

  • 1. Migrating Visual Communications from H.323 to SIP White Paper by Stefan Karapetkov April 2, 2008 Introduction The H.323 protocol was developed by the International Telecommunication Union (ITU) - an international standardization body based in Geneva, Switzerland - with video conferencing in mind, and most traditional video conferencing systems are based on H.323. However, the convergence of the voice, video, and data into what is often referred to as Unified Communications (UC) has a dramatic impact on how people use video, and presents a new set of requirements to solutions for the emerging visual communications market. In order to meet these new requirements, Polycom is working on a seamless migration from H.323 to the Session Initiation Protocol (SIP). This process will take long time, and H.323 and SIP will coexist in customer networks for years to come. The content of this paper is based on Polycom’s presentation ‘A New Paradigm for SIP-based Video Communications’ at the International SIP Conference in Paris (January 29 – February 1, 2008). The paper provides an overview of H.323 and SIP, and compares the two protocols. The paper also makes references to specific technologies that Polycom is deploying to guarantee smooth migration of the installed customer base from H.323 to SIP. Visual Communications Market Polycom envisions a dramatically different marketplace for video in the years ahead. Social, economic, and technological trends are aligning to create a unique opportunity for new and innovative forms of visual communication. This combination of factors will bring video into the mainstream and make visual communication essential in both our personal and professional lives. Polycom calls this transformation VC2. Visual communications today include applications such as telepresence which provides an immersive experience for users, group video conferencing which is now available with High- Definition audio and video and provides a new level of user experience, as well as personal video which brings visual communication to the individual user’s desktop or project space. Figure 1 is an overview of these applications. POLYCOM, Inc. 1
  • 2. Figure 1: Enterprise Video Communications Today While dedicated personal video systems integrate monitor, camera, microphones, speakers, and codec into one and are optimized for video communication, soft clients rely on the PC video and voice processing capabilities. Today, visual communications solutions are widely deployed in education, medical, and government organizations. Deployments in general enterprises were recently revitalized as a result of travel restrictions and green policies. Market Trends Two major market trends are driving the visual communications market. The first trend is the shift from reserved to on-demand conferencing. Both audio and video conferencing started as scheduled events with reserved resources, e.g. ports on the Multipoint Conferencing Unit (MCU) and bandwidth, e.g. B channels in the ISDN network. Audio conferencing made the transition to reservation-less, operator-less systems and is now 96% on-demand. Figure 2 summarizes the trend to on-demand conferencing. Figure 2: Trend from Reserved to On-demand Conferencing Video has stayed scheduled for longer, and even today, about 80% of video conferencing is scheduled. However, there is a clear trend to on-demand video, and strong indicators that future conferencing will be even richer and more flexible - with presence integration and increased number of choices how to access the services, e.g. from desktop computers and mobile phones. Looking at this trend on a higher level, reserved operator-attended services are becoming presence-enabled customer-initiated services. Note that audio conferencing is running ahead of POLYCOM, Inc. 2
  • 3. video conferencing – with higher desktop/mobile penetration and higher percentage of on- demand conferences. This trend has huge impact on the choice of communication protocols in visual communication systems. The trend requires more scalability because desktop video drives up the number of users. It also requires that new features such as presence and instant messaging are seamlessly integrated with audio, video and content. The second major market trend is from overlay video networks to unified collaboration. Video systems have been deployed as overlay networks (over the organization’s IP network) for years, and video has been a stand-alone application, separate from the mainstream IT applications. Video also required separate management tools, directories, and has in general hardly connected to the rest of the IT infrastructure. With the emergence of the Unified Communications concept, enterprises, service providers and other organizations started morphing their voice, video, and data communication systems into one. Figure 3 describes the trend towards Unified Communications. Figure 3: Trend towards Unified Collaboration This trend creates an interesting technical challenge. Telephony call control servers have started the migration from proprietary protocols to standard SIP, and there are already a large number of standards-based implementations, some of them open source. Even the remaining proprietary IP- PBX systems on the market provide some level of SIP interoperability and allow third-party equipment to connect to the IP-PBX, or even control it. Many Presence and Instant Messaging systems support SIP via the SIP for Instant Messaging and Presence Leveraging Extensions (SIMPLE) protocol. Other implementations are based on the eXtensible Messaging and Presence Protocol (XMPP). Enterprise video today is mostly H.323-based, although video endpoints, video soft clients and even MCU’s support basic SIP connectivity. For example, all Polycom endpoints can run in SIP mode, while conference servers such as Polycom RMX 2000 and MCG support SIP, H.323, H.320, etc. The technical challenge that UC poses is how to connect all of the elements in Figure 3 into one system that provides the full range of services to users. Based on the current state of the networking technology, SIP is the most functional common denominator that could interconnect the different applications within the organization. POLYCOM, Inc. 3
  • 4. H.323 Basics In order to compare SIP and H.323, we will need a brief description of the H.323 protocol. H.323 is an umbrella signaling protocol, i.e. it refers to a set of other protocol such as H.225 and H.245 which are known as ‘the H.323 family of protocols’. H.323 was originally defined for multimedia communications and perfectly fits the video conferencing application because it had from the very beginning mechanisms for audio and video call setup. It also has the so-called capability exchange procedure (often referred to as CAPS) that is very important for finding communication parameters acceptable for both communication sides, as well as a master-slave determination mechanism that is very useful when MCUs are involved in the communication. H.323 is optimized for machine communication. It uses ASN.1 notation/encoding, and the H.323 messages are encoded using the Basic Encoding Rules (BER). This means that very few people can actually read captured H.323 messages. H.323 Elements and Call Flow H.323 defines H.323 Terminals which can initiate or receive calls and H.323 Gatekeepers which register H.323 terminals, provide call admission control, and call routing. Gatekeepers can be very simple or very complex – depending on how many of the optional functions in H.323 they implement. H.323 also defines Gateways to other networks, e.g. H.320/ISDN. While gateways are optional in H.323, they play a central role when migration to H.323 (e.g. from H.320/ISDN to H.323) or from H.323 (e.g. to SIP) is required. Since the topic of this paper is migration from H.323 to SIP, we will discuss the H.323-SIP gateway in more detail later in this paper. Figure 4 looks at the interaction of the two critical and mandatory elements in the H.323 network: Terminals and Gatekeeper. Figure 4: H.323 Basic Call Flow H.323 describes the call setup procedure, and refers to the H.225 and H.245 protocols for signaling message formats and some additional functions. The signaling messages are described in H.225. The H.225 SETUP message includes information about the source, i.e. who is sending the message (in Figure 4, this is Terminal A) and about the destination (Terminal B). The Gatekeeper then uses this information to allocate the destination (Terminal B). After receiving the SETUP message, Terminal B stores the information about the request (IP addresses, port numbers, etc.), and sends back the CONNECT message. The most important information in the CONNECT message is about the setup of an H.245 control channel, which is POLYCOM, Inc. 4
  • 5. used for three main functions: capability exchange (CAPS), master-slave determination (MS), and opening logical channels (OLC), i.e. creating media streams for audio, video and content. H.245 Terminal Capability Exchange is a procedure for exchanging preferred codecs and settings between the two H.323 terminals. For example, Terminal A may suggest H.264 or H.263 video and Siren 22 Stereo or Siren 14 Mono audio, and the Terminal B may respond that it only supports H.263 and Siren 14. Once both sides agree on common parameters the ‘conversation’ moves to its next phase - H.245 Master Slave Determination - which is useful for avoiding conflicts during call control operations. H.245 Master Slave Determination is very important when an H.323 Terminal connects to an MCU (the MCU is the master), and when one MCU connects to another MCU through a so-called ‘cascading’ – in this case one of the MCUs has to be the master. After capabilities have been exchanged and connection master determined, the H.245 Open Logical Channel Request procedure creates media channels (voice, video, or content/data) between the communication parties. Note that these channels are always created in pairs, i.e. the video channel from Terminal A to Terminal B is different and separate from the video channel from Terminal B to Terminal A. Therefore, communication can be asymmetric: Terminal A can send high quality video to B, and receive lower quality video from B, and vice versa. H.245 control channel is also used to transmit the Flow Control command, which is used by the receiver to set an upper limit for the transmitter bit rate on any logical channel, and the Fast Update command, which is used by the receiver to request resending video frames that were lost in the transmission. Audio streams and video streams are transmitted via the Real Time Protocol (RTP, RFC 3550), and for each RTP stream there is an associated Real Time Control Protocol (RTCP, also RFC 3550) channel which is used to periodically transmit control packets to participants in a multimedia session. The primary function of RTCP is to provide feedback on the quality of service being provided by RTP. H.323 for Enterprise Video H.323 has been widely deployed in visual communication equipment. The H.323 Terminal function is implemented in video endpoints such as Polycom HDX and VSX. The H.323 Gatekeeper function is implemented in products such as Polycom SE 200 and PathNavigator. The H.323 MCU function is implemented in products such as Polycom RMX 2000 and MGC. In addition to basic call and DTMF tones, these systems support a range of additional features. The most important ones are listed in Figure 5. POLYCOM, Inc. 5
  • 6. Figure 5: H.323 Enterprise Video Multipoint conferencing is very natural in H.323 because every call in H.323 (including point-to- point calls) is defined as a ‘conference’. It is therefore assumed from the start that parties will be added to the conference. H.323 has its own set of security mechanisms. Early implementations used DES and 3DES encryption, while the latest generation of equipment supports the Advanced Encryption Standard (AES). H.323 also has a mechanism for traversing firewalls and NATs – it is described in H.460.17, H.460.18, and H.460.19 standards. Vendors embraced the H.323 protocol and added functions that are quite unique to visual communications. Examples are Dual Video Streams (based on the H.239 protocol), Video Channel Control (implemented in the H.245 protocol) and Far End Camera Control (FECC, based on H.224 and H.281 protocols). We will discuss each of the features later in this paper. SIP Basics The Session Initiation Protocol (SIP, RFC 3261) was developed by the Internet Engineering Task Force (IETF), an organization that sets the technical standards for the Internet. In many ways SIP is similar to H.323 as it also can be used to setup audio and video calls, and it also refers to a long list of other standards (called ‘Request for Comment’ or RFCs in the IETF lingo) that constitute ‘the SIP family of protocols’. For example, SIP refers to the Session Description Protocol (SDP, RFC 2327) as format for describing media parameters. IETF envisioned SIP to be generic protocol that can setup any kind of session, not just audio and video, i.e. SIP can be used for instant messaging, data transfer, etc. In addition, SIP was designed to be similar to the Hyper Text Transfer Protocol (HTTP) which is used for web browsing in the Internet. The idea was that HTTP developers should be able to easily learn the SIP protocol and develop Voice over IP and Video over IP applications, the same way they develop web applications. While this did not exactly happen, SIP became easier to read and understand than H.323, mainly because it uses readable clear-text messages (in comparison, H.323 uses ASN.1 and BER). Since IETF develops standards for Internet, it is very concerned about the scalability of networking protocols. Therefore, SIP was designed to be lightweight and scale well. While wave of extensions, mainly for VoIP applications, increased the complexity of the protocol, the core SIP specification (RFC 3261) and a few closely related specs - such as SDP (RFC 2327) and RTP (RFC 3550) - are sufficient for a functional SIP implementation. POLYCOM, Inc. 6
  • 7. SIP Elements and Call Flow The equivalent of H.323 Terminal in SIP is the SIP User Agent (UA). The name ‘user agent’ leans towards mobile communication and user mobility, i.e. the ability of the user to log on at a communication device which then becomes the user’s agent. Different from H.323, SIP splits the server functions (concentrated in the H.323 Gatekeeper) into several entities: SIP Redirect Server, SIP Proxy Server, and SIP Registrar. This is also in line with the Internet philosophy that the server that registers and authenticates you (the Registrar) does not need be the server that gets your requests (the Proxy) and does not need be the server that knows the current location of the destination (the Redirect Server). Figure 6 shows the basic SIP message exchange necessary to setup an audio/video call. Figure 6: SIP Basic Call Flow The UA’s learn the SIP servers’ addresses (Domain Name like www.sipregistrar1.com or IP address like 192.168.1.2) by configuration/provisioning or dynamically, i.e., by sending a DNS SRV request asking the Internet ‘What SIP servers are there?’ and receiving a list of servers. Subsequently, UA’s register with their home Registrars (registration procedure not shown here), and get authenticated, i.e., the Registrar queries a user data base to verify user name, user password, and an additional authentication parameters called ‘SIP Realm’. While H.323 uses E.164 phone numbers (e.g. +14085551212) or aliases to identify the destination, SIP uses Unified Resource Identifier (URI) in the format user@<domain name>. In our example, UA A is in the domain home.com and wants to reach ‘userB’ which is currently in a different domain visited.com. UA A starts the session (call) by sending an INVITE message (the equivalent of a H.323 SETUP message) for userB@home.com to the local Redirect Server asking for the current location of ‘userB’. The Redirect Server responds with error code 302 (SIP error codes are similar and often equivalent to the HTTP error codes) which means that the user has moved temporarily. The response includes the new domain of the user: visited.com. UA A then sends a new INVITE to the local Proxy Server (for simplicity Proxy and Registrar are residing in the same server in Figure 6), and the Proxy server routes the INVITE through the network to the destination. A handshake procedure including the SIP messages 200OK and ACK makes sure both communicating partners and the proxy server know that the session is successfully setup. Similar to H.323, the signaling procedure ends with the setup of media streams, e.g. for audio and video. As in H.323, audio streams and video streams are transmitted via the Real Time Protocol (RTP, RFC 3550), and for each RTP stream there is an associated Real Time Control Protocol POLYCOM, Inc. 7
  • 8. (RTCP, also RFC 3550) channel. The importance of the RTP use in both H.323 and SIP will be highlighted later in the discussion around SIP-H.323 gateways. SIP for Enterprise Video As mentioned above, the H.323 community invested much effort adding new functionality to H.323 for the purposes of visual communication. SIP on the other hand was embraced by the Voice over IP community and extended in many ways to support voice communications - both replicate some traditional telephony functions and create new ones. For example, IETF created a set of security mechanism (STUN, TURN, and ICE) that allow RTP streams to traverse firewalls and Network Address Translation (NAT) boxes – very common elements in IP networks, and a huge problem for both Voice over IP and Video over IP. As Figure 7 below shows, SIP is available today in visual communication equipment (endpoints, MCUs) but the list of features available in SIP – from visual communications perspective – is still shorter than in H.323. Figure 7: SIP Enterprise Video The major difference between SIP and H.323 is in the area of security and Firewall/NAT traversal. While H.323 systems deploy AES for media encoding, i.e. all RTP packets carrying audio and video are encrypted by the sender using AES, SIP refers to Secure Real Time Protocol (SRTP, RFC 3711) for encrypting media. While signaling messages in H.323 are transmitted unencrypted, SIP – maybe because it is a clear text protocol that can be read easily – enforces the use of Transport Layer Security (TLS, RFC 4346) to encrypt SIP signaling messages. The other major delta – also related to security - is in the area of Firewall and NAT traversal. H.323 relies on H.460.17, H.460.18, and H.460.19 standards for Firewall and NAT traversal. IETF originally developed STUN (Simple Traversal of UDP through NATs), then added the TURN (Traversal Using Relay NAT) mechanism to increase the firewall traversal success rate, and finally created the ICE (Interactive Connectivity Establishment ) specification that combines STUN and TURN functions into one. Firewall traversal has long been considered the forte of IETF and the hope is that through the newly developed traversal mechanisms, SIP-based communication will be able to flow across enterprise (including healthcare, government, and education) and service provider networks. What is SIP Used for Today? Although video network elements today support SIP, they are rarely deployed in a complete SIP video solution. The reason is that SIP still cannot match the H.323 functionality and an all-H.323 solution can provide great interoperability and more functionality than an all-SIP solution. POLYCOM, Inc. 8
  • 9. SIP gained ground from proprietary protocols from Avaya, Nortel, Siemens, etc. – mostly because it allows better interoperability across vendors, i.e. the ability to mix and match components. But in the H.323 video communications market, interoperability is great, and H.323 interoperability events (bakeoffs, cookouts, for some reason culinary terminology was widely adopted) are as efficient as SIP interoperability events such as SIPit. SIP for Integration with IM and Presence SIP is however irreplaceable in integrations with IM/Presence systems such as IBM Same Time and Microsoft LCS and OCS. The idea is that since SIP is used for exchanging Presence information and for setting up IM sessions (based on the SIMPLE specifications) it makes sense to integrate video system via SIP. The reality is however that SIMPLE is not the leading approach to Presence and IM. Microsoft added proprietary extensions to SIP for MS Office Communicator and LCS/OCS. Even within IETF, the competing XMPP protocol is gaining momentum, and seems to have eclipsed SIMPLE for Internet applications. Nevertheless, SIP is today the only common denominator that allows integration of video into IM and Presence systems. Figure 8 is an example of such integration. Figure 8: Integration with IM/Presence In the diagram, two IM/Presence clients communicate with an IM/Presence server which is connected through a gateway function - translation software that runs on a standard server. The SIP protocol is used for the communication among video components: video soft clients (associated with the IM/Presence clients), video endpoints (as the room system displayed in Figure 8) and conferencing servers (MCUs). A SIP Registrar/Proxy (marked ‘SIP Server’ here) handles registration, call setup, and call tear-down. A video client can be connected to another video client or to a video endpoint such as a room system. All video clients and endpoints can be part of a multipoint call through the conferencing server. Note that once video soft clients and video endpoints connect in a multi-party conference call, additional participants from H.323, H.320 (ISDN), and PSTN (voice only) can also join the conference. SIP for Integration with IP-PBXs Early versions of IP-PBXs supported basic H.323 and allowed registering H.323 clients. However, as SIP became more important to IP-PBX interoperability, IP-PBXs started supporting SIP registrations, SIP trunking, etc. H.323 support was dropped or was not updated to the latest H.323 versions. Since most IP-PBXs in the market support SIP (and do not support H.323), SIP is POLYCOM, Inc. 9
  • 10. irreplaceable in integrations with systems such as Avaya Call Manager, Nortel MCS 5100, and Cisco Call Manager. Note that since most IP-PBXs are based on proprietary architectures, the SIP interfaces provide only limited functions, i.e. registration, basic call, and DTMF. Hold is usually also supported because Hold is a part of the base SIP standard (RFC 3261). With the development of a new generation of IP communication systems based on SIP soft switches (such as Nortel MCS 5100), the SIP functionality became richer and included features such as Transfer, Forward, and Conference. Video endpoints can now support such functions, and mirror the functionality of desktop phones. These features mainly apply to personal video users and are less attractive to users of group conferencing systems. If the IP-PBX does not support SIP, integration is still possible through a CTI server with SIP plug-ins. While one can argue that using SIP or H.323 for such integrations is equally efficient, almost all integrations are done via SIP since it is not probable that H.323 will be supported natively in IP-PBXs. There is hope that over time the proprietary solutions will migrate to SIP. So the protocol selection is often based on which protocol looks more future proof. Figure 9 shows an example of an integration of video equipment with a SIP-based communication system. Figure 9: Integration with SIP Communication Server The SIP Communication Server in Figure 9 acts as SIP Proxy and Registrar for all user agents: SIP soft clients, SIP phones, video endpoints in SIP mode (HDX 4000 and 9000 in Figure 9), and the conferencing server that supports multiple protocols simultaneously. Similar to the integration with IM/Presence systems, the conferencing server (RMX 2000 in this example) allows H.323, H.320/ISDN, and PSTN (voice-only) participants to join a multiparty conference. Further benefits of using the conferencing server in such configurations are discussed in the SIP-H.323 gateway section below. SIP for Integration with IMS Integration of video systems (endpoints, application servers, conferencing servers/MCUs) with IP Multimedia Subsystem (IMS) networks is also based on SIP. IMS uses SIP for communication among network elements but has defined extensions (most visibly in the form of Privacy P- headers), so that seamless integration with IMS networks requires a bit more than plain SIP. More information about Polycom’s involvement in IMS is in the white paper ‘Polycom and IMS’ http://www.polycom.com/common/documents/whitepapers/polycom_ims_1.pdf. POLYCOM, Inc. 10
  • 11. Implementing Visual Communications Features in SIP In this section, we will look at the implementation approaches for three major video features – Dual Stream, FECC, and Video Channel Control – in SIP. As discussed in the H.323 section of this paper, the H.323 community developed these mechanisms, which became very popular among video users. A migration from H.323 to SIP therefore requires replication of the functionality in the new environment. Dual Video Stream Dual Video Streams allows a ‘presentation’ (sometimes also called ‘content’) audio-video stream to be created in parallel to the primary ‘live’ audio-video stream. This second stream is used to share any type of content: slides, spreadsheets, X-rays, video clips, etc. Polycom’s pre-standard version of this technology is called People+Content. H.239 is heavily based on intellectual property from Polycom People+Content and became the ITU-T standard that allows interoperability between different vendors. Figure 10 summarizes the Dual Video Streams concept. Figure 10: Dual Video Streams While the function works well on single-monitor systems, it is especially powerful in multi- screen setups (video endpoints can support up to 4 monitors). In the example in Figure 10, a Polycom HDX 4000 personal video system is on a live call with a Polycom HDX 9000 Executive Collection with two flat screen monitors. The live stream is shown on the right monitor. The user of the HDX 4000 uses a laptop directly connected to HDX 4000 or running Polycom content sharing software to activate content sharing to the HDX 9000 Executive Collection. A ‘presentation’ stream is created in parallel to the ‘live’ stream, and the content is displayed on the left screen of the receiver system. The benefit of this functionality is that users can share not just slides or spreadsheet but also moving images: Flash video, movie clips, commercials, etc. The ‘presentation’ channel has flexible resolution, frame rates, and bit rates. For dynamic images, it can support full High- Definition video at 30 frames per second, and for static content, such as slides, can work for example at 3 frames per second, and save bandwidth in the IP network. Another major benefit of using a video channel for content sharing is that the media is encrypted (by AES in H.323 and by SRTP in SIP). In addition, once the firewall and NAT traversal works for the ‘live’ stream, it works for the ‘presentation’ channel as well and there is no need for separate traversal solution. POLYCOM, Inc. 11
  • 12. The first issue with supporting Dual Video Streams in SIP is describing the content/presentation stream. As discussed above, the Session Description Protocol (SDP, RFC 2327) is used to describe media stream parameters. SIP endpoints and conferencing servers have to support RFC 4574 that defines the ‘label’ attribute in the SDP and the RFC 4796 that defines the ‘content’ attribute. Now that we can describe the content stream, we have to be able to associate the content stream with a live stream – this can be done by supporting RFC 3388 ‘Grouping of Media Lines in the Session Description Protocol’. The remaining issue is how to identify who is sending the content and who is receiving it. This is usually done by tokens (the party that has the token, can send content), and token management protocols make sure that there is only one token in the session, and that anyone can request and receive the token. RFC 4582 ‘Binary Flow Control Protocol (BFCP)’ defines token management mechanism, and can be used for Dual Video Stream implementation in SIP. And since everything has to be described in SDP, we also need a way to describe the BFCP streams in SDP. This can be done by supporting RFC 4583 ‘SDP Format for Binary Floor Control Protocol Streams’. Since it takes 5 specifications (RFCs) to implement the equivalent of H.239 functionality in SIP, Polycom created a specification that describes how to glue these RFCs together. This specification is now Internet Draft ‘Role Management and Multiple Stream Functionality in SIP’ (draft-even-xcon-pnc). Far End Camera Control FECC is a popular feature in the visual communications – if H.323 Terminals A and B are on a call, the feature allows Terminal A to control the camera of Terminal B: zoom, pan (move the camera left and right), and tilt (move the camera up and down). The assumption is that Terminal B has a PTZ (Pan, Tilt, and Zoom) camera, and has the FECC feature enabled. Figure 11 explains the concept. Figure 11: Far End Camera Control (FECC) In group conferencing setting, the key FECC benefit is that users can adjust the image that they get from the remote site, focus on a particular person or a group of people, and then move to another part of the room. In personal video setting, the feature can be used to adjust the camera if the remote party is sitting too close or too far from the camera. In H.323, FECC is implemented via two ITU standards: H.281 defines the binary data that is transmitted between Terminal A and B to control the camera while H.224 defines the format of the frames that carry the binary data. POLYCOM, Inc. 12
  • 13. In SIP, RFC 4573 ‘MIME Type Registration for RTP Payload Format for H.224’ (authored by Polycom) registers the H.224 media type, and defines the syntax and the semantics of the Session Description Protocol (SDP) parameters needed to support far-end camera control protocol using H.224 in SIP. In effect, RFC 4573 creates a tunnel through the SIP based network, and allows video endpoints to exchange H.224/H.281 information exactly as they do in H.323-based networks. Video Channel Control Video channel control is embedded in H.245 and was discussed in detail earlier in this paper. The protocol allows sending messages such as ‘Flow Control’ from the receiver of live and presentation streams back to the sender of these streams, and telling the sender to modify the bit rate, usually to reduce the bit rate when the receiver detects high packet loss. By sending ‘Fast Update’ message the receiver asks the sender to resend a full or intra video frame(s), usually when a video frame is lost in transmission. Figure 12 provides graphical description of the functionality. Figure 12: Video Channel Control There is still no standard solution for replicating the video channel control functionality in SIP. Polycom uses the SIP INFO message because it allows easy mapping of the H.245 messages into SIP. This approach has been embraced by other vendors in the market. However, IETF is in favor of an RTCP-based mechanism, and there is a work on the so-called Audio Video Profile Feedback - extension to RTCP that will allow for video channel control functionality. This approach has substantial impact on the SIP-H.323 gateway function. While H.245-INFO interworking is simple to implement and only touches the H.323-SIP signaling, RTCP is always associated with RTP and using RTCP for video channel control means touching the media stream. We will discuss that in more detail in the SIP-H.323 gateway section that follows. SIP-H.323 Interworking Although we expect SIP deployments to grow rapidly in the future, the installed base of H.323 endpoints and infrastructure is here to stay in the healthcare, government, education, and general enterprise markets. Interworking between the two protocols becomes an important issue. In general, there are two ways to bridge the SIP and H.323 networks: through a signaling gateway and through a conferencing server/MCU. Figure 13 provides a visual representation of the interworking concept and lists the functions that have to be considered in the SIP-H.323 interworking scenario. POLYCOM, Inc. 13
  • 14. Figure 13: SIP - H.323 Interworking SIP and H.323 are different protocols with different message formats but they both can be used in similar ways. Comparing the call flows in Figure 4 and Figure 6 shows a lot of similarities in the call setup process. Similarities exist in the call tear down process (not covered in this paper) and in the mechanisms to spontaneously exchange information during the call. A signaling gateway is a piece of software that takes incoming SIP messages, extracts the communication parameters, creates H.323 messages and sends them to the H.323 network. It also takes the incoming H.323 messages, extracts the communication parameters, creates corresponding SIP messages, and sends them to the SIP network. The gateway therefore looks like a SIP user agent to the SIP network and like H.323 terminal to the H.323 network. Luckily, both SIP and H.323 rely on the same protocols (RTP and RTCP) for transmitting media streams. The signaling gateway can then focus on mediating between the H.323 and SIP signaling but does not need touch the media. This is very important as media processing is very resource- intensive. While signaling messages generate traffic in the magnitude of few kilobits per second, video media streams can be in the megabits per second (HD 720p video starts at 1.2Mbps). The base RFC relevant to SIP-H.323 signaling interworking is RFC 4123 ‘SIP - H.323 Interworking Requirements’. Since a lot of the audio and video codecs used in visual communication are ITU-T standards, it was necessary to define RTP payload formats for each of them: G.722.1, G.722.1 Annex C, H.261 Video, H.263 Video, and H.264 Video. There are however several issues with the signaling gateway approach. First, media security gets broken because H.323-based video networks use the AES encryption while SIP refers to SRTP for encryption. These two standards are completely different – the encryption algorithms and the key exchange procedures are incompatible. The consequence is that deploying a signaling gateway would result in failure of the media encryption, i.e. the audio and video streams will be transmitted unencrypted. As we mentioned in the video channel control section, another issue is the IETF-backed approach that requires the use of RTCP which is associated with RTP media. This concept goes against the concept of signaling-only gateway because H.245 messages must somehow be mapped into RTCP messages. There are currently no implementations where RTCP is independent from an RTP media stream, so media has to traverse the gateway, in order to follow the IETF approach. The third issue is that signaling gateways only address the SIP-H.323 interworking; ISDN and PSTN have different media (e.g. B channels in ISDN), and ISDN/PSTN users cannot use this gateway to connect to the SIP network. POLYCOM, Inc. 14
  • 15. Due to these limitation, using the conferencing server as a gateway has been seriously considered as an alternative concept for H.323-SIP interworking. Conferencing servers can originate and terminate H.323 and SIP calls, and have sufficient processing power to handle the media. They already support AES, and can easily add support of SRTP encryption. Mechanisms for video channel control that use RTCP can be accommodated as well since RTP and RTCP streams go through the conferencing server. The main disadvantage of this approach is that it creates a bottleneck – even point to point calls between SIP and H.323 domains have to go through the conferencing server – and the associated high cost of additional conferencing server ports to support SIP-H.323 interworking. The Future of Visual Communications In the long run, visual communications will migrate from H.323 to SIP, and will seamlessly integrate with other communications network components: IP-PBXs, IM/Presence servers, etc. The legacy H.323 equipment will continue to connect to the SIP network through gateways and conferencing servers. Figure 14 displays the configuration of the future network. Figure 14: Future Visual Communications The migration to SIP will allow not only better interoperability with other communication systems but also increased scalability, better traversal of firewall and NATs, and better security. With regards to scalability, servers handling tens of thousands of users and providing voice, video, IM, presence, and directory services are feasible. Through federation, these servers can support large networks of personal video systems, group conferencing systems, immersive telepresence systems, soft clients, and mobile clients. Firewalls and NATs have always been barriers to IP communication but current video solutions are intranet-based and predominately used for internal company communication where firewalls are less of a problem. Future networks will connect companies with their suppliers, customers, and partners, all of which are separated by multiple firewalls. SIP in combination with ICE will provide an efficient way for connecting people across networks, and making visual communication ubiquitous, similar to voice communication today. With the ubiquity of SIP visual communications, security becomes of utmost importance. Once SRTP is universally adopted and deployed for media security and TLS is supported across vendors for signaling security, visual communications will become fully protected. POLYCOM, Inc. 15
  • 16. Conclusion Visual communication is expanding beyond enterprise conference rooms to the user’s desktop. The trend towards Unified Communications requires integrating video with variety of SIP-based systems in enterprises, hospitals, universities, and government organizations. SIP is a new protocol that can meet the requirements for scalable distributed visual communications. SIP has already been deployed for visual communication in certain scenarios. Once the missing functionality is added to SIP, it will become a solid foundation for visual communication solution. Transition from H.323 to SIP will be gradual, and interoperability with the installed H.323 base throughout the process is a key requirement and main technical challenge. Polycom is uniquely positioned to leverage its broad product portfolio, market leadership and extensive partner network to lead customers through the migration process from H.323 to SIP, and deliver on the VC2 promise: transform traditional video conferencing into tomorrow’s visual communications. POLYCOM, Inc. 16