Signaling issues heavily discussed
My focus is on media
Why terminate media?
Sometimes you have to for NAT traversal
Traversal Using Relay NAT (TURN)
If you want to Gateway to an existing telephony network, you need to terminate the media so you can change it
And then there are a bunch of applications where you may need to terminate media for your server-side application or just because it works better that way
Some of these other media server use cases include:
Traditional video conferencing multi-point control unit (MCU) for bridging multiple parties
Transcoding from one audio or video codec to another
Interworking WebRTC media with standard VoIP medis
Recording a stream or conversation
Analyzing or processing a stream in real time, such as inserting an image or video, performing call analytics, or simply adding DTMF
Any kind of person to machine or machine to machine that might not involve another person at all like today’s IVRs and speech recognition systems or the emerging computer vision systems for future applications
Do I really need a media server?
More server infrastructure = servers, bandwidth, more DevOps = more $$ & more latency
Can I do it on the client ?
One advantage of today’s fast processors and the web model is that processing can be done in the client or server in many cases.
However, there are important trade-offs.
Let’s take a closer look at when server-side media processing makes sense, starting with multi-party conferencing
In most WebRTC design is additional bi-directional stream is added for each party.
Each end-point must fully encode and decode the stream for each party.
This actually works very well if there is only a couple parties – usually not more than 3 or 4.
In an ideal you could always update your clients with whatever codec you need
We live in an imperfect world, so unfortunately transcoding will probably always be needed
Transcoding – VP8 to H.264
Mobile – OPUS to AMR-WB
OPUS to EVS
How about OPUS to AAC
There are many other reasons than TURN for requiring a media server. These include:
Traditional video conferencing multi-point control unit (MCU) for bridging multiple parties
Transcoding from one audio or video codec to another
Interworking WebRTC media with standard VoIP medis
Recording a stream or conversation
Analyzing or processing a stream in real time, such as inserting an image or video, performing call analytics, or simply adding DTMF
Any kind of person to machine or machine to machine that might not involve another person at all like today’s IVRs and speech recognition systems or the emerging computer vision systems for future applications
Now let’s talk briefly about how to effectively scale your media
The downside the the MCU approach is that is very processor intensive on the server, especially when dealing with HD video.
The reason is each stream needs to be individually encoded and decoded.
A more efficient, higher-capacity approach is a technique we call encoder sharing.
If several devices are receiving the same stream, rather than fully encode each one, you can dramatically increase capacity by encoding only once and sharing that stream.
A newer approach is known as a Selective Forwarding Unit (SFU)
In this architecture, each client sends only one stream to the SFU.
The SFU then redirects the stream to only the end points that want to see it.
The main task for the SFU is managing the encryption and decryption of the streams
No server-side encoding or decoding is required, so the SFU can handle a lot of clients.
An enhancement to this approach is known as simulcast.
Rather than just sending one stream, each client sends 2 or more streams – usually one high bitrate and one low bitrate.
Often times only a single high-bit rate – i.e. HD video – stream is sent for the active talker and the low bit-rate stream is sent for the others.
If a low power or bandwidth limited device is connected then the SFU can forward just the low-bitrate stream.
There is one additional approach called Scalable Video Coding or SVC.
Like simulcast, SVC sends multiple streams of varying quality from each client and a centralized SFU does the routing.
Unlike simulcast where independent streams are sent, SVC uses a layering approach in a single stream.
Like simulcast, the mechanisms for signaling the SFU are not standardized and wide-scale, WebRTC-based systems have yet to emerge.
Fine for a few calls
Run into CPS problems with many