Machine Learning techniques that can be used to identify call quality problems based on analysis of WebRTC metrics data. Presented by Varun Singh, callstats.io CEO, at CommCon 2019.
why an Opensea Clone Script might be your perfect match.pdf
Finding Hidden Call Quality Issues with Machine Learning
1. Finding Hidden Call Quality
Issues with Machine
Learning
Varun Singh, CEO
09 July 2019
On behalf of Navid, Ali, Lennart
2. Callstats.io is WebRTC Monitoring Leader
• Founded in 2014 by IETF and W3C authors
• Customers across vertical industries
• Integrated into all major CPaaS platforms
• Over 1B collected WebRTC datapoints each month
• Backed by leading venture capital firms ($3.5M/two rounds)
3. Range of Issues Degrades WebRTC User
Experiences
Variable network performance
• Degrades audio quality
Software errors
• Disconnections resulting dropped calls and call failures
User and equipment faults
• Device or local issues
4. WebRTC Monitoring
Network status
Service metrics
AI-Driven Troubleshooting
Anomaly detection
Root cause analysis
Notifications
Active Network Testing
Connectivity verification
Performance metrics
callstats.io provides robust solutions
5. Dealing with Big Data
Why do we need ML?
Photo credit: Flickr User Gavin Bell (CC BY-NC-ND 2.0)
https://www.flickr.com/photos/gavinbell/535261899/
6. Challenge: getStats provides a lot of
data
Example:
Go to
https://webrtc.github.io/samples/src/content/peerconnection/pc1/
Start the sample
Type:
7
pc1.getStats().then(stats=>stats.forEach(stat=>console.log(stat)))
8. Transport Endpoint Platform Infrastructure
ISP name /
AS number
Device type
Topology
(P2P / SFU / MCU)
User network:
End-user location , Network
Type
Operating system
Server locations
Media engine:
Audio and video frame rate and
size variations
Browser / RTC stack Server network
Packetization:
Round Trip Time,
Jitter, Packet loss
CPU type App Version
Several data dimensions
9. Challenge: dealing with lots of data
appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
totalInboundAudioPacketsLost
totalInboundVideoPacketsLost
totalOutboundAudioPacketsLost
totalOutboundVideoPacketsLost
totalInboundAudioPackets
totalInboundVideoPackets
totalOutboundAudioPackets
totalOutboundVideoPackets
totalMeanInboundVideoThroughput
countInboundVideoThroughput
totalMeanInboundAudioThroughput
countInboundAudioThroughput
totalMeanOutboundAudioThroughput
countOutboundAudioThroughput
totalMeanOutboundVideoThroughput
Many Metrics
10. Challenge: dealing with lots of data
appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
totalInboundAudioPacketsLost
totalInboundVideoPacketsLost
totalOutboundAudioPacketsLost
totalOutboundVideoPacketsLost
totalInboundAudioPackets
totalInboundVideoPackets
totalOutboundAudioPackets
totalOutboundVideoPackets
totalMeanInboundVideoThroughput
countInboundVideoThroughput
totalMeanInboundAudioThroughput
countInboundAudioThroughput
totalMeanOutboundAudioThroughput
countOutboundAudioThroughput
totalMeanOutboundVideoThroughput
Many Metrics Measures×
• Average
• Mean
• Percentile
• Skew
• Kurtosis
11. Challenge: dealing with lots of data
appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
totalInboundAudioPacketsLost
totalInboundVideoPacketsLost
totalOutboundAudioPacketsLost
totalOutboundVideoPacketsLost
totalInboundAudioPackets
totalInboundVideoPackets
totalOutboundAudioPackets
totalOutboundVideoPackets
totalMeanInboundVideoThroughput
countInboundVideoThroughput
totalMeanInboundAudioThroughput
countInboundAudioThroughput
totalMeanOutboundAudioThroughput
countOutboundAudioThroughput
totalMeanOutboundVideoThroughput
Many Metrics Measures× × n samples
• Average
• Mean
• Percentile
• Skew
• Kurtosis
• Sample rate
• Number of participants
• Call duration
12. Challenge: dealing with lots of data
appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
totalInboundAudioPacketsLost
totalInboundVideoPacketsLost
totalOutboundAudioPacketsLost
totalOutboundVideoPacketsLost
totalInboundAudioPackets
totalInboundVideoPackets
totalOutboundAudioPackets
totalOutboundVideoPackets
totalMeanInboundVideoThroughput
countInboundVideoThroughput
totalMeanInboundAudioThroughput
countInboundAudioThroughput
totalMeanOutboundAudioThroughput
countOutboundAudioThroughput
totalMeanOutboundVideoThroughput
Many Metrics Measures× × n samples
• Average
• Mean
• Percentile
• Skew
• Kurtosis
• Sample rate
• Number of participants
• Call duration
× m customers
• Customer segments
13. Why do we need ML?
Reducing costs
Image source: http://xingwu.me/2014/12/10/My-AWS-Account-Got-Compromised/
27. Feature selectionCorrelation matrix to determine significance
28
meanAudioRtt meanVideoRtt Local Latitude Local Longitude Server Latitude Server Longitude
meanAudioRtt 100% 42% -24% 59% 0% 0%
meanVideoRtt 42% 100% -15% 29% 0% 0%
Local Latitude -24% -15% 100% -23% 0% 0%
Local Longitude 59% 29% -23% 100% 0% 0%
Server Latitude 0% 0% 0% 0% 100% -100%
Server Longitude 0% 0% 0% 0% -100% 100%
28. Round Trip Time (RTT)
29
Round-trip time is the time it takes for a packet to travel through an IP network,
from a sending endpoint to a receiving endpoint and back, not including the time
to process the packet at its destination. Many factors affect RTT, like propagation
delay, processing delay, queuing delay, and transmission delay.
Sender ReceiverNetwork
𝑡 𝑠𝑡𝑎𝑟𝑡
𝑡 𝑎𝑐𝑘
𝑅𝑇𝑇 = 𝑡 𝑎𝑐𝑘 − 𝑡 𝑠𝑡𝑎𝑟𝑡 − 𝑡 𝑠𝑙𝑟
𝑡 𝑠𝑙𝑟
29. Feature reductionPrincipal Component Analysis
30
http://setosa.io/ev/principal-component-analysis/
• Find feature-combinations that have the most variation
• Creates a new set of dimensions to maximize these variations
• Remove the feature-combinations that don’t show much variation
31. Optimizing the model
32
Overfit model
Fit model
Minimize the number of
variables needed in your
model
Minimize the number of
clusters
Maintain predictiveness on
new datasets
32. Determining the optimal cluster size with
Bayesian Inference Criterion (BIC)
33
Number of Clusters
BICScore
Number of PCA
dimensions
33. Methodology: Iterative in practice
34
Collect
data
Check the
data
Clean the
data
Dimension
reduction
Clustering Labeling
34. Example of a customer handling
Millions of calls a day
3
35. RTT distribution for this
service is
36
Range
Proportion of the
data (%)
Low 0 < RTT < 150 68%
Medium 150 < RTT < 250 19%
Large 250 < RTT < 700 12%
Very large RTT > 700 1%
36. RTT distribution for this
service is
37
Area for
exploration
Range
Proportion of the
data (%)
Low 0 < RTT < 150 68%
Medium 150 < RTT < 250 19%
Large 250 < RTT < 700 12%
Very large RTT > 700 1%
40. Cluster 0: 16% of high RTTs
Network conditions are good
Local users in a specific country
Distance can be the main cause for large
RTTs
41. Cluster 5: 20% of high RTTs
Bad network conditions
Large jitter, Large fraction loss
Users all over the world
Roughly large distance
Network congestion and distance can be
the main cause of large RTTs
42
42. Cluster 3: 50% of high RTTs
Good Network condition
Local users in Asia
Main cause of large RTT can be large
distance
43. Analysis summary
44
32% of calls
have high RTT's
20% due to
congestion
2 geographic
problem
areas
Check server infrastructure & network
Add/move servers