Finding Hidden Call Quality Issues with Machine Learning

Finding Hidden Call Quality
Issues with Machine
Learning
Varun Singh, CEO
09 July 2019
On behalf of Navid, Ali, Lennart

Callstats.io is WebRTC Monitoring Leader
• Founded in 2014 by IETF and W3C authors
• Customers across vertical industries
• Integrated into all major CPaaS platforms
• Over 1B collected WebRTC datapoints each month
• Backed by leading venture capital firms ($3.5M/two rounds)

Range of Issues Degrades WebRTC User
Experiences
Variable network performance
• Degrades audio quality
Software errors
• Disconnections resulting dropped calls and call failures
User and equipment faults
• Device or local issues

WebRTC Monitoring
Network status
Service metrics
AI-Driven Troubleshooting
Anomaly detection
Root cause analysis
Notifications
Active Network Testing
Connectivity verification
Performance metrics
callstats.io provides robust solutions

Dealing with Big Data
Why do we need ML?
Photo credit: Flickr User Gavin Bell (CC BY-NC-ND 2.0)
https://www.flickr.com/photos/gavinbell/535261899/

Challenge: getStats provides a lot of
data
Example:
Go to
https://webrtc.github.io/samples/src/content/peerconnection/pc1/
Start the sample
Type:
7
pc1.getStats().then(stats=>stats.forEach(stat=>console.log(stat)))

Transport Endpoint Platform Infrastructure
ISP name /
AS number
Device type
Topology
(P2P / SFU / MCU)
User network:
End-user location , Network
Type
Operating system
Server locations
Media engine:
Audio and video frame rate and
size variations
Browser / RTC stack Server network
Packetization:
Round Trip Time,
Jitter, Packet loss
CPU type App Version
Several data dimensions

Challenge: dealing with lots of data
appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
totalInboundAudioPacketsLost
totalInboundVideoPacketsLost
totalOutboundAudioPacketsLost
totalOutboundVideoPacketsLost
totalInboundAudioPackets
totalInboundVideoPackets
totalOutboundAudioPackets
totalOutboundVideoPackets
totalMeanInboundVideoThroughput
countInboundVideoThroughput
totalMeanInboundAudioThroughput
countInboundAudioThroughput
totalMeanOutboundAudioThroughput
countOutboundAudioThroughput
totalMeanOutboundVideoThroughput
Many Metrics

appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
Many Metrics Measures×
• Average
• Mean
• Percentile
• Skew
• Kurtosis

appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
Many Metrics Measures× × n samples
• Average
• Mean
• Percentile
• Skew
• Kurtosis
• Sample rate
• Number of participants
• Call duration

appVer
buildVer
buildName
osName
osVer
totalMeanAudioRtt
totalMeanVideoRtt
countVideoRtt
countAudioRtt
Many Metrics Measures× × n samples
• Average
• Mean
• Percentile
• Skew
• Kurtosis
• Sample rate
• Number of participants
• Call duration
× m customers
• Customer segments

Why do we need ML?
Reducing costs
Image source: http://xingwu.me/2014/12/10/My-AWS-Account-Got-Compromised/

Challenge: figuring out what matters
Customer
Value
Operating
Complexity

Why do we need ML?
Saving users from analytics overload
1

Typical troubleshooting approach
17
Generate
hypothesis
Look for
anomalies
Filter
Evaluate
sub-
segment

Traditional troubleshooting challenges
18
Generate
hypothesis
Look for
anomalies
Filter
Evaluate
sub-
segment
Experience
required to
generate good
hypotheses
Not all anomalies
are obvious
Many filters to
define
Many sub-
segments to review

Ideal solution:
the system should tell you where to look
19

Finding Hidden Call Quality Issues with
Machine Learning
Deliver Better User Experiences

Machine Learning approaches
21
Image source: https://towardsdatascience.com/deep-learning-for-image-classification-why-its-challenging-where-we-ve-been-and-what-s-next-
93b56948fcef

Machine Learning approaches
22
Image source: https://towardsdatascience.com/deep-learning-for-image-classification-why-its-challenging-where-we-ve-been-and-what-s-next-
93b56948fcef

Methodology
23
Collect
data
Check the
data
Clean the
data
Feature
reduction
Clustering Labeling

Clean your data
Missing values
Duplicates
Outliers
Balancing
Normalization
26

Dimension reduction
27
Feature
elimination
Feature
selection
Feature
extraction
Reducing unnecessary complexity

Feature selectionCorrelation matrix to determine significance
28
meanAudioRtt meanVideoRtt Local Latitude Local Longitude Server Latitude Server Longitude
meanAudioRtt 100% 42% -24% 59% 0% 0%
meanVideoRtt 42% 100% -15% 29% 0% 0%
Local Latitude -24% -15% 100% -23% 0% 0%
Local Longitude 59% 29% -23% 100% 0% 0%
Server Latitude 0% 0% 0% 0% 100% -100%
Server Longitude 0% 0% 0% 0% -100% 100%

Round Trip Time (RTT)
29
Round-trip time is the time it takes for a packet to travel through an IP network,
from a sending endpoint to a receiving endpoint and back, not including the time
to process the packet at its destination. Many factors affect RTT, like propagation
delay, processing delay, queuing delay, and transmission delay.
Sender ReceiverNetwork
𝑡 𝑠𝑡𝑎𝑟𝑡
𝑡 𝑎𝑐𝑘
𝑅𝑇𝑇 = 𝑡 𝑎𝑐𝑘 − 𝑡 𝑠𝑡𝑎𝑟𝑡 − 𝑡 𝑠𝑙𝑟
𝑡 𝑠𝑙𝑟

Feature reductionPrincipal Component Analysis
30
http://setosa.io/ev/principal-component-analysis/
• Find feature-combinations that have the most variation
• Creates a new set of dimensions to maximize these variations
• Remove the feature-combinations that don’t show much variation

Clustering with Gaussian Mixture
Model (GMM)
31

Optimizing the model
32
Overfit model
Fit model
Minimize the number of
variables needed in your
model
Minimize the number of
clusters
Maintain predictiveness on
new datasets

Determining the optimal cluster size with
Bayesian Inference Criterion (BIC)
33
Number of Clusters
BICScore
Number of PCA
dimensions

Methodology: Iterative in practice
34
Collect
data
Check the
data
Clean the
data
Dimension
reduction
Clustering Labeling

Example of a customer handling
Millions of calls a day
3

RTT distribution for this
service is
36
Range
Proportion of the
data (%)
Low 0 < RTT < 150 68%
Medium 150 < RTT < 250 19%
Large 250 < RTT < 700 12%
Very large RTT > 700 1%

RTT distribution for this
service is
37
Area for
exploration
Range
Proportion of the
data (%)
Low 0 < RTT < 150 68%
Medium 150 < RTT < 250 19%
Large 250 < RTT < 700 12%
Very large RTT > 700 1%

Cluster 0: 16% of high RTTs
Network conditions are good
Local users in a specific country
Distance can be the main cause for large
RTTs

Bad network conditions
Large jitter, Large fraction loss
Users all over the world
Roughly large distance
Network congestion and distance can be
the main cause of large RTTs
42

Good Network condition
Local users in Asia
Main cause of large RTT can be large
distance

Analysis summary
44
32% of calls
have high RTT's
20% due to
congestion
2 geographic
problem
areas
Check server infrastructure & network
Add/move servers

Real solution:
the system tells you where to look
45

Finding Hidden Call Quality Issues with Machine Learning

Finding Hidden Call Quality Issues with Machine Learning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (7)

Similaire à Finding Hidden Call Quality Issues with Machine Learning

Similaire à Finding Hidden Call Quality Issues with Machine Learning (20)

Plus de callstats.io

Plus de callstats.io (16)

Dernier

Dernier (20)

Finding Hidden Call Quality Issues with Machine Learning