ExplainableAI.pptx

EXPLAIN-IT: Towards Explainable
AI for Unsupervised Network
Traffic Analysisº
Andrea Morichetta★, Pedro Casas*, Marco Mellia★
Politecnico di Torino★, Austrian Institute of Technology*
3rd ACM CoNEXT Workshop on Big DAta, Machine Learning and Artificial
Intelligence for Data Communication Networks

The Gap
• Scenario: Rising popularity of ML applications for solving
specific problems in network traffic analysis.
• Ground truth is systematically missing – difficult to obtain
(structural complexity and big data volumes)
• Labeled datasets are frequently simplistic representation of
real-world phenomena, often also outdated.
2

Unsupervised learning to fill the gap
• Unsupervised techniques allow to have a better understanding of the
data, exploring its shape and patterns.
• However, it is difficult to analyze their results
• Typical solutions:
• manual inspection  problem when there are too many or too complex data
• unsupervised quality metrics  the why is missing
• supervised quality metrics  not good if ground truth inherently wrong or
biased
3

Knowledge extraction from the clusters
Goal: have an interpretable representation of the features relevance in
the clusters
• For understanding the clusters content
• For better explanation of the data aggregation
4

Knowledge extraction – a supervised
approach
A possible solution: White box classifiers (white box techniques: e.g.,
linear regression and decision trees)
+Gives us also the opportunity to evaluate the cluster
attribution/assignment (via classification)
+Clear and algorithmically grounded
+Gives an “interpretation” available for the analysis
- It limits the set of applicable techniques
5
How to make this approach more general and extend the
set of algorithms?

Explainable AI - extend the supervised
approach
• EXPLAINABLE AI makes it easier to understand why certain decisions
or predictions have been made.
• Achieved by:
• Restricting the complexity of the machine learning model (intrinsic)
• Or by applying methods that analyze the model after training (post
hoc),
• e.g., LIME (Local Interpretable Model-agnostic Explanations)1 can
explain the predictions of any classifier or regressor, by
approximating it locally with an interpretable model.
6
1Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD
international conference on knowledge discovery and data mining. 2016.

Approach
7
Data
Exploration Space
Knowledge
Summary Space
Unsupervised techniques
e.g., Clustering
Step 2
Advanced Knowledge Extraction
Splitting model
SVM
Identification of
XAI features with
LIME
Explainable AI

Use case
• 10654 YouTube video sessions, coming from different sources, smartphone
(HTML player and YouTube app) and desktop (HTML player)
• Set of ~500 features:
• at the full video session level (e.g., session downlink throughout)
• as well as at different time resolutions with time slots of ∆t = [1, 5, 10] seconds.
• We focus on the average video quality (AVGQ) metric. We consider video
resolution as follows:
• 0: Low Definition (LD), with AVGQ < 480
• 1: Standard Definition (SD), with 480 ≤ AVGQ < 720
• 2: High Definition (HD), with AVGQ ≥ 720
8

Clustering phase
• Goal: We want to obtain 3 clusters in output:
a. Low Definition, LD
b. Standard definition, SD
c. High Definition, HD
• Algorithms used:
• Agglomerative (1) clustering with Ward Links (Ward minimizes the variance
of the clusters being merged)
• Agglomerative (2) clustering with Single Links (Single single uses the
minimum of the distances between all observations of the two sets)
• K-Means
• BIRCH - Balanced Iterative Reducing and Clustering using Hierarchies
9

Clustering Results – quality metrics
10
Adjusted M
utual Info Score
Adjusted Rand Score
CompletenessScore
FowlkesM
allowsScore
Homogeneity Score
SilhouetteScore
V
M
easureScore
0 0
0 1
0 2
0 3
0 4
0 5
0 6
Algorithm
Agglomerative(1)
Agglomerative(2)
K-Means
Birch

Clustering results – label distribution
11
Label distribution after agglomerative Ward clustering

Clustering results – feature Inspection
12
Example of feature inspection inthe results of agglomerative Ward clustering
Cluster 0 Cluster 1 Cluster 2

Interpret with model – using Support Vector
Machines
• Hyperplane-based classifiers
• The SVM selects the maximum margin separating hyperplane
• Use of kernel function to map points on a high-dimensional space
• However, it is a black-box classifier
• Thus, Explainable AI can aid us
13

Interpret with model – using SVM
14
Agglomerative (1)
Results of SVM applied to Agglomerative with Ward

Results with LIME – an example
Feature Feature Importance
uplink_bytes_second_slot_1s > 10468.5 0.10
dist_packet_length_downlink_p25 > 1379 0.09
dist_slotted_uplink_bytes_p97_1s > 18445.9 0.08
uplink_packets_first_slot_5s > 861.3 0.07
420628.7 < dist_slotted_bytes_p97_1s <= 902383.7 0.07
dist_slotted_downlink_bytes_p97_5s > 2711876.9 0.06
dist_slotted_downlink_bytes_h_1s > 0.7 0.05
335.4 < dist_slotted_uplink_packets_p99_1s <= 502.0 0.04
dist_slotted_uplink_bytes_p90_1s > 7627.6 0.04
dist_slotted_bytes_mean_5s > 845017.6 0.04
15
Instance classified as belonging to cluster 2

Conclusion and future work
• Interesting approach for improving the interpretation of clustering
results by relying on XAI principles
• Is explainable AI an advantage in the YouTube case, where features
are complex?
• Is LIME always good? Look at alternatives, e.g., SHAP
• Is it possible to avoid the classification step?
• Extend it to other scenarios
• Expand the research on different clustering algorithms
• Use different classification techniques
16

ExplainableAI.pptx

Recommandé

Recommandé

Contenu connexe

Similaire à ExplainableAI.pptx

Similaire à ExplainableAI.pptx (20)

Dernier

Dernier (20)

ExplainableAI.pptx

Notes de l'éditeur