SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
I Know What You Saw Last Minute -
The Chrome Browser Case
Ran Dubin1, Amit Dvir2 , Ofir Pele2,3 , Ofer Hadar1
1. Department of Communication System Engineering, Ben-Gurion University of the Negev, Israel.
2. Center for Cyber Technologies, Department of Computer Science, Ariel University, Israel.
3. Department of Electrical and Electronics Engineering, Ariel University, Israel.
About Me
• Ph.D. candidate at Ben-Gurion University, Israel
• Optimization of HTTP adaptive streaming
• Encrypted network traffic classification problems
• Senior data scientist at Seculert
• Seculert develop an automated breach analytics
platform in the cloud.
• Supervised machine learning for detection of
malicious activity within the enterprise network
Agenda
• Motivation
• The scenario
• Our goal
• How can “I know what you saw”?
• Related works
• Proposed algorithm
• Results
Motivation
• Google encourages network privacy:
• “77 percent of Google online traffic is encrypted"1
• “Google started giving HTTPS pages a ranking boost”
• HTTPS keeps your data anonymous:
• “No one will be able to snoop on the traffic — such as your ISP"2
• Let’s try to break it!
[1] http://gadgets.ndtv.com/internet/news/google-reveals-77-percent-of-its-online-traffic-is-encrypted-814191
[2] http://www.makeuseof.com/tag/can-you-really-be-anonymous-online/
The Scenario
•Passive Sniffing:
• Traffic control and optimization
• Open Source Intelligence Techniques (OSINT) vecto𝑟𝑟3
• Web searches, visited sites ..
•YouTube is the world’s leading social
network video platform
• YouTube is used also large protests and propaganda!
• Protecting user privacy and viewing habits is important!
[3] ISPs Sell Your Data to Advertisers, But FCC has a Plan to Protect Privacy, http://thehackernews.com/2016/03/isp-sells-data-to-advertisers.html
Our Goal
•To show that HTTPS2.0 is not enough in
order to protect your viewing habits
• Contribution:
• Dataset
• Data crawler - based on selenium
• New encrypted traffic feature and classification algorithm
Brief Partial Overview of SSL/TLS
• Step (0): browse to:
https://www.youtube.com/watch?v=_b
P6aVG6L1w
• Step (1): use Service Name Indicator
• Step (5): content and header are fully
encrypted
• HTTPS request (URL) is not visible
in the encrypted traffic
• All HTTP headers are encrypted
How Can “I Know What You Saw”?
1. How are YouTube videos encoded?
2. How is the video downloaded?
3. What is the video download behavior in the network?
4. How to tie everything together for a classification?
Introduction To HTTP Adaptive Streaming (HAS)
HAS Diagram HAS Download Diagram
Multi variable bit-rate streaming
* How YouTube Works: https://www.youtube.com/watch?v=UklDSMG9ffU
YouTube Encrypted Network Traffic
Chrome Automatic Mode Chrome Fixed Mode
YouTube Flow Patterns –
The Web Proxy Perspective
• Mixture of audio/video
in a single flow
• HTTP2 - multiplexed
application layer
protocol
• Multi-Bit-Rate Video
EncodingFirst 10 seconds of downloading only video
First 10 seconds of downloading audio + video
YouTube HTTP Byte Range
Fiddler (Video) Stream Request Vs Byte RangeByteRange[Bps]
Request Index
Related Works
1. Most discuss application type classification and not content classification
2. HTTPS classification was found to achieve low accuracy
3. Wright et al. exploit the VBR codec characteristics of encrypted Voice Over
Internet Protocol (VOIP) for language identification
4. Liu et al. and Saponas et al. presented methods for video title classification of
RTP/UDP and TCP internet traffic (not MBR)
5. Changes in video traffic over the Internet:
• HTTP byte range selection over HTTP
• MBR adaptive streaming
• HTTP version 2
Proposed Machine Learning Solution
1. Traffic Analysis
2. Traffic Features
3. Traffic Preprocessing
4. Machine Learning Algorithms
Feature Extraction
1. Many features:
Number of packets in a session, payload size, information bit rate,
Round-Trip Time (RTT), packet time differences
2. Bit Per Peak (BPP): Sum of bytes in each peak after TCP ACK mechanism
3. Why BPP?
• Represent the traffic On/Off behavior
• Real time classification constraints
• Compact feature representation
• Robust to packet loss and delays
BPP Index Vs Download Copy
BPP Index
#Download
Pre-Processing
• With/without audio removal
• <400 Kbytes BPPs are considered as audio
Proposed algorithms
1. Support Vector Machines (SVM) with
Radial Basis Function (RBF)
• With a BPP feature vector
2. Nearest Neighbor Algorithm – NN
• With a set of BPP features
SVM with Radial Basis Function (RBF) Kernel
• SVM RBF maps data to high
dimensional space. The classifier:
• Ongoing work uses SVM with
intersection similarities as features
BPP Set Feature
• 𝑆𝑆𝑖𝑖𝑖𝑖 is a set of Bit-Per-Peak (BPP) features (no duplicates)
• i - video title index
• j - stream index
• Note that each BPP-set may have different cardinality
NN Algorithm
• Similarity score between two BPP-sets is the cardinality of the intersection set:
sim(𝑆𝑆, 𝑆𝑆′) = |𝑆𝑆 ∩ 𝑆𝑆′|
• At test time, each video stream BPP-set, 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, is classified as the video title 𝑖𝑖
(class) that matches the maximum similarity score to class index. 𝑚𝑚𝑖𝑖 is the
number of streams per title 𝑖𝑖:
Dataset
Train/ test: 30 different titles, each with 100 streams copies (Train- 90, Test -10)
Videos outside of the dataset: 200 additional different video titles (titles not in the
regular dataset used only in testing)
Added delay evaluation: 4 subsets with added delay of 100/300/600/900 ms. (10
titles with 10 different downloads)
Added packet loss evaluation: 4 subsets with added packet loss of 1/3/6/9 % (10
titles with 10 different downloads)
Classification Accuracy
Identification[%]
Training Dataset Size
SVM+RBF
Ours
Confusion Matrices
SVM+RBF (accuracy 72%) Ours (accuracy 98%)
TrueLabel
TrueLabel
Predicted Label Predicted Label
Classification of Unknown Videos:
100% accuracy
Ongoing Results
Accuracy: 93.6%
Predicted Label
TrueLabel
Conclusions
• Created an OSINT vector from YouTube video traffic
• We demonstrated that HTTP2.0 is not protecting your viewing
habits.
• NN algorithm - 98% accuracy
• BPP feature - is robust to high network delays and packet loss
• Ongoing research – 10000 streams of 100 titles, similar results
• Contribution – crawler, dataset and algorithms
Different Network Conditions
SVM+RBF
Ours
SVM+RBF
Ours
Identification[%]
Identification[%]
Additional Packet Loss [%] Additional Delay [ms]

Contenu connexe

En vedette (13)

Fork Lift Cert
Fork Lift CertFork Lift Cert
Fork Lift Cert
 
LAS WEBQUEST
LAS WEBQUESTLAS WEBQUEST
LAS WEBQUEST
 
Vgu vkghu ig
Vgu vkghu igVgu vkghu ig
Vgu vkghu ig
 
Mohammed Fiazuddin - Certificate
Mohammed Fiazuddin - CertificateMohammed Fiazuddin - Certificate
Mohammed Fiazuddin - Certificate
 
Megae Payment
Megae PaymentMegae Payment
Megae Payment
 
LejeA0002
LejeA0002LejeA0002
LejeA0002
 
Didáctica general una página del programa
Didáctica general una página del programaDidáctica general una página del programa
Didáctica general una página del programa
 
Tasks suitable for programming on the web
Tasks suitable for programming on the webTasks suitable for programming on the web
Tasks suitable for programming on the web
 
2016 Auction
2016 Auction2016 Auction
2016 Auction
 
Using flashcards
Using flashcardsUsing flashcards
Using flashcards
 
Moodle.
Moodle.Moodle.
Moodle.
 
WWII 20 Wall Family Papers_Finding Aid
WWII 20 Wall Family Papers_Finding AidWWII 20 Wall Family Papers_Finding Aid
WWII 20 Wall Family Papers_Finding Aid
 
Sitios turistico de norte america
Sitios turistico de norte americaSitios turistico de norte america
Sitios turistico de norte america
 

Dernier

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Dernier (20)

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 

Eu 16-dubin-i-know-what-you-saw-last-minute-the-chrome-browser-case 5

  • 1. I Know What You Saw Last Minute - The Chrome Browser Case Ran Dubin1, Amit Dvir2 , Ofir Pele2,3 , Ofer Hadar1 1. Department of Communication System Engineering, Ben-Gurion University of the Negev, Israel. 2. Center for Cyber Technologies, Department of Computer Science, Ariel University, Israel. 3. Department of Electrical and Electronics Engineering, Ariel University, Israel.
  • 2. About Me • Ph.D. candidate at Ben-Gurion University, Israel • Optimization of HTTP adaptive streaming • Encrypted network traffic classification problems • Senior data scientist at Seculert • Seculert develop an automated breach analytics platform in the cloud. • Supervised machine learning for detection of malicious activity within the enterprise network
  • 3. Agenda • Motivation • The scenario • Our goal • How can “I know what you saw”? • Related works • Proposed algorithm • Results
  • 4. Motivation • Google encourages network privacy: • “77 percent of Google online traffic is encrypted"1 • “Google started giving HTTPS pages a ranking boost” • HTTPS keeps your data anonymous: • “No one will be able to snoop on the traffic — such as your ISP"2 • Let’s try to break it! [1] http://gadgets.ndtv.com/internet/news/google-reveals-77-percent-of-its-online-traffic-is-encrypted-814191 [2] http://www.makeuseof.com/tag/can-you-really-be-anonymous-online/
  • 5. The Scenario •Passive Sniffing: • Traffic control and optimization • Open Source Intelligence Techniques (OSINT) vecto𝑟𝑟3 • Web searches, visited sites .. •YouTube is the world’s leading social network video platform • YouTube is used also large protests and propaganda! • Protecting user privacy and viewing habits is important! [3] ISPs Sell Your Data to Advertisers, But FCC has a Plan to Protect Privacy, http://thehackernews.com/2016/03/isp-sells-data-to-advertisers.html
  • 6. Our Goal •To show that HTTPS2.0 is not enough in order to protect your viewing habits • Contribution: • Dataset • Data crawler - based on selenium • New encrypted traffic feature and classification algorithm
  • 7. Brief Partial Overview of SSL/TLS • Step (0): browse to: https://www.youtube.com/watch?v=_b P6aVG6L1w • Step (1): use Service Name Indicator • Step (5): content and header are fully encrypted • HTTPS request (URL) is not visible in the encrypted traffic • All HTTP headers are encrypted
  • 8. How Can “I Know What You Saw”? 1. How are YouTube videos encoded? 2. How is the video downloaded? 3. What is the video download behavior in the network? 4. How to tie everything together for a classification?
  • 9. Introduction To HTTP Adaptive Streaming (HAS) HAS Diagram HAS Download Diagram Multi variable bit-rate streaming * How YouTube Works: https://www.youtube.com/watch?v=UklDSMG9ffU
  • 10. YouTube Encrypted Network Traffic Chrome Automatic Mode Chrome Fixed Mode
  • 11. YouTube Flow Patterns – The Web Proxy Perspective • Mixture of audio/video in a single flow • HTTP2 - multiplexed application layer protocol • Multi-Bit-Rate Video EncodingFirst 10 seconds of downloading only video First 10 seconds of downloading audio + video
  • 12. YouTube HTTP Byte Range Fiddler (Video) Stream Request Vs Byte RangeByteRange[Bps] Request Index
  • 13. Related Works 1. Most discuss application type classification and not content classification 2. HTTPS classification was found to achieve low accuracy 3. Wright et al. exploit the VBR codec characteristics of encrypted Voice Over Internet Protocol (VOIP) for language identification 4. Liu et al. and Saponas et al. presented methods for video title classification of RTP/UDP and TCP internet traffic (not MBR) 5. Changes in video traffic over the Internet: • HTTP byte range selection over HTTP • MBR adaptive streaming • HTTP version 2
  • 14. Proposed Machine Learning Solution 1. Traffic Analysis 2. Traffic Features 3. Traffic Preprocessing 4. Machine Learning Algorithms
  • 15. Feature Extraction 1. Many features: Number of packets in a session, payload size, information bit rate, Round-Trip Time (RTT), packet time differences 2. Bit Per Peak (BPP): Sum of bytes in each peak after TCP ACK mechanism 3. Why BPP? • Represent the traffic On/Off behavior • Real time classification constraints • Compact feature representation • Robust to packet loss and delays
  • 16. BPP Index Vs Download Copy BPP Index #Download
  • 17. Pre-Processing • With/without audio removal • <400 Kbytes BPPs are considered as audio
  • 18. Proposed algorithms 1. Support Vector Machines (SVM) with Radial Basis Function (RBF) • With a BPP feature vector 2. Nearest Neighbor Algorithm – NN • With a set of BPP features
  • 19. SVM with Radial Basis Function (RBF) Kernel • SVM RBF maps data to high dimensional space. The classifier: • Ongoing work uses SVM with intersection similarities as features
  • 20. BPP Set Feature • 𝑆𝑆𝑖𝑖𝑖𝑖 is a set of Bit-Per-Peak (BPP) features (no duplicates) • i - video title index • j - stream index • Note that each BPP-set may have different cardinality
  • 21. NN Algorithm • Similarity score between two BPP-sets is the cardinality of the intersection set: sim(𝑆𝑆, 𝑆𝑆′) = |𝑆𝑆 ∩ 𝑆𝑆′| • At test time, each video stream BPP-set, 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, is classified as the video title 𝑖𝑖 (class) that matches the maximum similarity score to class index. 𝑚𝑚𝑖𝑖 is the number of streams per title 𝑖𝑖:
  • 22. Dataset Train/ test: 30 different titles, each with 100 streams copies (Train- 90, Test -10) Videos outside of the dataset: 200 additional different video titles (titles not in the regular dataset used only in testing) Added delay evaluation: 4 subsets with added delay of 100/300/600/900 ms. (10 titles with 10 different downloads) Added packet loss evaluation: 4 subsets with added packet loss of 1/3/6/9 % (10 titles with 10 different downloads)
  • 24. Confusion Matrices SVM+RBF (accuracy 72%) Ours (accuracy 98%) TrueLabel TrueLabel Predicted Label Predicted Label
  • 25. Classification of Unknown Videos: 100% accuracy
  • 27. Conclusions • Created an OSINT vector from YouTube video traffic • We demonstrated that HTTP2.0 is not protecting your viewing habits. • NN algorithm - 98% accuracy • BPP feature - is robust to high network delays and packet loss • Ongoing research – 10000 streams of 100 titles, similar results • Contribution – crawler, dataset and algorithms
  • 28.
  • 29.