SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
I Know What You Saw Last Minute -
The Chrome Browser Case
Ran Dubin1, Amit Dvir2 , Ofir Pele2,3 , Ofer Hadar1
1. Department of Communication System Engineering, Ben-Gurion University of the Negev, Israel.
2. Center for Cyber Technologies, Department of Computer Science, Ariel University, Israel.
3. Department of Electrical and Electronics Engineering, Ariel University, Israel.
About Me
• Ph.D. candidate at Ben-Gurion University, Israel
• Optimization of HTTP adaptive streaming
• Encrypted network traffic classification problems
• Senior data scientist at Seculert
• Seculert develop an automated breach analytics
platform in the cloud.
• Supervised machine learning for detection of
malicious activity within the enterprise network
Agenda
• Motivation
• The scenario
• Our goal
• How can “I know what you saw”?
• Related works
• Proposed algorithm
• Results
Motivation
• Google encourages network privacy:
• “77 percent of Google online traffic is encrypted"1
• “Google started giving HTTPS pages a ranking boost”
• HTTPS keeps your data anonymous:
• “No one will be able to snoop on the traffic — such as your ISP"2
• Let’s try to break it!
[1] http://gadgets.ndtv.com/internet/news/google-reveals-77-percent-of-its-online-traffic-is-encrypted-814191
[2] http://www.makeuseof.com/tag/can-you-really-be-anonymous-online/
The Scenario
•Passive Sniffing:
• Traffic control and optimization
• Open Source Intelligence Techniques (OSINT) vecto𝑟𝑟3
• Web searches, visited sites ..
•YouTube is the world’s leading social
network video platform
• YouTube is used also large protests and propaganda!
• Protecting user privacy and viewing habits is important!
[3] ISPs Sell Your Data to Advertisers, But FCC has a Plan to Protect Privacy, http://thehackernews.com/2016/03/isp-sells-data-to-advertisers.html
Our Goal
•To show that HTTPS2.0 is not enough in
order to protect your viewing habits
• Contribution:
• Dataset
• Data crawler - based on selenium
• New encrypted traffic feature and classification algorithm
Brief Partial Overview of SSL/TLS
• Step (0): browse to:
https://www.youtube.com/watch?v=_b
P6aVG6L1w
• Step (1): use Service Name Indicator
• Step (5): content and header are fully
encrypted
• HTTPS request (URL) is not visible
in the encrypted traffic
• All HTTP headers are encrypted
How Can “I Know What You Saw”?
1. How are YouTube videos encoded?
2. How is the video downloaded?
3. What is the video download behavior in the network?
4. How to tie everything together for a classification?
Introduction To HTTP Adaptive Streaming (HAS)
HAS Diagram HAS Download Diagram
Multi variable bit-rate streaming
* How YouTube Works: https://www.youtube.com/watch?v=UklDSMG9ffU
YouTube Encrypted Network Traffic
Chrome Automatic Mode Chrome Fixed Mode
YouTube Flow Patterns –
The Web Proxy Perspective
• Mixture of audio/video
in a single flow
• HTTP2 - multiplexed
application layer
protocol
• Multi-Bit-Rate Video
EncodingFirst 10 seconds of downloading only video
First 10 seconds of downloading audio + video
YouTube HTTP Byte Range
Fiddler (Video) Stream Request Vs Byte RangeByteRange[Bps]
Request Index
Related Works
1. Most discuss application type classification and not content classification
2. HTTPS classification was found to achieve low accuracy
3. Wright et al. exploit the VBR codec characteristics of encrypted Voice Over
Internet Protocol (VOIP) for language identification
4. Liu et al. and Saponas et al. presented methods for video title classification of
RTP/UDP and TCP internet traffic (not MBR)
5. Changes in video traffic over the Internet:
• HTTP byte range selection over HTTP
• MBR adaptive streaming
• HTTP version 2
Proposed Machine Learning Solution
1. Traffic Analysis
2. Traffic Features
3. Traffic Preprocessing
4. Machine Learning Algorithms
Feature Extraction
1. Many features:
Number of packets in a session, payload size, information bit rate,
Round-Trip Time (RTT), packet time differences
2. Bit Per Peak (BPP): Sum of bytes in each peak after TCP ACK mechanism
3. Why BPP?
• Represent the traffic On/Off behavior
• Real time classification constraints
• Compact feature representation
• Robust to packet loss and delays
BPP Index Vs Download Copy
BPP Index
#Download
Pre-Processing
• With/without audio removal
• <400 Kbytes BPPs are considered as audio
Proposed algorithms
1. Support Vector Machines (SVM) with
Radial Basis Function (RBF)
• With a BPP feature vector
2. Nearest Neighbor Algorithm – NN
• With a set of BPP features
SVM with Radial Basis Function (RBF) Kernel
• SVM RBF maps data to high
dimensional space. The classifier:
• Ongoing work uses SVM with
intersection similarities as features
BPP Set Feature
• 𝑆𝑆𝑖𝑖𝑖𝑖 is a set of Bit-Per-Peak (BPP) features (no duplicates)
• i - video title index
• j - stream index
• Note that each BPP-set may have different cardinality
NN Algorithm
• Similarity score between two BPP-sets is the cardinality of the intersection set:
sim(𝑆𝑆, 𝑆𝑆′) = |𝑆𝑆 ∩ 𝑆𝑆′|
• At test time, each video stream BPP-set, 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, is classified as the video title 𝑖𝑖
(class) that matches the maximum similarity score to class index. 𝑚𝑚𝑖𝑖 is the
number of streams per title 𝑖𝑖:
Dataset
Train/ test: 30 different titles, each with 100 streams copies (Train- 90, Test -10)
Videos outside of the dataset: 200 additional different video titles (titles not in the
regular dataset used only in testing)
Added delay evaluation: 4 subsets with added delay of 100/300/600/900 ms. (10
titles with 10 different downloads)
Added packet loss evaluation: 4 subsets with added packet loss of 1/3/6/9 % (10
titles with 10 different downloads)
Classification Accuracy
Identification[%]
Training Dataset Size
SVM+RBF
Ours
Confusion Matrices
SVM+RBF (accuracy 72%) Ours (accuracy 98%)
TrueLabel
TrueLabel
Predicted Label Predicted Label
Classification of Unknown Videos:
100% accuracy
Ongoing Results
Accuracy: 93.6%
Predicted Label
TrueLabel
Conclusions
• Created an OSINT vector from YouTube video traffic
• We demonstrated that HTTP2.0 is not protecting your viewing
habits.
• NN algorithm - 98% accuracy
• BPP feature - is robust to high network delays and packet loss
• Ongoing research – 10000 streams of 100 titles, similar results
• Contribution – crawler, dataset and algorithms
Different Network Conditions
SVM+RBF
Ours
SVM+RBF
Ours
Identification[%]
Identification[%]
Additional Packet Loss [%] Additional Delay [ms]

Contenu connexe

En vedette (13)

Fork Lift Cert
Fork Lift CertFork Lift Cert
Fork Lift Cert
 
LAS WEBQUEST
LAS WEBQUESTLAS WEBQUEST
LAS WEBQUEST
 
Vgu vkghu ig
Vgu vkghu igVgu vkghu ig
Vgu vkghu ig
 
Mohammed Fiazuddin - Certificate
Mohammed Fiazuddin - CertificateMohammed Fiazuddin - Certificate
Mohammed Fiazuddin - Certificate
 
Megae Payment
Megae PaymentMegae Payment
Megae Payment
 
LejeA0002
LejeA0002LejeA0002
LejeA0002
 
Didáctica general una página del programa
Didáctica general una página del programaDidáctica general una página del programa
Didáctica general una página del programa
 
Tasks suitable for programming on the web
Tasks suitable for programming on the webTasks suitable for programming on the web
Tasks suitable for programming on the web
 
2016 Auction
2016 Auction2016 Auction
2016 Auction
 
Using flashcards
Using flashcardsUsing flashcards
Using flashcards
 
Moodle.
Moodle.Moodle.
Moodle.
 
WWII 20 Wall Family Papers_Finding Aid
WWII 20 Wall Family Papers_Finding AidWWII 20 Wall Family Papers_Finding Aid
WWII 20 Wall Family Papers_Finding Aid
 
Sitios turistico de norte america
Sitios turistico de norte americaSitios turistico de norte america
Sitios turistico de norte america
 

Dernier

Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
MaherOthman7
 
Microkernel in Operating System | Operating System
Microkernel in Operating System | Operating SystemMicrokernel in Operating System | Operating System
Microkernel in Operating System | Operating System
Sampad Kar
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
Madan Karki
 
1893-part-1-2016 for Earthquake load design
1893-part-1-2016 for Earthquake load design1893-part-1-2016 for Earthquake load design
1893-part-1-2016 for Earthquake load design
AshishSingh1301
 

Dernier (20)

Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Microkernel in Operating System | Operating System
Microkernel in Operating System | Operating SystemMicrokernel in Operating System | Operating System
Microkernel in Operating System | Operating System
 
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
1893-part-1-2016 for Earthquake load design
1893-part-1-2016 for Earthquake load design1893-part-1-2016 for Earthquake load design
1893-part-1-2016 for Earthquake load design
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
 

Eu 16-dubin-i-know-what-you-saw-last-minute-the-chrome-browser-case 5

  • 1. I Know What You Saw Last Minute - The Chrome Browser Case Ran Dubin1, Amit Dvir2 , Ofir Pele2,3 , Ofer Hadar1 1. Department of Communication System Engineering, Ben-Gurion University of the Negev, Israel. 2. Center for Cyber Technologies, Department of Computer Science, Ariel University, Israel. 3. Department of Electrical and Electronics Engineering, Ariel University, Israel.
  • 2. About Me • Ph.D. candidate at Ben-Gurion University, Israel • Optimization of HTTP adaptive streaming • Encrypted network traffic classification problems • Senior data scientist at Seculert • Seculert develop an automated breach analytics platform in the cloud. • Supervised machine learning for detection of malicious activity within the enterprise network
  • 3. Agenda • Motivation • The scenario • Our goal • How can “I know what you saw”? • Related works • Proposed algorithm • Results
  • 4. Motivation • Google encourages network privacy: • “77 percent of Google online traffic is encrypted"1 • “Google started giving HTTPS pages a ranking boost” • HTTPS keeps your data anonymous: • “No one will be able to snoop on the traffic — such as your ISP"2 • Let’s try to break it! [1] http://gadgets.ndtv.com/internet/news/google-reveals-77-percent-of-its-online-traffic-is-encrypted-814191 [2] http://www.makeuseof.com/tag/can-you-really-be-anonymous-online/
  • 5. The Scenario •Passive Sniffing: • Traffic control and optimization • Open Source Intelligence Techniques (OSINT) vecto𝑟𝑟3 • Web searches, visited sites .. •YouTube is the world’s leading social network video platform • YouTube is used also large protests and propaganda! • Protecting user privacy and viewing habits is important! [3] ISPs Sell Your Data to Advertisers, But FCC has a Plan to Protect Privacy, http://thehackernews.com/2016/03/isp-sells-data-to-advertisers.html
  • 6. Our Goal •To show that HTTPS2.0 is not enough in order to protect your viewing habits • Contribution: • Dataset • Data crawler - based on selenium • New encrypted traffic feature and classification algorithm
  • 7. Brief Partial Overview of SSL/TLS • Step (0): browse to: https://www.youtube.com/watch?v=_b P6aVG6L1w • Step (1): use Service Name Indicator • Step (5): content and header are fully encrypted • HTTPS request (URL) is not visible in the encrypted traffic • All HTTP headers are encrypted
  • 8. How Can “I Know What You Saw”? 1. How are YouTube videos encoded? 2. How is the video downloaded? 3. What is the video download behavior in the network? 4. How to tie everything together for a classification?
  • 9. Introduction To HTTP Adaptive Streaming (HAS) HAS Diagram HAS Download Diagram Multi variable bit-rate streaming * How YouTube Works: https://www.youtube.com/watch?v=UklDSMG9ffU
  • 10. YouTube Encrypted Network Traffic Chrome Automatic Mode Chrome Fixed Mode
  • 11. YouTube Flow Patterns – The Web Proxy Perspective • Mixture of audio/video in a single flow • HTTP2 - multiplexed application layer protocol • Multi-Bit-Rate Video EncodingFirst 10 seconds of downloading only video First 10 seconds of downloading audio + video
  • 12. YouTube HTTP Byte Range Fiddler (Video) Stream Request Vs Byte RangeByteRange[Bps] Request Index
  • 13. Related Works 1. Most discuss application type classification and not content classification 2. HTTPS classification was found to achieve low accuracy 3. Wright et al. exploit the VBR codec characteristics of encrypted Voice Over Internet Protocol (VOIP) for language identification 4. Liu et al. and Saponas et al. presented methods for video title classification of RTP/UDP and TCP internet traffic (not MBR) 5. Changes in video traffic over the Internet: • HTTP byte range selection over HTTP • MBR adaptive streaming • HTTP version 2
  • 14. Proposed Machine Learning Solution 1. Traffic Analysis 2. Traffic Features 3. Traffic Preprocessing 4. Machine Learning Algorithms
  • 15. Feature Extraction 1. Many features: Number of packets in a session, payload size, information bit rate, Round-Trip Time (RTT), packet time differences 2. Bit Per Peak (BPP): Sum of bytes in each peak after TCP ACK mechanism 3. Why BPP? • Represent the traffic On/Off behavior • Real time classification constraints • Compact feature representation • Robust to packet loss and delays
  • 16. BPP Index Vs Download Copy BPP Index #Download
  • 17. Pre-Processing • With/without audio removal • <400 Kbytes BPPs are considered as audio
  • 18. Proposed algorithms 1. Support Vector Machines (SVM) with Radial Basis Function (RBF) • With a BPP feature vector 2. Nearest Neighbor Algorithm – NN • With a set of BPP features
  • 19. SVM with Radial Basis Function (RBF) Kernel • SVM RBF maps data to high dimensional space. The classifier: • Ongoing work uses SVM with intersection similarities as features
  • 20. BPP Set Feature • 𝑆𝑆𝑖𝑖𝑖𝑖 is a set of Bit-Per-Peak (BPP) features (no duplicates) • i - video title index • j - stream index • Note that each BPP-set may have different cardinality
  • 21. NN Algorithm • Similarity score between two BPP-sets is the cardinality of the intersection set: sim(𝑆𝑆, 𝑆𝑆′) = |𝑆𝑆 ∩ 𝑆𝑆′| • At test time, each video stream BPP-set, 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, is classified as the video title 𝑖𝑖 (class) that matches the maximum similarity score to class index. 𝑚𝑚𝑖𝑖 is the number of streams per title 𝑖𝑖:
  • 22. Dataset Train/ test: 30 different titles, each with 100 streams copies (Train- 90, Test -10) Videos outside of the dataset: 200 additional different video titles (titles not in the regular dataset used only in testing) Added delay evaluation: 4 subsets with added delay of 100/300/600/900 ms. (10 titles with 10 different downloads) Added packet loss evaluation: 4 subsets with added packet loss of 1/3/6/9 % (10 titles with 10 different downloads)
  • 24. Confusion Matrices SVM+RBF (accuracy 72%) Ours (accuracy 98%) TrueLabel TrueLabel Predicted Label Predicted Label
  • 25. Classification of Unknown Videos: 100% accuracy
  • 27. Conclusions • Created an OSINT vector from YouTube video traffic • We demonstrated that HTTP2.0 is not protecting your viewing habits. • NN algorithm - 98% accuracy • BPP feature - is robust to high network delays and packet loss • Ongoing research – 10000 streams of 100 titles, similar results • Contribution – crawler, dataset and algorithms
  • 28.
  • 29.