SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27
Vulnerability Detection
Based on Git History
Agenda
Introduction
Introduction
Research
Background
The Trend of Security Incidents
Key facts. Why this research is important:
In Quantity
# of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD]
In Quality
• Equifax exposed 143M consumers’ data due to website application
vulnerability (2017)
• Yahoo breached 3B users’ account information (2013)
The Century of Vulnerability
# OF VULNERABILITIES
As information technology is broadly
adopted, the impact of security
incidents is getting extensive and
critical.
Introduction To Help Code Reviewer
We know how to deliver software in proper quality. Code review!
Best Practice is Well-Known
Review patches before release and fix bugs before deployment. Still, however,
even the famous OSS projects struggle with the lack of code reviewers.
A Trade-Off of Automation Techniques
Software projects widely adopt a variety of automation approaches. Vulnerability
detection techniques faces a contradictory:
• (a) High precision. Useless if the tool outputs a billions of false positives.
• (b) Adaptability. No one wants to make efforts only for ensuring security such
as annotating unsafe user inputs.
Research
Background
# Example of taint annotation
int printf(/*@untainted@*/ char *fmt,
...);
Git is somewhat difficult.
No worries, it’s not only you!
WHAT’S GIT?
“(Git is) expressly designed
to make you feel
less intelligent than you
thought you were”
– Andrew Morton
The Greatness of Git -
www.linuxfoundation.org
Introduction
What’s Git?
But Git is Always Stay With You
Trust me, or try this command on your terminal:
# List up how much you rely on Git
history | awk '{ print $2 }' | sort | 
uniq -c | sort -r | head
Introduction
What’s Git?
Git for Machine Learning
Git provides what machine learning requires; good data:
• Adopted by 69.2% of 30K developers [StackOverfow]
• Trusted by most prominent OSS projects such as Linux
Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and
Apache HTTPD.
Introduction
What’s Git?
CVE-ID and Security Fix on Git
A sufficient number of reliable security fixes:
• Refers CVE-IDs in their commit message
• Or, fixed commits are referred by CVE database
Introduction
What’s Git?
A Brief Introduction of Git Features
Agenda
Methodology
A static analysis to detect
suspicious vulnerabilities based
on Git history.
METHODOLOGY - HVD
Methodology Proposal Approach
Concept
• This research proposes the approach which aims to
reduce the false positive rate compared to VCCFinder
[Perl et al] without sacrificing adaptability.
• The data source is the same to VCCFinder but this
approach takes account of added-lines and removed-
lines in patch feature while VCCFinder doesn’t.
Methodology VCCFinder: a Novel Approach
Concept
Generally, it’s hard to apply machine learning to source code
because most high-level programming languages such as
C/C++ are less redundant compared to natural languages
and assembly languages. To address this difficulty, Perl et
al.:
• Narrowed down the problem to the quantifiable lemma.
The quality of source code can be hardly quantified but
vulnerability can be expressed as 0 or 1.
• Leveraged the legacies. CVE database and the prominent
OSS projects.
“I really never wanted to do
source control
management at all and felt
that it was just about the
least interesting thing in
the computing world”
– Linus Torvalds
10 Years of Git -
www.linuxfoundation.org
Methodology Overall Architecture
Concept
Methodology Abbreviations
Terms
• HVD: History-based Vulnerability Detector
• VCC: Vulnerability-Contributing Commit(s). Changes
containing vulnerability
• UC: Unclassified Changes
• LT-S: Line type sensitive. The HVD approach
• LT-I: Line type insensitive. The replication of VCCFinder
Methodology Exploit vs Vulnerability
Terms
Potential
vulnerability
Vulnerability
Exploit
(malicious input)
Agenda
Evaluation
351,452
commits in total
Evaluation Dataset Provided by Perl et al.
Experiment
• This dataset contains commits labelled by VCC and UC and associated with
their CVE-IDs.
• It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories
implemented in C/C++.
• The number of unique tokens counts 170k.
• Compressed size is 525mb (npz).
Evaluation Implementation in Python
Experiment
To make the experiment reliable, I adopted a variety of libraries including:
• Numpy
• SciPy
• Scikit-learn
• Unidiff
LT-I: note that the reproducibility is limited since the source of VCCFinder is not
publicly available.
Evaluation Environment Specs
Experiment
The computation was performed at the one of CX250 Cluster (MPC):
• CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2
• Memory: 64GB (4GB DDR3-1866 ECC x16)
Evaluation Precision Improvement
• LT-S improved the AUC (area under curve) of its precision-recall curve by
18.8% from LT-I.
Precision
Evaluation Trade-off
• Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s)
• Note: the vast majority of the processing time is occupied by learning phase.
In the practical use case, the learnt model is dumped and shared with future
predictions for a while once calculated. Then, it takes a few seconds to parse a
given unknown commit and perform prediction by using the shared model.
Hence, the execution time of learning phase should not influence the
development process.
Precision
Evaluation The most contributing features
Effective Features
To gain more profound insights from the
experiment, this study also reveals that
valuables consisting of words related to
computer resource most significantly
contributed to the classification model.
For instance:
• (RAM) structors: memory allocation with
complex structures
• (RAM) vmalloc: virtual memory allocation
• (CPU) skbuf_head: a spin-lock of threads
• (network) tso: TCP Segmentation Offload
• (network) if_ether: a flag of Ethernet
availability
Evaluation Findings & insights
Effective Features
Findings:
• The valuable tokens which are relevant to computer resources such as CPU,
memory, and network
• The figure also shows most contributing valuables are added-tokens.
Insights:
• These findings do not surprise us because it’s obvious that vulnerability occurs
correlating closely with side effects with computer resource management and
adding code.
• However, it’s worth verifying that automatic detection approach makes no
difference with the experiential intuition of human.
Agenda
Conclusion
Despite the difficulty that the features acquirable via Git are limited, this study shows LT-
S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing
the original advantages:
• (a) Scalability
• (b) Generality
• (c) Explainability
CONCLUSION
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05
Thank you!
Questions & discussion

Contenu connexe

Tendances

OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For DevelopersKevin Brockhoff
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...Priyanka Aash
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportQAware GmbH
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE - ATT&CKcon
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsLionel Briand
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security Scott Raynovich
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelFrank Pfleger
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsLionel Briand
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingAmuhinda Hungai
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Abhik Roychoudhury
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions TestingCR
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsLionel Briand
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineDevOps.com
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsAkond Rahman
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsRAKESH RANA
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDSAlfredo Hickman
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?Akond Rahman
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN securityDavid Jorm
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...Open Networking Perú (Opennetsoft)
 

Tendances (20)

OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specifications
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes Manifests
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDS
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
 

Similaire à Detecting vulnerabilities in code through Git history analysis

Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Securitysedukull
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0Matt Lucas
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Ryan Hodgin
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedAshley Zupkus
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerVMware Tanzu
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingAarno Aukia
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420Steve Goeringer
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)Dinis Cruz
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureAlex Bulankou
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...SonjaChevre
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Reuven Harrison
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingRISC-V International
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?NGINX, Inc.
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSADenim Group
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Lionel Briand
 

Similaire à Detecting vulnerabilities in code through Git history analysis (20)

Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Security
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
1506.08725v1
1506.08725v11506.08725v1
1506.08725v1
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically Guaranteed
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 

Plus de Kenta Yamamoto

The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)Kenta Yamamoto
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...Kenta Yamamoto
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティKenta Yamamoto
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計するKenta Yamamoto
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか Kenta Yamamoto
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則Kenta Yamamoto
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3Kenta Yamamoto
 

Plus de Kenta Yamamoto (10)

The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティ
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計する
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか
 
Tips for bash script
Tips for bash scriptTips for bash script
Tips for bash script
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則
 
20110805 ui14課題2
20110805 ui14課題220110805 ui14課題2
20110805 ui14課題2
 
20110804 ui14課題
20110804 ui14課題20110804 ui14課題
20110804 ui14課題
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3
 

Dernier

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Dernier (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Detecting vulnerabilities in code through Git history analysis

  • 1. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27 Vulnerability Detection Based on Git History
  • 3. Introduction Research Background The Trend of Security Incidents Key facts. Why this research is important: In Quantity # of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD] In Quality • Equifax exposed 143M consumers’ data due to website application vulnerability (2017) • Yahoo breached 3B users’ account information (2013)
  • 4. The Century of Vulnerability # OF VULNERABILITIES As information technology is broadly adopted, the impact of security incidents is getting extensive and critical.
  • 5. Introduction To Help Code Reviewer We know how to deliver software in proper quality. Code review! Best Practice is Well-Known Review patches before release and fix bugs before deployment. Still, however, even the famous OSS projects struggle with the lack of code reviewers. A Trade-Off of Automation Techniques Software projects widely adopt a variety of automation approaches. Vulnerability detection techniques faces a contradictory: • (a) High precision. Useless if the tool outputs a billions of false positives. • (b) Adaptability. No one wants to make efforts only for ensuring security such as annotating unsafe user inputs. Research Background # Example of taint annotation int printf(/*@untainted@*/ char *fmt, ...);
  • 6. Git is somewhat difficult. No worries, it’s not only you! WHAT’S GIT?
  • 7. “(Git is) expressly designed to make you feel less intelligent than you thought you were” – Andrew Morton The Greatness of Git - www.linuxfoundation.org
  • 8. Introduction What’s Git? But Git is Always Stay With You Trust me, or try this command on your terminal: # List up how much you rely on Git history | awk '{ print $2 }' | sort | uniq -c | sort -r | head
  • 9. Introduction What’s Git? Git for Machine Learning Git provides what machine learning requires; good data: • Adopted by 69.2% of 30K developers [StackOverfow] • Trusted by most prominent OSS projects such as Linux Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and Apache HTTPD.
  • 10. Introduction What’s Git? CVE-ID and Security Fix on Git A sufficient number of reliable security fixes: • Refers CVE-IDs in their commit message • Or, fixed commits are referred by CVE database
  • 11. Introduction What’s Git? A Brief Introduction of Git Features
  • 13. A static analysis to detect suspicious vulnerabilities based on Git history. METHODOLOGY - HVD
  • 14. Methodology Proposal Approach Concept • This research proposes the approach which aims to reduce the false positive rate compared to VCCFinder [Perl et al] without sacrificing adaptability. • The data source is the same to VCCFinder but this approach takes account of added-lines and removed- lines in patch feature while VCCFinder doesn’t.
  • 15. Methodology VCCFinder: a Novel Approach Concept Generally, it’s hard to apply machine learning to source code because most high-level programming languages such as C/C++ are less redundant compared to natural languages and assembly languages. To address this difficulty, Perl et al.: • Narrowed down the problem to the quantifiable lemma. The quality of source code can be hardly quantified but vulnerability can be expressed as 0 or 1. • Leveraged the legacies. CVE database and the prominent OSS projects.
  • 16. “I really never wanted to do source control management at all and felt that it was just about the least interesting thing in the computing world” – Linus Torvalds 10 Years of Git - www.linuxfoundation.org
  • 18. Methodology Abbreviations Terms • HVD: History-based Vulnerability Detector • VCC: Vulnerability-Contributing Commit(s). Changes containing vulnerability • UC: Unclassified Changes • LT-S: Line type sensitive. The HVD approach • LT-I: Line type insensitive. The replication of VCCFinder
  • 19. Methodology Exploit vs Vulnerability Terms Potential vulnerability Vulnerability Exploit (malicious input)
  • 22. Evaluation Dataset Provided by Perl et al. Experiment • This dataset contains commits labelled by VCC and UC and associated with their CVE-IDs. • It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories implemented in C/C++. • The number of unique tokens counts 170k. • Compressed size is 525mb (npz).
  • 23. Evaluation Implementation in Python Experiment To make the experiment reliable, I adopted a variety of libraries including: • Numpy • SciPy • Scikit-learn • Unidiff LT-I: note that the reproducibility is limited since the source of VCCFinder is not publicly available.
  • 24. Evaluation Environment Specs Experiment The computation was performed at the one of CX250 Cluster (MPC): • CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2 • Memory: 64GB (4GB DDR3-1866 ECC x16)
  • 25. Evaluation Precision Improvement • LT-S improved the AUC (area under curve) of its precision-recall curve by 18.8% from LT-I. Precision
  • 26. Evaluation Trade-off • Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s) • Note: the vast majority of the processing time is occupied by learning phase. In the practical use case, the learnt model is dumped and shared with future predictions for a while once calculated. Then, it takes a few seconds to parse a given unknown commit and perform prediction by using the shared model. Hence, the execution time of learning phase should not influence the development process. Precision
  • 27. Evaluation The most contributing features Effective Features To gain more profound insights from the experiment, this study also reveals that valuables consisting of words related to computer resource most significantly contributed to the classification model. For instance: • (RAM) structors: memory allocation with complex structures • (RAM) vmalloc: virtual memory allocation • (CPU) skbuf_head: a spin-lock of threads • (network) tso: TCP Segmentation Offload • (network) if_ether: a flag of Ethernet availability
  • 28. Evaluation Findings & insights Effective Features Findings: • The valuable tokens which are relevant to computer resources such as CPU, memory, and network • The figure also shows most contributing valuables are added-tokens. Insights: • These findings do not surprise us because it’s obvious that vulnerability occurs correlating closely with side effects with computer resource management and adding code. • However, it’s worth verifying that automatic detection approach makes no difference with the experiential intuition of human.
  • 30. Despite the difficulty that the features acquirable via Git are limited, this study shows LT- S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing the original advantages: • (a) Scalability • (b) Generality • (c) Explainability CONCLUSION
  • 31. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05 Thank you! Questions & discussion