SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Design and Implementation of
Security Graph Language (SGL)
Dr. Asankhaya Sharma
Director of Software Engineering
CA Veracode
Motivation
● Software is built using large amounts of third-party code (up to 90%)
2
Motivation
● Software is built using large amounts of third-party code (up to 90%)
For each Java library
depended on, 4 others
are added
For each JS library
depended on,
9 others are added
3
● Unaudited third-party code is a liability
○ Apache Struts (2018)
■ CVE-2018-11776: RCE via URL
■ CVE-2017-5638: RCE via HTTP headers (Equifax breach)
○ Malicious libraries (eslint-scope, crossenv, 2018)
○ Heartbleed (OpenSSL, 2017)
○ GHOST (glibc, 2015)
○ Apache Commons Collections deserialization RCE (2015)
Motivation
4
● Manual auditing is infeasible
○ Hundreds of dependencies
○ Constantly changing
● Automated audits
○ Dependency-level
■ Ensure you’re not using a vulnerable version
○ Source-level
■ Ensure you’re not vulnerable, despite using a vulnerable version
■ Ensure you won’t be vulnerable as things change
● Potential vulnerabilities, anti-patterns
Motivation
5
● Capture the space in some abstract form
● Be able to interrogate it using flexible queries
● Automate and share these queries to rule out classes of issues
What we want
6
SGL
● Graph query
language
● Open source
security domain
○ Libraries,
vulnerabilities,
licenses
○ Methods,
classes
7
● Program analysis
○ Represent code as syntax trees
○ Reify intermediate structures, e.g. call graph, data-flow graph
○ Use transitive closure to derive insights
○ Query dependency graphs, etc.
● Vulnerability description
○ Structured alternative to CVEs
SGL: use cases
8
Related work
● Code analysis + graph databases
○ Yamaguchi, Fabian, et al. "Modeling and discovering vulnerabilities with code property
graphs." Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 2014.
● Graph query languages
○ Gremlin
○ Cypher
● Vulnerability description languages
○ OVAL
9
● Declarative Gremlin subset
● Compiled to Gremlin
● Transitive closure
● Optimizations
○ Reachability indices
○ Query planning
SGL: implementation
10
● Graph traversals
● Represent frontier as a stream
● Composition of (stateful) functions
● Turing-complete
○ Stateful variables
○ Imperative control flow
○ Branching
Detour: Gremlin
11
Detour: Gremlin
12
Detour: Gremlin
● Lots of imperative transformations
● Traversal direction matters
● Difficult to express imperative algorithms, e.g. Tarjan’s SCCs
○ Pure Gremlin is all about one homogenous stream
○ Lambda steps aren’t supported by all back ends
● Dynamically-typed
○ Use of strings as variables: cannot validate, inconsistent use
○ Everything is a traversal; actual traversals, control flow, etc.
13
Detour: Gremlin
● Security researchers are concerned with the domain, not the details of
traversing graphs
● A DSL should exist at a higher level of abstraction
● SGL:
○ Is declarative
○ Provides useful primitives, e.g. an efficient SCC implementation
○ Is typed
14
“Does this version of Apache Commons Collections
contain a method named readObject?”
SGL: language features
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
has_method method(name: ‘readObject’)
15
SGL: language features
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
has_method method(name: ‘readObject’)
commons-collections
3.2.2
readObject
...
...
...
has_method
depends_on
calls
calls
path
16
method(name: ‘readObject’) method_in_library
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
SGL: equivalence (sort of)
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
has_method method(name: ‘readObject’)
≈
17
“What methods does this version of Apache
Commons Collections contain?”
SGL: results
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
has_method
18
SGL: results
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
has_method
commons-collections
3.2.2
...
...
has_method
depends_on
calls
calls
...
...has_method
...
19
method(name: ‘readObject’)
method(name: ‘readExternal’)
method(name: ‘readResolve’)
SGL: results
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
has_method
20
“What libraries contain the method readObject?”
SGL: projection
library(_) has_method method(name: ‘readObject’)
21
SGL: projection
library(_) has_method method(name: ‘readObject’)
22
readObject
...
...
has_method calls
calls
...
...
has_method
SGL: projection
library(_) where(has_method method(name: ‘readObject’))
readObject
...
...
has_method calls
calls
...
...
has_method
23
SGL: equivalence
library(_) where(has_method method(name: ‘readObject’))
=
method(name: ‘readObject’) method_in_library
24
“What are the direct dependencies of Apache
Commons Collections?”
SGL: transitive closure
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
depends_on
25
SGL: transitive closure
“What are all the dependencies of Apache
Commons Collections?”
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
depends_on*
26
SGL: aggregations
“What are 5 dependencies of Apache Commons
Collections?”
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
depends_on* limit(5)
aggregation
27
“How many dependencies of Apache Commons
Collections are there?”
SGL: aggregations
library(coord1: ‘commons-collections’, version: ‘3.2.2’)
depends_on* count
aggregation
28
let depends_on_method =
depends_on has_method in
spring depends_on_method
SGL: bindings, abstraction
let spring = library(
'java',
'org.springframework',
'spring-webmvc',
'4.3.8.RELEASE'
) in
spring depends_on*
29
Compilation
let spring = library(
'java',
'org.springframework',
'spring-webmvc',
'4.3.8.RELEASE'
) in
spring depends_on*
g.V()
.hasLabel('library')
.has('language', 'java')
.has('group', 'org.springframework')
.has('artifact', 'spring-webmvc')
.has('version', '4.3.8.RELEASE')
.emit().repeat(out('depends_on').dedup())
30
Demo
● Vulnerable methods
● Struts
○ CVE-2018-11776, Apache Struts
○ A malicious URL leads to an RCE via OGNL execution
○ Source: ActionProxy#getMethod
○ Sink: OgnlUtil#compileAndExecute
31
● Homomorphism-based bag semantics…
○ The result of evaluating a query Q against a graph G consists of all possible homomorphisms
from Q to G
○ In other words, bindings of query variables are completely unconstrained, vs limited so two
variables can’t be bound to the same thing*
○ Results are bags, not sets
○ Practically, like Gremlin, SPARQL, relational databases, and unlike Cypher
● … without joins (officially)
* Refer to Angles et. al, 2017. "Foundations of Modern Query Languages for Graph Databases." ACM Comput. Surv. 50, 5,
Article 68 (September 2017)
Semantics
32
Semantics
● Not Turing-complete
○ Programs always terminate
● No side effects
○ Every expression is referentially transparent
● Easier to rewrite and analyze
33
● We consider a type as the product of a label (e.g. library, method) and
associated properties
library(…) :: Library
method(…) :: Method
Type system
34
Type system
library(…) depends_on* :: Library
library(…) has_method :: Method
library(…) count :: Integer
library(…) where(…) :: Library
35
● Reduction to relational algebra
○ (Inner) join: edge traversal
○ Project: where
○ Select: vertex predicates
○ Treat transitive closure as an extensional relation
● Reorder selections
● Index usage
● Join ordering
Optimizations
36
● Reorder selections
○ Gremlin does this (along with other optimizations)
○ Perform more specific selections first
dedup library(coord1: ‘org.springframework’)
→
library(coord1: ‘org.springframework’) dedup
Optimizations
37
● Index usage
○ Gremlin takes advantage of traditional indices, e.g. for locating vertices in a graph
○ We extend this with reachability indices where possible
library(…) depends_on* has_method method(…)
○ Simplest scheme: store the transitive closure in a bit matrix
■ With index: O(n2
) space, O(1) time
■ Without: no space, O(nd
) time + potentially large constant factor
○ More sophisticated indexing schemes exist
Optimizations
38
● Join ordering (i.e. query planning)
○ Given n relations, n! possible orderings
○ Essential problems: query equivalence, cost
Optimizations
library(_) where(has_method method(name: ‘readObject’))
method(name: ‘readObject’) method_in_library
39
● Join ordering
○ Enumerate equivalent queries
■ Convert queries into domain graph
■ Compute all possible orderings
■ Certain orderings are invalid, e.g. not
Optimizations
readObject
has_method...
...
has_method
method1
has_method
library1
40
● Join ordering
○ Query cost
■ Observation: certain orderings are known to be more efficient
■ e.g. many-to-one relations
■ Notion of redundancy: vertices traversed which don’t contribute to result
Optimizations
readObject
has_method
...
...
has_method readObject
...
...
has_method
vs
41
● Join ordering
○ Query cost
■ Redundancy for many-to-one relations
■ For the others, statistics from a large dataset
● Product of cardinalities
Optimizations
Edge Avg
out-deg
Avg
in-deg
depends_on 4.0 4.1
has_file 43.5 1.0
has_method 1508.2 8.9
calls 27.2 30.6
embeds 54.9 22.0
defines 14.4 1.8
has_library_hash 1.0 2.6
has_method_hash 4.9 18.6
has_library 16.4 1.9
has_vulnerable_method 1.8 2.1
has_version_range 2.9 1.2
has_class 217.0 11.1
extends 1.0 1.0
42
● Join ordering benchmarks
Optimizations
let glassfish_class =
class(regex 'org.glassfish.*') in
let read_object =
method(method_name:'readObject') in
let get_path = method(
class_name:'java/io/File',
method_name:'getPath') in
glassfish_class defines
read_object where(calls get_path)
43
● Join ordering benchmarks
Optimizations
Query Redundancy Runtime
Original glassfish_class defines
read_object where(calls get_path)
391.2 105.8s
Reversed get_path called_by read_object
where(defined_by glassfish_class)
55.7 0.6s
44
Dataset
● Public data from Maven Central
● 79M vertices, 582M edges, 76GB
● Call graphs computed with CHA/RTA
● Bytecode hashing
45
● Program analysis
● Vulnerability description
○ Structured alternative to CVEs
SGL: use cases
46
● CVEs
○ Useful canonical identifiers for vulnerabilities
○ Not machine-readable
■ Vulnerable components must be identified manually (and inconsistently)
■ False positives on real-world systems
■ Difficult to deduplicate
Describing vulnerabilities
47
Describing vulnerabilities
● Idea: represent vulnerabilities as SGL queries
○ Structured and can be processed by tools
○ Flexibility, e.g. dynamic updates
○ Trivially check by executing
○ Relate to existing data, libraries and vulns
● Deduplication
○ Relies on query equivalence; difficult for arbitrary queries
○ Idea: define a subset that can be checked for equivalence
48
● Constant queries that can be compared, i.e. a data structure
● Normalized form
○ Bindings
○ Vertex predicates
○ No edge steps
○ Must begin at vulnerability
○ Expand syntactic sugar
○ Sort
Normal forms
vulnerability(cwe: 1)
has_version_range union(
version_range(from: '1.0', to: '1.1'))
union(
has_library union(
library('java', 'web', 'core', '1.0'),
library('java', 'web', 'core', '1.1')),
has_vulnerable_method union(
method('com/example/Controller',
'config', '()')))
49
Reification
● We’d also like to use vulnerabilities in queries
○ “Find all vulnerable libraries”
● Reify vulnerabilities as vertices
● Distinguish by storing normalized query in a property
50
Considerations
● Dynamically-updating vulnerabilities
○ e.g. when a new library version is released
○ Can be convenient, but would require manual review anyway
● Finding similar vulnerabilities
○ Search for vulnerabilities associated with a library
○ Generalize query by removing predicates
51
● Expressiveness
○ Datalog without user-defined rules
■ Computation?
○ Arbitrary “diamond” joins
library(…) ?a depends_on library(…) ?b,
?a has_method method(…) method_in_library ?b
Future work
52
● More domains
○ Dataflow graphs
Future work
53
Try it out
● www.sourceclear.com
● Sign up for a free trial
● Activate an agent
● SRCCLR_ENABLE_SGL=true srcclr scan --url
https://github.com/srcclr/example-java-maven --sgl
54
Thank you!
55
● Questions?
● More info - sgl.org
● Contact
○ Twitter - @asankhaya

Contenu connexe

Tendances

Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
Universität Rostock
 

Tendances (20)

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Building Topology in NS3
Building Topology in NS3Building Topology in NS3
Building Topology in NS3
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015
 
Semophores and it's types
Semophores and it's typesSemophores and it's types
Semophores and it's types
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
 
Tutorial ns 3-tutorial-slides
Tutorial ns 3-tutorial-slidesTutorial ns 3-tutorial-slides
Tutorial ns 3-tutorial-slides
 
How to Think in RxJava Before Reacting
How to Think in RxJava Before ReactingHow to Think in RxJava Before Reacting
How to Think in RxJava Before Reacting
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
Presentation iswc
Presentation iswcPresentation iswc
Presentation iswc
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMCDiagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
rspamd-fosdem
rspamd-fosdemrspamd-fosdem
rspamd-fosdem
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
 

Similaire à Design and Implementation of the Security Graph Language

Similaire à Design and Implementation of the Security Graph Language (20)

Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Clang: More than just a C/C++ Compiler
Clang: More than just a C/C++ CompilerClang: More than just a C/C++ Compiler
Clang: More than just a C/C++ Compiler
 
Nzitf Velociraptor Workshop
Nzitf Velociraptor WorkshopNzitf Velociraptor Workshop
Nzitf Velociraptor Workshop
 
OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Hibernate 1x2
Hibernate 1x2Hibernate 1x2
Hibernate 1x2
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Java SE 8 library design
Java SE 8 library designJava SE 8 library design
Java SE 8 library design
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
 
Reactive Applications with Apache Pulsar and Spring Boot
Reactive Applications with Apache Pulsar and Spring BootReactive Applications with Apache Pulsar and Spring Boot
Reactive Applications with Apache Pulsar and Spring Boot
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
 
Java 8
Java 8Java 8
Java 8
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
SFO15-110: Toolchain Collaboration
SFO15-110: Toolchain CollaborationSFO15-110: Toolchain Collaboration
SFO15-110: Toolchain Collaboration
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 

Plus de Asankhaya Sharma

Plus de Asankhaya Sharma (13)

9 types of people you find on your team
9 types of people you find on your team9 types of people you find on your team
9 types of people you find on your team
 
Securing Open Source Code in Enterprise
Securing Open Source Code in EnterpriseSecuring Open Source Code in Enterprise
Securing Open Source Code in Enterprise
 
Secure Software Development
Secure Software DevelopmentSecure Software Development
Secure Software Development
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and Mixins
 
Specifying compatible sharing in data structures
Specifying compatible sharing in data structuresSpecifying compatible sharing in data structures
Specifying compatible sharing in data structures
 
Exploiting undefined behaviors for efficient symbolic execution
Exploiting undefined behaviors for efficient symbolic executionExploiting undefined behaviors for efficient symbolic execution
Exploiting undefined behaviors for efficient symbolic execution
 
DIDAR: Database Intrusion Detection with Automated Recovery
DIDAR: Database Intrusion Detection with Automated RecoveryDIDAR: Database Intrusion Detection with Automated Recovery
DIDAR: Database Intrusion Detection with Automated Recovery
 
Developer-focused Software Security
Developer-focused Software SecurityDeveloper-focused Software Security
Developer-focused Software Security
 
Visualizing Symbolic Execution with Bokeh
Visualizing Symbolic Execution with BokehVisualizing Symbolic Execution with Bokeh
Visualizing Symbolic Execution with Bokeh
 
Crafting a Successful Engineering Career
Crafting a Successful Engineering CareerCrafting a Successful Engineering Career
Crafting a Successful Engineering Career
 
Certified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationCertified Reasoning for Automated Verification
Certified Reasoning for Automated Verification
 
Last Days of Academy
Last Days of AcademyLast Days of Academy
Last Days of Academy
 
SayCheese Ad
SayCheese AdSayCheese Ad
SayCheese Ad
 

Dernier

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Design and Implementation of the Security Graph Language

  • 1. Design and Implementation of Security Graph Language (SGL) Dr. Asankhaya Sharma Director of Software Engineering CA Veracode
  • 2. Motivation ● Software is built using large amounts of third-party code (up to 90%) 2
  • 3. Motivation ● Software is built using large amounts of third-party code (up to 90%) For each Java library depended on, 4 others are added For each JS library depended on, 9 others are added 3
  • 4. ● Unaudited third-party code is a liability ○ Apache Struts (2018) ■ CVE-2018-11776: RCE via URL ■ CVE-2017-5638: RCE via HTTP headers (Equifax breach) ○ Malicious libraries (eslint-scope, crossenv, 2018) ○ Heartbleed (OpenSSL, 2017) ○ GHOST (glibc, 2015) ○ Apache Commons Collections deserialization RCE (2015) Motivation 4
  • 5. ● Manual auditing is infeasible ○ Hundreds of dependencies ○ Constantly changing ● Automated audits ○ Dependency-level ■ Ensure you’re not using a vulnerable version ○ Source-level ■ Ensure you’re not vulnerable, despite using a vulnerable version ■ Ensure you won’t be vulnerable as things change ● Potential vulnerabilities, anti-patterns Motivation 5
  • 6. ● Capture the space in some abstract form ● Be able to interrogate it using flexible queries ● Automate and share these queries to rule out classes of issues What we want 6
  • 7. SGL ● Graph query language ● Open source security domain ○ Libraries, vulnerabilities, licenses ○ Methods, classes 7
  • 8. ● Program analysis ○ Represent code as syntax trees ○ Reify intermediate structures, e.g. call graph, data-flow graph ○ Use transitive closure to derive insights ○ Query dependency graphs, etc. ● Vulnerability description ○ Structured alternative to CVEs SGL: use cases 8
  • 9. Related work ● Code analysis + graph databases ○ Yamaguchi, Fabian, et al. "Modeling and discovering vulnerabilities with code property graphs." Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 2014. ● Graph query languages ○ Gremlin ○ Cypher ● Vulnerability description languages ○ OVAL 9
  • 10. ● Declarative Gremlin subset ● Compiled to Gremlin ● Transitive closure ● Optimizations ○ Reachability indices ○ Query planning SGL: implementation 10
  • 11. ● Graph traversals ● Represent frontier as a stream ● Composition of (stateful) functions ● Turing-complete ○ Stateful variables ○ Imperative control flow ○ Branching Detour: Gremlin 11
  • 13. Detour: Gremlin ● Lots of imperative transformations ● Traversal direction matters ● Difficult to express imperative algorithms, e.g. Tarjan’s SCCs ○ Pure Gremlin is all about one homogenous stream ○ Lambda steps aren’t supported by all back ends ● Dynamically-typed ○ Use of strings as variables: cannot validate, inconsistent use ○ Everything is a traversal; actual traversals, control flow, etc. 13
  • 14. Detour: Gremlin ● Security researchers are concerned with the domain, not the details of traversing graphs ● A DSL should exist at a higher level of abstraction ● SGL: ○ Is declarative ○ Provides useful primitives, e.g. an efficient SCC implementation ○ Is typed 14
  • 15. “Does this version of Apache Commons Collections contain a method named readObject?” SGL: language features library(coord1: ‘commons-collections’, version: ‘3.2.2’) has_method method(name: ‘readObject’) 15
  • 16. SGL: language features library(coord1: ‘commons-collections’, version: ‘3.2.2’) has_method method(name: ‘readObject’) commons-collections 3.2.2 readObject ... ... ... has_method depends_on calls calls path 16
  • 17. method(name: ‘readObject’) method_in_library library(coord1: ‘commons-collections’, version: ‘3.2.2’) SGL: equivalence (sort of) library(coord1: ‘commons-collections’, version: ‘3.2.2’) has_method method(name: ‘readObject’) ≈ 17
  • 18. “What methods does this version of Apache Commons Collections contain?” SGL: results library(coord1: ‘commons-collections’, version: ‘3.2.2’) has_method 18
  • 19. SGL: results library(coord1: ‘commons-collections’, version: ‘3.2.2’) has_method commons-collections 3.2.2 ... ... has_method depends_on calls calls ... ...has_method ... 19
  • 20. method(name: ‘readObject’) method(name: ‘readExternal’) method(name: ‘readResolve’) SGL: results library(coord1: ‘commons-collections’, version: ‘3.2.2’) has_method 20
  • 21. “What libraries contain the method readObject?” SGL: projection library(_) has_method method(name: ‘readObject’) 21
  • 22. SGL: projection library(_) has_method method(name: ‘readObject’) 22 readObject ... ... has_method calls calls ... ... has_method
  • 23. SGL: projection library(_) where(has_method method(name: ‘readObject’)) readObject ... ... has_method calls calls ... ... has_method 23
  • 24. SGL: equivalence library(_) where(has_method method(name: ‘readObject’)) = method(name: ‘readObject’) method_in_library 24
  • 25. “What are the direct dependencies of Apache Commons Collections?” SGL: transitive closure library(coord1: ‘commons-collections’, version: ‘3.2.2’) depends_on 25
  • 26. SGL: transitive closure “What are all the dependencies of Apache Commons Collections?” library(coord1: ‘commons-collections’, version: ‘3.2.2’) depends_on* 26
  • 27. SGL: aggregations “What are 5 dependencies of Apache Commons Collections?” library(coord1: ‘commons-collections’, version: ‘3.2.2’) depends_on* limit(5) aggregation 27
  • 28. “How many dependencies of Apache Commons Collections are there?” SGL: aggregations library(coord1: ‘commons-collections’, version: ‘3.2.2’) depends_on* count aggregation 28
  • 29. let depends_on_method = depends_on has_method in spring depends_on_method SGL: bindings, abstraction let spring = library( 'java', 'org.springframework', 'spring-webmvc', '4.3.8.RELEASE' ) in spring depends_on* 29
  • 30. Compilation let spring = library( 'java', 'org.springframework', 'spring-webmvc', '4.3.8.RELEASE' ) in spring depends_on* g.V() .hasLabel('library') .has('language', 'java') .has('group', 'org.springframework') .has('artifact', 'spring-webmvc') .has('version', '4.3.8.RELEASE') .emit().repeat(out('depends_on').dedup()) 30
  • 31. Demo ● Vulnerable methods ● Struts ○ CVE-2018-11776, Apache Struts ○ A malicious URL leads to an RCE via OGNL execution ○ Source: ActionProxy#getMethod ○ Sink: OgnlUtil#compileAndExecute 31
  • 32. ● Homomorphism-based bag semantics… ○ The result of evaluating a query Q against a graph G consists of all possible homomorphisms from Q to G ○ In other words, bindings of query variables are completely unconstrained, vs limited so two variables can’t be bound to the same thing* ○ Results are bags, not sets ○ Practically, like Gremlin, SPARQL, relational databases, and unlike Cypher ● … without joins (officially) * Refer to Angles et. al, 2017. "Foundations of Modern Query Languages for Graph Databases." ACM Comput. Surv. 50, 5, Article 68 (September 2017) Semantics 32
  • 33. Semantics ● Not Turing-complete ○ Programs always terminate ● No side effects ○ Every expression is referentially transparent ● Easier to rewrite and analyze 33
  • 34. ● We consider a type as the product of a label (e.g. library, method) and associated properties library(…) :: Library method(…) :: Method Type system 34
  • 35. Type system library(…) depends_on* :: Library library(…) has_method :: Method library(…) count :: Integer library(…) where(…) :: Library 35
  • 36. ● Reduction to relational algebra ○ (Inner) join: edge traversal ○ Project: where ○ Select: vertex predicates ○ Treat transitive closure as an extensional relation ● Reorder selections ● Index usage ● Join ordering Optimizations 36
  • 37. ● Reorder selections ○ Gremlin does this (along with other optimizations) ○ Perform more specific selections first dedup library(coord1: ‘org.springframework’) → library(coord1: ‘org.springframework’) dedup Optimizations 37
  • 38. ● Index usage ○ Gremlin takes advantage of traditional indices, e.g. for locating vertices in a graph ○ We extend this with reachability indices where possible library(…) depends_on* has_method method(…) ○ Simplest scheme: store the transitive closure in a bit matrix ■ With index: O(n2 ) space, O(1) time ■ Without: no space, O(nd ) time + potentially large constant factor ○ More sophisticated indexing schemes exist Optimizations 38
  • 39. ● Join ordering (i.e. query planning) ○ Given n relations, n! possible orderings ○ Essential problems: query equivalence, cost Optimizations library(_) where(has_method method(name: ‘readObject’)) method(name: ‘readObject’) method_in_library 39
  • 40. ● Join ordering ○ Enumerate equivalent queries ■ Convert queries into domain graph ■ Compute all possible orderings ■ Certain orderings are invalid, e.g. not Optimizations readObject has_method... ... has_method method1 has_method library1 40
  • 41. ● Join ordering ○ Query cost ■ Observation: certain orderings are known to be more efficient ■ e.g. many-to-one relations ■ Notion of redundancy: vertices traversed which don’t contribute to result Optimizations readObject has_method ... ... has_method readObject ... ... has_method vs 41
  • 42. ● Join ordering ○ Query cost ■ Redundancy for many-to-one relations ■ For the others, statistics from a large dataset ● Product of cardinalities Optimizations Edge Avg out-deg Avg in-deg depends_on 4.0 4.1 has_file 43.5 1.0 has_method 1508.2 8.9 calls 27.2 30.6 embeds 54.9 22.0 defines 14.4 1.8 has_library_hash 1.0 2.6 has_method_hash 4.9 18.6 has_library 16.4 1.9 has_vulnerable_method 1.8 2.1 has_version_range 2.9 1.2 has_class 217.0 11.1 extends 1.0 1.0 42
  • 43. ● Join ordering benchmarks Optimizations let glassfish_class = class(regex 'org.glassfish.*') in let read_object = method(method_name:'readObject') in let get_path = method( class_name:'java/io/File', method_name:'getPath') in glassfish_class defines read_object where(calls get_path) 43
  • 44. ● Join ordering benchmarks Optimizations Query Redundancy Runtime Original glassfish_class defines read_object where(calls get_path) 391.2 105.8s Reversed get_path called_by read_object where(defined_by glassfish_class) 55.7 0.6s 44
  • 45. Dataset ● Public data from Maven Central ● 79M vertices, 582M edges, 76GB ● Call graphs computed with CHA/RTA ● Bytecode hashing 45
  • 46. ● Program analysis ● Vulnerability description ○ Structured alternative to CVEs SGL: use cases 46
  • 47. ● CVEs ○ Useful canonical identifiers for vulnerabilities ○ Not machine-readable ■ Vulnerable components must be identified manually (and inconsistently) ■ False positives on real-world systems ■ Difficult to deduplicate Describing vulnerabilities 47
  • 48. Describing vulnerabilities ● Idea: represent vulnerabilities as SGL queries ○ Structured and can be processed by tools ○ Flexibility, e.g. dynamic updates ○ Trivially check by executing ○ Relate to existing data, libraries and vulns ● Deduplication ○ Relies on query equivalence; difficult for arbitrary queries ○ Idea: define a subset that can be checked for equivalence 48
  • 49. ● Constant queries that can be compared, i.e. a data structure ● Normalized form ○ Bindings ○ Vertex predicates ○ No edge steps ○ Must begin at vulnerability ○ Expand syntactic sugar ○ Sort Normal forms vulnerability(cwe: 1) has_version_range union( version_range(from: '1.0', to: '1.1')) union( has_library union( library('java', 'web', 'core', '1.0'), library('java', 'web', 'core', '1.1')), has_vulnerable_method union( method('com/example/Controller', 'config', '()'))) 49
  • 50. Reification ● We’d also like to use vulnerabilities in queries ○ “Find all vulnerable libraries” ● Reify vulnerabilities as vertices ● Distinguish by storing normalized query in a property 50
  • 51. Considerations ● Dynamically-updating vulnerabilities ○ e.g. when a new library version is released ○ Can be convenient, but would require manual review anyway ● Finding similar vulnerabilities ○ Search for vulnerabilities associated with a library ○ Generalize query by removing predicates 51
  • 52. ● Expressiveness ○ Datalog without user-defined rules ■ Computation? ○ Arbitrary “diamond” joins library(…) ?a depends_on library(…) ?b, ?a has_method method(…) method_in_library ?b Future work 52
  • 53. ● More domains ○ Dataflow graphs Future work 53
  • 54. Try it out ● www.sourceclear.com ● Sign up for a free trial ● Activate an agent ● SRCCLR_ENABLE_SGL=true srcclr scan --url https://github.com/srcclr/example-java-maven --sgl 54
  • 55. Thank you! 55 ● Questions? ● More info - sgl.org ● Contact ○ Twitter - @asankhaya