SlideShare a Scribd company logo
1 of 29
Download to read offline
Elegant and Scalable Code Querying with
Code Property Graphs
Fabian Yamaguchi
October 4, 2019, Connected Data London
Fabian Yamaguchi
Chief Scientist, ShiftLeft Inc.
Github: fabsx00
Twitter: @fabsx00
Email: fabs@shiftleft.io
PhD, University of Goettingen
Loves code analysis, vulnerability
discovery, machine learning, and
graphs!
What is Ocular? What is Joern?
- Goal: provide query language to describe patterns in code
- to identify bugs and vulnerabilities
- to help in deeply understanding large programs
- Think of it as an extensible Code Analysis Machine
- Programmable in JVM-based languages (e.g., Java/Scala/Kotlin)
- You can write scripts, language extensions and libraries on top of it
- Joern is Ocular’s free-spirited open-source brother
This talk is about the technology behind these engines
Low level graph representations of programs
- Each graph provides a different perspective on the code
- Can we merge them?
Abstract Syntax Trees
Control
flow graphs
Program dependence graphs
Dominator
tree
Combining graphs with “Property Graphs”
- “A property graph is a directed edge-labeled, attributed multigraph”
- Attributes allow data to be stored in nodes/edges
- Edge labels allow different types of relations to be present in one graph
Code Property Graph (2014) - rudimentary concept
presented in
Specification - Key Design Ideas
- Specification that works over programming languages
- Delay as much of the graph creation to second-stage
- Provide generic representation for core programming language concepts
- Methods/Functions
- Types
- Namespaces
- Instructions
- Call sites
- Encode control flow structures only via a control flow graph
- Model only local program properties and leave global program
representations for later analysis stages
OSS Specification for first-stage code property graph
https://github.com/ShiftLeftSecurity/codepropertygraph
A “container” for code over arbitrary instruction sets
- Define only a common format for
representing code
- Allow arbitrary instruction set (given
by semantics) as a parameter
- Represent all code using only
- call sites and method stubs
- call edges, and control flow edges
- data-flow semantics via data flow edges
Second stage: “linking”
jvm2cpg
csharp2cpg
llvm2cpg
...
go2cpg
Local analysis
(language
dependent)
Analyzer
(non)-interactive Report
cpg.bin.zip
Shared operations and “linking” steps
cpg2scpg
scpg.bin.zip
First level
(unlinked) CPG
Second level
(enhanced and linked) CPG
Base layer of the code property graph
- Production quality version of 2014 code property graph
- Language-independent intermediate representation of control-flow and
data-flow semantics
- Interprocedural, flow-sensitive, context-sensitive, field-sensitive data-flow
tracker available that operates on this representation
- Heuristics and street smarts to terminate in < 10 minutes
This is already pretty powerful...
- Entry names of archives
must be checked for control
characters (e.g., “../”) to
avoid path traversal
- First presented in phrack in ‘91
- Reported again with shiny logo
as “Zip-Slip” by startup in 2018
- https://www.youtube.com/watc
h?v=Ry_yb5Oipq0 for details
Typical vulnerability type associated with Zip files
- Names of archive entries are used unchecked as path
names when opening a file
- Concrete example: return values of ZipEntry.getName are
used as path names in a java.io.File constructor
File f = new File(entry.getName())
Sufficient to model control flow, data flow, and method
invocations correctly.
How does this surface in Java code?
Query
ocular> val src = cpg.method.fullName(“java.util.zip.*getName.*”).methodReturn
val snk = cpg.method.fullName(“.*File.<init>”).parameter.index(1)
snk.reachableBy(src).flows.passesNot(“.*escape|sanitize|contains.*”)
- “XML External Entity Processing” vulnerability
- The attacker controls an XML file processed by the application and
it does NOT disable resolving of external entities
- If this is given, the attacker can provide an XML file that reads and sends
sensitive files to the attacker as part of the parsing process
- OWASP cheat sheet says:
Typical XML-related vulnerability: XXE
Query
ocular> val snk = cpg.method.fullName(“.*XMLStreamReader.*next.*”)
.parameter.index(0)
val src =
cpg.method.fullName(“.*XMLInputFactory.*create.*”).methodReturn
snk.reachableBy(src).flows.l.andNot{
// No calls to setProperty
cpg.method.name(“setProperty”).fullName(“.*XML.*”).callIn.l
}
… but not powerful enough.
Higher level abstractions - Example: Microservice
Program
Java Platform “File”
“Database”
Java Platform
GET /foo/bar?...
{“foo” : “bar”, ...}
“Route”
“Handler”“request”
“Json response”
“query”
<form action=”/foo/bar”>
<input type=”hidden”
name=”csrfprotect”>
...
<input name=”foo”
htmlEscaped=”true”>
</form>
“Input field”
“HTML
attribute”
“Form”
“Information flow”
Humans tend to describe vulnerabilities on much
higher levels of abstraction!
Higher-level abstractions
Handlers/Routes (Vert.x framework)
@Override
public void start(Future<Void> done) {
// Create a router object.
Router router = Router.router(vertx);
router.get("/health").handler(rc -> rc.response().end("OK"));
// This is how one can do RBAC, e.g.: only admin is allowed
router.get("/greeting").handler(ctx ->
ctx.user().isAuthorized("booster-admin", authz -> {
if (authz.succeeded() && authz.result()) {
ctx.next();
} else {
log.error("AuthZ failed!");
ctx.fail(403);
}
}));
…
}
- Lambda expression flows into
first argument of `handler`
- Route (as string) flows into first
argument of `get`.
- Return value of `get` flows into
instance parameter of
`handler` (or the other way
around)
Extraction of handlers/routes
val HANDLER_METHOD_NAME = "io.vertx.ext.web.Route.handler:io.vertx.ext.web.Route(io.vertx.core.Handler)"
val firstParamOfHandler = cpg.method.fullNameExact(HANDLER_METHOD_NAME).parameter.index(1)
val flowsOfMethodRefsIntoHandler = firstParamOfHandler.reachableBy(cpg.methodRef).flows.l
val handlerNameRoutePairs = flowsOfMethodRefsIntoHandler.map { flow =>
val handlerName = flow.source.node.asInstanceOf[nodes.MethodRef].methodInstFullName
val route = routeForFlowIntoHandler(flow)(handlerName, route)
(handlerName, route)
}
def routeForFlowIntoHandler(flow : NewFlow) = {
val getOrPostSource = cpg.method.fullName("io.vertx.*Router.*(get|post):.*").methodReturn
Flow.sink.callsite
.map(x => x.asInstanceOf[nodes.Call]).get
.start.argument(0).reachableBy(getOrPostSource)
.flows.source.l
.flatMap(x => x.callsite.get.asInstanceOf[nodes.Call].start.argument(1).code.l)
.headOption.getOrElse("")
}
Creating DOM Trees from JSP files in a CPG Pass
class JspPass(cpg: Cpg) extends CpgPass(cpg) {
override def run(): Iterator[DiffGraph] = {
Queries.configFiles(cpg, ".jsp").toList.sorted.map {
case (filename, html) =>
val diffGraph = new DiffGraph
val rootCpgNode =
createAndAddTreesRecursively(
Jsoup.parse(html).root(), None, diffGraph)
cpg.configfile.nameExact(filename).headOption.foreach(
diffGraph.addEdgeFromOriginal(_,
rootCpgNode, EdgeTypes.CONTAINS))
diffGraph
}.toIterator
}
def createAndAddTreesRecursively(node: Node,
maybeParent: Option[NewDomNode],
diffGraph: DiffGraph): NewDomNode = {
implicit val dstGraph: DiffGraph = diffGraph
val attributes = node.attributes().asScala.map { attr =>
NewDomAttribute(attr.getKey, attr.getValue) }.toList
// Add node + attribute nodes
val newDomNode = NewDomNode(node.nodeName,
attributes)
diffGraph.addNode(newDomNode)
newDomNode.start.store()
// Add AST edge from parent to node
maybeParent.foreach(
diffGraph.addEdge(_, newDomNode, EdgeTypes.AST)
)
node.childNodes.asScala.foreach(
createAndAddTreesRecursively(_,
Some(newDomNode),
diffGraph))
newDomNode
}
}
Query for XSS detection using DOM + Bytecode
ocular> val src = cpg.form.method(“post”).input
.filterNot(_.attribute.name(“htmlEscaped”))
.variable
val snk = … // some output routine in HTML
snk.reachableBy(src).flows.l
Transition from
HTML DOM into
Java Code!
- Literature deals a lot with FPs due to model limitations, e.g.,
overtainting of collections, conservative call graphs, ...
- In practice, most FPs result from context information, e.g.,
information about the business logic, that you cannot
deduce from the code alone:
- “This is an internal service that only our admin uses”
- “Without first convincing the authentication server, this code would
never be executed”
- “Due to $aliens, this integer is always 5 and thus cannot be negative”
- Ability to model the $aliens part is crucial to reduce false
positives
- We do this mostly via passes that tag the graph
Outside information, business logic, and FP reduction
Code Property Graphs Today
Base layer - low level local program
representations: syntax, control flow,
methods, types.
Vulnerabilities
Multiple domain-specific layers [...]
Call graph, type hierarchy, data
flows, configurations, dependencies
Scaling static analysis
- Summaries
- Scaling static analysis requires “summaries” of program behavior (in order to skip duplicate
calculation of facts, e.g., for library methods)
- Calculating summaries for data flow is common practice
- Upper layers of the CPG generalize the concept of a summary
- Parallelism
- Processors aren’t getting much faster, but you’re getting more and more cores.
- Literature has very little to say about multiple cores, let alone multiple cloud instances
- CPG passes are a design with parallelism in mind
Designed for distributed computing
- Passes can be run in a sequence like the passes of a compiler
- The design also allows to run independent passes in parallel though!
End of presentation
@ShiftLeftInc @fabsx00
https://shiftleft.io
https://joern.io

More Related Content

What's hot

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAPEDB
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutionsKirill Loifman
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1AjayRawat971036
 
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)confluent
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLChristian Antognini
 
Basics of reflection in java
Basics of reflection in javaBasics of reflection in java
Basics of reflection in javakim.mens
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With GaleraMydbops
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSONChris Saxon
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 

What's hot (20)

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Joy of scala
Joy of scalaJoy of scala
Joy of scala
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutions
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1
 
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQL
 
Basics of reflection in java
Basics of reflection in javaBasics of reflection in java
Basics of reflection in java
 
Quick introduction to scala
Quick introduction to scalaQuick introduction to scala
Quick introduction to scala
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Similar to Elegant and Scalable Code Querying with Code Property Graphs

Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...Codemotion
 
Porting VisualWorks code to Pharo
Porting VisualWorks code to PharoPorting VisualWorks code to Pharo
Porting VisualWorks code to PharoESUG
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016Tim Ellison
 
Lessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsLessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsItai Yaffe
 
JSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph PicklJSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph PicklChristoph Pickl
 
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...Codemotion
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiSebastian Ruder
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...Fabio Franzini
 
Construction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific LanguagesConstruction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific LanguagesThoughtWorks
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19Joseph Kuo
 
PHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutionsPHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutionsOleg Zinchenko
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientAngelo Dell'Aera
 
Server Side Javascript
Server Side JavascriptServer Side Javascript
Server Side Javascriptrajivmordani
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceTim Ellison
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...Karthik Murugesan
 

Similar to Elegant and Scalable Code Querying with Code Property Graphs (20)

Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
 
Porting VisualWorks code to Pharo
Porting VisualWorks code to PharoPorting VisualWorks code to Pharo
Porting VisualWorks code to Pharo
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Lessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsLessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark Applications
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
JSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph PicklJSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph Pickl
 
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
 
Construction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific LanguagesConstruction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific Languages
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19
 
PHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutionsPHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutions
 
Kamailio Updates - VUC 588
Kamailio Updates - VUC 588Kamailio Updates - VUC 588
Kamailio Updates - VUC 588
 
Test2
Test2Test2
Test2
 
Test2
Test2Test2
Test2
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 
Server Side Javascript
Server Side JavascriptServer Side Javascript
Server Side Javascript
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
 

More from Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenConnected Data World
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Connected Data World
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine LearningConnected Data World
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleConnected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?Connected Data World
 

More from Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 

Elegant and Scalable Code Querying with Code Property Graphs

  • 1. Elegant and Scalable Code Querying with Code Property Graphs Fabian Yamaguchi October 4, 2019, Connected Data London
  • 2. Fabian Yamaguchi Chief Scientist, ShiftLeft Inc. Github: fabsx00 Twitter: @fabsx00 Email: fabs@shiftleft.io PhD, University of Goettingen Loves code analysis, vulnerability discovery, machine learning, and graphs!
  • 3. What is Ocular? What is Joern? - Goal: provide query language to describe patterns in code - to identify bugs and vulnerabilities - to help in deeply understanding large programs - Think of it as an extensible Code Analysis Machine - Programmable in JVM-based languages (e.g., Java/Scala/Kotlin) - You can write scripts, language extensions and libraries on top of it - Joern is Ocular’s free-spirited open-source brother This talk is about the technology behind these engines
  • 4. Low level graph representations of programs - Each graph provides a different perspective on the code - Can we merge them? Abstract Syntax Trees Control flow graphs Program dependence graphs Dominator tree
  • 5. Combining graphs with “Property Graphs” - “A property graph is a directed edge-labeled, attributed multigraph” - Attributes allow data to be stored in nodes/edges - Edge labels allow different types of relations to be present in one graph
  • 6. Code Property Graph (2014) - rudimentary concept presented in
  • 7. Specification - Key Design Ideas - Specification that works over programming languages - Delay as much of the graph creation to second-stage - Provide generic representation for core programming language concepts - Methods/Functions - Types - Namespaces - Instructions - Call sites - Encode control flow structures only via a control flow graph - Model only local program properties and leave global program representations for later analysis stages
  • 8. OSS Specification for first-stage code property graph https://github.com/ShiftLeftSecurity/codepropertygraph
  • 9. A “container” for code over arbitrary instruction sets - Define only a common format for representing code - Allow arbitrary instruction set (given by semantics) as a parameter - Represent all code using only - call sites and method stubs - call edges, and control flow edges - data-flow semantics via data flow edges
  • 10. Second stage: “linking” jvm2cpg csharp2cpg llvm2cpg ... go2cpg Local analysis (language dependent) Analyzer (non)-interactive Report cpg.bin.zip Shared operations and “linking” steps cpg2scpg scpg.bin.zip First level (unlinked) CPG Second level (enhanced and linked) CPG
  • 11. Base layer of the code property graph - Production quality version of 2014 code property graph - Language-independent intermediate representation of control-flow and data-flow semantics - Interprocedural, flow-sensitive, context-sensitive, field-sensitive data-flow tracker available that operates on this representation - Heuristics and street smarts to terminate in < 10 minutes
  • 12. This is already pretty powerful...
  • 13. - Entry names of archives must be checked for control characters (e.g., “../”) to avoid path traversal - First presented in phrack in ‘91 - Reported again with shiny logo as “Zip-Slip” by startup in 2018 - https://www.youtube.com/watc h?v=Ry_yb5Oipq0 for details Typical vulnerability type associated with Zip files
  • 14. - Names of archive entries are used unchecked as path names when opening a file - Concrete example: return values of ZipEntry.getName are used as path names in a java.io.File constructor File f = new File(entry.getName()) Sufficient to model control flow, data flow, and method invocations correctly. How does this surface in Java code?
  • 15. Query ocular> val src = cpg.method.fullName(“java.util.zip.*getName.*”).methodReturn val snk = cpg.method.fullName(“.*File.<init>”).parameter.index(1) snk.reachableBy(src).flows.passesNot(“.*escape|sanitize|contains.*”)
  • 16. - “XML External Entity Processing” vulnerability - The attacker controls an XML file processed by the application and it does NOT disable resolving of external entities - If this is given, the attacker can provide an XML file that reads and sends sensitive files to the attacker as part of the parsing process - OWASP cheat sheet says: Typical XML-related vulnerability: XXE
  • 17. Query ocular> val snk = cpg.method.fullName(“.*XMLStreamReader.*next.*”) .parameter.index(0) val src = cpg.method.fullName(“.*XMLInputFactory.*create.*”).methodReturn snk.reachableBy(src).flows.l.andNot{ // No calls to setProperty cpg.method.name(“setProperty”).fullName(“.*XML.*”).callIn.l }
  • 18. … but not powerful enough.
  • 19. Higher level abstractions - Example: Microservice Program Java Platform “File” “Database” Java Platform GET /foo/bar?... {“foo” : “bar”, ...} “Route” “Handler”“request” “Json response” “query” <form action=”/foo/bar”> <input type=”hidden” name=”csrfprotect”> ... <input name=”foo” htmlEscaped=”true”> </form> “Input field” “HTML attribute” “Form” “Information flow” Humans tend to describe vulnerabilities on much higher levels of abstraction!
  • 21. Handlers/Routes (Vert.x framework) @Override public void start(Future<Void> done) { // Create a router object. Router router = Router.router(vertx); router.get("/health").handler(rc -> rc.response().end("OK")); // This is how one can do RBAC, e.g.: only admin is allowed router.get("/greeting").handler(ctx -> ctx.user().isAuthorized("booster-admin", authz -> { if (authz.succeeded() && authz.result()) { ctx.next(); } else { log.error("AuthZ failed!"); ctx.fail(403); } })); … } - Lambda expression flows into first argument of `handler` - Route (as string) flows into first argument of `get`. - Return value of `get` flows into instance parameter of `handler` (or the other way around)
  • 22. Extraction of handlers/routes val HANDLER_METHOD_NAME = "io.vertx.ext.web.Route.handler:io.vertx.ext.web.Route(io.vertx.core.Handler)" val firstParamOfHandler = cpg.method.fullNameExact(HANDLER_METHOD_NAME).parameter.index(1) val flowsOfMethodRefsIntoHandler = firstParamOfHandler.reachableBy(cpg.methodRef).flows.l val handlerNameRoutePairs = flowsOfMethodRefsIntoHandler.map { flow => val handlerName = flow.source.node.asInstanceOf[nodes.MethodRef].methodInstFullName val route = routeForFlowIntoHandler(flow)(handlerName, route) (handlerName, route) } def routeForFlowIntoHandler(flow : NewFlow) = { val getOrPostSource = cpg.method.fullName("io.vertx.*Router.*(get|post):.*").methodReturn Flow.sink.callsite .map(x => x.asInstanceOf[nodes.Call]).get .start.argument(0).reachableBy(getOrPostSource) .flows.source.l .flatMap(x => x.callsite.get.asInstanceOf[nodes.Call].start.argument(1).code.l) .headOption.getOrElse("") }
  • 23. Creating DOM Trees from JSP files in a CPG Pass class JspPass(cpg: Cpg) extends CpgPass(cpg) { override def run(): Iterator[DiffGraph] = { Queries.configFiles(cpg, ".jsp").toList.sorted.map { case (filename, html) => val diffGraph = new DiffGraph val rootCpgNode = createAndAddTreesRecursively( Jsoup.parse(html).root(), None, diffGraph) cpg.configfile.nameExact(filename).headOption.foreach( diffGraph.addEdgeFromOriginal(_, rootCpgNode, EdgeTypes.CONTAINS)) diffGraph }.toIterator } def createAndAddTreesRecursively(node: Node, maybeParent: Option[NewDomNode], diffGraph: DiffGraph): NewDomNode = { implicit val dstGraph: DiffGraph = diffGraph val attributes = node.attributes().asScala.map { attr => NewDomAttribute(attr.getKey, attr.getValue) }.toList // Add node + attribute nodes val newDomNode = NewDomNode(node.nodeName, attributes) diffGraph.addNode(newDomNode) newDomNode.start.store() // Add AST edge from parent to node maybeParent.foreach( diffGraph.addEdge(_, newDomNode, EdgeTypes.AST) ) node.childNodes.asScala.foreach( createAndAddTreesRecursively(_, Some(newDomNode), diffGraph)) newDomNode } }
  • 24. Query for XSS detection using DOM + Bytecode ocular> val src = cpg.form.method(“post”).input .filterNot(_.attribute.name(“htmlEscaped”)) .variable val snk = … // some output routine in HTML snk.reachableBy(src).flows.l Transition from HTML DOM into Java Code!
  • 25. - Literature deals a lot with FPs due to model limitations, e.g., overtainting of collections, conservative call graphs, ... - In practice, most FPs result from context information, e.g., information about the business logic, that you cannot deduce from the code alone: - “This is an internal service that only our admin uses” - “Without first convincing the authentication server, this code would never be executed” - “Due to $aliens, this integer is always 5 and thus cannot be negative” - Ability to model the $aliens part is crucial to reduce false positives - We do this mostly via passes that tag the graph Outside information, business logic, and FP reduction
  • 26. Code Property Graphs Today Base layer - low level local program representations: syntax, control flow, methods, types. Vulnerabilities Multiple domain-specific layers [...] Call graph, type hierarchy, data flows, configurations, dependencies
  • 27. Scaling static analysis - Summaries - Scaling static analysis requires “summaries” of program behavior (in order to skip duplicate calculation of facts, e.g., for library methods) - Calculating summaries for data flow is common practice - Upper layers of the CPG generalize the concept of a summary - Parallelism - Processors aren’t getting much faster, but you’re getting more and more cores. - Literature has very little to say about multiple cores, let alone multiple cloud instances - CPG passes are a design with parallelism in mind
  • 28. Designed for distributed computing - Passes can be run in a sequence like the passes of a compiler - The design also allows to run independent passes in parallel though!
  • 29. End of presentation @ShiftLeftInc @fabsx00 https://shiftleft.io https://joern.io