Join us for this 20-minute webinar to hear from Nick Johnson, Product Marketing Manager for Graph Data Science, as he explains the fundamentals of Neo4j Graph Data Science and its applications in optimizing supply chain management. Discover how leveraging graph analytics can help you identify bottlenecks, reduce costs, and streamline your supply chain operations more efficiently.
3. 3
1
What Is a
Graph?
2
The Big
Questions
3
Smarter
Fraud
Detection:
Proactively
Identify Fraud
5
Next Steps
and
Resources
4
How Banking
Circle
Increased
Fraud
Detection
300%
Agenda
5. 5
Node: Represents an entity in the
graph
Relationships (edges / links):
Connect nodes to each other
Property: Describes a node or
relationship: name, age, height, etc.
Graphs Show Data Based on Relationships
ASH
DEL
Name: Mel
Born: May 29, 1970
Twitter: @mel
Name: Ash
Born: Dec 5, 1975
CAR
Brand Volvo
Model: V70
Since:
Jan 10, 2011
LOVES
LOVES
LOVES
LIVES WITH
6. 6
Graphs Show Data Based on Relationships
Movies and People
Nodes: People and movies
Relationships: the role each
person plays related to the
movies
Property: release date, tagline,
title, etc.
9. What’s important?
9
Who has the most connections?
Who has the highest page rank?
Who is an influencer?
Use the Data You Already Have to Answer:
What’s important?
Prioritization
Listen for words like:
● Best
● Top performing
● Highest converting
● Most challenging
10. What’s unusual?
10
Where is a community forming?
What are the group dynamics?
What’s unusual about this data?
Use the Data You Already Have to Answer:
What’s unusual?
Anomaly detection
Listen for words around behavior like:
● Unusual
● Anomalous
● Strange
● Odd
● Weird
11. What’s next?
11
What’s the most common path?
Who is in the same community?
What relationship will form?
Use the Data You Already Have to Answer:
What’s next?
Predictions
Listen for words like:
● Recommend
● Optimize
● Improve
● Likely
13. 13
86%
PwC
of executives agree their company
should invest more in technology to
identify, track and measure supply
chain risk
83%
PwC
of executives say their supply chain
technology investments haven’t fully
delivered expected results
Source: PwC
Source: PwC
16. 16
The Traditional Data Science Approach
Problems
● Where are the bottlenecks?
● What’s the fastest route?
● What airports are experiencing the
most delays?
What’s important?
What’s unusual?
Traditional methods
Computationally expensive to
find the best route on a vastly
growing dataset
Forces data scientists to use
approximations on limited
historical data
Aggregate data across
multiple growing and changing
resources
17. 17
The Graph Data Science Approach
Traditional methods Neo4j Graph Data Science
Automatically identify the most
efficient route anywhere with
pathfinding algorithms
Identify bottlenecks and risks with
centrality algorithms
Supply chains naturally form a
graph with suppliers, products, and
customers
Computationally expensive to
find the best route on a vastly
growing dataset
Forces data scientists to use
approximations on limited
historical data
Aggregate data across
multiple growing and changing
resources
24. 25
Optimize maritime routes.
Results:
● Subsecond maritime routes planning
● Reduce global carbon emissions
60,000 tons
● 12-16M ROI for OrbitMI customers
“Without predictive modeling and graph analytics
from Neo4j, we couldn't have a product with this
level of value.”
Slavisa Djokic, VP of Engineering, OrbitMI
Logistics
and Supply Chain
Forward looking organizations are adopting graph analytics and graph data science to power business critical decision making. Gartner predicts that 80% of data and analytics innovations will use graph tech by 2025.
It’s a simple concept with endless enterprise and industry applications. My goal today is to help you imagine what’s possible and give you an example of a use case for understanding when graphs are a better way to solve your most challenging business problems.
So I’ll start today with a brief overview of graphs so we’re all working from the same understanding.
Next I’ll hit on the big questions. These are the three high-level questions that Graph Data excels at answering.
Then I’ll dive into graphs can support Supply Chain - what the use case is, how it can save your organization time, money, and even improve customer satisfaction.
After that, I’ll show you an example of how Orbit MI, a maritime software company power by Neo4j helps manage global fleets more efficiently, profitably, and sustainably.
Lastly we’ll conclude with a couple resources for learning more and continuing your graph data science journey.
So, what is a graph?
At its most fundamental, a graph is simply a different way of structuring data. Instead of rows and columns, like in a traditional, relational database table or dataframe, graphs use nodes (nouns) and relationships (verbs) as their primary structure. Properties describe the relationships between two nodes.
Naturally graphs can be shown as networks of people (customers, employees, partners) or transactions (products or suppliers) to name a few.
Let’s give the movie database IMDB as an example. Here we can show the relationship between people and movies (nodes) based on the role each person plays in a movie.
Those relationships are described properties like release date or title.
Next let’s look at the types of questions Graph Data Science can help us answer.
Graph Data Science particularly excels when your business question can be summed up to one of these three questions.
What’s important
What’s unusual
What’s next
Here’s what I mean:
What’s Important? (Prioritization)
There are numerous examples of decision makers trying to determine project urgency and therefore, prioritization. For example:
Marketing: What is the most important piece of content, the most important webpage, the most important call to action?
Product Teams: Where is the most friction?
Support: Which article is the most important?
Finance: Which report is most important for leadership teams?
If you’re hearing words like best, top performing, converting, or challenging your decision makers are asking you about importance
“What’s unusual really gets at suspicious or strange behavior that is out of the ordinary. Departments across the enterprise might ask their data science counterparts to identify unusual behavior such as:
IT: Where is unusual activity on my network devices?
Finance: Where is unusual activity in my accounting department?
SecOps: Where is unusual activity in my data center?
Compliance: Is there unusual activity in contract language?
Looking ahead and predicting the future is something most of us wish we could do with ease. Recommender systems are perhaps the most applicable example across every area of the business.
Predictive insights using graphs can deliver answers to these questions and more.
Marketing: What email should we send customers next?
Product Teams: What product should we build next?
Retailers: What product should we sell next?
Human Resources: What training should an employee take next?
Finance: How should we price our products next quarter?
Operations: What is the fastest path from point A to point B?
So what does problem solving with graph data science look like in practice? And why is it better at addressing real life problems better than traditional data science methods?
Let’s use the use case of Supply Chain Gridlock and Management as an example.
Supply chain management is an enormous challenge that impacts everyone, but particularly those working in manufacturing, retail, and transportation sectors. And after the fiasco of 2020 and 2021, the supply chain has been at the forefront of every global executive’s mind.
The auditing and consulting giant PwC conducted a survey on the digital trends in supply chains from more than 300 enterprise business executives and leaders.
They found that 83% of executives say their supply chain technology investments have not delivered expected results
And (click)
That 86% of executives agree their company should invest more in technology to identify, track, and measure supply chain risk.
It’s clear that the way enterprises are approaching the supply chain today is simply not adequate to best serve global customers.
For example, U.S. companies continue to maintain high prices as supply chain challenges persist from the 2020 pandemic. Ongoing bottlenecks and shortages have resulted in increased costs for businesses, which they have passed onto consumers in the form of higher prices.
(Click)
And let’s not forget about post-pandemic summer travel where we saw a surge in flight cancellations and record-high gas prices, disrupting travel plans for many Americans. Airlines struggled to cope with staff shortages and operational issues, while gas prices soared due to increased demand and limited supply, further affecting road travel.
And that makes sense because there are so many micro- and macroeconomic influences on the supply chain that enterprises need to analyze and weigh risks. For example:Labor plays a crucial role in the supply chain as the workforce responsible for the production, transportation, and distribution of goods, with shortages or disruptions potentially causing bottlenecks and delays in the overall process.
Supply and consumer demand influence the supply chain by dictating the flow of goods and services, as businesses must balance the production and distribution of goods to meet customer needs while adapting to fluctuations in demand.
Infrastructure influences the supply chain by providing the physical framework and transportation networks necessary for the efficient movement and distribution of goods, with inadequate or disrupted systems causing delays and inefficiencies.
Energy costs influence the supply chain by impacting the expenses associated with production, transportation, and storage of goods, with higher costs often leading to increased prices for consumers and potential disruptions in the overall process.
Accidents can influence the supply chain by causing unexpected disruptions, delays, or damages to goods, potentially leading to shortages, increased costs, and overall instability in the flow of products from suppliers to consumers.Natural disasters can influence the supply chain by causing severe disruptions to production, transportation, and distribution networks, leading to delays, shortages, and increased costs as businesses struggle to recover and adapt.
Local and geopolitics can influence the supply chain by affecting trade policies, regulations, and relationships between countries, which may result in disruptions, increased costs, or shifts in the sourcing and distribution of goods.
Holidays and celebrations can influence the supply chain by causing fluctuations in consumer demand, necessitating adjustments in production, inventory management, and distribution to accommodate seasonal trends and peak periods.
So how do data scientists approach supply chain gridlock?
Traditionally data scientists need to aggregate public and private data such as inventory, shipments, orders, and transportation logs that are constantly growing and evolving. Then data scientists mine this aggregated data for patterns and trends that can be turned into descriptive data like average lead times. Once these averages are understood, they can make predictions about when to order.
The problem is that this quickly creates massive datasets that are expensive to run experiments and analysis on - and - because of the changing nature of supply chains, they are forced to make predictions on approximations of limited historical data.
As you can see these things can get very complex very quickly.
(Click)
So, if we were to go back to our graphy questions that we discussed in the previous section, do you any of the questions we’re answering roll up to “What’s important? What’s unusual? Or what’s next?”Some of your questions might be:
Where are the supply chain bottlenecks?
What is the fastest route from point a to point b?
What airports are experiencing the most delays?
(click)
The questions can be summarized by what’s important (i.e. what’s the fastest route) and what’s unusual (i.e. where are the bottlenecks).
So how you do you go about using Graph Data Science to address supply chain gridlock?
(click)
Supply chain networks are just that - networks - which are a type of graph. Instead of aggregating data across multiple growing and changing resources in a traditional data structure you can map ports, suppliers, products, and customers as nodes and relationships.
Supply chains are constantly growing and evolving. Graph Data Science’s pathfinding algorithms make it to automatically find the fastest, most efficient route, even on data that is constantly changing and evolving
If you’re experiencing gridlock, centrality and pathfinding algorithms make it easy. Let me show you how…
Here you can see a map a map of airports in a country represented as a graph in Bloom, Neo4j’s low-code visualization tool. The nodes here are airports while the relationships depict an airplane route between airports. You can see that there are little clusters of airports in certain areas of the map.
To start enriching our supply chain data, we can start by running graph algorithms on the data directly from Bloom. We will start by using centrality algorithms. This family of algorithms can calculate the importance of nodes based on the structure of the graph.
One of the most popular algorithms to understand operational load is degree centrality. Nodes with high operational load have to manage larger inflows and outflows and may be forced to reconcile conflicting schedules and priorities more often. So, nodes with higher operational load tend to require more resources to run effectively.
Here are the results of applying degree centrality to the airport network in our Bloom scene. We can see that 4-5 airports really stand out here. They are
Daivsport
Shanefort
Moodtyown
Richardberg
Michaelstad
While degree centrality can tell us about operational load, it only measures the local activity associated with the node, not necessarily the control or influence the node has on the entire supply chain network.
For this we can look at another algorithm called Betweenness centrality. Technically speaking, a node’s Betweenness centrality is calculated by counting how often the node rests on the shortest paths between all the other nodes in a graph. It is generally a good metric for describing how well a node bridges different regions of the graph together.
Nodes with high Betweenness centrality have more control over the flow of material and/or product because they connect many other nodes together that may otherwise be disconnected or connected through much longer less efficient paths. So, nodes with higher flow control present higher risk for causing bottlenecks in supply chains if they encounter delays or other issues
(click)
Now we’ve applied the Betweenness centrality algorithm in Bloom. More often than not, degree centrality and betweenness will be positively correlated, however, it isn’t perfect. We see here that the ranking of top airports is a bit different, with Richardberg now having the highest score, and Shanefort being a close follow. The highest scoring degree centrality node of Davisport is in third place and has less than half the betweenness centrality of second place Shanefort.
Centrality algorithms can measure the importance of nodes in our supply chain. Another aspect we may want to consider, particularly for distribution and logistics networks, is how flows may naturally cluster into distinct well defined regions. This can be driven by geographical proximity, economic features (supply/demand), or other structural factors. This clustering strongly affects flow control and risks on local/regional levels. For example, nodes within a particular region often depend more heavily on each other. Additionally some nodes will have a stronger effect on flows within a region while others will be more instrumental in flows coming/going with different regions. In a graph, we can use community detection algorithms to find clusters of the supply chain that are densely interconnected.
To analyze whether this regional clustering exists in our supply chain network, and if so, identify and label the nodes within them, we can use the Louvain algorithm.
In this context, Louvain Community Detection finds regional interdependence within the network by identifying groups of nodes which have highly interconnected flows between them. So, nodes within the same community have a stronger interdependence on each other relative to nodes outside the community.
This is an example of the Louvain algorithm run within Bloom. Building on our centrality algorithms above, where nodes are sized based on their importance, we can color nodes based on their community membership. Nodes in each community are assigned different colors, and we can clearly see some structural patterns emerging.
You will see that Louvain found 5 large communities or “regions” within our logistics network.Masseyhaven region top left in Orange
Moodeytown region center left in purple
Davisfort region bottom left in red
Richardberg region in center right in blue
Shanefort region top right in yellow
If you can look closely you can also see that many other airports in the regions are mostly (or exclusively) connected to just one of these four central ones, making those airports even more dependent on a single other airport in the region. If you know much about airline routing, you can see that this neatly recapitulates the hub and spoke model of air transit.
Once you have this information you can start to use pathfinding algorithms like:
Dijkstra’s Source-Target Shortest Path is a classic graph algorithm for finding the shortest path between two nodes
Delta-Stepping Single-Source Shortest Path uses parallelized computation to find the shortest paths between a single source node and multiple target nodes.
A* Shortest Path is an extension of Dijkstra that uses heuristics to speed up computation. Particularly well-suited for paths between geospatial points
I’m not going to go in-depth here because it will take me a lot more time to explain all the different ways to find the fastest most efficient path using graph algorithms, but you get the idea that similar to identifying bottlenecks there are several algorithms and approaches to addressing this problem.
So you see how easy it is to improve pathfinding with Graph Data Science, let’s talk about a real word example of customers are saving time and money from a more efficient supply chain.
OrbitMI is a maritime logistics and analytics company and they use graph data science for route planning and optimization. And this is really cool because when you talk about Graph Data Science, or graph algorithms, pathfinding is where it all really started. So Orbit MI works in the global container shipping market, so it’s a $9 billion/year market, but a big problem is that poorly planned routes can waste time, so you might have ships taking too long of a route, they can waste money, you can go to expensive ports, and you can have half-full ships, that waste fuel, and other resources. There’s a lot there that can go wrong. You want to have your ships delivering cargo as often as possible, you want them full, you want them taking the cheapest and most direct route. So Orbit MI provides a software as a service platform and they actually use Neo4j Graph Data Science’s pathfinding algorithms to power their maritime route decision making algorithms. So they provide distances and costs, and internal logic like avoid canals and Neo4j can provide them with the best route. So Graph Data Science is able to plan these routes for them in under a second. Most of them render in under half a second. And Neo4j is powering their SaaS platform that customers are using to plan their paths for their ships. For customers using this platform, they’ve seen a 60% increase in productivity, because of decreased idle time and ships being fuller. That’s created a return on investment of 12 - 16 Million dollars and they’ve saved over their customers over 60,000 tons of fuel, from that reduced idle time, more efficient routes and fuller ships.
Here the nodes are ports and ships. The relationships are the distance between routes between point & b and the algorithms help you identify what’s the best path. And the impact isn’t just on the bottom line, but it’s also helping reduce global shipping’s impact on the environment.
Q: How can Graph Data Science be integrated with other tools, platforms, or machine learning models to enhance supply chain capabilities?A: That’s a really great question. Graph Data Science can fit into data stacks and data pipelines seamlessly with its native connectors to popular tools used for accessing, storing, moving, and sharing data. These tools include Apache Spark and Apache Kafka Connectors, a native BI Connector, a Data Warehouse Connector, Graph topology export, and BigQuery integration. Additionally, Graph Data Science is compatible with all major clouds, with AuraDS Enterprise now available for early access in AWS and Azure.Q: Why are there so many algorithms within one category?
A: Different algorithms may be more suitable for different types of graphs and different types of problems. It gives data scientists choice based on the characteristics of their graph and the specific requirements Some differences are:
The Yen's algorithm can find multiple shortest paths between two nodes in a single run vs running Dijkstra’s multiple times.
Shortest path isn’t best. In some cases it may be better to find multiple paths and choose the one that is the most suitable for your needs.
The algorithms don’t run the same as the number of nodes and relationships explode.For example, Dijkstra's is more efficient than Yen's when looking for a single path between two nodes.
Q: How can organizations get started with implementing Graph Data Science for supply chain
A: Getting started is all about understanding the problem you’re trying to solve, and knowing where your critical data is stored so you can transform it into nodes and relationships. We have an evaluation guide that walks you through step-by-step how to get started with your use case, and when you’re ready we have a whole host of Graph Data Science specialists who are ready to help you build out your proof of concept.Evaluation Guide: https://neo4j.com/whitepapers/graph-data-science-evaluation-checklist/