SlideShare une entreprise Scribd logo
1  sur  37
Graph-Based
Stream Processing
This is Katarina
● Born at the Croatian coast, in the town of Šibenik
● Master’s in Mathematics and Computer science from
the Faculty of Science in Zagreb
● I work as a Developer Relations Engineer at Memgraph
● I love to travel, cook and eat tasty food (check out my
Instagram)
This is Ivan
● Born and living in Croatia
● Master’s in Computer science from the Faculty of
Electrical Engineering and Computing in Zagreb
● I work as a Developer Relations Engineer at
Memgraph
● I love to travel, cook and watch Netflix too much
The graph data model
When to use a graph database?
What are graphs?
A graph is a network structure that consists
of a set of nodes (vertices) and a set of
relationships (edges) connecting them.
● Nodes - structures that represent
entities
● Relationships - connections between
these entities
● Properties - associated values
(key-value pairs) belonging to either
nodes or relationships
Labeled property graph model
Graph database vs relational database
Cypher query language
Cypher is the most widely adopted, fully-specified, and open query language for property graph
databases. It provides an intuitive way to work with property graphs.
Cypher contains:
- clauses such as MATCH, DELETE, SET, RETURN…
- functions such as round(), cos(), toString()...
- custom procedures written in Python, C/C++ and Rust
Cypher query language
MATCH (u:Customer{customer_id:'customer-one'})-[:BOUGHT]->
(p:Product)<-[:BOUGHT]-(peer:Customer)-[:BOUGHT]->
(reco:Product)
WHERE not (u)-[:BOUGHT]->(reco)
RETURN reco as Recommendation, count(*) as Frequency ORDER
BY Frequency DESC LIMIT 5;
SQL
SELECT product.product_name as Recommendation, count(1) as
Frequency
FROM product, customer_product_mapping, (SELECT
cpm3.product_id, cpm3.customer_id
FROM Customer_product_mapping cpm,
Customer_product_mapping cpm2, Customer_product_mapping
cpm3
WHERE cpm.customer_id = ‘customer-one’
and cpm.product_id = cpm2.product_id
and cpm2.customer_id != ‘customer-one’
and cpm3.customer_id = cpm2.customer_id
and cpm3.product_id not in (select distinct product_id
FROM Customer_product_mapping cpm
WHERE cpm.customer_id = ‘customer-one’)
) recommended_products
WHERE customer_product_mapping.product_id =
product.product_id
and customer_product_mapping.product_id in
recommended_products.product_id
and customer_product_mapping.customer_id =
recommended_products.customer_id
GROUP BY product.product_name
ORDER BY Frequency desc
Graph analytics in action
When to use graph analytics?
Graph analytics
Graph analytics, also called network analysis,
generates insights hidden in the relationships of
the network structure.
Some common algorithms include:
● Clustering & community detection
● Connected components
● PageRank
● Shortest path
● BFS & DFS
Graph analytics in action
Google was built on the
PageRank algorithm measuring
the importance of web pages.
Facebook’s social graph uses
Community Detection to infer unknown
data about their users based on similar
network behavior of other users to
power their ad-targeting engine.
Amazon uses Collaborative Filtering
to deliver high quality real-time
product recommendations.
Pinterest uses Random Walks and
Graph-Machine Learning to deliver
high-quality personalized
recommendations responsible for more
than 80% of all user engagements.
“Graph-Machine Learning
features proved the most
valuable of all other features
when determining the quality and
relevancy of our dish and
restaurant recommendations.”
Social networks
Recommendation engines
Supply chain management
Fraud detection
Graph stream processing
Analyze your data in real-time instead of in batches
Graph stream processing
Graph stream processing
● Stream processing is a type of big data
architecture in which the data is analyzed
in real-time
● It is used whenever you need to extract
additional information from you data as
it’s being consumed
● The processing happens asynchronously -
the data source and the stream processing
work independently without waiting for a
response from one another
Memgraph
Memgraph is a platform for graph computation
on streaming data powered by an in-memory
graph database.
Memgraph is used to:
● Store graph data in memory
● Run graph analytics
● Analyze streaming data
GQLAlchemy
GQLAlchemy is a fully open-source Python
library. It is an Object Graph Mapper (OGM) - a
link between Graph Database objects and Python
objects.
GQLAlchemy includes:
● OGM capabilities
● Query builder
● On-disk storage
● Graph schema validation Jupyter Notebook Demo:
● https://github.com/memgraph/jupyter-memgrap
h-tutorials/tree/main/twitter_network_analysis
The stream processing architecture
What do you need for stream processing?
Stream processing architecture
Demo time!
The Twitter dataset
Analyzing a Twitter network
We are going to find the most christmassy
person on Twitter using the hashtag #christmas
by analyzing retweets of the original tweet:
● Dynamic PageRank algorithm is used to find
the most influential account
● Dynamic community detection algorithm is
used to figure out which accounts belong to
the same community
Graph stream processing with Memgraph
1. Transformation module
Transforming Kafka messages into graph objects:
2. PageRank
Set up everything and run the dynamic PageRank algorithm:
3. Create and start the stream
4. Create trigger
Stream processing in Memgraph
1. Create a transformation module in Python
2. Run the PageRank and community detection algorithms with GQLAlchemy
3. Create and start the stream with GQLAlchemy
4. Create a trigger and module to send PageRank results back to a Kafka topic
Want to play with the Twitter dataset?
Check out the Memgraph Playground:
https://playground.memgraph.com/sandbox/twitter-christmas-retweets
Thank you for your attention!
www.memgraph.io
If you like what we do,
throw us a star! ⭐
https://github.com/memgraph/memgraph
Check out the Twitter network
analysis repository!
https://github.com/memgraph/twitter-network-analysis
www.memgraph.io
Join our Discord server!
memgr.ph/discord
supe_katarina
katarina.supe@memgraph.io
ivan_g_despot
ivan.despot@memgraph.io

Contenu connexe

Tendances

Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
Tim Vaillancourt
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 

Tendances (20)

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
kafka
kafkakafka
kafka
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 

Similaire à [Apache Kafka® Meetup by Confluent] Graph-based stream processing

Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Databricks
 

Similaire à [Apache Kafka® Meetup by Confluent] Graph-based stream processing (20)

GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Graph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesGraph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph frames
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
 
Odsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graphOdsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graph
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
Mohan C R CV
Mohan C R CVMohan C R CV
Mohan C R CV
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting Started
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)
 

Plus de confluent

Plus de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

[Apache Kafka® Meetup by Confluent] Graph-based stream processing

  • 2. This is Katarina ● Born at the Croatian coast, in the town of Šibenik ● Master’s in Mathematics and Computer science from the Faculty of Science in Zagreb ● I work as a Developer Relations Engineer at Memgraph ● I love to travel, cook and eat tasty food (check out my Instagram)
  • 3. This is Ivan ● Born and living in Croatia ● Master’s in Computer science from the Faculty of Electrical Engineering and Computing in Zagreb ● I work as a Developer Relations Engineer at Memgraph ● I love to travel, cook and watch Netflix too much
  • 4. The graph data model When to use a graph database?
  • 5. What are graphs? A graph is a network structure that consists of a set of nodes (vertices) and a set of relationships (edges) connecting them. ● Nodes - structures that represent entities ● Relationships - connections between these entities ● Properties - associated values (key-value pairs) belonging to either nodes or relationships Labeled property graph model
  • 6. Graph database vs relational database
  • 7. Cypher query language Cypher is the most widely adopted, fully-specified, and open query language for property graph databases. It provides an intuitive way to work with property graphs. Cypher contains: - clauses such as MATCH, DELETE, SET, RETURN… - functions such as round(), cos(), toString()... - custom procedures written in Python, C/C++ and Rust
  • 8. Cypher query language MATCH (u:Customer{customer_id:'customer-one'})-[:BOUGHT]-> (p:Product)<-[:BOUGHT]-(peer:Customer)-[:BOUGHT]-> (reco:Product) WHERE not (u)-[:BOUGHT]->(reco) RETURN reco as Recommendation, count(*) as Frequency ORDER BY Frequency DESC LIMIT 5; SQL SELECT product.product_name as Recommendation, count(1) as Frequency FROM product, customer_product_mapping, (SELECT cpm3.product_id, cpm3.customer_id FROM Customer_product_mapping cpm, Customer_product_mapping cpm2, Customer_product_mapping cpm3 WHERE cpm.customer_id = ‘customer-one’ and cpm.product_id = cpm2.product_id and cpm2.customer_id != ‘customer-one’ and cpm3.customer_id = cpm2.customer_id and cpm3.product_id not in (select distinct product_id FROM Customer_product_mapping cpm WHERE cpm.customer_id = ‘customer-one’) ) recommended_products WHERE customer_product_mapping.product_id = product.product_id and customer_product_mapping.product_id in recommended_products.product_id and customer_product_mapping.customer_id = recommended_products.customer_id GROUP BY product.product_name ORDER BY Frequency desc
  • 9. Graph analytics in action When to use graph analytics?
  • 10. Graph analytics Graph analytics, also called network analysis, generates insights hidden in the relationships of the network structure. Some common algorithms include: ● Clustering & community detection ● Connected components ● PageRank ● Shortest path ● BFS & DFS
  • 11. Graph analytics in action Google was built on the PageRank algorithm measuring the importance of web pages. Facebook’s social graph uses Community Detection to infer unknown data about their users based on similar network behavior of other users to power their ad-targeting engine. Amazon uses Collaborative Filtering to deliver high quality real-time product recommendations. Pinterest uses Random Walks and Graph-Machine Learning to deliver high-quality personalized recommendations responsible for more than 80% of all user engagements. “Graph-Machine Learning features proved the most valuable of all other features when determining the quality and relevancy of our dish and restaurant recommendations.”
  • 13.
  • 17. Graph stream processing Analyze your data in real-time instead of in batches
  • 19. Graph stream processing ● Stream processing is a type of big data architecture in which the data is analyzed in real-time ● It is used whenever you need to extract additional information from you data as it’s being consumed ● The processing happens asynchronously - the data source and the stream processing work independently without waiting for a response from one another
  • 20. Memgraph Memgraph is a platform for graph computation on streaming data powered by an in-memory graph database. Memgraph is used to: ● Store graph data in memory ● Run graph analytics ● Analyze streaming data
  • 21. GQLAlchemy GQLAlchemy is a fully open-source Python library. It is an Object Graph Mapper (OGM) - a link between Graph Database objects and Python objects. GQLAlchemy includes: ● OGM capabilities ● Query builder ● On-disk storage ● Graph schema validation Jupyter Notebook Demo: ● https://github.com/memgraph/jupyter-memgrap h-tutorials/tree/main/twitter_network_analysis
  • 22. The stream processing architecture What do you need for stream processing?
  • 26. Analyzing a Twitter network We are going to find the most christmassy person on Twitter using the hashtag #christmas by analyzing retweets of the original tweet: ● Dynamic PageRank algorithm is used to find the most influential account ● Dynamic community detection algorithm is used to figure out which accounts belong to the same community
  • 27. Graph stream processing with Memgraph
  • 28. 1. Transformation module Transforming Kafka messages into graph objects:
  • 29. 2. PageRank Set up everything and run the dynamic PageRank algorithm:
  • 30. 3. Create and start the stream
  • 32. Stream processing in Memgraph 1. Create a transformation module in Python 2. Run the PageRank and community detection algorithms with GQLAlchemy 3. Create and start the stream with GQLAlchemy 4. Create a trigger and module to send PageRank results back to a Kafka topic
  • 33.
  • 34. Want to play with the Twitter dataset? Check out the Memgraph Playground: https://playground.memgraph.com/sandbox/twitter-christmas-retweets
  • 35. Thank you for your attention!
  • 36. www.memgraph.io If you like what we do, throw us a star! ⭐ https://github.com/memgraph/memgraph Check out the Twitter network analysis repository! https://github.com/memgraph/twitter-network-analysis
  • 37. www.memgraph.io Join our Discord server! memgr.ph/discord supe_katarina katarina.supe@memgraph.io ivan_g_despot ivan.despot@memgraph.io