SlideShare une entreprise Scribd logo
1  sur  49
Horton: online query execution on large distributed graphs Sameh Elnikety Microsoft Research
People Team Interns Mohamed Sarwat (UMN), Jiaqing Du (EPFL)
Large Graphs Are Important Large graphs appear in many application domains Examples: social networks, knowledge representation, …
Example : Social Network Scale LinkedIn 70 million users Facebook 500 million users 65 Billion photos Queries Find Alice’s friends Find Alice’s photos with friends Rich graph Types, attributes
Objective: Management & Querying Manage and query large graphs online Today Expensive hardware (limited) Cluster of machines (ad-hoc) Tomorrow  Growth area Analogy: provide system support Analogy: DBMS manages application data
Key Design Decisions Interactive queries Graph stored in random access memory Graph too large for one machine Distributed problem Partitioning: slice graph into pieces Replication: for fault tolerance Long time to build graph Support updates
Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine  Query optimizations How to update the graph Multi-version concurrency control  E.g., Generalized Snapshot Isolation
Concurrency Control ,[object Object]
Replicas agreeOn which update transactions commit On their commit order ,[object Object],T1:  set balance = $1000 T2:  set balance = $2000 Replica1: see T1 then T2  balance = $2000 Replica2: see T2 then T1  balance = $1000
    Replication Single Graph Server Replica 1 Load  Balancer Replica 2 Replica 3
    Read Tx Replica 1 Load  Balancer Replica 2 T Replica 3
    Update Tx Replica 1 Load  Balancer Replica 2 T ws ws Replica 3
Replication Middleware Tx A Replica 1 Tx A A  B  C durability OFF A  B  C Replication MW  (global ordering) Tx B Replica 2 Tx B durability A  B  C A  B  C A  B  C durability OFF A  B  C Replica 3 Tx C Tx C A  B  C A  B  C durability OFF
Partitioning – Key Ideas Objective Maximize temporal and spatial locality Static Linear time Evolving-set partitioning Streaming algorithm During loading Dynamic Incremental Affinity propagation
Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine  Query optimizations How to update the graph Multi-version concurrency control  E.g., Generalized Snapshot Isolation
Data Model Node ID, type, attributes  Edge Connects two nodes Direction, type, attributes  App Horton
Query Language  Trade-off Expressiveness vs. efficiency Regular language reachability Query is a regular expression Sequence of node and edge predicates Example Alice’s photos Photo, tags, Alice Node: type=photo  ,  edge: type=tags  ,  node: type=person, name=Alice Result:matching paths
Query Language - Operators Projection Alice’s photos Photo, tags, Alice Node: type=photo  ,  edge: type=tags  ,  node: type=person, name=Alice SELECT  photoFROM    photo, tags, Alice OR (Photo | video), tags, Alice Kleenestar  Alice org chart Alice, (manages, person)*
Example App: CodeBook - Graph
Example App: CodeBook - Queries Person, FileOwner>, TFSFile,FileOwner<, Person Person, DiscussionOwner>, Discussion, DiscussionOwner<, Person Person, WorkItemOwner>, TFSWorkItem, WorkItemOwner< ,Person Person, Manages<, Person, Manages>, Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, Mentions>, TFSWorkItem, WorkItemOwner<,  Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, FileOwner<, Person Person, FileOwner>, TFSFile, Mentions>, TFSWorkItem, Mentions>, TFSFile,   FileOwner<, Person
Query Language - Future Extensions From: path Sequence of predicates To: sub-graph Graph pattern Sub-graph isomorphism
Talk Outline ,[object Object]
Graph query execution engine
Graph query optimizer
Initial experimental results
Demo,[object Object]
Centralized Query Execution Photo Tags Alice S2 S0 S1 S3 Alice, Tags, Photo Breadth First Search Answer Paths: Alice, Tags, Photo1Alice, Tags, Photo8
Distributed Execution Engine
Distributed Query Execution Alice, Tags, Photo, Tags, Hillary Partition 1 Partition 2
Distributed Query Execution Alice, Tags, Photo, Tags, Hillary FSM Partition 1 Partition 2 S0 Partition 1 Step 1 Alice Alice S1 Tags S2 Step 2 Photo1 Photo8 Photo S3 Tags S4 Step 3 Hillary Hillary Partition 2 S5
Graph Query Optimization
Graph Query Optimization Input Graph statistics + query plan Output Optimized query plan (execution sequence) Action Enumerate execution plans Evaluate their costs using graph statistics Find the plan with minimum cost
Graph Query Optimization Techniques Predicate ordering Derived edges Fragment reuse
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs.
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Total cost = 14
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs. Total cost = 7 Total cost = 14
How to Decide Predicate Ordering? ,[object Object]
Estimate their costs using graph statistics
Find the sequence with minimum cost,[object Object]
Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost =  1                 	[find Mike]
Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost =  1                 	[find Mike] 	+ (1* 2.2)       	[find Mike, Tagged, Photo]
Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost =  1                 	[find Mike] 	+ (1* 2.2)       	[find Mike, Tagged, Photo]                  	+ (2.2 * 1.6)   	[find Mike, Tagged, Photo, Tagged, Person] 	+ (2.2 * 1.6 * 1.2) 	[find Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike]                  	= 11
Plan Enumeration Find Mike’s photo that is also tagged by at least one of his friends Plan1 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Plan2 Mike, FriendOf, Person, Tagged, Photo, Tagged, Mike Plan3 (Mike, FriendOf, Person)   ⋈ 	 (Person, Tagged, Photo, Tagged, Mike) Plan4 (Mike, FriendOf, Person, Tagged, Photo)   ⋈   (Photo, Tagged, Mike) . . . . .
Derived Edges Friends-photo-tag Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Cost = 7 Mike, Tagged, Photo, Friends-photo-tag, Mike Cost = 5
Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike
Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute Query3 based on cached results of Query1 and Query2:   [Query1], Tagged, Person, FriendOf, Mike [Query2], Tagged, Photo, Tagged, Mike  [Query1]⋈[Photo, Tagged, Person] ⋈[Query2]

Contenu connexe

Similaire à GDM 2011 Talk

Palette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through ColorsPalette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through ColorsWei Di
 
Deep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptxDeep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptxanirudhk46
 
Pollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC PresentationPollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC PresentationAlex Yule
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And ReportingCloudera, Inc.
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014Amazon Web Services
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
Introduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationIntroduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationOlga Scrivner
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueMark Needham
 
Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018Husam Aamer
 
Tag Maps
Tag MapsTag Maps
Tag Mapsmor
 
MIT CSAIL HCI Seminar
MIT CSAIL HCI SeminarMIT CSAIL HCI Seminar
MIT CSAIL HCI Seminarmor
 
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...Dawn Foster
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning SyllabusAndres Mendez-Vazquez
 

Similaire à GDM 2011 Talk (20)

Palette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through ColorsPalette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through Colors
 
Deep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptxDeep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptx
 
Pollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC PresentationPollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC Presentation
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Slide4_Keypoints
Slide4_KeypointsSlide4_Keypoints
Slide4_Keypoints
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Introduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationIntroduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web Application
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier League
 
Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018
 
Tag Maps
Tag MapsTag Maps
Tag Maps
 
MIT CSAIL HCI Seminar
MIT CSAIL HCI SeminarMIT CSAIL HCI Seminar
MIT CSAIL HCI Seminar
 
Using Processing
Using ProcessingUsing Processing
Using Processing
 
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning Syllabus
 

Dernier

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Dernier (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

GDM 2011 Talk

  • 1. Horton: online query execution on large distributed graphs Sameh Elnikety Microsoft Research
  • 2. People Team Interns Mohamed Sarwat (UMN), Jiaqing Du (EPFL)
  • 3. Large Graphs Are Important Large graphs appear in many application domains Examples: social networks, knowledge representation, …
  • 4. Example : Social Network Scale LinkedIn 70 million users Facebook 500 million users 65 Billion photos Queries Find Alice’s friends Find Alice’s photos with friends Rich graph Types, attributes
  • 5. Objective: Management & Querying Manage and query large graphs online Today Expensive hardware (limited) Cluster of machines (ad-hoc) Tomorrow Growth area Analogy: provide system support Analogy: DBMS manages application data
  • 6. Key Design Decisions Interactive queries Graph stored in random access memory Graph too large for one machine Distributed problem Partitioning: slice graph into pieces Replication: for fault tolerance Long time to build graph Support updates
  • 7. Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine Query optimizations How to update the graph Multi-version concurrency control E.g., Generalized Snapshot Isolation
  • 8.
  • 9.
  • 10. Replication Single Graph Server Replica 1 Load Balancer Replica 2 Replica 3
  • 11. Read Tx Replica 1 Load Balancer Replica 2 T Replica 3
  • 12. Update Tx Replica 1 Load Balancer Replica 2 T ws ws Replica 3
  • 13. Replication Middleware Tx A Replica 1 Tx A A  B  C durability OFF A  B  C Replication MW (global ordering) Tx B Replica 2 Tx B durability A  B  C A  B  C A  B  C durability OFF A  B  C Replica 3 Tx C Tx C A  B  C A  B  C durability OFF
  • 14. Partitioning – Key Ideas Objective Maximize temporal and spatial locality Static Linear time Evolving-set partitioning Streaming algorithm During loading Dynamic Incremental Affinity propagation
  • 15. Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine Query optimizations How to update the graph Multi-version concurrency control E.g., Generalized Snapshot Isolation
  • 16. Data Model Node ID, type, attributes Edge Connects two nodes Direction, type, attributes App Horton
  • 17. Query Language Trade-off Expressiveness vs. efficiency Regular language reachability Query is a regular expression Sequence of node and edge predicates Example Alice’s photos Photo, tags, Alice Node: type=photo , edge: type=tags , node: type=person, name=Alice Result:matching paths
  • 18. Query Language - Operators Projection Alice’s photos Photo, tags, Alice Node: type=photo , edge: type=tags , node: type=person, name=Alice SELECT photoFROM photo, tags, Alice OR (Photo | video), tags, Alice Kleenestar Alice org chart Alice, (manages, person)*
  • 20. Example App: CodeBook - Queries Person, FileOwner>, TFSFile,FileOwner<, Person Person, DiscussionOwner>, Discussion, DiscussionOwner<, Person Person, WorkItemOwner>, TFSWorkItem, WorkItemOwner< ,Person Person, Manages<, Person, Manages>, Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, Mentions>, TFSWorkItem, WorkItemOwner<, Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, FileOwner<, Person Person, FileOwner>, TFSFile, Mentions>, TFSWorkItem, Mentions>, TFSFile,   FileOwner<, Person
  • 21. Query Language - Future Extensions From: path Sequence of predicates To: sub-graph Graph pattern Sub-graph isomorphism
  • 22.
  • 26.
  • 27. Centralized Query Execution Photo Tags Alice S2 S0 S1 S3 Alice, Tags, Photo Breadth First Search Answer Paths: Alice, Tags, Photo1Alice, Tags, Photo8
  • 29. Distributed Query Execution Alice, Tags, Photo, Tags, Hillary Partition 1 Partition 2
  • 30. Distributed Query Execution Alice, Tags, Photo, Tags, Hillary FSM Partition 1 Partition 2 S0 Partition 1 Step 1 Alice Alice S1 Tags S2 Step 2 Photo1 Photo8 Photo S3 Tags S4 Step 3 Hillary Hillary Partition 2 S5
  • 32. Graph Query Optimization Input Graph statistics + query plan Output Optimized query plan (execution sequence) Action Enumerate execution plans Evaluate their costs using graph statistics Find the plan with minimum cost
  • 33. Graph Query Optimization Techniques Predicate ordering Derived edges Fragment reuse
  • 34. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs.
  • 35. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
  • 36. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
  • 37. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
  • 38. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Total cost = 14
  • 39. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs. Total cost = 7 Total cost = 14
  • 40.
  • 41. Estimate their costs using graph statistics
  • 42.
  • 43. Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost = 1 [find Mike]
  • 44. Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost = 1 [find Mike] + (1* 2.2) [find Mike, Tagged, Photo]
  • 45. Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost = 1 [find Mike] + (1* 2.2) [find Mike, Tagged, Photo] + (2.2 * 1.6) [find Mike, Tagged, Photo, Tagged, Person] + (2.2 * 1.6 * 1.2) [find Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike] = 11
  • 46. Plan Enumeration Find Mike’s photo that is also tagged by at least one of his friends Plan1 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Plan2 Mike, FriendOf, Person, Tagged, Photo, Tagged, Mike Plan3 (Mike, FriendOf, Person) ⋈ (Person, Tagged, Photo, Tagged, Mike) Plan4 (Mike, FriendOf, Person, Tagged, Photo) ⋈ (Photo, Tagged, Mike) . . . . .
  • 47. Derived Edges Friends-photo-tag Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Cost = 7 Mike, Tagged, Photo, Friends-photo-tag, Mike Cost = 5
  • 48. Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike
  • 49. Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute Query3 based on cached results of Query1 and Query2: [Query1], Tagged, Person, FriendOf, Mike [Query2], Tagged, Photo, Tagged, Mike [Query1]⋈[Photo, Tagged, Person] ⋈[Query2]
  • 50. Summary of Query Optimization Predicate ordering Derived edges Reuse of query fragments
  • 51. Experimental Evaluation Graph Codebook application 1 Million nodes 10 million edges Machines 7 machines Intel Core 2 Duo 2.26 GHz , 4 GB ram Graph partitioned on 6 machines
  • 53.