SlideShare une entreprise Scribd logo
1  sur  49
Horton: online query execution on large distributed graphs Sameh Elnikety Microsoft Research
People Team Interns Mohamed Sarwat (UMN), Jiaqing Du (EPFL)
Large Graphs Are Important Large graphs appear in many application domains Examples: social networks, knowledge representation, …
Example : Social Network Scale LinkedIn 70 million users Facebook 500 million users 65 Billion photos Queries Find Alice’s friends Find Alice’s photos with friends Rich graph Types, attributes
Objective: Management & Querying Manage and query large graphs online Today Expensive hardware (limited) Cluster of machines (ad-hoc) Tomorrow  Growth area Analogy: provide system support Analogy: DBMS manages application data
Key Design Decisions Interactive queries Graph stored in random access memory Graph too large for one machine Distributed problem Partitioning: slice graph into pieces Replication: for fault tolerance Long time to build graph Support updates
Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine  Query optimizations How to update the graph Multi-version concurrency control  E.g., Generalized Snapshot Isolation
Concurrency Control ,[object Object]
Replicas agreeOn which update transactions commit On their commit order ,[object Object],T1:  set balance = $1000 T2:  set balance = $2000 Replica1: see T1 then T2  balance = $2000 Replica2: see T2 then T1  balance = $1000
    Replication Single Graph Server Replica 1 Load  Balancer Replica 2 Replica 3
    Read Tx Replica 1 Load  Balancer Replica 2 T Replica 3
    Update Tx Replica 1 Load  Balancer Replica 2 T ws ws Replica 3
Replication Middleware Tx A Replica 1 Tx A A  B  C durability OFF A  B  C Replication MW  (global ordering) Tx B Replica 2 Tx B durability A  B  C A  B  C A  B  C durability OFF A  B  C Replica 3 Tx C Tx C A  B  C A  B  C durability OFF
Partitioning – Key Ideas Objective Maximize temporal and spatial locality Static Linear time Evolving-set partitioning Streaming algorithm During loading Dynamic Incremental Affinity propagation
Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine  Query optimizations How to update the graph Multi-version concurrency control  E.g., Generalized Snapshot Isolation
Data Model Node ID, type, attributes  Edge Connects two nodes Direction, type, attributes  App Horton
Query Language  Trade-off Expressiveness vs. efficiency Regular language reachability Query is a regular expression Sequence of node and edge predicates Example Alice’s photos Photo, tags, Alice Node: type=photo  ,  edge: type=tags  ,  node: type=person, name=Alice Result:matching paths
Query Language - Operators Projection Alice’s photos Photo, tags, Alice Node: type=photo  ,  edge: type=tags  ,  node: type=person, name=Alice SELECT  photoFROM    photo, tags, Alice OR (Photo | video), tags, Alice Kleenestar  Alice org chart Alice, (manages, person)*
Example App: CodeBook - Graph
Example App: CodeBook - Queries Person, FileOwner>, TFSFile,FileOwner<, Person Person, DiscussionOwner>, Discussion, DiscussionOwner<, Person Person, WorkItemOwner>, TFSWorkItem, WorkItemOwner< ,Person Person, Manages<, Person, Manages>, Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, Mentions>, TFSWorkItem, WorkItemOwner<,  Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, FileOwner<, Person Person, FileOwner>, TFSFile, Mentions>, TFSWorkItem, Mentions>, TFSFile,   FileOwner<, Person
Query Language - Future Extensions From: path Sequence of predicates To: sub-graph Graph pattern Sub-graph isomorphism
Talk Outline ,[object Object]
Graph query execution engine
Graph query optimizer
Initial experimental results
Demo,[object Object]
Centralized Query Execution Photo Tags Alice S2 S0 S1 S3 Alice, Tags, Photo Breadth First Search Answer Paths: Alice, Tags, Photo1Alice, Tags, Photo8
Distributed Execution Engine
Distributed Query Execution Alice, Tags, Photo, Tags, Hillary Partition 1 Partition 2
Distributed Query Execution Alice, Tags, Photo, Tags, Hillary FSM Partition 1 Partition 2 S0 Partition 1 Step 1 Alice Alice S1 Tags S2 Step 2 Photo1 Photo8 Photo S3 Tags S4 Step 3 Hillary Hillary Partition 2 S5
Graph Query Optimization
Graph Query Optimization Input Graph statistics + query plan Output Optimized query plan (execution sequence) Action Enumerate execution plans Evaluate their costs using graph statistics Find the plan with minimum cost
Graph Query Optimization Techniques Predicate ordering Derived edges Fragment reuse
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs.
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Total cost = 14
Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs. Total cost = 7 Total cost = 14
How to Decide Predicate Ordering? ,[object Object]
Estimate their costs using graph statistics
Find the sequence with minimum cost,[object Object]
Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost =  1                 	[find Mike]
Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost =  1                 	[find Mike] 	+ (1* 2.2)       	[find Mike, Tagged, Photo]
Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost =  1                 	[find Mike] 	+ (1* 2.2)       	[find Mike, Tagged, Photo]                  	+ (2.2 * 1.6)   	[find Mike, Tagged, Photo, Tagged, Person] 	+ (2.2 * 1.6 * 1.2) 	[find Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike]                  	= 11
Plan Enumeration Find Mike’s photo that is also tagged by at least one of his friends Plan1 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Plan2 Mike, FriendOf, Person, Tagged, Photo, Tagged, Mike Plan3 (Mike, FriendOf, Person)   ⋈ 	 (Person, Tagged, Photo, Tagged, Mike) Plan4 (Mike, FriendOf, Person, Tagged, Photo)   ⋈   (Photo, Tagged, Mike) . . . . .
Derived Edges Friends-photo-tag Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Cost = 7 Mike, Tagged, Photo, Friends-photo-tag, Mike Cost = 5
Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike
Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute Query3 based on cached results of Query1 and Query2:   [Query1], Tagged, Person, FriendOf, Mike [Query2], Tagged, Photo, Tagged, Mike  [Query1]⋈[Photo, Tagged, Person] ⋈[Query2]

Contenu connexe

Similaire à GDM 2011 Talk

Palette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through ColorsPalette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through ColorsWei Di
 
Deep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptxDeep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptxanirudhk46
 
Pollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC PresentationPollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC PresentationAlex Yule
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And ReportingCloudera, Inc.
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014Amazon Web Services
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
Introduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationIntroduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationOlga Scrivner
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueMark Needham
 
Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018Husam Aamer
 
Tag Maps
Tag MapsTag Maps
Tag Mapsmor
 
MIT CSAIL HCI Seminar
MIT CSAIL HCI SeminarMIT CSAIL HCI Seminar
MIT CSAIL HCI Seminarmor
 
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...Dawn Foster
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning SyllabusAndres Mendez-Vazquez
 

Similaire à GDM 2011 Talk (20)

Palette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through ColorsPalette Power: Enabling Visual Search through Colors
Palette Power: Enabling Visual Search through Colors
 
Deep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptxDeep Dive to Learning to Rank for Graph Search.pptx
Deep Dive to Learning to Rank for Graph Search.pptx
 
Pollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC PresentationPollmaps - 2011 Esri UC Presentation
Pollmaps - 2011 Esri UC Presentation
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Slide4_Keypoints
Slide4_KeypointsSlide4_Keypoints
Slide4_Keypoints
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Introduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationIntroduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web Application
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier League
 
Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018Image recognition applications and dataset preparation - DevFest Baghdad 2018
Image recognition applications and dataset preparation - DevFest Baghdad 2018
 
Tag Maps
Tag MapsTag Maps
Tag Maps
 
MIT CSAIL HCI Seminar
MIT CSAIL HCI SeminarMIT CSAIL HCI Seminar
MIT CSAIL HCI Seminar
 
Using Processing
Using ProcessingUsing Processing
Using Processing
 
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
Network Analysis: People and Open Source Communities - LinuxCon Seattle and D...
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning Syllabus
 

Dernier

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Dernier (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

GDM 2011 Talk

  • 1. Horton: online query execution on large distributed graphs Sameh Elnikety Microsoft Research
  • 2. People Team Interns Mohamed Sarwat (UMN), Jiaqing Du (EPFL)
  • 3. Large Graphs Are Important Large graphs appear in many application domains Examples: social networks, knowledge representation, …
  • 4. Example : Social Network Scale LinkedIn 70 million users Facebook 500 million users 65 Billion photos Queries Find Alice’s friends Find Alice’s photos with friends Rich graph Types, attributes
  • 5. Objective: Management & Querying Manage and query large graphs online Today Expensive hardware (limited) Cluster of machines (ad-hoc) Tomorrow Growth area Analogy: provide system support Analogy: DBMS manages application data
  • 6. Key Design Decisions Interactive queries Graph stored in random access memory Graph too large for one machine Distributed problem Partitioning: slice graph into pieces Replication: for fault tolerance Long time to build graph Support updates
  • 7. Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine Query optimizations How to update the graph Multi-version concurrency control E.g., Generalized Snapshot Isolation
  • 8.
  • 9.
  • 10. Replication Single Graph Server Replica 1 Load Balancer Replica 2 Replica 3
  • 11. Read Tx Replica 1 Load Balancer Replica 2 T Replica 3
  • 12. Update Tx Replica 1 Load Balancer Replica 2 T ws ws Replica 3
  • 13. Replication Middleware Tx A Replica 1 Tx A A  B  C durability OFF A  B  C Replication MW (global ordering) Tx B Replica 2 Tx B durability A  B  C A  B  C A  B  C durability OFF A  B  C Replica 3 Tx C Tx C A  B  C A  B  C durability OFF
  • 14. Partitioning – Key Ideas Objective Maximize temporal and spatial locality Static Linear time Evolving-set partitioning Streaming algorithm During loading Dynamic Incremental Affinity propagation
  • 15. Three Research Problems How to partition the graph Static and dynamic techniques E.g., Evolving set partitioning How to query the graph Data model & query language Execution engine Query optimizations How to update the graph Multi-version concurrency control E.g., Generalized Snapshot Isolation
  • 16. Data Model Node ID, type, attributes Edge Connects two nodes Direction, type, attributes App Horton
  • 17. Query Language Trade-off Expressiveness vs. efficiency Regular language reachability Query is a regular expression Sequence of node and edge predicates Example Alice’s photos Photo, tags, Alice Node: type=photo , edge: type=tags , node: type=person, name=Alice Result:matching paths
  • 18. Query Language - Operators Projection Alice’s photos Photo, tags, Alice Node: type=photo , edge: type=tags , node: type=person, name=Alice SELECT photoFROM photo, tags, Alice OR (Photo | video), tags, Alice Kleenestar Alice org chart Alice, (manages, person)*
  • 20. Example App: CodeBook - Queries Person, FileOwner>, TFSFile,FileOwner<, Person Person, DiscussionOwner>, Discussion, DiscussionOwner<, Person Person, WorkItemOwner>, TFSWorkItem, WorkItemOwner< ,Person Person, Manages<, Person, Manages>, Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, Mentions>, TFSWorkItem, WorkItemOwner<, Person Person, WorkItemOwner>, TFSWorkItem, Mentions>, TFSFile, FileOwner<, Person Person, FileOwner>, TFSFile, Mentions>, TFSWorkItem, Mentions>, TFSFile,   FileOwner<, Person
  • 21. Query Language - Future Extensions From: path Sequence of predicates To: sub-graph Graph pattern Sub-graph isomorphism
  • 22.
  • 26.
  • 27. Centralized Query Execution Photo Tags Alice S2 S0 S1 S3 Alice, Tags, Photo Breadth First Search Answer Paths: Alice, Tags, Photo1Alice, Tags, Photo8
  • 29. Distributed Query Execution Alice, Tags, Photo, Tags, Hillary Partition 1 Partition 2
  • 30. Distributed Query Execution Alice, Tags, Photo, Tags, Hillary FSM Partition 1 Partition 2 S0 Partition 1 Step 1 Alice Alice S1 Tags S2 Step 2 Photo1 Photo8 Photo S3 Tags S4 Step 3 Hillary Hillary Partition 2 S5
  • 32. Graph Query Optimization Input Graph statistics + query plan Output Optimized query plan (execution sequence) Action Enumerate execution plans Evaluate their costs using graph statistics Find the plan with minimum cost
  • 33. Graph Query Optimization Techniques Predicate ordering Derived edges Fragment reuse
  • 34. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs.
  • 35. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
  • 36. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
  • 37. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right
  • 38. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Total cost = 14
  • 39. Predicate Ordering Find Mike’s photo that is also tagged by at least one of his friends Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute left to right Execute right to left Different predicate orders can result in different execution costs. Total cost = 7 Total cost = 14
  • 40.
  • 41. Estimate their costs using graph statistics
  • 42.
  • 43. Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost = 1 [find Mike]
  • 44. Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost = 1 [find Mike] + (1* 2.2) [find Mike, Tagged, Photo]
  • 45. Cost Estimation using Graph Statistics Graph Statistics Left to right Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike EstimatedCost = 1 [find Mike] + (1* 2.2) [find Mike, Tagged, Photo] + (2.2 * 1.6) [find Mike, Tagged, Photo, Tagged, Person] + (2.2 * 1.6 * 1.2) [find Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike] = 11
  • 46. Plan Enumeration Find Mike’s photo that is also tagged by at least one of his friends Plan1 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Plan2 Mike, FriendOf, Person, Tagged, Photo, Tagged, Mike Plan3 (Mike, FriendOf, Person) ⋈ (Person, Tagged, Photo, Tagged, Mike) Plan4 (Mike, FriendOf, Person, Tagged, Photo) ⋈ (Photo, Tagged, Mike) . . . . .
  • 47. Derived Edges Friends-photo-tag Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Cost = 7 Mike, Tagged, Photo, Friends-photo-tag, Mike Cost = 5
  • 48. Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike
  • 49. Query Fragment Reuse Query1 Mike, Tagged, Photo Query2 Person, FriendOf, Mike Query3 Mike, Tagged, Photo, Tagged, Person, FriendOf, Mike Execute Query3 based on cached results of Query1 and Query2: [Query1], Tagged, Person, FriendOf, Mike [Query2], Tagged, Photo, Tagged, Mike [Query1]⋈[Photo, Tagged, Person] ⋈[Query2]
  • 50. Summary of Query Optimization Predicate ordering Derived edges Reuse of query fragments
  • 51. Experimental Evaluation Graph Codebook application 1 Million nodes 10 million edges Machines 7 machines Intel Core 2 Duo 2.26 GHz , 4 GB ram Graph partitioned on 6 machines
  • 53.