SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
The LDBC
Social Network Benchmark
Interactive Workload
http://www.ldbcouncil.org
The LDBC
Social Network Benchmark
Interactive Workload
http://www.ldbcouncil.org
The LDBC
Social Network Benchmark
Interactive Workload
http://www.ldbcouncil.org
The LDBC
Social Network Benchmark
Interactive Workload
http://www.ldbcouncil.org
The LDBC
Social Network Benchmark
Interactive Workload
http://www.ldbcouncil.org
The LDBC
Social Network Benchmark
Interactive Workload
Thomas Neumann, Linnea Passing
Renzo Angles (U. Talca)
Norbert Martinez
David Dominguez
Xavier Sanchez
SNB “Task Force” acknowledgements
http://www.ldbcouncil.org
LDBC Organization (non-profit)
“sponsors”
+ non-profit members (FORTH, STI2) & personal members
+ Task Forces, volunteers developing benchmarks
+ TUC: Technical User Community (6 workshops, 36 graph
and RDF user case studies, 12 vendor presentations)
Social Network Benchmark: schema
Social Network Benchmark: schema
Social Network Benchmark: schema
Social Network Benchmark: schema
Social Network Benchmark: schema
DATAGEN: social network generator
advanced generation of:
• network structure
– Power law distributions, small diameter
Friendship Degree Distribution
• Based on “Anatomy of Facebook” blogpost (2013)
• Diameter increases logarithmically with scale factor
– New:
function has been
made pluggable
DATAGEN: social network generator
advanced generation of:
• network structure
– Power law distributions, small diameter
• property values
– realistic, correlated value distributions
Realistic Correlated Value Distributions
• Person.firstname correlates
with Person.location
– Values taken from DBpedia
• Many other correlations
and dependencies..
– e.g. university depends on location
• In forum discussions, people read DBpedia articles to each other
(= correlation between message text and discussion topic)
– Topic = DBpedia article title
– Text = one sentence of the article
Person.location
=<Germany>
Person.location
=<China>
DATAGEN: social network generator
advanced generation of:
• network structure
– Power law distributions, small diameter
• property values
– realistic, correlated value distributions
– temporal correlations / “flash mobs”
Temporal Effects (Flash Mobs)
• Forum posts generation spikes in time for
certain topics:
DATAGEN: social network generator
advanced generation of:
• network structure
– Power law distributions, small diameter
• property values
– realistic, correlated value distributions
– temporal correlations / “flash mobs”
• correlations between values and structure
– 2 correlation “dimensions”: location & interests
“RMIT alumni stay in touch”
P4
P5
Student
“Anna”
<RMIT>
“Melbourne”
“1990”
P1
<RMIT>
“Laura”
“1990”
<AC/DC>
<AC/DC>
P3
<RMIT>
“1990”
P2
<CWI>
“Netherlands”
“Metalheads rock together”
DATAGEN: Scaling
• Scale Factor (SF) is the size of the CSV input data in GB
• Some Virtuoso SQL stats at SF=30:
DATAGEN: Graph Characteristics
Livejournal LFR3 (synthetic) LDBC DATAGEN
GRADES2014 “How community-like is the structure of synthetically generated
graphs” - Arnau Prat(DAMA-UPC); David Domínguez-Sal (Sparsity Technologies)
Benchmark Workloads
• Interactive: tests throughput running short queries
while consistently handling concurrent updates
– Show all photos posted by my friends that I was tagged in
• Business Intelligence: consists of complex structured
queries for analyzing online behavior
– Influential people the topic of open source development?
• Graph Analytics: tests the functionality and scalability
on most of the data as a single operation
– PageRank, Shortest Path(s), Community Detection
GRADES2015 “Graphalytics: A Big Data Benchmark for Graph-Processing
Platforms ” - Mihai Capota, Tim Hegeman, Alexandru Iosup(TU Delft); Arnau
Prat (UPC), Orri Erling (OpenLink Technologies), Peter Boncz (CWI)
Topic of this SIGMOD paper
Draft queries available on ldbcouncil.org website (deliverable D2.2.4) & github
Workloads by system
System Interactive Business Intelligence Graph Analytics
Graph databases Yes Yes Maybe
Graph programming
frameworks
- Yes Yes
RDF databases Yes Yes -
Relational databases Yes Yes
Maybe, by keeping
state in temporary
tables, and using the
functional features of
PL-SQL
NoSQL Key-value Maybe Maybe -
NoSQL MapReduce - Maybe Yes
Interactive Workload
MapReduce-base data generation
• Generate 3 years of network activity for a certain amount of persons
– 33 months of data  bulk load
– 3 months of data  insert queries
• Scalable (SF1000 in one hour on 10 small compute nodes)
– can also be used without a cluster (pseudo-distributed)
During data generation, we perform Parameter Curation to derive suitable
parameters for the complex-read-only query set
Database Benchmark Design
Desirable properties:
• Relevant.
• Representative.
• Understandable.
• Economical.
• Accepted.
• Scalable.
• Portable.
• Fair.
• Evolvable.
• Public.
Jim Gray (1991) The Benchmark Handbook for Database
and Transaction Processing Systems
Dina Bitton, David J. DeWitt, Carolyn Turbyfill (1993)
Benchmarking Database Systems: A Systematic Approach
Multiple TPCTC papers, e.g.
Karl Huppler (2009) The Art of Building a Good Benchmark
 “Choke Points”
Benchmark Design with Choke Points
Choke-Point = well-chosen difficulty in the workload
• “difficulties in the workloads”
– arise from Data (distribs)+Query+Workload
– there may be different technical solutions to
address the choke point
• or, there may not yet exist optimizations
lot’s of research opportunities!
TPCTC 2013: Erling,Boncz,Neumann www.cwi.nl/~boncz  Publications
“TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark”
SNB Interactive Workload: Complex Read Query Set
Choke-Point: shortest paths
• compute weights over a recursive forum traversal
– on the fly, or
– materialized, but then maintain them under updates
• compute shortest paths using these weights in the
friends graph
SNB Interactive Workload: Complex Read Query Set
Choke-Point: outdegree correlation
• Travel is correlated with location
– People travel more often to nearby countries
• Outdegree after (countryX,countryY) selection varies a lot
– (Australia,NZ): high outdegree (“join hit ratio”)
– (Australia,Belgium): low outdegree  different query plan,
or navigation strategy likely wins
Parameter Curation
• Example: Q3
– Problem: value correlations cause very large variance
– Solution: data mine for stable parameter equivalence classes
TPCTC2014 “Parameter Curation for Benchmark
Queries” Andrey Gubichev (TUM) & Peter Boncz (CWI)
Query Mix & Metric
Query Mix
• Insert queries (~10% of time):
 challenge: execute parallel but respect data dependencies in the graph
• Read-only Complex Queries (~50% of time)
 challenge: generate query parameters with stable query behavior
Parameter Curation to find “equivalence classes” in parameters
• Simple Read-only Queries (~40% of time)
– Retrieve Post / Retrieve Person Profile
Metric
• Acceleration Factor (AF) that can be sustained (+ AF/$ weighted by cost)
– with 99th percentile of query latency within maximal query time
SNB Query Driver
• Dependency-aware parallel query generation
– Problem: friends graph is non-partitionable, but imposes
ordering constraints.
Could cause large checking overhead, impeding driver
parallelism.
– Solution: Window-based checking approach for keeping driver
threads roughly synchronized on a global timestamp.
Is helped by DATAGEN properties that ensure there is a minimal
latency between certain dependencies (e.g. entering the network
and making friends, or posting on a new friend’s forum). This
minimal latency provides synchronization headroom.
Summary
• LDBC
– Graph and RDF benchmark council
– Choke-point driven benchmark design (user+system expert involvement)
• Social Network Benchmark
– Advanced social network generator
• skewed distributions, power laws, value/structure correlations, flash mobs
– 3 workloads: Interactive (focus of this paper), BI, Analytics
• Interactive Query Mix & Metrics
• Parallel Query Driver that respects dependencies efficiently
• Parameter Curation for stable results
7th LDBC Technical User Community meeting
November 9+10 2015, IBM TJ Watson (NJ)
Questions?
http://www.ldbcouncil.org http://github.com/ldbc
Blogs
Specifications
Early Result FDRs
Videos of TUC talks
Developer info
Code, Issue Tracking

Contenu connexe

En vedette

FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
Aloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentAloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentMilind Gokhale
 
Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.
Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.
Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.Marcelo Vásconez Carrasco
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Plan de desarrollo itagui 2008 2011
Plan de desarrollo itagui 2008 2011Plan de desarrollo itagui 2008 2011
Plan de desarrollo itagui 2008 2011edgarmesa
 
Jobcast sales presentation
Jobcast sales presentationJobcast sales presentation
Jobcast sales presentationSam Parker
 
Joseph Blatter, FIFA re: Emirates Sponsorship
Joseph Blatter, FIFA re: Emirates SponsorshipJoseph Blatter, FIFA re: Emirates Sponsorship
Joseph Blatter, FIFA re: Emirates SponsorshipCityHallLabour
 
MOPA Formation Boutique 2014, Bordeaux
MOPA Formation Boutique 2014, BordeauxMOPA Formation Boutique 2014, Bordeaux
MOPA Formation Boutique 2014, BordeauxMONA
 
Apresentação Institucional Mgi Tecnogin
Apresentação Institucional Mgi TecnoginApresentação Institucional Mgi Tecnogin
Apresentação Institucional Mgi TecnoginGabriella Portilho
 
Plannet Esolutions Limited Profile.
Plannet Esolutions Limited Profile.Plannet Esolutions Limited Profile.
Plannet Esolutions Limited Profile.Divyansh Batra
 
Análisis cómic actividad 4
Análisis cómic actividad 4Análisis cómic actividad 4
Análisis cómic actividad 4Jaime Sánchez
 
Jornades de Prevenció d'Accidents "in itinere"
Jornades de Prevenció d'Accidents "in itinere"Jornades de Prevenció d'Accidents "in itinere"
Jornades de Prevenció d'Accidents "in itinere"TMB
 
Organizational Road maps for Institutional Online Learning production and del...
Organizational Road maps for Institutional Online Learning production and del...Organizational Road maps for Institutional Online Learning production and del...
Organizational Road maps for Institutional Online Learning production and del...Dr. Patricio Montesinos
 
Computacion Afectiva, Aplicacion Educativa para TVDI - Sandra Baldassarri
Computacion Afectiva, Aplicacion Educativa para TVDI - Sandra BaldassarriComputacion Afectiva, Aplicacion Educativa para TVDI - Sandra Baldassarri
Computacion Afectiva, Aplicacion Educativa para TVDI - Sandra BaldassarriRed Auti
 

En vedette (20)

FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Aloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentAloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design Document
 
Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.
Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.
Francisco José Contreras Peláez, Defensa del Estado social. Resumen y reseña.
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Pro plastic 100_rs
Pro plastic 100_rsPro plastic 100_rs
Pro plastic 100_rs
 
Plan de desarrollo itagui 2008 2011
Plan de desarrollo itagui 2008 2011Plan de desarrollo itagui 2008 2011
Plan de desarrollo itagui 2008 2011
 
torres famosas
torres famosas torres famosas
torres famosas
 
Jobcast sales presentation
Jobcast sales presentationJobcast sales presentation
Jobcast sales presentation
 
Joseph Blatter, FIFA re: Emirates Sponsorship
Joseph Blatter, FIFA re: Emirates SponsorshipJoseph Blatter, FIFA re: Emirates Sponsorship
Joseph Blatter, FIFA re: Emirates Sponsorship
 
Seagro informa 02 2015 SEAGRO abre sus puertas en el Corazón de Honduras
Seagro informa 02 2015 SEAGRO abre sus puertas en el Corazón de HondurasSeagro informa 02 2015 SEAGRO abre sus puertas en el Corazón de Honduras
Seagro informa 02 2015 SEAGRO abre sus puertas en el Corazón de Honduras
 
Examenes de eso
Examenes de esoExamenes de eso
Examenes de eso
 
MOPA Formation Boutique 2014, Bordeaux
MOPA Formation Boutique 2014, BordeauxMOPA Formation Boutique 2014, Bordeaux
MOPA Formation Boutique 2014, Bordeaux
 
Apresentação Institucional Mgi Tecnogin
Apresentação Institucional Mgi TecnoginApresentação Institucional Mgi Tecnogin
Apresentação Institucional Mgi Tecnogin
 
Valoracion del liceo jarv
Valoracion del liceo jarvValoracion del liceo jarv
Valoracion del liceo jarv
 
Plannet Esolutions Limited Profile.
Plannet Esolutions Limited Profile.Plannet Esolutions Limited Profile.
Plannet Esolutions Limited Profile.
 
Análisis cómic actividad 4
Análisis cómic actividad 4Análisis cómic actividad 4
Análisis cómic actividad 4
 
Sketchnote o Dibupunte
Sketchnote o DibupunteSketchnote o Dibupunte
Sketchnote o Dibupunte
 
Jornades de Prevenció d'Accidents "in itinere"
Jornades de Prevenció d'Accidents "in itinere"Jornades de Prevenció d'Accidents "in itinere"
Jornades de Prevenció d'Accidents "in itinere"
 
Organizational Road maps for Institutional Online Learning production and del...
Organizational Road maps for Institutional Online Learning production and del...Organizational Road maps for Institutional Online Learning production and del...
Organizational Road maps for Institutional Online Learning production and del...
 
Computacion Afectiva, Aplicacion Educativa para TVDI - Sandra Baldassarri
Computacion Afectiva, Aplicacion Educativa para TVDI - Sandra BaldassarriComputacion Afectiva, Aplicacion Educativa para TVDI - Sandra Baldassarri
Computacion Afectiva, Aplicacion Educativa para TVDI - Sandra Baldassarri
 

Similaire à The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015

LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczLDBC council
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015loydbakerjr
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsNeo4j
 
Southwickc lampert lodlam_training
Southwickc lampert lodlam_trainingSouthwickc lampert lodlam_training
Southwickc lampert lodlam_trainingssouthwick
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
Overview of the TREC 2019 Deep Learning Track
Overview of the TREC 2019 Deep Learning TrackOverview of the TREC 2019 Deep Learning Track
Overview of the TREC 2019 Deep Learning TrackNick Craswell
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxNurulIzrin
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I pkaviya
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 

Similaire à The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015 (20)

LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter Boncz
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
Southwickc lampert lodlam_training
Southwickc lampert lodlam_trainingSouthwickc lampert lodlam_training
Southwickc lampert lodlam_training
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Overview of the TREC 2019 Deep Learning Track
Overview of the TREC 2019 Deep Learning TrackOverview of the TREC 2019 Deep Learning Track
Overview of the TREC 2019 Deep Learning Track
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 

Plus de Ioan Toma

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczIoan Toma
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXIoan Toma
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionIoan Toma
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...Ioan Toma
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingIoan Toma
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadIoan Toma
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesIoan Toma
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsIoan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesIoan Toma
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in productionIoan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphIoan Toma
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)Ioan Toma
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...Ioan Toma
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolutionIoan Toma
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyIoan Toma
 

Plus de Ioan Toma (16)

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
 

Dernier

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015

  • 1. The LDBC Social Network Benchmark Interactive Workload http://www.ldbcouncil.org
  • 2. The LDBC Social Network Benchmark Interactive Workload http://www.ldbcouncil.org
  • 3. The LDBC Social Network Benchmark Interactive Workload http://www.ldbcouncil.org
  • 4. The LDBC Social Network Benchmark Interactive Workload http://www.ldbcouncil.org
  • 5. The LDBC Social Network Benchmark Interactive Workload http://www.ldbcouncil.org
  • 6. The LDBC Social Network Benchmark Interactive Workload Thomas Neumann, Linnea Passing Renzo Angles (U. Talca) Norbert Martinez David Dominguez Xavier Sanchez SNB “Task Force” acknowledgements http://www.ldbcouncil.org
  • 7. LDBC Organization (non-profit) “sponsors” + non-profit members (FORTH, STI2) & personal members + Task Forces, volunteers developing benchmarks + TUC: Technical User Community (6 workshops, 36 graph and RDF user case studies, 12 vendor presentations)
  • 13. DATAGEN: social network generator advanced generation of: • network structure – Power law distributions, small diameter
  • 14. Friendship Degree Distribution • Based on “Anatomy of Facebook” blogpost (2013) • Diameter increases logarithmically with scale factor – New: function has been made pluggable
  • 15. DATAGEN: social network generator advanced generation of: • network structure – Power law distributions, small diameter • property values – realistic, correlated value distributions
  • 16. Realistic Correlated Value Distributions • Person.firstname correlates with Person.location – Values taken from DBpedia • Many other correlations and dependencies.. – e.g. university depends on location • In forum discussions, people read DBpedia articles to each other (= correlation between message text and discussion topic) – Topic = DBpedia article title – Text = one sentence of the article Person.location =<Germany> Person.location =<China>
  • 17. DATAGEN: social network generator advanced generation of: • network structure – Power law distributions, small diameter • property values – realistic, correlated value distributions – temporal correlations / “flash mobs”
  • 18. Temporal Effects (Flash Mobs) • Forum posts generation spikes in time for certain topics:
  • 19. DATAGEN: social network generator advanced generation of: • network structure – Power law distributions, small diameter • property values – realistic, correlated value distributions – temporal correlations / “flash mobs” • correlations between values and structure – 2 correlation “dimensions”: location & interests
  • 20. “RMIT alumni stay in touch” P4 P5 Student “Anna” <RMIT> “Melbourne” “1990” P1 <RMIT> “Laura” “1990” <AC/DC> <AC/DC> P3 <RMIT> “1990” P2 <CWI> “Netherlands” “Metalheads rock together”
  • 21. DATAGEN: Scaling • Scale Factor (SF) is the size of the CSV input data in GB • Some Virtuoso SQL stats at SF=30:
  • 22. DATAGEN: Graph Characteristics Livejournal LFR3 (synthetic) LDBC DATAGEN GRADES2014 “How community-like is the structure of synthetically generated graphs” - Arnau Prat(DAMA-UPC); David Domínguez-Sal (Sparsity Technologies)
  • 23. Benchmark Workloads • Interactive: tests throughput running short queries while consistently handling concurrent updates – Show all photos posted by my friends that I was tagged in • Business Intelligence: consists of complex structured queries for analyzing online behavior – Influential people the topic of open source development? • Graph Analytics: tests the functionality and scalability on most of the data as a single operation – PageRank, Shortest Path(s), Community Detection GRADES2015 “Graphalytics: A Big Data Benchmark for Graph-Processing Platforms ” - Mihai Capota, Tim Hegeman, Alexandru Iosup(TU Delft); Arnau Prat (UPC), Orri Erling (OpenLink Technologies), Peter Boncz (CWI) Topic of this SIGMOD paper Draft queries available on ldbcouncil.org website (deliverable D2.2.4) & github
  • 24. Workloads by system System Interactive Business Intelligence Graph Analytics Graph databases Yes Yes Maybe Graph programming frameworks - Yes Yes RDF databases Yes Yes - Relational databases Yes Yes Maybe, by keeping state in temporary tables, and using the functional features of PL-SQL NoSQL Key-value Maybe Maybe - NoSQL MapReduce - Maybe Yes
  • 25. Interactive Workload MapReduce-base data generation • Generate 3 years of network activity for a certain amount of persons – 33 months of data  bulk load – 3 months of data  insert queries • Scalable (SF1000 in one hour on 10 small compute nodes) – can also be used without a cluster (pseudo-distributed) During data generation, we perform Parameter Curation to derive suitable parameters for the complex-read-only query set
  • 26. Database Benchmark Design Desirable properties: • Relevant. • Representative. • Understandable. • Economical. • Accepted. • Scalable. • Portable. • Fair. • Evolvable. • Public. Jim Gray (1991) The Benchmark Handbook for Database and Transaction Processing Systems Dina Bitton, David J. DeWitt, Carolyn Turbyfill (1993) Benchmarking Database Systems: A Systematic Approach Multiple TPCTC papers, e.g. Karl Huppler (2009) The Art of Building a Good Benchmark  “Choke Points”
  • 27. Benchmark Design with Choke Points Choke-Point = well-chosen difficulty in the workload • “difficulties in the workloads” – arise from Data (distribs)+Query+Workload – there may be different technical solutions to address the choke point • or, there may not yet exist optimizations lot’s of research opportunities! TPCTC 2013: Erling,Boncz,Neumann www.cwi.nl/~boncz  Publications “TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark”
  • 28. SNB Interactive Workload: Complex Read Query Set
  • 29. Choke-Point: shortest paths • compute weights over a recursive forum traversal – on the fly, or – materialized, but then maintain them under updates • compute shortest paths using these weights in the friends graph
  • 30. SNB Interactive Workload: Complex Read Query Set
  • 31. Choke-Point: outdegree correlation • Travel is correlated with location – People travel more often to nearby countries • Outdegree after (countryX,countryY) selection varies a lot – (Australia,NZ): high outdegree (“join hit ratio”) – (Australia,Belgium): low outdegree  different query plan, or navigation strategy likely wins
  • 32. Parameter Curation • Example: Q3 – Problem: value correlations cause very large variance – Solution: data mine for stable parameter equivalence classes TPCTC2014 “Parameter Curation for Benchmark Queries” Andrey Gubichev (TUM) & Peter Boncz (CWI)
  • 33. Query Mix & Metric Query Mix • Insert queries (~10% of time):  challenge: execute parallel but respect data dependencies in the graph • Read-only Complex Queries (~50% of time)  challenge: generate query parameters with stable query behavior Parameter Curation to find “equivalence classes” in parameters • Simple Read-only Queries (~40% of time) – Retrieve Post / Retrieve Person Profile Metric • Acceleration Factor (AF) that can be sustained (+ AF/$ weighted by cost) – with 99th percentile of query latency within maximal query time
  • 34. SNB Query Driver • Dependency-aware parallel query generation – Problem: friends graph is non-partitionable, but imposes ordering constraints. Could cause large checking overhead, impeding driver parallelism. – Solution: Window-based checking approach for keeping driver threads roughly synchronized on a global timestamp. Is helped by DATAGEN properties that ensure there is a minimal latency between certain dependencies (e.g. entering the network and making friends, or posting on a new friend’s forum). This minimal latency provides synchronization headroom.
  • 35. Summary • LDBC – Graph and RDF benchmark council – Choke-point driven benchmark design (user+system expert involvement) • Social Network Benchmark – Advanced social network generator • skewed distributions, power laws, value/structure correlations, flash mobs – 3 workloads: Interactive (focus of this paper), BI, Analytics • Interactive Query Mix & Metrics • Parallel Query Driver that respects dependencies efficiently • Parameter Curation for stable results 7th LDBC Technical User Community meeting November 9+10 2015, IBM TJ Watson (NJ)
  • 36. Questions? http://www.ldbcouncil.org http://github.com/ldbc Blogs Specifications Early Result FDRs Videos of TUC talks Developer info Code, Issue Tracking