SlideShare une entreprise Scribd logo
1  sur  13
SIB . 23.03.2011 . Page 1                         http://lod2.eu




WP2
Storing and Querying
Very Large Knowledge Bases
                             Vienna Update
                             March 2012 – M18

                             Peter Boncz


                                                http://lod2.eu
SIB . 23.03.2011 . Page 2                                             http://lod2.eu




 Table of Contents

 • WP2 Refresher
 • LOD Cloud Hosted on the Knowledge Store Cluster
    * 50B mark reached, column-store Virtuoso deployed
 • State of the Art LOD Laboratory (“Benchmarking”)
    * LDBC – RDF Store Industry council
    * BSBM at large scale
    * RDF-H + Social Intelligence Benchmark (SIB)
 • Technical work
    * column-store Virtuoso  cluster version
    * recycling query results
 • Next up
   * LOD cloud @250B triples
    * Virtuoso: adaptive query optimizer (and more)
    * first MonetDB/SPARQL version (RDF clustering, graph indexing)
LOD2 Title . 02.09.2010 . Page 3                          http://lod2.eu




 WP2 Organization

 CWI (MonetDB):
 • Peter Boncz (also in VUA group of Frank v Harmelen)
 • Duc Pham Minh (Phd student)
 • Irini Fundulaki (1-year sabbatical from FORTH)

 OpenLink (Virtuoso):
 • Orri Erling
 • Hugh Williams
 • Ivan Mikhailov

 + FU Berlin (BSBM)
 + DERI (BSBM text+ LOD cloud + text retrieval/sindice)
 + ULEI (DBpedia benchmark)
SIB . 23.03.2011 . Page 4                              http://lod2.eu


      WP2
      Storing and Querying Very Large Knowledge Bases

Goal: enabling large-scale, feature-rich & enterprise-ready Linked
  Data management solutions

Database Partners in LOD2:
CWI: Leading open source analytics RDBMS
OpenLink: Leading Linked data deployment platform

Technological Excellence:
Creating and publishing metrics for choosing RDF solutions
Bringing Column Store Technology for Business Intelligence on RDF
Ground-breaking database innovations for RDF stores
   (Dynamic Query optimization, Adaptive Caching of Joins,
   Optimized Graph Processing, Cluster/Cloud scalability)
LOD2 Title . 02.09.2010 . Page 5                   http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 LOD cloud cache scalability
 • M0: 20B triples
 • M12: 50B triples
 • M24: 250B triples
 • M36: 1T triples

 D2.4 completed: 50B triples in LOD cache @ DERI
 First deployment of Virtuoso7 Cluster
 • Currently hosting about 55 billion triples
 • 8 node Virtuoso v7 (column store) Cluster
 • 384GB RAM
 • 2TB Disk Storage
 • 14B/quads, excl literals

 Next up:
 • hardware provisioning for 250B and 1T triples
  (need 512GB RAM resp. 2TB RAM somewhere)
LOD2 Title . 02.09.2010 . Page 6                         http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 Benchmarking

 • creating new benchmarks
      • BSBM-BI (FU Berlin)
      • DBpedia Benchmark (ULEI) – best paper award
      • RDF-H (OGL,CWI)
      • Social Intelligence Benchmark (OGL,CWI)
 • running benchmark evaluations
      • BSBM on a large cluster cluster (Lisa @ SARA)
      • BSBM on large single-server (40cores, 1TB RAM)
 • creating industry consensus
      • Benchmark Auditing Service
      • LOD Benchmark Council
LOD2 Title . 02.09.2010 . Page 7                               http://lod2.eu




 BSBM Large Scale Experiments (still ongoing..)

 New Aspects:
 • The Business Intelligence Use Case (BI)
 • Benchmark Rules
 • BSBM V3 Results
 • trying cluster versions

 SARA LISA cluster
 • experiments with up to 64 nodes

 VectorWise high-end server
 • 40-core machine with 1TB RAM

 Benchmarked at SARA and Vectorwise
 4store 1.1.2      Garlik       http://4store.org/
 BigData r4169     SYSTAP LLC   http://www.systap.com/bigdata.htm
 BigOwlim 3.4.3129 OntoText     http://www.ontotext.com/owlim/
 Jena TDB 0.8.9    openjena.org http://www.openjena.org/TDB/
 Fuseki 0.1.0      openjena.org http://openjena.org/wiki/Fuseki
 Virtuoso 7.0      OpenLink     http://virtuoso.openlinksw.com/
LOD2 Title . 02.09.2010 . Page 9                           http://lod2.eu




           Social Intelligence Benchmark




                                       14 dictionaries
                                        of real data
Facebook schema style
                                     Realistic scenario
                                        simulation

         Synthetic Generated Data                         Linked Open Data
LOD2 Title . 02.09.2010 . Page 11                                  http://lod2.eu




 Technical Work: Recycling (D2.4)

 Dynamic caching of intermediate query results
 • SPARQL problem: hard to index workload / expensive backward chaining
 Idea: compute once, re-use many times
LOD2 Title . 02.09.2010 . Page 13                           http://lod2.eu




 Technical Work: Virtuoso 7

 Major now upcoming release V7, due for release in 2012

 • column store technology:
       • aggressive compression  more data fits in RAM
       • vectored execution  things run faster
 • elastic cluster implementation
       • partitions can migrate across nodes
 • bringing computation to the data
       • arbitrary recursive functions in the cluster
 • geospatial support
       • full openGIS support, R-tree backed, EWKT format
 • future enhancements
       • adaptive query optimization (CWI ROX)
       •re-use of intermediates (CWI recycling)
       • using SSDs as cache
LOD2 Title . 02.09.2010 . Page 14                             http://lod2.eu




 Next 6 months


 Virtuoso: sampled query optimizer
 • query optimization in SPARQL is difficult (no stats)
 • use adaptive, run-time, query optimization with sampling

 MonetDB and SPARQL
 • First version in sight (cooperation with FORTH)
 • research tracks
       • RDF clustering on Characteristic Sets
       • correlated join path indexing

 LOD cache at 250B triples
 • what triples to use?
 • what hardware to use? (need 512GB RAM)
SIB . 23.03.2011 . Page 15            http://lod2.eu




      Contact

      Address

      Centrum Wiskunde Informatica (CWI)
      Science Park 123
      1098 XG Amsterdam
      The Netherlands

      monetdb.cwi.nl




Thanks for your attention!
LOD2 Title . 02.09.2010 . Page 16                                  http://lod2.eu




 LOD2 Benchmark Auditing Service

 Benchmarking needs of SPARQL engine vendors:
 • vendors want to publish in their own timescale
 • using new or upcoming releases (not yet public)
 • using properly tuned settings and hardware to their solution
 • yet need credibility (is it fair)

 Tournaments organized by one institution have
 • bad timing, wrong version, one more bug to fix, etc
 • not the right hardware or settings
 • may become a legal liability once matters become more serious

 LOD2 should reach out to the SPARQL technical community and
 provide independent benchmark auditing services
 • start with BSBM  working on Auditing Rules Document
 • maybe other benchmarks later

Contenu connexe

En vedette

México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013EdelmanMexico
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small BusinessCaroline Cummings
 
Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1dapaz93
 
How to create cned school servr 40000 total
How to create cned school servr 40000 totalHow to create cned school servr 40000 total
How to create cned school servr 40000 totalPrachoom Rangkasikorn
 
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010Departamento de Derecho UNS
 
バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析0nly0
 
JMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsJMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsKevin Parrish
 

En vedette (9)

México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small Business
 
Podcast
PodcastPodcast
Podcast
 
Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1
 
How to create cned school servr 40000 total
How to create cned school servr 40000 totalHow to create cned school servr 40000 total
How to create cned school servr 40000 total
 
resum 2015
resum 2015resum 2015
resum 2015
 
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
 
バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析
 
JMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsJMESI Bioethics Two Applications
JMESI Bioethics Two Applications
 

Similaire à LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases

The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oakMichael Dürig
 
osscon_mysql_redis_plugin
osscon_mysql_redis_pluginosscon_mysql_redis_plugin
osscon_mysql_redis_pluginhyeongchae lee
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFOpenLink Software
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference KeynoteKingsley Uyi Idehen
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Joachim Neubert
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
AWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsAWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsHannes Mühleisen
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012Amazon Web Services
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 

Similaire à LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases (20)

LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
 
osscon_mysql_redis_plugin
osscon_mysql_redis_pluginosscon_mysql_redis_plugin
osscon_mysql_redis_plugin
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
Solr 4
Solr 4Solr 4
Solr 4
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
AWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsAWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data Commons
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 

Plus de LOD2 Creating Knowledge out of Interlinked Data

Plus de LOD2 Creating Knowledge out of Interlinked Data (20)

LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
 
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
 
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolPartyLOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolParty
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMES
 
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project ManagementLOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project Management
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
 
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
 
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
 
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data WebLOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
 
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
 
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 StackLOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
 

Dernier

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 

Dernier (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases

  • 1. SIB . 23.03.2011 . Page 1 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Vienna Update March 2012 – M18 Peter Boncz http://lod2.eu
  • 2. SIB . 23.03.2011 . Page 2 http://lod2.eu Table of Contents • WP2 Refresher • LOD Cloud Hosted on the Knowledge Store Cluster * 50B mark reached, column-store Virtuoso deployed • State of the Art LOD Laboratory (“Benchmarking”) * LDBC – RDF Store Industry council * BSBM at large scale * RDF-H + Social Intelligence Benchmark (SIB) • Technical work * column-store Virtuoso  cluster version * recycling query results • Next up * LOD cloud @250B triples * Virtuoso: adaptive query optimizer (and more) * first MonetDB/SPARQL version (RDF clustering, graph indexing)
  • 3. LOD2 Title . 02.09.2010 . Page 3 http://lod2.eu WP2 Organization CWI (MonetDB): • Peter Boncz (also in VUA group of Frank v Harmelen) • Duc Pham Minh (Phd student) • Irini Fundulaki (1-year sabbatical from FORTH) OpenLink (Virtuoso): • Orri Erling • Hugh Williams • Ivan Mikhailov + FU Berlin (BSBM) + DERI (BSBM text+ LOD cloud + text retrieval/sindice) + ULEI (DBpedia benchmark)
  • 4. SIB . 23.03.2011 . Page 4 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Goal: enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions Database Partners in LOD2: CWI: Leading open source analytics RDBMS OpenLink: Leading Linked data deployment platform Technological Excellence: Creating and publishing metrics for choosing RDF solutions Bringing Column Store Technology for Business Intelligence on RDF Ground-breaking database innovations for RDF stores (Dynamic Query optimization, Adaptive Caching of Joins, Optimized Graph Processing, Cluster/Cloud scalability)
  • 5. LOD2 Title . 02.09.2010 . Page 5 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking LOD cloud cache scalability • M0: 20B triples • M12: 50B triples • M24: 250B triples • M36: 1T triples D2.4 completed: 50B triples in LOD cache @ DERI First deployment of Virtuoso7 Cluster • Currently hosting about 55 billion triples • 8 node Virtuoso v7 (column store) Cluster • 384GB RAM • 2TB Disk Storage • 14B/quads, excl literals Next up: • hardware provisioning for 250B and 1T triples (need 512GB RAM resp. 2TB RAM somewhere)
  • 6. LOD2 Title . 02.09.2010 . Page 6 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking Benchmarking • creating new benchmarks • BSBM-BI (FU Berlin) • DBpedia Benchmark (ULEI) – best paper award • RDF-H (OGL,CWI) • Social Intelligence Benchmark (OGL,CWI) • running benchmark evaluations • BSBM on a large cluster cluster (Lisa @ SARA) • BSBM on large single-server (40cores, 1TB RAM) • creating industry consensus • Benchmark Auditing Service • LOD Benchmark Council
  • 7. LOD2 Title . 02.09.2010 . Page 7 http://lod2.eu BSBM Large Scale Experiments (still ongoing..) New Aspects: • The Business Intelligence Use Case (BI) • Benchmark Rules • BSBM V3 Results • trying cluster versions SARA LISA cluster • experiments with up to 64 nodes VectorWise high-end server • 40-core machine with 1TB RAM Benchmarked at SARA and Vectorwise 4store 1.1.2 Garlik http://4store.org/ BigData r4169 SYSTAP LLC http://www.systap.com/bigdata.htm BigOwlim 3.4.3129 OntoText http://www.ontotext.com/owlim/ Jena TDB 0.8.9 openjena.org http://www.openjena.org/TDB/ Fuseki 0.1.0 openjena.org http://openjena.org/wiki/Fuseki Virtuoso 7.0 OpenLink http://virtuoso.openlinksw.com/
  • 8. LOD2 Title . 02.09.2010 . Page 9 http://lod2.eu Social Intelligence Benchmark 14 dictionaries of real data Facebook schema style Realistic scenario simulation Synthetic Generated Data Linked Open Data
  • 9. LOD2 Title . 02.09.2010 . Page 11 http://lod2.eu Technical Work: Recycling (D2.4) Dynamic caching of intermediate query results • SPARQL problem: hard to index workload / expensive backward chaining Idea: compute once, re-use many times
  • 10. LOD2 Title . 02.09.2010 . Page 13 http://lod2.eu Technical Work: Virtuoso 7 Major now upcoming release V7, due for release in 2012 • column store technology: • aggressive compression  more data fits in RAM • vectored execution  things run faster • elastic cluster implementation • partitions can migrate across nodes • bringing computation to the data • arbitrary recursive functions in the cluster • geospatial support • full openGIS support, R-tree backed, EWKT format • future enhancements • adaptive query optimization (CWI ROX) •re-use of intermediates (CWI recycling) • using SSDs as cache
  • 11. LOD2 Title . 02.09.2010 . Page 14 http://lod2.eu Next 6 months Virtuoso: sampled query optimizer • query optimization in SPARQL is difficult (no stats) • use adaptive, run-time, query optimization with sampling MonetDB and SPARQL • First version in sight (cooperation with FORTH) • research tracks • RDF clustering on Characteristic Sets • correlated join path indexing LOD cache at 250B triples • what triples to use? • what hardware to use? (need 512GB RAM)
  • 12. SIB . 23.03.2011 . Page 15 http://lod2.eu Contact Address Centrum Wiskunde Informatica (CWI) Science Park 123 1098 XG Amsterdam The Netherlands monetdb.cwi.nl Thanks for your attention!
  • 13. LOD2 Title . 02.09.2010 . Page 16 http://lod2.eu LOD2 Benchmark Auditing Service Benchmarking needs of SPARQL engine vendors: • vendors want to publish in their own timescale • using new or upcoming releases (not yet public) • using properly tuned settings and hardware to their solution • yet need credibility (is it fair) Tournaments organized by one institution have • bad timing, wrong version, one more bug to fix, etc • not the right hardware or settings • may become a legal liability once matters become more serious LOD2 should reach out to the SPARQL technical community and provide independent benchmark auditing services • start with BSBM  working on Auditing Rules Document • maybe other benchmarks later

Notes de l'éditeur

  1. From the aforementioned reasons, we proposed an RDF and graph database benchmark, called Social Intelligence benchmark, that can exploit the advantages of RDF in graph representation. We are aiming at testing the graph database performance on a highly connected graph. As social network is a high profile for graph data management, we design our benchmark over the scenarios of a social network. We try to generate data as realistic as possible with correlations and offer challenging queries over the data correlations.Besides, since a very large amount of useful information is available in many linked-open datasets, we exploit these resources by linking to them.
  2. Now, I will describe the data specification of SIB. As Facebook is the most popular social network with more than 800 millions active users, we take the schema style of Facebook as the baseline for designing SIB. For generating realistic data, we use 14 dictionaries that we build from real data. These dictionaries cover various domains, for example, geographical information, personal names,..SIB data is designed so that it can simulate realistic scenario including the real behaviors of the users and the characteristics of data distributions in social networks.As we mention before, our synthetic data is linked with well-known linked open data. And here, SIB is linked with DBPedia, one of the largest linked open dataset.
  3. I think most of us know FB and even have a Facebook account. The logical schema of our benchmark simulates the Facebook schema in which a user can have many friends, and there are friendships between them. A user can provide many profile information such as his name, where he is studying at, where he is living at. He can also specify his current status, for example, in Relation ship with another user. The user can upload many photo, start a discussion by writing posts, and get a lot of comments from his friends.