SlideShare a Scribd company logo
1 of 26
Download to read offline
Binary RDF for Scalable Publishing,
   Exchanging and Consumption
        in the Web of Data

Javier D. Fernández
Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez

                                        University of Valladolid (Spain)
                                            University of Chile (Chile)




PhD Symposium
Brief RDF Introduction

(1) Resource Description Framework
     Webs, services, protocols
     Persons, Proteins, geography…


(2) A standard model for data exchange on the Web
    Understandable by computers


(3) W3C Recommendation (2004)

(4) Data model
    (subject, predicate, object)


   PhD Symposium
RDF Example
                                                                                           literal
Subject, Predicate, Object
(U,B) , U        , (U,B,L)
                                                                                   “Pablo Neruda”
                                                                URI
               URI                         URI



                                                                 <http://books/author33>

    <http://books/book21>

                                                                 “Spain in the Heart”




                 _collection                      <http://myblog/lectures>

                               lectures:to_read_list

       Blank

 PhD Symposium
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover more things.




     Image:PhD Symposium
           Danilo Rizzuti / FreeDigitalPhotos.net
Image:PhD Symposium
      Danilo Rizzuti / FreeDigitalPhotos.net
Scalability problems



                    DBPedia (en)   233 M.triples   ~ 33 GB
                    Uniprot        845     “       ~ 230 GB



   Publish?
   Exchange?
   Process/Consume/Query?



    PhD Symposium
RDF Publication



                                                   dereferenceable URIs

                                     RDF dump
                  sensor
                                                SPARQL Endpoints/
                                                      APIs


 No Recommendations/methodology to publish at large scale
 Related Work: Some metadata for discovery, such as Void, Semantic
  Sitemaps.




  PhD Symposium
RDF Exchanging issues
 RDF/XML, N3, Turtle, JSON.
          Document-centric (verbose)  data-centric view (machine)
 No structure (chunks, universal compression)



 Related Work: Universal compression (gzip, bzip2) and the Efficient XML
  Interchange Format (EXI).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
RDF Processing/Consumption (After Exchanging)
 Costly Post-processing
          Decompression
          Indexing (RDF Store)
          Finally… consume


 Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes
  (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
The scalability problems has
a main impact on Users

         Would you download hundreds of GB...


                                              … if you don’t know exactly what they contain,
                                             that need costly exchange and post-processing,
                                                and require a powerful store to query them ?




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
In the following...
1. Proposed approach for scalable publishing, exchanging and consumption
   of large RDF datasets
2. Preliminary results
3. Methodology
4. On-going work and conclusions




   Image:PhD Symposium
         jscreationzs / FreeDigitalPhotos.net
An integrated solution
We call for, and we study in this thesis, a Binary RDF Serialization format:
     Machine oriented (binary)
     Clean publication
               Metadata
               Modular
     Efficient exchange
               Compression
     Basic data operations
               Easy to parse and consume
               Primitive query resolution




    Image:PhD Symposium
          jscreationzs / FreeDigitalPhotos.net
HDT Overview




 PhD Symposium
Dictionary+Triples partition



   1   <http://books/author33>
   2   <http://books/book21>             6
   3   dc:author
   4   dc:title
   5   foaf:name                     1
                                 2
   6   “Pablo Neruda”
   7   “Spain in the Heart”          7




  PhD Symposium
Key concepts: The Dictionary

   Largest component (up to 74%)
     Long URIs, shared prefixes
     Lang, datatype tags in literals
   Efficient IDString operations



We plan to work on a specific organization which
  Optimizes space (regularities)
  Provides efficient performance in operations




         PhD Symposium
Preliminary results in Rich Functional Dictionaries

We propose to adapt techniques for string dictionaries;
  Front-Coding
     Making dictionary partitions




  [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández,
     Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012).

       PhD Symposium
Key concepts: Triples

   Specific compression:
       More efficient compression than just gzip.
   Data indexing for consumption:
       Allows direct patterns resolution without decompression
           (s,p,o), (s,?p,?o) and (s,p,?o)


We plan to work on a specific technique which
  optimizes space
  provides efficient performance in primitive operations




          PhD Symposium
Preliminary results in Triples Encoding

We propose to use Bitmap indexes:




   [*] Compact Representation of Large RDF Data Sets for Publishing and
       Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez.
       International Semantic Web Conference(ISWC 2010).

       PhD Symposium
Methodology
 RDF structure in theory and practice.
 Binary RDF Specification.
 Succinct Dictionaries.
 Triples Indexes.
 Practical deployment.




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
Some Results… HDT Acknowledged as W3C
member submission:
http://www.w3.org/Submission/2011/03/
                                        supported by:




   PhD Symposium
Some Results... HDT for exchanging




 PhD Symposium
Some Results... HDT for consumption
Direct Consumption, without decompression after exchanging
           Example of use: HDT-it (Thanks to Mario Arias, DERI)




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
On-going promising work: HDT-FoQ




    [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto,
        Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC
        2012). To appear
 PhD Symposium
In conclusion
Binary RDF aims to lightweight the Web of Data;
    Logical decomposition: Header, Dictionary, and Triples
    Clean publication
    Compressed RDF format for exchanging
    Machine-friendly, direct consumption
         Rich Functional Dictionary/Triples representations for querying




      PhD Symposium
Still much work on…
 Getting a global understanding of the real structure of RDF networks.
 Applying this knowledge in innovative dictionary and triples indexes.
     full SPARQL at consumption
 Supporting dynamic operations
     inserting, deleting, and updating binary RDF




       PhD Symposium
Thanks!



HDT:        http://www.rdfhdt.org/
Group: http://dataweb.infor.uva.es/
Slides: http://www.slideshare.net/javifer


                                   Javier D. Fernández (jfergar@infor.uva.es)
                   Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez

                                                   University of Valladolid (Spain)
                                                       University of Chile (Chile)


  PhD Symposium

More Related Content

What's hot

What's hot (20)

Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Road to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache ArrowRoad to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache Arrow
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Service Workers and APEX
Service Workers and APEXService Workers and APEX
Service Workers and APEX
 
RDBでのツリー表現入門
RDBでのツリー表現入門RDBでのツリー表現入門
RDBでのツリー表現入門
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
Introduction to gRPC
Introduction to gRPCIntroduction to gRPC
Introduction to gRPC
 
Module: Content Exchange in IPFS
Module: Content Exchange in IPFSModule: Content Exchange in IPFS
Module: Content Exchange in IPFS
 
Named Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseNamed Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-case
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files Syndrome
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Neo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j: What's Under the Hood
Neo4j: What's Under the Hood
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
The Hidden Life of Spark Jobs
The Hidden Life of Spark JobsThe Hidden Life of Spark Jobs
The Hidden Life of Spark Jobs
 

Viewers also liked

B. indonesia
B. indonesiaB. indonesia
B. indonesia
Jay
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьу
Şamil Tzva
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbc
Hatti Knuts
 

Viewers also liked (20)

Exel budget
Exel budgetExel budget
Exel budget
 
The pitch[1]
The pitch[1]The pitch[1]
The pitch[1]
 
Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economy
 
Proposal salam bgi
Proposal salam bgiProposal salam bgi
Proposal salam bgi
 
TGV Pequim-Xangai
TGV Pequim-XangaiTGV Pequim-Xangai
TGV Pequim-Xangai
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon City
 
A good story
A good storyA good story
A good story
 
B. indonesia
B. indonesiaB. indonesia
B. indonesia
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьу
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costs
 
London
LondonLondon
London
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentation
 
The pitch
The pitchThe pitch
The pitch
 
CONCURSO BOLETÍN
CONCURSO BOLETÍNCONCURSO BOLETÍN
CONCURSO BOLETÍN
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial system
 
O drama na Síria ...
O drama na Síria ...O drama na Síria ...
O drama na Síria ...
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbc
 
202 lecture 1
202 lecture 1202 lecture 1
202 lecture 1
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data (20)

Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

  • 1. Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data Javier D. Fernández Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium
  • 2. Brief RDF Introduction (1) Resource Description Framework  Webs, services, protocols  Persons, Proteins, geography… (2) A standard model for data exchange on the Web  Understandable by computers (3) W3C Recommendation (2004) (4) Data model  (subject, predicate, object) PhD Symposium
  • 3. RDF Example literal Subject, Predicate, Object (U,B) , U , (U,B,L) “Pablo Neruda” URI URI URI <http://books/author33> <http://books/book21> “Spain in the Heart” _collection <http://myblog/lectures> lectures:to_read_list Blank PhD Symposium
  • 4. 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 5. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 6. Scalability problems DBPedia (en) 233 M.triples ~ 33 GB Uniprot 845 “ ~ 230 GB  Publish?  Exchange?  Process/Consume/Query? PhD Symposium
  • 7. RDF Publication dereferenceable URIs RDF dump sensor SPARQL Endpoints/ APIs  No Recommendations/methodology to publish at large scale  Related Work: Some metadata for discovery, such as Void, Semantic Sitemaps. PhD Symposium
  • 8. RDF Exchanging issues  RDF/XML, N3, Turtle, JSON.  Document-centric (verbose)  data-centric view (machine)  No structure (chunks, universal compression)  Related Work: Universal compression (gzip, bzip2) and the Efficient XML Interchange Format (EXI). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 9. RDF Processing/Consumption (After Exchanging)  Costly Post-processing  Decompression  Indexing (RDF Store)  Finally… consume  Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 10. The scalability problems has a main impact on Users Would you download hundreds of GB... … if you don’t know exactly what they contain, that need costly exchange and post-processing, and require a powerful store to query them ? Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 11. In the following... 1. Proposed approach for scalable publishing, exchanging and consumption of large RDF datasets 2. Preliminary results 3. Methodology 4. On-going work and conclusions Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 12. An integrated solution We call for, and we study in this thesis, a Binary RDF Serialization format:  Machine oriented (binary)  Clean publication  Metadata  Modular  Efficient exchange  Compression  Basic data operations  Easy to parse and consume  Primitive query resolution Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 13. HDT Overview PhD Symposium
  • 14. Dictionary+Triples partition 1 <http://books/author33> 2 <http://books/book21> 6 3 dc:author 4 dc:title 5 foaf:name 1 2 6 “Pablo Neruda” 7 “Spain in the Heart” 7 PhD Symposium
  • 15. Key concepts: The Dictionary  Largest component (up to 74%)  Long URIs, shared prefixes  Lang, datatype tags in literals  Efficient IDString operations We plan to work on a specific organization which  Optimizes space (regularities)  Provides efficient performance in operations PhD Symposium
  • 16. Preliminary results in Rich Functional Dictionaries We propose to adapt techniques for string dictionaries;  Front-Coding  Making dictionary partitions [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012). PhD Symposium
  • 17. Key concepts: Triples  Specific compression:  More efficient compression than just gzip.  Data indexing for consumption:  Allows direct patterns resolution without decompression (s,p,o), (s,?p,?o) and (s,p,?o) We plan to work on a specific technique which  optimizes space  provides efficient performance in primitive operations PhD Symposium
  • 18. Preliminary results in Triples Encoding We propose to use Bitmap indexes: [*] Compact Representation of Large RDF Data Sets for Publishing and Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez. International Semantic Web Conference(ISWC 2010). PhD Symposium
  • 19. Methodology  RDF structure in theory and practice.  Binary RDF Specification.  Succinct Dictionaries.  Triples Indexes.  Practical deployment. Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 20. Some Results… HDT Acknowledged as W3C member submission: http://www.w3.org/Submission/2011/03/ supported by: PhD Symposium
  • 21. Some Results... HDT for exchanging PhD Symposium
  • 22. Some Results... HDT for consumption Direct Consumption, without decompression after exchanging  Example of use: HDT-it (Thanks to Mario Arias, DERI) Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 23. On-going promising work: HDT-FoQ [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto, Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC 2012). To appear PhD Symposium
  • 24. In conclusion Binary RDF aims to lightweight the Web of Data;  Logical decomposition: Header, Dictionary, and Triples  Clean publication  Compressed RDF format for exchanging  Machine-friendly, direct consumption  Rich Functional Dictionary/Triples representations for querying PhD Symposium
  • 25. Still much work on…  Getting a global understanding of the real structure of RDF networks.  Applying this knowledge in innovative dictionary and triples indexes.  full SPARQL at consumption  Supporting dynamic operations  inserting, deleting, and updating binary RDF PhD Symposium
  • 26. Thanks! HDT: http://www.rdfhdt.org/ Group: http://dataweb.infor.uva.es/ Slides: http://www.slideshare.net/javifer Javier D. Fernández (jfergar@infor.uva.es) Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium