SlideShare a Scribd company logo
1 of 22
Download to read offline
sitemap4rdf
generate Sitemap files from a SPARQL
              endpoint
          http://www.deri.ie/
          http://www deri ie/




     Boris Villazón-Terrazas and Richard Cyganiak (DERI)
    Facultad de Informática, Universidad Politécnica de Madrid
  Campus de Montegancedo sn 28660 Boadilla del Monte Madrid
                             sn,                   Monte,
                     http://www.oeg-upm.net
           Phone: 34.91.3366605, Fax: 34.91.3524819
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                              2
Linked Data frontends for triple stores




Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/


                          3
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                              4
Sindice: the best RDF search engine




     5
Sindice: the best RDF search engine




•   120M+ documents
•   Continuously updating since 2006
    C ti      l    d ti    i
•   Search API
•   RDF/XML, Turtle, RDFa, microformats




                       6
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                              7
Sitemap Protocol

• Used by web crawlers
• Efficiently find all your content & discover
  what has been updated
             http://sitemaps.org/




A sitemap fil contains i f
   i      file      i information regarding one or more URL on
                               i         di                URLs
   your Web site. The information that is stored there helps search
   engines better spider your website.


                                 8
Sitemap Protocol: Simple example

<?xml version="1.0" encoding="UTF-8"?>
<urlset
   xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://yoursite/</loc>
   </url>
   <url>
         oc ttp://you s te/p oducts/535 6 / oc
      <loc>http://yoursite/products/53546</loc>
   </url>
   <url>
      <loc>http://yoursite/products/98421</loc>
   </url>
   <url>
      <loc>http://yoursite/products/41003</loc>
   </url>
</urlset>


                             9
Sitemap Protocol: Optional parts




<?xml version="1.0" encoding="UTF-8"?>
<urlset
   xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://yoursite/</loc>
      <lastmod>2010-06-24</lastmod>
      <changefreq>daily</changefreq>
      < h    f   >d il </ h    f   >
   </url>
</urlset>




                           10
Sitemap Protocol: Huge sitemaps


• Gzip-compress your sitemap
• Limit: 50k URLs or 10MB
  • split into multiple sitemap files
  • add a sitemap index file




                         11
Sitemap Protocol: Discovery

• Publish the sitemap file

• Add a line to http://yoursite/robots.txt
   •   Web site owners use the /robots.txt file to give instructions about their site
       to web robots; this is called The Robots Exclusion Protocol.




 Sitemap: http://yoursite/sitemap.xml




                                          12
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                             13
sitemap4rdf


• Simple command line tool
• Sends a SPARQL query to list all URIs
• Generates sitemap

sitemap4rdf htt //
 it    4 df http://yoursite/sparql htt //
                        it /     l http://yoursite/resource/
                                               it /        /

Example:

sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/


• run sitemap4rdf specifying th SPARQL endpoint
       it    4 df      if i the               d i t
  and the prefix of the URLs to include in the Sitemap


                                         14
Submit the sitemap location - Sindice

• http://sindice.com/main/submit




                           15
Submit the sitemap location - Google

• https://www.google.com/webmasters/tools/




                         16
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                             17
Summary

• Sitemap protocol informs search engines about
  available pages
   • Supported by Sindice!


• sitemap4rdf generates Sitemap files by listing URIs
  in a SPARQL endpoint
   • Open source, Java
   • http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
   • http://mccarthy dia fi upm es/sitemap4rdf/
     http://mccarthy.dia.fi.upm.es/sitemap4rdf/
   • http://www.oeg-upm.net/index.php/en/downloads/122-sitemap4rdf




                                 18
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                             19
Future Work

• Integrate sitemap4rdf with Pubby

• Generate voiD file automatically from a SPARQL
  endpoint

• Generate an entry in CKAN (registry of open
  knowledge packages) automatically through CKAN-
  API
   • http://ckan net/package/geolinkeddata
     http://ckan.net/package/geolinkeddata


• Interact with prefix cc ( service for remembering and
                prefix.cc
  looking up URI prefixes) through its API
   • geoes: < http://geo.linkeddata.es/ontology>
              http://geo.linkeddata.es/ontology

                                20
Future Work

• Support the semantic sitemap extension (when it will
  be compatible with google)
   • http://sw.deri.org/2007/07/sitemapextension/




                                21
sitemap4rdf
generate Sitemap files from a SPARQL
              endpoint
          http://www.deri.ie/
          http://www deri ie/




     Boris Villazón-Terrazas and Richard Cyganiak (DERI)
    Facultad de Informática, Universidad Politécnica de Madrid
  Campus de Montegancedo sn 28660 Boadilla del Monte Madrid
                             sn,                   Monte,
                     http://www.oeg-upm.net
           Phone: 34.91.3366605, Fax: 34.91.3524819

More Related Content

What's hot

Map4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial DatasetsMap4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial Datasets
Boris Villazón-Terrazas
 

What's hot (20)

New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
Map4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial DatasetsMap4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial Datasets
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
 
Web Archive Research Skills and Tools Survey (WARST)
 Web Archive Research Skills and Tools Survey (WARST) Web Archive Research Skills and Tools Survey (WARST)
Web Archive Research Skills and Tools Survey (WARST)
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
WG5: A data wrangling experiment
WG5: A data wrangling experimentWG5: A data wrangling experiment
WG5: A data wrangling experiment
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Web Archiving: Description and Access
Web Archiving: Description and AccessWeb Archiving: Description and Access
Web Archiving: Description and Access
 
LOD技術解説
LOD技術解説LOD技術解説
LOD技術解説
 
RDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCRDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARC
 
:me owl:sameAs flickr:33669349@N00 .
:me owl:sameAs flickr:33669349@N00 .:me owl:sameAs flickr:33669349@N00 .
:me owl:sameAs flickr:33669349@N00 .
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
Finding Data Sets
Finding Data SetsFinding Data Sets
Finding Data Sets
 
Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open Data
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3
 
The European ALIADA project : introduction
The European ALIADA project : introductionThe European ALIADA project : introduction
The European ALIADA project : introduction
 

Viewers also liked

Viewers also liked (11)

SEEMP - Semantic Aspects and Interoperability
SEEMP - Semantic Aspects and InteroperabilitySEEMP - Semantic Aspects and Interoperability
SEEMP - Semantic Aspects and Interoperability
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
Towards a Commons RDF Java library
Towards a Commons RDF Java libraryTowards a Commons RDF Java library
Towards a Commons RDF Java library
 
Yet another SPARQL 1.1 brief introduction
Yet another SPARQL 1.1 brief introductionYet another SPARQL 1.1 brief introduction
Yet another SPARQL 1.1 brief introduction
 
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
Geolinkeddata 07042011 1
Geolinkeddata 07042011 1Geolinkeddata 07042011 1
Geolinkeddata 07042011 1
 
Linguistic resources enhanced with geospatial Information
Linguistic resources enhanced with geospatial InformationLinguistic resources enhanced with geospatial Information
Linguistic resources enhanced with geospatial Information
 
Ecuadorian Geospatial Linked Data
Ecuadorian Geospatial Linked Data Ecuadorian Geospatial Linked Data
Ecuadorian Geospatial Linked Data
 
iSOCO - Research Lab Brief Introduction
iSOCO - Research Lab Brief IntroductioniSOCO - Research Lab Brief Introduction
iSOCO - Research Lab Brief Introduction
 
Data Shapes and Data Transformations
Data Shapes and Data TransformationsData Shapes and Data Transformations
Data Shapes and Data Transformations
 

Similar to Sitemap4rdf(v2 boris)

How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
Richard Cyganiak
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.
Diep Nguyen
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 

Similar to Sitemap4rdf(v2 boris) (20)

How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
El Punto Neutro de Internet en Cataluña
El Punto Neutro de Internet en CataluñaEl Punto Neutro de Internet en Cataluña
El Punto Neutro de Internet en Cataluña
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
 
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Clouds
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Linked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaLinked Media Management with Apache Marmotta
Linked Media Management with Apache Marmotta
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
IAA Life in Lockdown series: Securing Internet Routing
IAA Life in Lockdown series: Securing Internet RoutingIAA Life in Lockdown series: Securing Internet Routing
IAA Life in Lockdown series: Securing Internet Routing
 
2012 09 caas-ag_infra
2012 09 caas-ag_infra2012 09 caas-ag_infra
2012 09 caas-ag_infra
 
Semantic web and Drupal: an introduction
Semantic web and Drupal: an introductionSemantic web and Drupal: an introduction
Semantic web and Drupal: an introduction
 
Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web Scraping
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web Applications
 
How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 

More from Boris Villazón-Terrazas

RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
Boris Villazón-Terrazas
 
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseAn Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
Boris Villazón-Terrazas
 

More from Boris Villazón-Terrazas (7)

RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
 
Statistical Linked Data
Statistical Linked DataStatistical Linked Data
Statistical Linked Data
 
Linked Data Projects at OEG - Current Status
Linked Data Projects at OEG - Current StatusLinked Data Projects at OEG - Current Status
Linked Data Projects at OEG - Current Status
 
A Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationA Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and Organization
 
Lightweight Semantic Annotation of Geospatial RESTful Services
Lightweight Semantic Annotation of Geospatial RESTful ServicesLightweight Semantic Annotation of Geospatial RESTful Services
Lightweight Semantic Annotation of Geospatial RESTful Services
 
Geometry2rdf(v2 boris)
Geometry2rdf(v2 boris)Geometry2rdf(v2 boris)
Geometry2rdf(v2 boris)
 
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseAn Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Sitemap4rdf(v2 boris)

  • 1. sitemap4rdf generate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819
  • 2. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 2
  • 3. Linked Data frontends for triple stores Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/ 3
  • 4. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 4
  • 5. Sindice: the best RDF search engine 5
  • 6. Sindice: the best RDF search engine • 120M+ documents • Continuously updating since 2006 C ti l d ti i • Search API • RDF/XML, Turtle, RDFa, microformats 6
  • 7. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 7
  • 8. Sitemap Protocol • Used by web crawlers • Efficiently find all your content & discover what has been updated http://sitemaps.org/ A sitemap fil contains i f i file i information regarding one or more URL on i di URLs your Web site. The information that is stored there helps search engines better spider your website. 8
  • 9. Sitemap Protocol: Simple example <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> oc ttp://you s te/p oducts/535 6 / oc <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url> </urlset> 9
  • 10. Sitemap Protocol: Optional parts <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> < h f >d il </ h f > </url> </urlset> 10
  • 11. Sitemap Protocol: Huge sitemaps • Gzip-compress your sitemap • Limit: 50k URLs or 10MB • split into multiple sitemap files • add a sitemap index file 11
  • 12. Sitemap Protocol: Discovery • Publish the sitemap file • Add a line to http://yoursite/robots.txt • Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. Sitemap: http://yoursite/sitemap.xml 12
  • 13. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 13
  • 14. sitemap4rdf • Simple command line tool • Sends a SPARQL query to list all URIs • Generates sitemap sitemap4rdf htt // it 4 df http://yoursite/sparql htt // it / l http://yoursite/resource/ it / / Example: sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/ • run sitemap4rdf specifying th SPARQL endpoint it 4 df if i the d i t and the prefix of the URLs to include in the Sitemap 14
  • 15. Submit the sitemap location - Sindice • http://sindice.com/main/submit 15
  • 16. Submit the sitemap location - Google • https://www.google.com/webmasters/tools/ 16
  • 17. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 17
  • 18. Summary • Sitemap protocol informs search engines about available pages • Supported by Sindice! • sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint • Open source, Java • http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ • http://mccarthy dia fi upm es/sitemap4rdf/ http://mccarthy.dia.fi.upm.es/sitemap4rdf/ • http://www.oeg-upm.net/index.php/en/downloads/122-sitemap4rdf 18
  • 19. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 19
  • 20. Future Work • Integrate sitemap4rdf with Pubby • Generate voiD file automatically from a SPARQL endpoint • Generate an entry in CKAN (registry of open knowledge packages) automatically through CKAN- API • http://ckan net/package/geolinkeddata http://ckan.net/package/geolinkeddata • Interact with prefix cc ( service for remembering and prefix.cc looking up URI prefixes) through its API • geoes: < http://geo.linkeddata.es/ontology> http://geo.linkeddata.es/ontology 20
  • 21. Future Work • Support the semantic sitemap extension (when it will be compatible with google) • http://sw.deri.org/2007/07/sitemapextension/ 21
  • 22. sitemap4rdf generate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819