SlideShare a Scribd company logo
1 of 22
Download to read offline
sitemap4rdf
generate Sitemap files from a SPARQL
              endpoint
          http://www.deri.ie/
          http://www deri ie/




     Boris Villazón-Terrazas and Richard Cyganiak (DERI)
    Facultad de Informática, Universidad Politécnica de Madrid
  Campus de Montegancedo sn 28660 Boadilla del Monte Madrid
                             sn,                   Monte,
                     http://www.oeg-upm.net
           Phone: 34.91.3366605, Fax: 34.91.3524819
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                              2
Linked Data frontends for triple stores




Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/


                          3
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                              4
Sindice: the best RDF search engine




     5
Sindice: the best RDF search engine




•   120M+ documents
•   Continuously updating since 2006
    C ti      l    d ti    i
•   Search API
•   RDF/XML, Turtle, RDFa, microformats




                       6
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                              7
Sitemap Protocol

• Used by web crawlers
• Efficiently find all your content & discover
  what has been updated
             http://sitemaps.org/




A sitemap fil contains i f
   i      file      i information regarding one or more URL on
                               i         di                URLs
   your Web site. The information that is stored there helps search
   engines better spider your website.


                                 8
Sitemap Protocol: Simple example

<?xml version="1.0" encoding="UTF-8"?>
<urlset
   xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://yoursite/</loc>
   </url>
   <url>
         oc ttp://you s te/p oducts/535 6 / oc
      <loc>http://yoursite/products/53546</loc>
   </url>
   <url>
      <loc>http://yoursite/products/98421</loc>
   </url>
   <url>
      <loc>http://yoursite/products/41003</loc>
   </url>
</urlset>


                             9
Sitemap Protocol: Optional parts




<?xml version="1.0" encoding="UTF-8"?>
<urlset
   xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://yoursite/</loc>
      <lastmod>2010-06-24</lastmod>
      <changefreq>daily</changefreq>
      < h    f   >d il </ h    f   >
   </url>
</urlset>




                           10
Sitemap Protocol: Huge sitemaps


• Gzip-compress your sitemap
• Limit: 50k URLs or 10MB
  • split into multiple sitemap files
  • add a sitemap index file




                         11
Sitemap Protocol: Discovery

• Publish the sitemap file

• Add a line to http://yoursite/robots.txt
   •   Web site owners use the /robots.txt file to give instructions about their site
       to web robots; this is called The Robots Exclusion Protocol.




 Sitemap: http://yoursite/sitemap.xml




                                          12
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                             13
sitemap4rdf


• Simple command line tool
• Sends a SPARQL query to list all URIs
• Generates sitemap

sitemap4rdf htt //
 it    4 df http://yoursite/sparql htt //
                        it /     l http://yoursite/resource/
                                               it /        /

Example:

sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/


• run sitemap4rdf specifying th SPARQL endpoint
       it    4 df      if i the               d i t
  and the prefix of the URLs to include in the Sitemap


                                         14
Submit the sitemap location - Sindice

• http://sindice.com/main/submit




                           15
Submit the sitemap location - Google

• https://www.google.com/webmasters/tools/




                         16
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                             17
Summary

• Sitemap protocol informs search engines about
  available pages
   • Supported by Sindice!


• sitemap4rdf generates Sitemap files by listing URIs
  in a SPARQL endpoint
   • Open source, Java
   • http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
   • http://mccarthy dia fi upm es/sitemap4rdf/
     http://mccarthy.dia.fi.upm.es/sitemap4rdf/
   • http://www.oeg-upm.net/index.php/en/downloads/122-sitemap4rdf




                                 18
ToC



•   Publishing Linked Data from a triple store
•   Search engines
•   The Sitemap protocol
•   sitemap4rdf
•   Summary
    S
•   Future work




                             19
Future Work

• Integrate sitemap4rdf with Pubby

• Generate voiD file automatically from a SPARQL
  endpoint

• Generate an entry in CKAN (registry of open
  knowledge packages) automatically through CKAN-
  API
   • http://ckan net/package/geolinkeddata
     http://ckan.net/package/geolinkeddata


• Interact with prefix cc ( service for remembering and
                prefix.cc
  looking up URI prefixes) through its API
   • geoes: < http://geo.linkeddata.es/ontology>
              http://geo.linkeddata.es/ontology

                                20
Future Work

• Support the semantic sitemap extension (when it will
  be compatible with google)
   • http://sw.deri.org/2007/07/sitemapextension/




                                21
sitemap4rdf
generate Sitemap files from a SPARQL
              endpoint
          http://www.deri.ie/
          http://www deri ie/




     Boris Villazón-Terrazas and Richard Cyganiak (DERI)
    Facultad de Informática, Universidad Politécnica de Madrid
  Campus de Montegancedo sn 28660 Boadilla del Monte Madrid
                             sn,                   Monte,
                     http://www.oeg-upm.net
           Phone: 34.91.3366605, Fax: 34.91.3524819

More Related Content

What's hot

New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...Nuno Freire
 
Map4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial DatasetsMap4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial DatasetsBoris Villazón-Terrazas
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Asuncion Gomez-Perez
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
 
Web Archive Research Skills and Tools Survey (WARST)
 Web Archive Research Skills and Tools Survey (WARST) Web Archive Research Skills and Tools Survey (WARST)
Web Archive Research Skills and Tools Survey (WARST)WARCnet
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
WG5: A data wrangling experiment
WG5: A data wrangling experimentWG5: A data wrangling experiment
WG5: A data wrangling experimentWARCnet
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
RDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCRDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCDiane Hillmann
 
:me owl:sameAs flickr:33669349@N00 .
:me owl:sameAs flickr:33669349@N00 .:me owl:sameAs flickr:33669349@N00 .
:me owl:sameAs flickr:33669349@N00 .Alexandre Passant
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Fabrizio Orlandi
 
Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021WARCnet
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataAI4BD GmbH
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Gill Hamilton
 
The European ALIADA project : introduction
The European ALIADA project : introductionThe European ALIADA project : introduction
The European ALIADA project : introductionaliada project
 

What's hot (20)

New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
Map4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial DatasetsMap4rdf - Faceted Browser for Geospatial Datasets
Map4rdf - Faceted Browser for Geospatial Datasets
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
 
Web Archive Research Skills and Tools Survey (WARST)
 Web Archive Research Skills and Tools Survey (WARST) Web Archive Research Skills and Tools Survey (WARST)
Web Archive Research Skills and Tools Survey (WARST)
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
WG5: A data wrangling experiment
WG5: A data wrangling experimentWG5: A data wrangling experiment
WG5: A data wrangling experiment
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Web Archiving: Description and Access
Web Archiving: Description and AccessWeb Archiving: Description and Access
Web Archiving: Description and Access
 
LOD技術解説
LOD技術解説LOD技術解説
LOD技術解説
 
RDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCRDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARC
 
:me owl:sameAs flickr:33669349@N00 .
:me owl:sameAs flickr:33669349@N00 .:me owl:sameAs flickr:33669349@N00 .
:me owl:sameAs flickr:33669349@N00 .
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
Finding Data Sets
Finding Data SetsFinding Data Sets
Finding Data Sets
 
Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021Maurer Presentation - WARCnet Spring Meeting 2021
Maurer Presentation - WARCnet Spring Meeting 2021
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open Data
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3Open Knowledge Foundation Edinburgh meet-up #3
Open Knowledge Foundation Edinburgh meet-up #3
 
The European ALIADA project : introduction
The European ALIADA project : introductionThe European ALIADA project : introduction
The European ALIADA project : introduction
 

Viewers also liked

SEEMP - Semantic Aspects and Interoperability
SEEMP - Semantic Aspects and InteroperabilitySEEMP - Semantic Aspects and Interoperability
SEEMP - Semantic Aspects and InteroperabilityBoris Villazón-Terrazas
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
Towards a Commons RDF Java library
Towards a Commons RDF Java libraryTowards a Commons RDF Java library
Towards a Commons RDF Java librarySergio Fernández
 
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...Boris Villazón-Terrazas
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
Linguistic resources enhanced with geospatial Information
Linguistic resources enhanced with geospatial InformationLinguistic resources enhanced with geospatial Information
Linguistic resources enhanced with geospatial InformationBoris Villazón-Terrazas
 

Viewers also liked (11)

SEEMP - Semantic Aspects and Interoperability
SEEMP - Semantic Aspects and InteroperabilitySEEMP - Semantic Aspects and Interoperability
SEEMP - Semantic Aspects and Interoperability
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
Towards a Commons RDF Java library
Towards a Commons RDF Java libraryTowards a Commons RDF Java library
Towards a Commons RDF Java library
 
Yet another SPARQL 1.1 brief introduction
Yet another SPARQL 1.1 brief introductionYet another SPARQL 1.1 brief introduction
Yet another SPARQL 1.1 brief introduction
 
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
A Method for Reusing and Re-engineering Non-ontological Resources for Buildin...
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
Geolinkeddata 07042011 1
Geolinkeddata 07042011 1Geolinkeddata 07042011 1
Geolinkeddata 07042011 1
 
Linguistic resources enhanced with geospatial Information
Linguistic resources enhanced with geospatial InformationLinguistic resources enhanced with geospatial Information
Linguistic resources enhanced with geospatial Information
 
Ecuadorian Geospatial Linked Data
Ecuadorian Geospatial Linked Data Ecuadorian Geospatial Linked Data
Ecuadorian Geospatial Linked Data
 
iSOCO - Research Lab Brief Introduction
iSOCO - Research Lab Brief IntroductioniSOCO - Research Lab Brief Introduction
iSOCO - Research Lab Brief Introduction
 
Data Shapes and Data Transformations
Data Shapes and Data TransformationsData Shapes and Data Transformations
Data Shapes and Data Transformations
 

Similar to Sitemap4rdf(v2 boris)

How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfRichard Cyganiak
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDFscorlosquet
 
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agricultureValeria Pesce
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Cloudsinovex GmbH
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Linked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaLinked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaThomas Kurz
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Juan Sequeda
 
IAA Life in Lockdown series: Securing Internet Routing
IAA Life in Lockdown series: Securing Internet RoutingIAA Life in Lockdown series: Securing Internet Routing
IAA Life in Lockdown series: Securing Internet RoutingAPNIC
 
Semantic web and Drupal: an introduction
Semantic web and Drupal: an introductionSemantic web and Drupal: an introduction
Semantic web and Drupal: an introductionKristof Van Tomme
 
Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDamian T. Gordon
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsCommand Prompt., Inc
 
How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.Diep Nguyen
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 

Similar to Sitemap4rdf(v2 boris) (20)

How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
El Punto Neutro de Internet en Cataluña
El Punto Neutro de Internet en CataluñaEl Punto Neutro de Internet en Cataluña
El Punto Neutro de Internet en Cataluña
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
 
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Clouds
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Linked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaLinked Media Management with Apache Marmotta
Linked Media Management with Apache Marmotta
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
IAA Life in Lockdown series: Securing Internet Routing
IAA Life in Lockdown series: Securing Internet RoutingIAA Life in Lockdown series: Securing Internet Routing
IAA Life in Lockdown series: Securing Internet Routing
 
2012 09 caas-ag_infra
2012 09 caas-ag_infra2012 09 caas-ag_infra
2012 09 caas-ag_infra
 
Semantic web and Drupal: an introduction
Semantic web and Drupal: an introductionSemantic web and Drupal: an introduction
Semantic web and Drupal: an introduction
 
Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web Scraping
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web Applications
 
How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.How to scraping content from web for location-based mobile app.
How to scraping content from web for location-based mobile app.
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 

More from Boris Villazón-Terrazas

RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingBoris Villazón-Terrazas
 
Linked Data Projects at OEG - Current Status
Linked Data Projects at OEG - Current StatusLinked Data Projects at OEG - Current Status
Linked Data Projects at OEG - Current StatusBoris Villazón-Terrazas
 
A Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationA Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationBoris Villazón-Terrazas
 
Lightweight Semantic Annotation of Geospatial RESTful Services
Lightweight Semantic Annotation of Geospatial RESTful ServicesLightweight Semantic Annotation of Geospatial RESTful Services
Lightweight Semantic Annotation of Geospatial RESTful ServicesBoris Villazón-Terrazas
 
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseAn Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseBoris Villazón-Terrazas
 

More from Boris Villazón-Terrazas (7)

RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
 
Statistical Linked Data
Statistical Linked DataStatistical Linked Data
Statistical Linked Data
 
Linked Data Projects at OEG - Current Status
Linked Data Projects at OEG - Current StatusLinked Data Projects at OEG - Current Status
Linked Data Projects at OEG - Current Status
 
A Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationA Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and Organization
 
Lightweight Semantic Annotation of Geospatial RESTful Services
Lightweight Semantic Annotation of Geospatial RESTful ServicesLightweight Semantic Annotation of Geospatial RESTful Services
Lightweight Semantic Annotation of Geospatial RESTful Services
 
Geometry2rdf(v2 boris)
Geometry2rdf(v2 boris)Geometry2rdf(v2 boris)
Geometry2rdf(v2 boris)
 
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseAn Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Sitemap4rdf(v2 boris)

  • 1. sitemap4rdf generate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819
  • 2. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 2
  • 3. Linked Data frontends for triple stores Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/ 3
  • 4. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 4
  • 5. Sindice: the best RDF search engine 5
  • 6. Sindice: the best RDF search engine • 120M+ documents • Continuously updating since 2006 C ti l d ti i • Search API • RDF/XML, Turtle, RDFa, microformats 6
  • 7. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 7
  • 8. Sitemap Protocol • Used by web crawlers • Efficiently find all your content & discover what has been updated http://sitemaps.org/ A sitemap fil contains i f i file i information regarding one or more URL on i di URLs your Web site. The information that is stored there helps search engines better spider your website. 8
  • 9. Sitemap Protocol: Simple example <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> oc ttp://you s te/p oducts/535 6 / oc <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url> </urlset> 9
  • 10. Sitemap Protocol: Optional parts <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> < h f >d il </ h f > </url> </urlset> 10
  • 11. Sitemap Protocol: Huge sitemaps • Gzip-compress your sitemap • Limit: 50k URLs or 10MB • split into multiple sitemap files • add a sitemap index file 11
  • 12. Sitemap Protocol: Discovery • Publish the sitemap file • Add a line to http://yoursite/robots.txt • Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. Sitemap: http://yoursite/sitemap.xml 12
  • 13. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 13
  • 14. sitemap4rdf • Simple command line tool • Sends a SPARQL query to list all URIs • Generates sitemap sitemap4rdf htt // it 4 df http://yoursite/sparql htt // it / l http://yoursite/resource/ it / / Example: sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/ • run sitemap4rdf specifying th SPARQL endpoint it 4 df if i the d i t and the prefix of the URLs to include in the Sitemap 14
  • 15. Submit the sitemap location - Sindice • http://sindice.com/main/submit 15
  • 16. Submit the sitemap location - Google • https://www.google.com/webmasters/tools/ 16
  • 17. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 17
  • 18. Summary • Sitemap protocol informs search engines about available pages • Supported by Sindice! • sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint • Open source, Java • http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ • http://mccarthy dia fi upm es/sitemap4rdf/ http://mccarthy.dia.fi.upm.es/sitemap4rdf/ • http://www.oeg-upm.net/index.php/en/downloads/122-sitemap4rdf 18
  • 19. ToC • Publishing Linked Data from a triple store • Search engines • The Sitemap protocol • sitemap4rdf • Summary S • Future work 19
  • 20. Future Work • Integrate sitemap4rdf with Pubby • Generate voiD file automatically from a SPARQL endpoint • Generate an entry in CKAN (registry of open knowledge packages) automatically through CKAN- API • http://ckan net/package/geolinkeddata http://ckan.net/package/geolinkeddata • Interact with prefix cc ( service for remembering and prefix.cc looking up URI prefixes) through its API • geoes: < http://geo.linkeddata.es/ontology> http://geo.linkeddata.es/ontology 20
  • 21. Future Work • Support the semantic sitemap extension (when it will be compatible with google) • http://sw.deri.org/2007/07/sitemapextension/ 21
  • 22. sitemap4rdf generate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819