SlideShare une entreprise Scribd logo
1  sur  17
Linking the Open
      Data?
         Petko Valtchev
   (Assoc. Prof., Dept. of CS, UQAM)
              ODX’13
          Montreal, April 6th
Why Link The Data
                                                     “I want you to put your data on the Web.”
                                                                      Sir T. Berners-Lee (TED’07)


•Original Web (1990s):
   • network of linked documents
•Web of Data (2000s):
   • network of interlinked data items
•Linked Open Data: Publish data on the Web:
   • max. reuse and inter-connections, min. redundancy, network effect
                     Data is really useful, whenever it is shared and combined with other data.
Linking Data?
•   But how should one produce such data?
    1. Global identification: a URL should point to any data item.

    2. Reachability via HTTP: accessing the URL should retrieve the data
       item.

    3. Linked structure: outgoing links (typed!) in the data should point to
       additional data with URLs.

                                            http://www.w3.org/DesignIssues/LinkedData.html




•   THE language : Resource Description Framework (RDF)
    1. benefits: links provide context
A Graph?
                       rdf:type
           pd:tedstr
           pd:tedstr              foaf:Person
                                   foaf:Person
                  foaf:name
                              Ted Strauss
                              Ted Strauss
                foaf:based_near
                                dbpedia:Montre
                                dbpedia:Montre
                                      al
                                      al
                                             dpprop:
                                             population

                                           3,407,963
                                           3,407,963
A Graph?
                       rdf:type
           pd:tedstr
           pd:tedstr              foaf:Person
                                   foaf:Person

                        foaf:name
                                        Ted Strauss
                                        Ted Strauss
           foaf:based_near
                               dbpedia:Montreal
                               dbpedia:Montreal
                                                dpprop:
                        dbpedia-owl:country     population

                                              3,407,963
                                              3,407,963

                                    dbpedia:Canada
                                     dbpedia:Canada
A Graph? Global?
                                                         rdf:type
                                             pd:tedstr
                                             pd:tedstr              foaf:Person
                                                                     foaf:Person
                   foaf:knows
                                                          foaf:name
               rdf:type                                                   Ted Strauss
                                                                          Ted Strauss
pd:linguo
 pd:linguo                   foaf:Person
                              foaf:Person    foaf:based_near
                     foaf:name                                   dbpedia:Montreal
                                                                 dbpedia:Montreal
                                                                                  dpprop:
                               Linkun Guo
                                Linkun Guo
 foaf:based_near                                          dbpedia-owl:country     population

                          dbpedia:Beijing
                           dbpedia:Beijing
                                                                                3,407,963
                                                                                3,407,963

             dpprop:population                                        dbpedia:Canada
                                                                       dbpedia:Canada
                            20,693,000
                             20,693,000
A Graph? Global? Giant?
                                                                   rdf:type
                                                pd:tedstr
                                                pd:tedstr                               foaf:Person
                                                                                         foaf:Person
                   foaf:knows
                                                                     foaf:name
               rdf:type                                                                      Ted Strauss
                                                                                             Ted Strauss
pd:linguo
 pd:linguo                   foaf:Person
                              foaf:Person      foaf:based_near
                     foaf:name                                              dbpedia:Montreal
                                                                            dbpedia:Montreal
                                                                                             dpprop:
                               Linkun Guo
                                Linkun Guo
 foaf:based_near                                                     dbpedia-owl:country     population

                          dbpedia:Beijing
                           dbpedia:Beijing
                                                                                                   3,407,963
                                                                                                   3,407,963
                                                                dbpedia-owl:country
                                             dbpedia:Toronto
                                              dbpedia:Toronto

             dpprop:population                                                           dbpedia:Canada
                                                                                          dbpedia:Canada
                            20,693,000
                             20,693,000       dbpedia:Quebec
                                               dbpedia:Quebec     dbpedia-owl:country
How is it Open ?
•   ‘‘If you want to start interlinking data then you can only do that if the data is licensed
    in a way that allows such interlinking.’’
                                                                               Rufus Pollock


•   But why is Open data on the Web not ‘linked’?
    •   CVS, XML, RDBs
        •   no easy integration

    •   Web 2.0 Mashups?
        •   data sources fixed

•   Linked Open Data (LOD) cloud - global data space
The LOD cloud family picture
Sept. 2011
What for?
•   Linking Open Drug Data (LODD), since 2008
    •   Publish/interlink publicly available data about drugs

    •   Provide answers to non trivial questions on the LODD

        •   For physicians
            •   Which are the equivalent drugs for a given condition?

            •   What drugs are currently under clinical trial?

        •   For patients
            •   What alternatives exist to a given drug?

            •   What are the contraindications for a drug?
Supplemental Slides
          Petko Valtchev
   (Assoc. Prof., Dept. of CS, UQAM)


               ODX’13
          Montreal, April 6th
Main Entry Points into the LOD cloud
•   DBPedia - a large multi-domain dataset containing extracted data from
    Wikipedia; it contains about 3.77M concepts, 400+M facts with abstracts in 11
    different languages.

•   YAGO - precise knowledge base with 1.7M entities and 15M facts derived
    from Wikipedia and WordNet.

•   FOAF (Friend Of A Friend) - describes people, the links between them and
    the things they create and do.

•   GoodRelations - a vocabulary for eCommerce, enabling web sites to publish
    details of their products and services in a machine-readable way.

•   GeoNames - provides RDF descriptions of more than 6.5M geographical
    features worldwide.
Cross-Media Cultural Heritage Management with LOD
•   Simon is a Maths student visiting Montreal. He is fond of reading, cinema, music and history. His friends
    recommended him the flourishing Mile End district where many cafés serve espresso and european pastry.
•   Once settled down in a bar, he opens his iPad to look what is exciting about the surroundings. Knowing his
    preferences, the mobile app suggests him an excerpt from a novel written by the local "infant du quarter",
    Mordecai Richler, called "The Apprenticeship of Duddy Kravitz". The excerpt describes the life of the Jewish
    community on two of the area's principal streets, St Urban St., and "The Main" St. in the 1930s.
•   Once finished, Simon feels intrigued and accepts the suggestion to go for a short walk looking for remains
    from that period. While sipping his coffee, Simon checks the author's biography and finds he has written
    another book, "Barney's Version".
•   After screening a summary, it is suggested to look at the eponimous film directed by Richard J. Lewis. While
    watching a trailer, he noticed the youthful red-haired actress playing the 1st wife of the main character and
    after querying the app’s knowledge base he learns that's Rachelle Lefevre who's born in Montreal.
•   Before walking out, he checks the availability of a copy of "Barney's Version" and discovers that he can find
    one in the local municipal library.
•   When on the go, the system plays "I'm your man" a song by Leonard Cohen, another literary celebrity from
    Montreal.
The Semantic Annotations : RDFa
•   RDFa serializes RDF through HTML attributes

     •   similar to microformats

     •   @resource, @property, @href, @instanceof, @rel, etc.
Cool applications of semantic annotations

    •   Semantic query answering:
        •   Where do my colleagues live?
            •   Possible answers from their own web pages (via Trudat HP)

                •   dbpedia:Montreal

                •   dbpedia:Laval

                •   dbpedia:Toronto

        •   What are their dietary restrictions?
Practical take on OD vs LOD
•   OD for social justice in US (say Atlanta)?
    •   Dataset 1: census data
        •   Focus on particular area with houses distinguished
            •   inhabited by black people vs white people

    •   Dataset 2: water supply data, houses connected to water lines or not
•   By superposing datasets 1 and 2, analysis uncovered a discrimination
    •   ~83 % of the unconnected houses were inhabited by black people!!!

•   How was it done (a guess)
    •   matching between addresses as strings compared :-(

•   LOD format - simpler and more reliable processing:
    •   finding paths in the graph
Data about the Data
•   Reasoning about the dataset:
    •   Metadata:
        •   e.g. Dublin core vocabulary




•   Notion of provenance
    •   The problem of trust: everybody could publish everything

Contenu connexe

Tendances

Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDFJie Bao
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithSanjiv Kawa
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithSanjiv Kawa
 
Creating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFCreating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFdonaldlsmithjr
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
 
SPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiativeSPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiativeFulvio Corno
 
FOAF for Social Network Portability
FOAF for Social Network PortabilityFOAF for Social Network Portability
FOAF for Social Network PortabilityUldis Bojars
 
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesMULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesKemele M. Endris
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes senseFabien Gandon
 
Linked Data Modeling for Beginner
Linked Data Modeling for BeginnerLinked Data Modeling for Beginner
Linked Data Modeling for BeginnerMyungjin Lee
 
RDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsRDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsLouise Spiteri
 

Tendances (19)

Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 
Linked Data:Libraries and Beyond
Linked Data:Libraries and BeyondLinked Data:Libraries and Beyond
Linked Data:Libraries and Beyond
 
NISO Webinar: MARC and FRBR: Friends or Foes?
NISO Webinar: MARC and FRBR: Friends or Foes?NISO Webinar: MARC and FRBR: Friends or Foes?
NISO Webinar: MARC and FRBR: Friends or Foes?
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 
RDA and the Semantic Web
RDA and the Semantic WebRDA and the Semantic Web
RDA and the Semantic Web
 
Creating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFCreating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDF
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
SPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiativeSPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiative
 
FOAF for Social Network Portability
FOAF for Social Network PortabilityFOAF for Social Network Portability
FOAF for Social Network Portability
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesMULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
 
Why Link?
Why Link?Why Link?
Why Link?
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
 
Linked Data Modeling for Beginner
Linked Data Modeling for BeginnerLinked Data Modeling for Beginner
Linked Data Modeling for Beginner
 
RDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dotsRDA, FRBR, and FRAD: Connecting the dots
RDA, FRBR, and FRAD: Connecting the dots
 
Web 3.0 w teorii i praktyce
Web 3.0 w teorii i praktyceWeb 3.0 w teorii i praktyce
Web 3.0 w teorii i praktyce
 

Plus de Trudat

How journalists and open data folks can better work together. Roberto Rocha
How journalists and open data folks can better work together.  Roberto RochaHow journalists and open data folks can better work together.  Roberto Rocha
How journalists and open data folks can better work together. Roberto RochaTrudat
 
Open Data Startups. Heri Rakotomalala, @heri
Open Data Startups. Heri Rakotomalala, @heriOpen Data Startups. Heri Rakotomalala, @heri
Open Data Startups. Heri Rakotomalala, @heriTrudat
 
Montreal 1947, From Above. Anton Dubrau. cat-bus.com
Montreal 1947, From Above. Anton Dubrau. cat-bus.comMontreal 1947, From Above. Anton Dubrau. cat-bus.com
Montreal 1947, From Above. Anton Dubrau. cat-bus.comTrudat
 
Open Data in Neuroscience, Trevor Bekolay
Open Data in Neuroscience, Trevor BekolayOpen Data in Neuroscience, Trevor Bekolay
Open Data in Neuroscience, Trevor BekolayTrudat
 
Ecohack pitch. Alex Aylett
Ecohack pitch. Alex AylettEcohack pitch. Alex Aylett
Ecohack pitch. Alex AylettTrudat
 
3. gendreau, christian
3. gendreau, christian3. gendreau, christian
3. gendreau, christianTrudat
 
Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...
Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...
Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...Trudat
 
Open Data in Science. Corey Chivers
Open Data in Science. Corey ChiversOpen Data in Science. Corey Chivers
Open Data in Science. Corey ChiversTrudat
 
Visualizing Government Data. Sebastien Pierre, FFunction, ffctn.com
Visualizing Government Data. Sebastien Pierre, FFunction, ffctn.comVisualizing Government Data. Sebastien Pierre, FFunction, ffctn.com
Visualizing Government Data. Sebastien Pierre, FFunction, ffctn.comTrudat
 
Value of Open Data: IATI. Michael Roberts
Value of Open Data: IATI. Michael RobertsValue of Open Data: IATI. Michael Roberts
Value of Open Data: IATI. Michael RobertsTrudat
 
Open Data business examples - Michael Lenczner
Open Data business examples - Michael LencznerOpen Data business examples - Michael Lenczner
Open Data business examples - Michael LencznerTrudat
 

Plus de Trudat (11)

How journalists and open data folks can better work together. Roberto Rocha
How journalists and open data folks can better work together.  Roberto RochaHow journalists and open data folks can better work together.  Roberto Rocha
How journalists and open data folks can better work together. Roberto Rocha
 
Open Data Startups. Heri Rakotomalala, @heri
Open Data Startups. Heri Rakotomalala, @heriOpen Data Startups. Heri Rakotomalala, @heri
Open Data Startups. Heri Rakotomalala, @heri
 
Montreal 1947, From Above. Anton Dubrau. cat-bus.com
Montreal 1947, From Above. Anton Dubrau. cat-bus.comMontreal 1947, From Above. Anton Dubrau. cat-bus.com
Montreal 1947, From Above. Anton Dubrau. cat-bus.com
 
Open Data in Neuroscience, Trevor Bekolay
Open Data in Neuroscience, Trevor BekolayOpen Data in Neuroscience, Trevor Bekolay
Open Data in Neuroscience, Trevor Bekolay
 
Ecohack pitch. Alex Aylett
Ecohack pitch. Alex AylettEcohack pitch. Alex Aylett
Ecohack pitch. Alex Aylett
 
3. gendreau, christian
3. gendreau, christian3. gendreau, christian
3. gendreau, christian
 
Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...
Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...
Thinking infrastructurally. Panel 3: Future Avenues for Open Data. Tracey P. ...
 
Open Data in Science. Corey Chivers
Open Data in Science. Corey ChiversOpen Data in Science. Corey Chivers
Open Data in Science. Corey Chivers
 
Visualizing Government Data. Sebastien Pierre, FFunction, ffctn.com
Visualizing Government Data. Sebastien Pierre, FFunction, ffctn.comVisualizing Government Data. Sebastien Pierre, FFunction, ffctn.com
Visualizing Government Data. Sebastien Pierre, FFunction, ffctn.com
 
Value of Open Data: IATI. Michael Roberts
Value of Open Data: IATI. Michael RobertsValue of Open Data: IATI. Michael Roberts
Value of Open Data: IATI. Michael Roberts
 
Open Data business examples - Michael Lenczner
Open Data business examples - Michael LencznerOpen Data business examples - Michael Lenczner
Open Data business examples - Michael Lenczner
 

Dernier

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Linking the Open Data? by Petko Valtchev

  • 1. Linking the Open Data? Petko Valtchev (Assoc. Prof., Dept. of CS, UQAM) ODX’13 Montreal, April 6th
  • 2. Why Link The Data “I want you to put your data on the Web.” Sir T. Berners-Lee (TED’07) •Original Web (1990s): • network of linked documents •Web of Data (2000s): • network of interlinked data items •Linked Open Data: Publish data on the Web: • max. reuse and inter-connections, min. redundancy, network effect Data is really useful, whenever it is shared and combined with other data.
  • 3. Linking Data? • But how should one produce such data? 1. Global identification: a URL should point to any data item. 2. Reachability via HTTP: accessing the URL should retrieve the data item. 3. Linked structure: outgoing links (typed!) in the data should point to additional data with URLs. http://www.w3.org/DesignIssues/LinkedData.html • THE language : Resource Description Framework (RDF) 1. benefits: links provide context
  • 4. A Graph? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:name Ted Strauss Ted Strauss foaf:based_near dbpedia:Montre dbpedia:Montre al al dpprop: population 3,407,963 3,407,963
  • 5. A Graph? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:name Ted Strauss Ted Strauss foaf:based_near dbpedia:Montreal dbpedia:Montreal dpprop: dbpedia-owl:country population 3,407,963 3,407,963 dbpedia:Canada dbpedia:Canada
  • 6. A Graph? Global? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:knows foaf:name rdf:type Ted Strauss Ted Strauss pd:linguo pd:linguo foaf:Person foaf:Person foaf:based_near foaf:name dbpedia:Montreal dbpedia:Montreal dpprop: Linkun Guo Linkun Guo foaf:based_near dbpedia-owl:country population dbpedia:Beijing dbpedia:Beijing 3,407,963 3,407,963 dpprop:population dbpedia:Canada dbpedia:Canada 20,693,000 20,693,000
  • 7. A Graph? Global? Giant? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:knows foaf:name rdf:type Ted Strauss Ted Strauss pd:linguo pd:linguo foaf:Person foaf:Person foaf:based_near foaf:name dbpedia:Montreal dbpedia:Montreal dpprop: Linkun Guo Linkun Guo foaf:based_near dbpedia-owl:country population dbpedia:Beijing dbpedia:Beijing 3,407,963 3,407,963 dbpedia-owl:country dbpedia:Toronto dbpedia:Toronto dpprop:population dbpedia:Canada dbpedia:Canada 20,693,000 20,693,000 dbpedia:Quebec dbpedia:Quebec dbpedia-owl:country
  • 8. How is it Open ? • ‘‘If you want to start interlinking data then you can only do that if the data is licensed in a way that allows such interlinking.’’ Rufus Pollock • But why is Open data on the Web not ‘linked’? • CVS, XML, RDBs • no easy integration • Web 2.0 Mashups? • data sources fixed • Linked Open Data (LOD) cloud - global data space
  • 9. The LOD cloud family picture Sept. 2011
  • 10. What for? • Linking Open Drug Data (LODD), since 2008 • Publish/interlink publicly available data about drugs • Provide answers to non trivial questions on the LODD • For physicians • Which are the equivalent drugs for a given condition? • What drugs are currently under clinical trial? • For patients • What alternatives exist to a given drug? • What are the contraindications for a drug?
  • 11. Supplemental Slides Petko Valtchev (Assoc. Prof., Dept. of CS, UQAM) ODX’13 Montreal, April 6th
  • 12. Main Entry Points into the LOD cloud • DBPedia - a large multi-domain dataset containing extracted data from Wikipedia; it contains about 3.77M concepts, 400+M facts with abstracts in 11 different languages. • YAGO - precise knowledge base with 1.7M entities and 15M facts derived from Wikipedia and WordNet. • FOAF (Friend Of A Friend) - describes people, the links between them and the things they create and do. • GoodRelations - a vocabulary for eCommerce, enabling web sites to publish details of their products and services in a machine-readable way. • GeoNames - provides RDF descriptions of more than 6.5M geographical features worldwide.
  • 13. Cross-Media Cultural Heritage Management with LOD • Simon is a Maths student visiting Montreal. He is fond of reading, cinema, music and history. His friends recommended him the flourishing Mile End district where many cafés serve espresso and european pastry. • Once settled down in a bar, he opens his iPad to look what is exciting about the surroundings. Knowing his preferences, the mobile app suggests him an excerpt from a novel written by the local "infant du quarter", Mordecai Richler, called "The Apprenticeship of Duddy Kravitz". The excerpt describes the life of the Jewish community on two of the area's principal streets, St Urban St., and "The Main" St. in the 1930s. • Once finished, Simon feels intrigued and accepts the suggestion to go for a short walk looking for remains from that period. While sipping his coffee, Simon checks the author's biography and finds he has written another book, "Barney's Version". • After screening a summary, it is suggested to look at the eponimous film directed by Richard J. Lewis. While watching a trailer, he noticed the youthful red-haired actress playing the 1st wife of the main character and after querying the app’s knowledge base he learns that's Rachelle Lefevre who's born in Montreal. • Before walking out, he checks the availability of a copy of "Barney's Version" and discovers that he can find one in the local municipal library. • When on the go, the system plays "I'm your man" a song by Leonard Cohen, another literary celebrity from Montreal.
  • 14. The Semantic Annotations : RDFa • RDFa serializes RDF through HTML attributes • similar to microformats • @resource, @property, @href, @instanceof, @rel, etc.
  • 15. Cool applications of semantic annotations • Semantic query answering: • Where do my colleagues live? • Possible answers from their own web pages (via Trudat HP) • dbpedia:Montreal • dbpedia:Laval • dbpedia:Toronto • What are their dietary restrictions?
  • 16. Practical take on OD vs LOD • OD for social justice in US (say Atlanta)? • Dataset 1: census data • Focus on particular area with houses distinguished • inhabited by black people vs white people • Dataset 2: water supply data, houses connected to water lines or not • By superposing datasets 1 and 2, analysis uncovered a discrimination • ~83 % of the unconnected houses were inhabited by black people!!! • How was it done (a guess) • matching between addresses as strings compared :-( • LOD format - simpler and more reliable processing: • finding paths in the graph
  • 17. Data about the Data • Reasoning about the dataset: • Metadata: • e.g. Dublin core vocabulary • Notion of provenance • The problem of trust: everybody could publish everything

Notes de l'éditeur

  1. “ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. ”
  2. A way of publishing data on the web that: Encourages reuse Reduces redundancy Maximises inter-connectedness Enables network effects Many ways to introduce the Linked Open Data Links provide a context, and context is important for proper processing.
  3. Global identifier: URI Access to data: via HTTP Data model: RDF (a graph)
  4. A graph of resources Vertices and edges are typed by terms provided in vocabularies: vocabularies are published in an open and distributed fashion. They can be mixed at will Moreover, the vocabulary terms are also resources (identified via URIs) Like in XML namespaces, shortcuts (prefixes) are used to avoid overloading the code with long URSLs FOAF is a vocabulary (schema) for representing people in the way linkedIn sees them DBpedia is an RDF version of Wikipedia: pages are translated into structured data
  5. A graph of resources Vertices and edges are typed by terms provided in vocabularies: vocabularies are published in an open and distributed fashion. They can be mixed at will Moreover, the vocabulary terms are also resources (identified via URIs) Like in XML namespaces, shortcuts (prefixes) are used to avoid overloading the code with long URSLs FOAF is a vocabulary (schema) for representing people in the way linkedIn sees them DBpedia is an RDF version of Wikipedia: pages are translated into structured data
  6. A graph of resources Vertices and edges are typed by terms provided in vocabularies: vocabularies are published in an open and distributed fashion. They can be mixed at will Moreover, the vocabulary terms are also resources (identified via URIs) Like in XML namespaces, shortcuts (prefixes) are used to avoid overloading the code with long URSLs FOAF is a vocabulary (schema) for representing people in the way linkedIn sees them DBpedia is an RDF version of Wikipedia: pages are translated into structured data
  7. But haven ’t we been putting linked data on the web for years? In CSV , relational databases, XML etc? Well yes, but these approaches are not so easy to integrate Web 2.0 mashups work against a fixed set of data sources Linked Data applications operate on top of an unbound, global data space.
  8. “ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. ”
  9. TODO: Microformats schema.org
  10. Let us dream a little bit... Once there are RDFa annotations on a good number of page, some more or less interesting questions can be answered directly from the RDFa - aware Web browser Ex. Now I am again Ted and I want to know where my colleagues from Trudat live A much less useless question could be: I want to invite my colleagues for dinner and therefore need to know their dietary restrictions. Instead of phoning them one-by-one or maintain a local database for colleagues and friends, I trust their own RDFa-enabled personal web pages.
  11. First of all, there is no particular semantics to provide for your data to be linked to other available data Think of an example: Remember the example of how open data could support social justice in US? The guy took the census data of an american city (say Atlanta) Focus was on particular area and he distinguished houses  between inhabited by black people inhabited by white people  He also took the water supply data, i.e., which houses were connected to the water lines By superposing the datasets, he discovered that ~83 % of the unconnected houses were inhabited by black people!!! This was a proof of discrimination and a judge (district) Well, what he did is matching between addresses in both datasets: he basically compared strings This is what is all about and you know strings my not always match perfectly :-( In a LOD format, URI (URL) would be assigned to individual addresses, so that there is a unique way of identifying an entity (resource) The processing would have been simpler and more reliable: Finding paths in the graph - using a dedicated query language, SPARQL But, the question is: DO the governments WANT us to have that much INSIGHT and at such a low PRICE?