SlideShare une entreprise Scribd logo
1  sur  23
iSOCO


      Provenance and Trust

      José Manuel Gómez Pérez

Fundations of Trust in the Future Internet
        Future Internet Assembly
        Valencia, 15th April 2010



                            W3C Provenance Incubator Group
Why provenance is important


For the Web Architecture
 - "At the toolbar (menu, whatever) associated with a document there is a button
   marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to
   the Web, 'so how do I know I can trust this information?'. The software then
   goes directly or indirectly back to metainformation about the document, which
   suggests a number of reasons.“-- Tim Berners-Lee, W3C Chair, Web Design
   Issues, September 1997
For Linked Data
 - "Provenance is the number one issue we face when publishing government
   data as linked data for data.gov.uk” -- John Sheridan, UK National Archives,
   data.gov.uk, February 2010
For Science
 - "We need a paradigm that makes it simple [...] to perform and publish
   reproducible computational research. [...] A Reproducible Research
   Environment (RRE) [...] provides computational tools together with the ability
   to automatically track the provenance of data, analyses, and results and to
   package them (or pointers to persistent versions of them) for redistribution."

                                                 W3C Provenance Incubator Group
                                                                                      2
Provenance is…




Records of

Sources of information,
including entities and
processes, involved in
producing or delivering an
artifact
History of subsequent
owners (change of custody)




                             W3C Provenance Incubator Group
                                                              3
Provenance is…




Evidence of authenticity, integrity,
and information quality
Certifies products of good process




                                       W3C Provenance Incubator Group
                                                                        4
Provenance is…




Valuable
Hard to collect and verify
Necessary to assign credit
…and blame

i.e. to establish

Trust
                             W3C Provenance Incubator Group
                                                              5
Provenance representation

   Provenance models define
     - Types of provenance elements
     - Relationships between them
   Provenance represented as a graph
     - Nodes: Pieces of provenance information (Artifacts,
         Processes, Agents)
     - Edges: relate provenance elements to each other
   Additionally: roles, temporal, dependency, causal
   dependency




The Open Provenance
    Model (OPM)




                                                     W3C Provenance Incubator Group
                                                                                      6
Provenance-related vocabularies



         General                   Provenance-specific
DC – Dublin Core Metadata            OPM - Open Provenance Model
Terms                                Provenir
FOAF – Friend of a Friend            PRV - Provenance Vocabulary
SIOC – Semantically-Interlinked      PML - Proof Markup Language
Online Communities
                                     DC - Dublin Core
SWP – Semantic Web Publishing
                                     PREMIS
vocabulary
                                     WOT – Web Of Trust schema
VOID – VOcabulary of Interlinked
Datasets                             PAV - SWAN Provenance
                                     Ontology
                                     SWP - Semantic Web Publishing
                                     Vocabulary
                                     CS - Changeset Vocabulary

                                        W3C Provenance Incubator Group
                                                                         7
A snapshot on provenance


           Uses                     Research areas                        Domains

• Meaningful data              • Scientific workflows            • Science
  integration
                               • Databases (where and            • Open Government and
• Establishing trust when        why provenance)                   Linked Data
  information sources are
  diverse and of varying       • Security (Information           • Web search & use
  quality e.g. in the Web        flow and trust)
                                                                 • Cultural heritage
• Providing justification to   • Justification and
  the conclusions of             Argumentation in KRR
  reasoning processes                                            • Licensing and attribution

                               • Information Retrieval           • Manufacturing
• Attribution (who did           (analysis of information
  what)                          sources)

• Enabling comparison
  and reproducibility of
  processes

                                                        W3C Provenance Incubator Group
                                                                                         8
The Linked Data paradigm


How can we exploit
                                  Tim Berners Lee, 2006 (Design Issues)
 all the available
       data?                      1. Use URIs to identify things
                                      -   Anything, not just documents
                                  2. Use HTTP URIs for people to
Data reuse and remix                 lookup such names
Common flexible and usable APIs       -   Globally unique names
Standard vocabularies to              -   Distributed ownership
describe interlinked datasets     3. Provide useful information in RDF
Tools                                upon URI resolution
Realize the Semantic Web vision   4. Include RDF links to other URIs
                                      -   Enable discovery of related
                                          information




                                          W3C Provenance Incubator Group
                                                                           9
Provenance is key in Linked Data




   Data Lineage
     - The origins of data
     - Related artifacts and actors
   Information quality
     - Timeliness
     - (Semantic) consistency between
       datasets
     - Stable and meaningful data links
   Trust
     - Data authenticity
     - Reliability




  W3C Provenance Incubator Group
                                      10
Provenance is key in Open Government


Open Government Initiative (http://www.whitehouse.gov/open)
  - Release quickly, improve later
  - NSF to develop plan (http://www.nsf.gov/open):
“NSF is developing an Open Goverment Plan, which will serve as the roadmap for our plans
to improve transparency, better integrate public participation and collaboration into our core
mission, and become more innovative and efficient.”

Similar initiative in the UK (data.gov.uk)
"Provenance is the number one issue we face when publishing government data as linked
data for data.gov.uk” -- John Sheridan, UK National Archives

Why is provenance so important
  -   Government data comes from very diverse data sources
  -   Varying quality
  -   Different scope
  -   Different assumptions


                                                         W3C Provenance Incubator Group
                                                                                                 11
Large amounts of data and processes




                 In 2006, the size of the digital universe
Source: IDC
                 was estimated in 161 exabytes
                 3 million times the information in all
                 books ever written
                 In 2010, expected to turn 988
                 exabytes
                 …and all this data is potentially
                 published online
                     W3C Provenance Incubator Group
                                                      12
It’s all about producing and consuming information…




     Automated means
         to evaluate
     Information Quality
        and Trust are
           needed!




                           W3C Provenance Incubator Group
                                                            13
Provenance and Trust in the Web: A real-life example


                                                     • Two fake web sites
Reusing web data without the means that allow        • A fake Wikipedia entry
  contrasting its provenance can be harmful,         • Fake California public
       especially in sensitive domains.
                                                     safety phone numbers
                                                     • Fake local TV station


                                                The hoax caused a 1000-word tome
                                                on Frankfurter Allgemeine Zeitung…
                                                and public apologies from DPA

                                                Trust on Wikipedia misled DPA
                                                In a provenance-aware world, DPA
                                                would have had means based on
                                                data provenance to evaluate trust
                                                  - Bluewater did not exist
                                                  - The Berlin Boys do not exist

                                                   W3C Provenance Incubator Group
                                                                                    14
Trust evaluation: Useful provenance questions




Who created that content
(author/attribution)?
Was the content ever manipulated,
if so by what processes/entities?
Who is providing that content
(repository)?
Can any of the answers to these
questions be verified e.g., through
e-signatures?

                                   W3C Provenance Incubator Group
                                                                    15
Immediate need for provenance at W3C




       Trust-driven                               Others
Web of trust                           Reasoning
 - Making trust judgments               - Attribution of assertions from
   based on provenance                    diverse sources
Social trust                           Linked data
 - Attribution, authority,              - Use of conflicting data of
   propagation                            varying degrees of quality
Social web                             Life sciences and e-Science
 - Privacy and use policies of         at large
   sensitive (personal) data            - Reproducibility of scientific
                                          results


                                         W3C Provenance Incubator Group
                                                                           16
W3C Provenance Group: Charter and goals of the incubator group


         Provide state-of-the-art understanding and develop a
        roadmap for development and possible standardization

Articulate requirements for accessing and reasoning about
provenance information
 - Develop use cases
Identify issues in provenance that are direct concern to the
Semantic Web
 - Articulate relationships with other aspects of Web architecture
Report on state-of-the-art work on provenance
Report on a roadmap for provenance in the Semantic Web
 - Identify starting points for provenance representations
 - Identifying elements of a provenance architecture that would benefit
   from standardization

                                            W3C Provenance Incubator Group
                                                                             17
W3C Provenance Group: Products of the group to date


Group formed in September 2009
 - All information is public: http://www.w3.org/2005/Incubator/prov/wiki
Developed a set of key dimensions for provenance (11/09)
 - Grouped into three major categories: content, management, use
Developed use cases for provenance (12/09)
 - More than 30 use cases, most were improved and curated
Developed requirements for provenance that arise from the
use cases (1/10)
 - User requirements: what is the purpose/use of the provenance
   information
 - Technical requirements: derived from the user requirements
Currently developing state-of-the-art report (expected 6/10)


                                            W3C Provenance Incubator Group
                                                                             18
Scientific and technical challenges of provenance




Provenance information needs to be :

  Represented
  Captured and recorded
  Stored and secured, queried, and reasoned about
  Visualized and browsed




                                        W3C Provenance Incubator Group
                                                                         19
Scientific and technical challenges of provenance (II)


Vocabularies for representation of provenance content
 - Need representations of processes (workflows), entities,roles, data collections,
   meta-assertions, etc.
 - The Open Provenance Model (OPM): http://twiki.ipaw.info/bin/view/OPM
Granularity of provenance records
 - How much detail is useful, manageable/scalable in practice?
      • Size of provenance can be orders of magnitude larger than base data
Provenance evaluation for Information Quality and Trust Management
Evolution and updates
 - Shelf timeliness of data
      • Determine when data becomes obsolete based on provenance info
 -   Versioning of data sources
      • Relate updates of data based on provenance info
 - Reproducibility of processes
Provenance-aware visualization, navigation, and resource consumption


                                                    W3C Provenance Incubator Group
                                                                                      20
Scientific and technical challenges of Provenance & Trust


Policies based on provenance information
 - Association-based policies
     • Source is NYT, source cites NYT
     • Source is cited in Wikipedia
 - Bias-based policies
     • Source is an oil company
 - Distrust policies
     • Source is a blog
Policies may be restricted to a context
 - Topic of search, topics of page, tags of page
Trust policies may be shared across users
 - Like bookmarks in del.icio.us



                                               W3C Provenance Incubator Group
                                                                                21
Socioeconomic challenges



How to incentivize provenance take-up in the Web
architecture so that content and service providers move
into a provenance-aware paradigm?
 - Generation of provenance metadata by content providers
 - Consumption of provenance metadata by infrastructure and
   applications like search engines, browsers, etc.

Objective evidence of trust in online data or service allows
 -     Ranking increase in search engines
 -     Attracting internet traffic
 -     Increasing automation in data and service consumption
But, can you can trust such provenance information itself?
     - Non-forgery of provenance metadata
        • Authoritative agencies
                                             W3C Provenance Incubator Group
                                                                              22
José Manuel Gómez-Pérez
R&D Director
T +34913349778
                                                Thanks for
M +34609077103
jmgomez@isoco.com                             your attention!
                                     iSOCO
   Para obtener más información sobre como               puede
 ayudar a su empresa a optimizar sus negocios digitales y aportar
            una solución innovadora, contáctenos en
                       www.            .com
              Barcelona                 Madrid                         Valencia
Tel +34 93 5677200         +34 91 3349797               +34 96 3467143
Edificio Testa A           C/Pedro de Valdivia, 10      Oficina 107
C/ Alcalde Barnils 64-68   28006 Madrid                 C/ Prof. Beltrán Báguena 4,
St. Cugat del Vallès                                    46009 Valencia
08174 Barcelona




                                                     W3C Provenance Incubator Group
                                                                                      23

Contenu connexe

Tendances

Www 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios HatzisWww 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios HatzisIgnite_Athens
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...Stichting ePortfolio Support
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Helen Mitchell
 
From WWW to GGG Ignite Athens 2012
From WWW to GGG Ignite Athens 2012From WWW to GGG Ignite Athens 2012
From WWW to GGG Ignite Athens 2012healis
 
Cni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferiesCni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferiesBDLSS
 

Tendances (7)

Www 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios HatzisWww 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios Hatzis
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCube
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 
From WWW to GGG Ignite Athens 2012
From WWW to GGG Ignite Athens 2012From WWW to GGG Ignite Athens 2012
From WWW to GGG Ignite Athens 2012
 
Cni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferiesCni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferies
 

En vedette

Happy new year celebration style 1 powerpoint templates
Happy new year celebration style 1 powerpoint templatesHappy new year celebration style 1 powerpoint templates
Happy new year celebration style 1 powerpoint templatesSlideTeam.net
 
Claudio Barna
Claudio BarnaClaudio Barna
Claudio Barnagisalvi
 
Mobile guest service
Mobile guest serviceMobile guest service
Mobile guest serviceLaura ORIOT
 
Happy New Year 2016 Quotes and Wishes
Happy New Year 2016 Quotes and WishesHappy New Year 2016 Quotes and Wishes
Happy New Year 2016 Quotes and WishesSimpy Saini
 
Comunicato pallavolo N°11 del 21 dicembre 2016
Comunicato pallavolo N°11 del 21 dicembre 2016Comunicato pallavolo N°11 del 21 dicembre 2016
Comunicato pallavolo N°11 del 21 dicembre 2016Giuliano Ganassi
 
Next Challenges in Corporate Knowledge Management
Next Challenges in Corporate Knowledge ManagementNext Challenges in Corporate Knowledge Management
Next Challenges in Corporate Knowledge ManagementJose Manuel Gómez-Pérez
 
Consumer ED ILHIE Toolkit for Professionals
Consumer ED  ILHIE Toolkit for ProfessionalsConsumer ED  ILHIE Toolkit for Professionals
Consumer ED ILHIE Toolkit for ProfessionalsWirehead Technology
 
Triangulos rectangulos notables(completo)
Triangulos rectangulos notables(completo)Triangulos rectangulos notables(completo)
Triangulos rectangulos notables(completo)Martin Huamán Pazos
 
It’s ideas time!
It’s ideas time! It’s ideas time!
It’s ideas time! Mimi Lai
 

En vedette (16)

Happy new year celebration style 1 powerpoint templates
Happy new year celebration style 1 powerpoint templatesHappy new year celebration style 1 powerpoint templates
Happy new year celebration style 1 powerpoint templates
 
1 ieee98
1 ieee981 ieee98
1 ieee98
 
Claudio Barna
Claudio BarnaClaudio Barna
Claudio Barna
 
דפוס בארי
דפוס בארידפוס בארי
דפוס בארי
 
Resume
ResumeResume
Resume
 
Activity 4
Activity 4Activity 4
Activity 4
 
Mobile guest service
Mobile guest serviceMobile guest service
Mobile guest service
 
Happy New Year 2016 Quotes and Wishes
Happy New Year 2016 Quotes and WishesHappy New Year 2016 Quotes and Wishes
Happy New Year 2016 Quotes and Wishes
 
Comunicato pallavolo N°11 del 21 dicembre 2016
Comunicato pallavolo N°11 del 21 dicembre 2016Comunicato pallavolo N°11 del 21 dicembre 2016
Comunicato pallavolo N°11 del 21 dicembre 2016
 
Mass and volume
Mass and volumeMass and volume
Mass and volume
 
Next Challenges in Corporate Knowledge Management
Next Challenges in Corporate Knowledge ManagementNext Challenges in Corporate Knowledge Management
Next Challenges in Corporate Knowledge Management
 
Consumer ED ILHIE Toolkit for Professionals
Consumer ED  ILHIE Toolkit for ProfessionalsConsumer ED  ILHIE Toolkit for Professionals
Consumer ED ILHIE Toolkit for Professionals
 
Frank resume 11112015
Frank resume 11112015Frank resume 11112015
Frank resume 11112015
 
Producción Audiovisual Cine: Tema 2
Producción Audiovisual Cine: Tema 2Producción Audiovisual Cine: Tema 2
Producción Audiovisual Cine: Tema 2
 
Triangulos rectangulos notables(completo)
Triangulos rectangulos notables(completo)Triangulos rectangulos notables(completo)
Triangulos rectangulos notables(completo)
 
It’s ideas time!
It’s ideas time! It’s ideas time!
It’s ideas time!
 

Similaire à Provenance and Trust

Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinal20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinalDeborah McGuinness
 
Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web ServicesJose Enrique Ruiz
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13Kristi Holmes
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationSören Auer
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterprisePeter Haase
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic TechnologiesPeter Haase
 
Provenance state-of-art
Provenance state-of-artProvenance state-of-art
Provenance state-of-artvty
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.orgNorman Morrison
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Stuart Weibel
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishingBradley Allen
 
Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13Michele Piunti
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Alexandre Passant
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingUniversity of Arizona
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 

Similaire à Provenance and Trust (20)

Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinal20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinal
 
Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web Services
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
 
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
Provenance state-of-art
Provenance state-of-artProvenance state-of-art
Provenance state-of-art
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
 
Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data Sharing
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 

Plus de Jose Manuel Gómez-Pérez

Plus de Jose Manuel Gómez-Pérez (6)

Science religion-dsmeetupv1.0
Science religion-dsmeetupv1.0Science religion-dsmeetupv1.0
Science religion-dsmeetupv1.0
 
Halo Pcs Kcap2007 V2
Halo Pcs Kcap2007 V2Halo Pcs Kcap2007 V2
Halo Pcs Kcap2007 V2
 
Acquisition And Understanding Of Process Knowledgev1 1
Acquisition And Understanding Of Process Knowledgev1 1Acquisition And Understanding Of Process Knowledgev1 1
Acquisition And Understanding Of Process Knowledgev1 1
 
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
 
Provenance: From e-Science to the Web Of Data
Provenance: From e-Science to the Web Of DataProvenance: From e-Science to the Web Of Data
Provenance: From e-Science to the Web Of Data
 
Tecnologías Semánticas en Salud
Tecnologías Semánticas en SaludTecnologías Semánticas en Salud
Tecnologías Semánticas en Salud
 

Dernier

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Provenance and Trust

  • 1. iSOCO Provenance and Trust José Manuel Gómez Pérez Fundations of Trust in the Future Internet Future Internet Assembly Valencia, 15th April 2010 W3C Provenance Incubator Group
  • 2. Why provenance is important For the Web Architecture - "At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons.“-- Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997 For Linked Data - "Provenance is the number one issue we face when publishing government data as linked data for data.gov.uk” -- John Sheridan, UK National Archives, data.gov.uk, February 2010 For Science - "We need a paradigm that makes it simple [...] to perform and publish reproducible computational research. [...] A Reproducible Research Environment (RRE) [...] provides computational tools together with the ability to automatically track the provenance of data, analyses, and results and to package them (or pointers to persistent versions of them) for redistribution." W3C Provenance Incubator Group 2
  • 3. Provenance is… Records of Sources of information, including entities and processes, involved in producing or delivering an artifact History of subsequent owners (change of custody) W3C Provenance Incubator Group 3
  • 4. Provenance is… Evidence of authenticity, integrity, and information quality Certifies products of good process W3C Provenance Incubator Group 4
  • 5. Provenance is… Valuable Hard to collect and verify Necessary to assign credit …and blame i.e. to establish Trust W3C Provenance Incubator Group 5
  • 6. Provenance representation Provenance models define - Types of provenance elements - Relationships between them Provenance represented as a graph - Nodes: Pieces of provenance information (Artifacts, Processes, Agents) - Edges: relate provenance elements to each other Additionally: roles, temporal, dependency, causal dependency The Open Provenance Model (OPM) W3C Provenance Incubator Group 6
  • 7. Provenance-related vocabularies General Provenance-specific DC – Dublin Core Metadata OPM - Open Provenance Model Terms Provenir FOAF – Friend of a Friend PRV - Provenance Vocabulary SIOC – Semantically-Interlinked PML - Proof Markup Language Online Communities DC - Dublin Core SWP – Semantic Web Publishing PREMIS vocabulary WOT – Web Of Trust schema VOID – VOcabulary of Interlinked Datasets PAV - SWAN Provenance Ontology SWP - Semantic Web Publishing Vocabulary CS - Changeset Vocabulary W3C Provenance Incubator Group 7
  • 8. A snapshot on provenance Uses Research areas Domains • Meaningful data • Scientific workflows • Science integration • Databases (where and • Open Government and • Establishing trust when why provenance) Linked Data information sources are diverse and of varying • Security (Information • Web search & use quality e.g. in the Web flow and trust) • Cultural heritage • Providing justification to • Justification and the conclusions of Argumentation in KRR reasoning processes • Licensing and attribution • Information Retrieval • Manufacturing • Attribution (who did (analysis of information what) sources) • Enabling comparison and reproducibility of processes W3C Provenance Incubator Group 8
  • 9. The Linked Data paradigm How can we exploit Tim Berners Lee, 2006 (Design Issues) all the available data? 1. Use URIs to identify things - Anything, not just documents 2. Use HTTP URIs for people to Data reuse and remix lookup such names Common flexible and usable APIs - Globally unique names Standard vocabularies to - Distributed ownership describe interlinked datasets 3. Provide useful information in RDF Tools upon URI resolution Realize the Semantic Web vision 4. Include RDF links to other URIs - Enable discovery of related information W3C Provenance Incubator Group 9
  • 10. Provenance is key in Linked Data Data Lineage - The origins of data - Related artifacts and actors Information quality - Timeliness - (Semantic) consistency between datasets - Stable and meaningful data links Trust - Data authenticity - Reliability W3C Provenance Incubator Group 10
  • 11. Provenance is key in Open Government Open Government Initiative (http://www.whitehouse.gov/open) - Release quickly, improve later - NSF to develop plan (http://www.nsf.gov/open): “NSF is developing an Open Goverment Plan, which will serve as the roadmap for our plans to improve transparency, better integrate public participation and collaboration into our core mission, and become more innovative and efficient.” Similar initiative in the UK (data.gov.uk) "Provenance is the number one issue we face when publishing government data as linked data for data.gov.uk” -- John Sheridan, UK National Archives Why is provenance so important - Government data comes from very diverse data sources - Varying quality - Different scope - Different assumptions W3C Provenance Incubator Group 11
  • 12. Large amounts of data and processes In 2006, the size of the digital universe Source: IDC was estimated in 161 exabytes 3 million times the information in all books ever written In 2010, expected to turn 988 exabytes …and all this data is potentially published online W3C Provenance Incubator Group 12
  • 13. It’s all about producing and consuming information… Automated means to evaluate Information Quality and Trust are needed! W3C Provenance Incubator Group 13
  • 14. Provenance and Trust in the Web: A real-life example • Two fake web sites Reusing web data without the means that allow • A fake Wikipedia entry contrasting its provenance can be harmful, • Fake California public especially in sensitive domains. safety phone numbers • Fake local TV station The hoax caused a 1000-word tome on Frankfurter Allgemeine Zeitung… and public apologies from DPA Trust on Wikipedia misled DPA In a provenance-aware world, DPA would have had means based on data provenance to evaluate trust - Bluewater did not exist - The Berlin Boys do not exist W3C Provenance Incubator Group 14
  • 15. Trust evaluation: Useful provenance questions Who created that content (author/attribution)? Was the content ever manipulated, if so by what processes/entities? Who is providing that content (repository)? Can any of the answers to these questions be verified e.g., through e-signatures? W3C Provenance Incubator Group 15
  • 16. Immediate need for provenance at W3C Trust-driven Others Web of trust Reasoning - Making trust judgments - Attribution of assertions from based on provenance diverse sources Social trust Linked data - Attribution, authority, - Use of conflicting data of propagation varying degrees of quality Social web Life sciences and e-Science - Privacy and use policies of at large sensitive (personal) data - Reproducibility of scientific results W3C Provenance Incubator Group 16
  • 17. W3C Provenance Group: Charter and goals of the incubator group Provide state-of-the-art understanding and develop a roadmap for development and possible standardization Articulate requirements for accessing and reasoning about provenance information - Develop use cases Identify issues in provenance that are direct concern to the Semantic Web - Articulate relationships with other aspects of Web architecture Report on state-of-the-art work on provenance Report on a roadmap for provenance in the Semantic Web - Identify starting points for provenance representations - Identifying elements of a provenance architecture that would benefit from standardization W3C Provenance Incubator Group 17
  • 18. W3C Provenance Group: Products of the group to date Group formed in September 2009 - All information is public: http://www.w3.org/2005/Incubator/prov/wiki Developed a set of key dimensions for provenance (11/09) - Grouped into three major categories: content, management, use Developed use cases for provenance (12/09) - More than 30 use cases, most were improved and curated Developed requirements for provenance that arise from the use cases (1/10) - User requirements: what is the purpose/use of the provenance information - Technical requirements: derived from the user requirements Currently developing state-of-the-art report (expected 6/10) W3C Provenance Incubator Group 18
  • 19. Scientific and technical challenges of provenance Provenance information needs to be : Represented Captured and recorded Stored and secured, queried, and reasoned about Visualized and browsed W3C Provenance Incubator Group 19
  • 20. Scientific and technical challenges of provenance (II) Vocabularies for representation of provenance content - Need representations of processes (workflows), entities,roles, data collections, meta-assertions, etc. - The Open Provenance Model (OPM): http://twiki.ipaw.info/bin/view/OPM Granularity of provenance records - How much detail is useful, manageable/scalable in practice? • Size of provenance can be orders of magnitude larger than base data Provenance evaluation for Information Quality and Trust Management Evolution and updates - Shelf timeliness of data • Determine when data becomes obsolete based on provenance info - Versioning of data sources • Relate updates of data based on provenance info - Reproducibility of processes Provenance-aware visualization, navigation, and resource consumption W3C Provenance Incubator Group 20
  • 21. Scientific and technical challenges of Provenance & Trust Policies based on provenance information - Association-based policies • Source is NYT, source cites NYT • Source is cited in Wikipedia - Bias-based policies • Source is an oil company - Distrust policies • Source is a blog Policies may be restricted to a context - Topic of search, topics of page, tags of page Trust policies may be shared across users - Like bookmarks in del.icio.us W3C Provenance Incubator Group 21
  • 22. Socioeconomic challenges How to incentivize provenance take-up in the Web architecture so that content and service providers move into a provenance-aware paradigm? - Generation of provenance metadata by content providers - Consumption of provenance metadata by infrastructure and applications like search engines, browsers, etc. Objective evidence of trust in online data or service allows - Ranking increase in search engines - Attracting internet traffic - Increasing automation in data and service consumption But, can you can trust such provenance information itself? - Non-forgery of provenance metadata • Authoritative agencies W3C Provenance Incubator Group 22
  • 23. José Manuel Gómez-Pérez R&D Director T +34913349778 Thanks for M +34609077103 jmgomez@isoco.com your attention! iSOCO Para obtener más información sobre como puede ayudar a su empresa a optimizar sus negocios digitales y aportar una solución innovadora, contáctenos en www. .com Barcelona Madrid Valencia Tel +34 93 5677200 +34 91 3349797 +34 96 3467143 Edificio Testa A C/Pedro de Valdivia, 10 Oficina 107 C/ Alcalde Barnils 64-68 28006 Madrid C/ Prof. Beltrán Báguena 4, St. Cugat del Vallès 46009 Valencia 08174 Barcelona W3C Provenance Incubator Group 23