SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Implementing Digital Provenance on
the World Wide Web Using Semantic
Web Technology
Gregory Joiner*, Douglas Reid
Raytheon BBN Technologies
{gjoiner,dreid}@bbn.com
June 9th, 2011
First…Some Administrivia!
• Updated slides are located on SlideShare at:
http://slidesha.re/lqCHWd
• Presentation is not “Technical – Intermediate.”
– I wanted to reach the maximum number of users
– Was not enough time to provide both an overview and
technical instruction.
• Feel free to interrupt me anytime with questions!
June 9th, 2011 2
Goals of this Talk
• Learn what digital provenance is
• Understand why it is important
• Know what is currently being done by whom
• Have starting point for implementing provenance
in your semantic web applications
• Be passionate about digital provenance!
June 9th, 2011 3
Agenda
• Part 1: A Introduction to Digital Provenance
– What is Digital Provenance
– National Cyber Leap Year Summit
• Part 2: Digital Provenance Use Cases
– Everyday Web Browsing
– Contradictory, Time-Sensitive Information
– Closed Network Provenance
• Part 3: Where Are We Now?
– W3C Provenance Work
– Review of the Current State-of-the-Art
• Part 4: Digital Provenance Tool Development
– Why SemWeb is Perfect for Digital Provenance
– Open Source and Standards Compliance
– Securing Provenance Metadata
– Additional Design Considerations
June 9th, 2011 4
A INTRODUCTION TO
DIGITAL PROVENANCE
Part 1:
Part 1: A Introduction to Digital Provenance
Part 2: Digital Provenance Use Cases
Part 3: Where Are We Now?
Part 4: Digital Provenance Tool Development
June 9th, 2011 5
What is Digital Provenance
• Provenance is defined by Webster’s Dictionary as “the
origin or source of something” – mainly pertaining to art
or architectural artifacts
• Digital Provenance is metadata that establishes the
chain-of-custody information needed for users to make
trust decisions about digital data
• Digital Provenance Metadata can describe any type of
electronic data at any granularity level from entire web
sites to single files to even individual assertions within a
webpage or document
June 9th, 2011 6
What is Digital Provenance
Types of Digital Provenance Metadata include:
• Bibliographical Information – Provides a list of all of the sources
behind a document or assertion
• Chain-of-Custody Information – Provides a history of the different
people and/or systems that have handled the document or assertion
• Proof / Justification Information – Documents the logical steps
followed to make an assertion
• Trust Information – Provides a quantifiable metric to measure and
compare the trustworthiness of one document or assertion to
another.
June 9th, 2011 7
National Cyber Leap Year Summit
• Convened in 2009 as a response to
the President’s call to secure the
nation’s cyber infrastructure and
charged with identifying the “game-
changing” technologies needed to
secure cyberspace
• Identified Digital Provenance as
one of those technologies because it
enables the identification,
authentication, and reputation of
entities and objects with appropriate
granularity at many layers of the
protocol hierarchy.
June 9th, 2011 8
DIGITAL PROVENANCE
USE CASES
Part 2:
Part 1: A Introduction to Digital Provenance
Part 2: Digital Provenance Use Cases
Part 3: Where Are We Now?
Part 4: Digital Provenance Tool Development
June 9th, 2011 9
Everyday Web Browsing
• Scenario: People often rely on the
Internet for advice on important
subjects, such health or finance, and
frequently make key decisions based on
web content alone. This is especially
true for mobile users who lack the
bandwidth and display room to
investigate the provenance on their own.
• Solution: By dynamically marking the
trustworthiness of web content, users
can quickly determine what data they
can trust so they can make more
informed decisions.
June 9th, 2011 10
Contradictory, Time-Sensitive Information
• Scenario: When breaking news
happens, content re-publishers and end
users are often forced to chose
between contradicting information. For
example, after the tragic shooting in
Arizona in January 2011, some
websites claimed Rep. Gifford was
dead while others properly reported that
she was still alive.
• Solution: By providing a standard way
to view and compare the bibliographical
and chain-of-custody information of the
conflicting articles, users can make an
informed decision on which one to trust.
June 9th, 2011 11
Closed Network Provenance
• Scenario: Even in a closed network,
users frequently have to decide
whether to trust existing content. This is
often the case within the Intelligence
Community and Department of Defense
where certain time-sensitive tasks allow
assumptions to be made that other
tasks can not. For example, the use of
lethal force against a target requires
more concrete evidence than other,
less irreparable actions.
• Solution: By providing analysts with a
complete list of the assumptions and
justifications behind a given assertion,
they can determine whether or not they
can use that assertion in their analysis.
June 9th, 2011 12
Additional Use Cases
• License and Contract Compliance
• Public Policy Conformance
• Assigning Credit and Blame to Information
• Many more were identified by the W3C
Provenance Incubator Group and are located at:
http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases
June 9th, 2011 13
WHERE ARE WE NOW?
Part 3:
Part 1: A Introduction to Digital Provenance
Part 2: Digital Provenance Use Cases
Part 3: Where Are We Now?
Part 4: Digital Provenance Tool Development
June 9th, 2011 14
W3C Provenance Work
• Provenance Interchange Working Group
– Chartered through Oct 2012, based on Incubator Group’s findings
– Formed to “support the widespread publication and use of
provenance information of Web documents, data, and resources”
– Will publish Recommendations to define a language for exchanging
provenance information (PIL) among applications
• Provenance Interchange Language (PIL) design goals
– Be applicable to any resource
– Provide a low barrier to entry to facilitate widespread adoption
– Provide a small, extensible core model
– Draw from existing vocabularies ontologies
• Deliverables
– Conceptual Model, Formal Model, Formal Semantics, Accessing
and Query Provenance, XML Serialization, Best Practice Cookbook,
Primer
June 9th, 2011 15
W3C’s work (cont.)
• Key Recommendations for PIL
– Standard way to represent, at a minimum, three basic entities
1. A handle (URI) to refer to an object
2. A person/entity that the object is attributed to
3. A processing step done by a person/entity to an object
– Mechanism to access provenance-related information addressed
by other standards
• Licensing information of an object
• Digital signature for the object
• Digital signature for the provenance records
– Standard way for sites to make provenance information about
their content available to other parties in a selective manner, and
for others to access that information
June 9th, 2011 16
Review of the Current State-of-the-Art
Representation
• Existing Provenance Vocabularies/Ontologies
– Dublin Core: “Librarian” vocabulary capturing bibliographical information.
– Provenir Ontology: Upper-level ontology for use in SemWeb applications
– Provenance Vocabulary: Captures data using the Linked Data principles
– Proof Markup Language (PML): “Full-Featured” interlingua that describes
basic provenance meta-data plus justification and trust information.
– Others: Changeset Vocabulary, PREMIS, SWAN Provenance Ontology,
Semantic Web Publishing Vocabulary, and WOT Schema
• Concrete mapping specified between existing ontologies
– The Open Provenance Model (OPM) was chosen as a reference
vocabulary since it contained is a general and broad model that
encompasses many aspects of provenance
– W3C Incubator Group formally encoded the mappings according to Simple
Knowledge Organization System (SKOS) vocabulary
June 9th, 2011 17
Review of the Current State-of-the-Art
Implementation
• News aggregation scenario
– Content tracking (Memetracker, Spinn3r & BlogTracker, influence studies)
– Explicit provenance (trackbacks / pingbacks, Twitter’s Retweet)
– Licensing (Creative Commons, Google Books Right Registry)
• Disease outbreak scenario
– Data provenance (human-readable changelogs, database research)
– Workflow provenance (Taverna/Pegasus, Inference Web, ZOOM)
– Justification for policy (ad-hoc user effort)
• Business Contract scenario
– Tracking design (VisTrails)
– Computer-aided Design (Design Rationale editor (DRed), IBIS software)
June 9th, 2011 18
State-of-the-Art (cont.)
Gaps
• Content
– No mechanism to refer to the identity/derivation of an information object
– No guidance on granularity for description of complex objects
– No common standard for exposing/expressing provenance information
– No standard for versioning and publishing updates
– No standard to characterize suitability of provenance info for proof
• Management
– No standard for linking provenance between sites
– No guidance on combining existing standards to provide provenance
– No guidance for exposing provenance info on the Web
– No proven approaches to manage scale
– No standard way to ensure only essential non-confidential provenance is
released
June 9th, 2011 19
State-of-the-Art (cont.)
More Gaps
• Use
– No clear understanding of how to relate provenance at different levels of
abstraction
– No general solutions to understand provenance publish on the Web
– No standard to enable provenance integration/comparison
– No broadly applicable methodology for making trust judgments based on
provenance when presented with information of varying quality
– No existing mechanism to check compliance with laws, regulations or
contracts
– No means to resolve conflicts in provenance data
June 9th, 2011 20
DIGITAL PROVENANCE
TOOL DEVELOPMENT
Part 4:
Part 1: A Introduction to Digital Provenance
Part 2: Digital Provenance Use Cases
Part 3: Where Are We Now?
Part 4: Digital Provenance Tool Development
June 9th, 2011 21
Why SemWeb is Perfect for Digital Provenance
• Semantic Web Technologies allow data to be shared and
reused in a manner that is more flexible and
integratable than traditional knowledge representations.
• The Web Ontology Language (OWL) allows deeper
context to be encoded in the digital provenance metadata
which enables the capture of more complex information
in a standard, well specified format.
• With the provenance metadata in a machine-readable
format, powerful automated information processing
can which can provide additional provenance knowledge.
• By semantically tagging the digital provenance metadata,
it can be dynamically linked to supporting (or
contradicting) information to provide a more complete
chain-of-custody picture.
June 9th, 2011 22
Why Digital Provenance is Perfect for SemWeb
June 9th, 2011 23
Provenance helps complete the path to the top of the
Semantic Web layer cake and to TBL’s SemWeb nirvana.
Open Source and Standards Compliance
• As explained in the National Cyber Leap Year Summit’s Co-Chairs’
Report, establishing standards early on in the development process
is crucial to achieving rapid, widespread community acceptance that
is required for any digital provenance tool to be successful.
• Therefore, Digital Provenance tools should comply with and even
inform the emerging W3C standards discussed earlier in this
presentation
• Furthermore, since digital provenance tools require an additional
time burden for both content developers and end-users, they should
be available at little to no cost to further encourage acceptance.
June 9th, 2011 24
Securing Provenance Metadata
• Provenance metadata that is not
signed or secured is susceptible
to tampering and therefore
cannot realistically be trusted.
• Confidentiality and integrity
controls that are consistent with
a wide variety of security models
are crucial to creating a
successful digital provenance
solution.
June 9th, 2011 25
Additional Design Considerations
• It is crucial that any digital provenance tool
supports the creation, processing, and
rendering of digital provenance metadata at
all stages of the content creation
lifecycle.
• Since users will require provenance
information at many different levels of detail,
successful digital provenance tools will be
configurable to allow content creators and
users to create and view the metadata at
any granularity level.
June 9th, 2011 26
Key Takeaways
• Provenance is key to the future success of the Web and
is the final piece of the Semantic Web puzzle.
• The U.S. government has identified digital provenance
as one of the important “game changing” cyber security
technologies.
• Important W3C work is already underway.
• You can start thinking about and incorporating
provenance in your application right now.
June 9th, 2011 27
For More Information
• Authors
– Greg Joiner, gjoiner@bbn.com, 703-284-1259
– Douglas Reid, dreid@bbn.com, 703-284-1291
• National Cyber Leap Year Report
– Co-Chairs Report: http://bit.ly/6NO05g
– Participants’ Ideas Report: http://bit.ly/7HmjQ8
• W3C Provenance Interchange Working Group
– www.w3.org/2011/prov
June 9th, 2011 28
Questions
June 9th, 2011 29

Contenu connexe

Tendances

Ethics for the paralegal
Ethics for the paralegalEthics for the paralegal
Ethics for the paralegal
Lauren Doucette
 
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Artificial Intelligence Institute at UofSC
 
Share: Science Information Life Cycle
Share: Science Information Life CycleShare: Science Information Life Cycle
Share: Science Information Life Cycle
kauberry
 

Tendances (20)

Diluting Prejudice
Diluting PrejudiceDiluting Prejudice
Diluting Prejudice
 
Al Live: Filtering: The Man in the Middle
Al Live: Filtering: The Man in the MiddleAl Live: Filtering: The Man in the Middle
Al Live: Filtering: The Man in the Middle
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...
 
A methodology for internal Web ethics
A methodology for internal Web ethicsA methodology for internal Web ethics
A methodology for internal Web ethics
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
 
Privacy in the Digital Age, Helen Cullyer
Privacy in the Digital Age, Helen CullyerPrivacy in the Digital Age, Helen Cullyer
Privacy in the Digital Age, Helen Cullyer
 
A Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyA Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information Privacy
 
C01 silvia schenkolewski_retention_disposition
C01 silvia schenkolewski_retention_dispositionC01 silvia schenkolewski_retention_disposition
C01 silvia schenkolewski_retention_disposition
 
C01 silvia schenkolewski_retention_disposition
C01 silvia schenkolewski_retention_dispositionC01 silvia schenkolewski_retention_disposition
C01 silvia schenkolewski_retention_disposition
 
Shibboleth: Open Source Distributed Authentication and Authorization
Shibboleth: Open Source Distributed Authentication and AuthorizationShibboleth: Open Source Distributed Authentication and Authorization
Shibboleth: Open Source Distributed Authentication and Authorization
 
Ethics for the paralegal
Ethics for the paralegalEthics for the paralegal
Ethics for the paralegal
 
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
 
Libraries and E-government: Foundations and Issues
Libraries and E-government: Foundations and IssuesLibraries and E-government: Foundations and Issues
Libraries and E-government: Foundations and Issues
 
Open Government Data: Understanding Open Access vs. Public Domain
Open Government Data: Understanding Open Access vs. Public DomainOpen Government Data: Understanding Open Access vs. Public Domain
Open Government Data: Understanding Open Access vs. Public Domain
 
Analysis, modelling and protection of online private data.
Analysis, modelling and protection of online private data.Analysis, modelling and protection of online private data.
Analysis, modelling and protection of online private data.
 
Share: Science Information Life Cycle
Share: Science Information Life CycleShare: Science Information Life Cycle
Share: Science Information Life Cycle
 
Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...
Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...
Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...
 
Privacy 2020 (Participants) EINS summer school
Privacy 2020 (Participants) EINS summer schoolPrivacy 2020 (Participants) EINS summer school
Privacy 2020 (Participants) EINS summer school
 
The internet
The internetThe internet
The internet
 
Legal Research in the Age of Cloud Computing
Legal Research in the Age of Cloud ComputingLegal Research in the Age of Cloud Computing
Legal Research in the Age of Cloud Computing
 

En vedette (6)

Cat250 module 3 newsletter
Cat250 module 3 newsletterCat250 module 3 newsletter
Cat250 module 3 newsletter
 
Mariamshahabcocacolansacplansbook 091225165137-phpapp02
Mariamshahabcocacolansacplansbook 091225165137-phpapp02Mariamshahabcocacolansacplansbook 091225165137-phpapp02
Mariamshahabcocacolansacplansbook 091225165137-phpapp02
 
Grapedigitaltrendsnewsletter1eng 090904060339-phpapp01
Grapedigitaltrendsnewsletter1eng 090904060339-phpapp01Grapedigitaltrendsnewsletter1eng 090904060339-phpapp01
Grapedigitaltrendsnewsletter1eng 090904060339-phpapp01
 
The NUT Shack
The NUT ShackThe NUT Shack
The NUT Shack
 
VIDA- MARI CARMEN
VIDA- MARI CARMENVIDA- MARI CARMEN
VIDA- MARI CARMEN
 
WebSphere Portlet Factory: Davalen’s Practical Advice from the Field
WebSphere Portlet Factory: Davalen’s Practical Advice from the Field WebSphere Portlet Factory: Davalen’s Practical Advice from the Field
WebSphere Portlet Factory: Davalen’s Practical Advice from the Field
 

Similaire à SemTech West 2011 - Digital Provenance

20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
Bernadette Hyland-Wood
 
Data management plans
Data management plansData management plans
Data management plans
Brad Houston
 

Similaire à SemTech West 2011 - Digital Provenance (20)

Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Branding the Stewardship of Big Data
Branding the Stewardship of Big DataBranding the Stewardship of Big Data
Branding the Stewardship of Big Data
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
Data management plans
Data management plansData management plans
Data management plans
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Di d dlf_handout
Di d dlf_handoutDi d dlf_handout
Di d dlf_handout
 
Open Data is not Enough (final version)
Open Data is not Enough (final version)Open Data is not Enough (final version)
Open Data is not Enough (final version)
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Here Comes Everything
Here Comes EverythingHere Comes Everything
Here Comes Everything
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
 
John Eberhardt NSTAC Testimony
John Eberhardt NSTAC TestimonyJohn Eberhardt NSTAC Testimony
John Eberhardt NSTAC Testimony
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challenges
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

SemTech West 2011 - Digital Provenance

  • 1. Implementing Digital Provenance on the World Wide Web Using Semantic Web Technology Gregory Joiner*, Douglas Reid Raytheon BBN Technologies {gjoiner,dreid}@bbn.com June 9th, 2011
  • 2. First…Some Administrivia! • Updated slides are located on SlideShare at: http://slidesha.re/lqCHWd • Presentation is not “Technical – Intermediate.” – I wanted to reach the maximum number of users – Was not enough time to provide both an overview and technical instruction. • Feel free to interrupt me anytime with questions! June 9th, 2011 2
  • 3. Goals of this Talk • Learn what digital provenance is • Understand why it is important • Know what is currently being done by whom • Have starting point for implementing provenance in your semantic web applications • Be passionate about digital provenance! June 9th, 2011 3
  • 4. Agenda • Part 1: A Introduction to Digital Provenance – What is Digital Provenance – National Cyber Leap Year Summit • Part 2: Digital Provenance Use Cases – Everyday Web Browsing – Contradictory, Time-Sensitive Information – Closed Network Provenance • Part 3: Where Are We Now? – W3C Provenance Work – Review of the Current State-of-the-Art • Part 4: Digital Provenance Tool Development – Why SemWeb is Perfect for Digital Provenance – Open Source and Standards Compliance – Securing Provenance Metadata – Additional Design Considerations June 9th, 2011 4
  • 5. A INTRODUCTION TO DIGITAL PROVENANCE Part 1: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 5
  • 6. What is Digital Provenance • Provenance is defined by Webster’s Dictionary as “the origin or source of something” – mainly pertaining to art or architectural artifacts • Digital Provenance is metadata that establishes the chain-of-custody information needed for users to make trust decisions about digital data • Digital Provenance Metadata can describe any type of electronic data at any granularity level from entire web sites to single files to even individual assertions within a webpage or document June 9th, 2011 6
  • 7. What is Digital Provenance Types of Digital Provenance Metadata include: • Bibliographical Information – Provides a list of all of the sources behind a document or assertion • Chain-of-Custody Information – Provides a history of the different people and/or systems that have handled the document or assertion • Proof / Justification Information – Documents the logical steps followed to make an assertion • Trust Information – Provides a quantifiable metric to measure and compare the trustworthiness of one document or assertion to another. June 9th, 2011 7
  • 8. National Cyber Leap Year Summit • Convened in 2009 as a response to the President’s call to secure the nation’s cyber infrastructure and charged with identifying the “game- changing” technologies needed to secure cyberspace • Identified Digital Provenance as one of those technologies because it enables the identification, authentication, and reputation of entities and objects with appropriate granularity at many layers of the protocol hierarchy. June 9th, 2011 8
  • 9. DIGITAL PROVENANCE USE CASES Part 2: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 9
  • 10. Everyday Web Browsing • Scenario: People often rely on the Internet for advice on important subjects, such health or finance, and frequently make key decisions based on web content alone. This is especially true for mobile users who lack the bandwidth and display room to investigate the provenance on their own. • Solution: By dynamically marking the trustworthiness of web content, users can quickly determine what data they can trust so they can make more informed decisions. June 9th, 2011 10
  • 11. Contradictory, Time-Sensitive Information • Scenario: When breaking news happens, content re-publishers and end users are often forced to chose between contradicting information. For example, after the tragic shooting in Arizona in January 2011, some websites claimed Rep. Gifford was dead while others properly reported that she was still alive. • Solution: By providing a standard way to view and compare the bibliographical and chain-of-custody information of the conflicting articles, users can make an informed decision on which one to trust. June 9th, 2011 11
  • 12. Closed Network Provenance • Scenario: Even in a closed network, users frequently have to decide whether to trust existing content. This is often the case within the Intelligence Community and Department of Defense where certain time-sensitive tasks allow assumptions to be made that other tasks can not. For example, the use of lethal force against a target requires more concrete evidence than other, less irreparable actions. • Solution: By providing analysts with a complete list of the assumptions and justifications behind a given assertion, they can determine whether or not they can use that assertion in their analysis. June 9th, 2011 12
  • 13. Additional Use Cases • License and Contract Compliance • Public Policy Conformance • Assigning Credit and Blame to Information • Many more were identified by the W3C Provenance Incubator Group and are located at: http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases June 9th, 2011 13
  • 14. WHERE ARE WE NOW? Part 3: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 14
  • 15. W3C Provenance Work • Provenance Interchange Working Group – Chartered through Oct 2012, based on Incubator Group’s findings – Formed to “support the widespread publication and use of provenance information of Web documents, data, and resources” – Will publish Recommendations to define a language for exchanging provenance information (PIL) among applications • Provenance Interchange Language (PIL) design goals – Be applicable to any resource – Provide a low barrier to entry to facilitate widespread adoption – Provide a small, extensible core model – Draw from existing vocabularies ontologies • Deliverables – Conceptual Model, Formal Model, Formal Semantics, Accessing and Query Provenance, XML Serialization, Best Practice Cookbook, Primer June 9th, 2011 15
  • 16. W3C’s work (cont.) • Key Recommendations for PIL – Standard way to represent, at a minimum, three basic entities 1. A handle (URI) to refer to an object 2. A person/entity that the object is attributed to 3. A processing step done by a person/entity to an object – Mechanism to access provenance-related information addressed by other standards • Licensing information of an object • Digital signature for the object • Digital signature for the provenance records – Standard way for sites to make provenance information about their content available to other parties in a selective manner, and for others to access that information June 9th, 2011 16
  • 17. Review of the Current State-of-the-Art Representation • Existing Provenance Vocabularies/Ontologies – Dublin Core: “Librarian” vocabulary capturing bibliographical information. – Provenir Ontology: Upper-level ontology for use in SemWeb applications – Provenance Vocabulary: Captures data using the Linked Data principles – Proof Markup Language (PML): “Full-Featured” interlingua that describes basic provenance meta-data plus justification and trust information. – Others: Changeset Vocabulary, PREMIS, SWAN Provenance Ontology, Semantic Web Publishing Vocabulary, and WOT Schema • Concrete mapping specified between existing ontologies – The Open Provenance Model (OPM) was chosen as a reference vocabulary since it contained is a general and broad model that encompasses many aspects of provenance – W3C Incubator Group formally encoded the mappings according to Simple Knowledge Organization System (SKOS) vocabulary June 9th, 2011 17
  • 18. Review of the Current State-of-the-Art Implementation • News aggregation scenario – Content tracking (Memetracker, Spinn3r & BlogTracker, influence studies) – Explicit provenance (trackbacks / pingbacks, Twitter’s Retweet) – Licensing (Creative Commons, Google Books Right Registry) • Disease outbreak scenario – Data provenance (human-readable changelogs, database research) – Workflow provenance (Taverna/Pegasus, Inference Web, ZOOM) – Justification for policy (ad-hoc user effort) • Business Contract scenario – Tracking design (VisTrails) – Computer-aided Design (Design Rationale editor (DRed), IBIS software) June 9th, 2011 18
  • 19. State-of-the-Art (cont.) Gaps • Content – No mechanism to refer to the identity/derivation of an information object – No guidance on granularity for description of complex objects – No common standard for exposing/expressing provenance information – No standard for versioning and publishing updates – No standard to characterize suitability of provenance info for proof • Management – No standard for linking provenance between sites – No guidance on combining existing standards to provide provenance – No guidance for exposing provenance info on the Web – No proven approaches to manage scale – No standard way to ensure only essential non-confidential provenance is released June 9th, 2011 19
  • 20. State-of-the-Art (cont.) More Gaps • Use – No clear understanding of how to relate provenance at different levels of abstraction – No general solutions to understand provenance publish on the Web – No standard to enable provenance integration/comparison – No broadly applicable methodology for making trust judgments based on provenance when presented with information of varying quality – No existing mechanism to check compliance with laws, regulations or contracts – No means to resolve conflicts in provenance data June 9th, 2011 20
  • 21. DIGITAL PROVENANCE TOOL DEVELOPMENT Part 4: Part 1: A Introduction to Digital Provenance Part 2: Digital Provenance Use Cases Part 3: Where Are We Now? Part 4: Digital Provenance Tool Development June 9th, 2011 21
  • 22. Why SemWeb is Perfect for Digital Provenance • Semantic Web Technologies allow data to be shared and reused in a manner that is more flexible and integratable than traditional knowledge representations. • The Web Ontology Language (OWL) allows deeper context to be encoded in the digital provenance metadata which enables the capture of more complex information in a standard, well specified format. • With the provenance metadata in a machine-readable format, powerful automated information processing can which can provide additional provenance knowledge. • By semantically tagging the digital provenance metadata, it can be dynamically linked to supporting (or contradicting) information to provide a more complete chain-of-custody picture. June 9th, 2011 22
  • 23. Why Digital Provenance is Perfect for SemWeb June 9th, 2011 23 Provenance helps complete the path to the top of the Semantic Web layer cake and to TBL’s SemWeb nirvana.
  • 24. Open Source and Standards Compliance • As explained in the National Cyber Leap Year Summit’s Co-Chairs’ Report, establishing standards early on in the development process is crucial to achieving rapid, widespread community acceptance that is required for any digital provenance tool to be successful. • Therefore, Digital Provenance tools should comply with and even inform the emerging W3C standards discussed earlier in this presentation • Furthermore, since digital provenance tools require an additional time burden for both content developers and end-users, they should be available at little to no cost to further encourage acceptance. June 9th, 2011 24
  • 25. Securing Provenance Metadata • Provenance metadata that is not signed or secured is susceptible to tampering and therefore cannot realistically be trusted. • Confidentiality and integrity controls that are consistent with a wide variety of security models are crucial to creating a successful digital provenance solution. June 9th, 2011 25
  • 26. Additional Design Considerations • It is crucial that any digital provenance tool supports the creation, processing, and rendering of digital provenance metadata at all stages of the content creation lifecycle. • Since users will require provenance information at many different levels of detail, successful digital provenance tools will be configurable to allow content creators and users to create and view the metadata at any granularity level. June 9th, 2011 26
  • 27. Key Takeaways • Provenance is key to the future success of the Web and is the final piece of the Semantic Web puzzle. • The U.S. government has identified digital provenance as one of the important “game changing” cyber security technologies. • Important W3C work is already underway. • You can start thinking about and incorporating provenance in your application right now. June 9th, 2011 27
  • 28. For More Information • Authors – Greg Joiner, gjoiner@bbn.com, 703-284-1259 – Douglas Reid, dreid@bbn.com, 703-284-1291 • National Cyber Leap Year Report – Co-Chairs Report: http://bit.ly/6NO05g – Participants’ Ideas Report: http://bit.ly/7HmjQ8 • W3C Provenance Interchange Working Group – www.w3.org/2011/prov June 9th, 2011 28