SlideShare une entreprise Scribd logo
1  sur  46
Fake Picassos, Tampered History, and Digital Forgery:Protecting the Genealogy of Bits withSecure Provenance Ragib Hasan NSF Computing Innovation Fellow Dept. of Computer Science,  Johns Hopkins University [Joint work with Marianne Winslett (UIUC) and Radu Sion(Stony Brook)] Computer Science Seminar, Johns Hopkins University February 2, 2010
Let’s play a game Real, worth $101.8 million Fake, listed at eBay, worth nothing Can you spot the fake Picasso?
So, how do art buyers authenticate art? Among other things, they look at Provenance records Provenance: from Latin provenire ‘come from’, defined as  “(i) the fact of coming from some particular source or quarter; origin, derivation.   (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; a record of the ultimate derivation and passage of an item through its various owners” (Oxford English Dictionary) In other words,Who owned it, what was done to it, how was it transferred … Widely used in arts, archives, and archeology, called the Fundamental Principle of Archival http://moma.org/collection/provenance/items/644.67.html L'artiste et son modèle (1928), at Museum of Modern Art
Provenance of a book Owner names recorded in sequential order on left side 4
Let’s consider the Digital World Am I getting back untampered data? Was this data created and processed by persons I trust? Unlike data processing in the past, digital data can be rapidly copied, modified, and erased To trust  data we receive from others or retrieve from storage, we need to look into the integrity of both the present state and the past historyof data Data is generated, processed, and transmitted between different systems and principals, and stored in database/storage Our life today has become increasingly dependent on digital data Our Most Valuable Asset is Data 5
Data History is especially important for mobile data Celebrity death hoax Resulting tweet/retweet cycle Hoaxes propagates fast, so before retweeting, do check the original source
The origin of data is important to measure its quality Dr. Evil Dr. Jones Dataset is used by many unsuspecting scientists to produce new data Unreliable experiments create faulty dataset Tainted data only produces more tainted results Reliable Lab Dr. Jones got the data from a lab he trusts, but the original data was tainted, so any data derived from it will also be tainted
Bottom line: History matters … The origins of a data object can affect its quality and usefulness The transformation path that a data object took affects its trustworthiness In other words, data provenance is extremely important to decide data quality and trustworthiness
What exactly is Data Provenance? Definition* Description of the origins of data and the process by which it arrived at the database. [Buneman et al.] Information describing materials and transformations applied to derive the data. [Lanter] Metadata recording the process of experiment workflows, annotations, and notes about experiments.  [Greenwood] Information that helps determine the derivation history of a data product, starting from its original sources. [Simmhan et al.] 9 *Simmhan et al. A Survey of Provenance in E-Science. SIGMOD Record, 2005.
Example provenance systems Simmhan et al., 2005
What was the common theme of all those systems? They were all scientific computing systems And scientists trust people (more or less) Previous research covers provenance collection, annotation, querying, and workflow, but security issues are not handled For provenance in untrusted environments, we need integrity, confidentiality and privacy guarantees So, we need provenance of provenance, i.e. a model for Secure Provenance Data
“History is not history unless it is the truth”.  Abraham Lincoln, 1856
Problem Statement Can we design an efficient system for securing the provenance of documents, and allow auditing with fine-grained preservation of confidentiality?
Previous work on integrity assurances (Logically) centralized repository  (CVS, Subversion, GIT) Changes to files recorded Not applicable to mobile documents File systems with integrity assurances (SUNDR, PASIS, TCFS) Provide local integrity checking Do not apply to data that traverses systems System state entanglement (Baker 02) Entangle one system’s state with another, so others can serve as witness to a system’s state Not applicable to mobile data Secure audit logs / trails (Schneier and Kelsey 99), LogCrypt (Holt 2004), (Peterson et al. 2006) Trusted notary certifies logs, or trusted third party given hash chain seed
Secure provenance means preventing “Undetectable history rewriting” Preventundetectable modification of history, such that,  Adversaries cannot insert fake events, remove genuine events from a document’s provenance No one can deny history of own actions Allow fine grained preservation of privacy and confidentiality of actions Users can choose which auditors can see details of their work Attributes can be selectively disclosed or hidden without harming integrity check 15
Usage and threat model Bob Alice Charlie Marvin PAlice PBob PCharlie PAlice PBob PCharlie PMarvin Audrey Users: Edit documents on their machines Documents: Are edited, transmitted to other users Provenance entry = record of a user’s modifications and related context  Provenance chain = chronologically sorted list of entries; accompanies the document  Auditors: semi-trusted principals ,[object Object]
Only certain auditors can read each entryAdversaries: insiders or outsiders who ,[object Object]
Collude with others to add/remove entries
Claim a chain belongs to another document
Repudiate an entry Ragib Hasan, Radu Sion, and Marianne Winslett, “Introducing Secure Provenance: Problems and Challenges”, ACM StorageSS 2007
Scenario 1: Securing a single document
What realistic guarantees can we provide against a strong adversary? We can’t expect or force the adversary to play by the books Access control is not feasible in a loosely coupled distributed environment So, we can only verify plausibility of a given history
Plausible history is something that could have happened Plausible history: The version history that will result if a document were created and subsequently edited and transferred from user to user, with provenance information correctly and indelibly recorded all along the way. A document can have multiple plausible histories
Original scheme: Ensure Verification of Plausible History Given a document and a provenance chain, we want to verify if this could have happened Forger’s goal: To create a plausible history involving honest users
Analogy: The Forger and the Prada Fake Prada,  street price $50 Forger’s goal is to create a plausible history for a fake Prada bag (i.e. make it appear to come from real Prada distributors via real supply chain) Forger’s goal is NOT to replace a real Prada bag’s plausible history, to show it was made elsewhere. Real Prada, price $3000
Our solution: Overview P1 P2 P3 P4 Pn-1 Pn Provenance Chain … U3 W3 K3 C3 Pub3 Provenance Entry Uid3 Pid3 Host3 IP3 time3 Ui= identity of the principal (lineage) Ci= integrity checksum(s) Ki = confidentiality locks for Wi Wi= Encrypted modification log
Our solution: Confidentiality A single auditor Modification log Encrypted Modification log Issues  ,[object Object]
Only the auditor(s) trusted by the user can see the user’s actions on the documentMultiple auditors Encrypted Modification log Modification log Encrypted Modification log Modification log Encrypted Modification log Optimization: Use broadcast encryption tree to reduce number of required keys
Sidenote: Broadcast Encryption Organize the keys as a tree, with auditors at leaves Each principal knows all the keys from leaf to root A D B C To trust everyone, use the root key To trust only D To trust B and C To trust A and C and D
Our solution: Confidentiality P1 P2 P3 P4 Pn-1 Pn … U3 W3 K3 C3 Pub3 Wi= Eki (wi)|hash(D) Ki = {Eka (ki) } kiis a secret key that authorized auditors can retrieve from the field Ki wi is either the diff or the set of actions taken on the file ,[object Object],[object Object]
Fine grained control over confidentiality Redacted (unclassified)  Document Classified Document Provenance chain has sensitive info Declassify / release Deleting sensitive information will break integrity checks P1 P2 P3 P4 P1 P2 P3 P4 Nonsensitive Information Sensitive Information Original attributes commitment Disclosable provenance entry Nonsensitive Info Sensitive info Commit(sensitive info) Checksum calculation Blinded entry disclosed to third party Nonsensitive Info Commit(sensitive info)
We can summarizeprovenance chains to save space, make audits fast 1:1 chain n:1 chain Each entry has 1 checksum, calculated from 1 previous checksum Each entry has n checksums, each of them calculated from 1 previous checksum We can systematically remove entries from the chain while still being able to prove integrity of chain
Scenario 2: Multiple Documents How to verify exactly what happened?
Co-Provenance provides stronger guarantees Problem: We can only tell if a given provenance chain shows what could have happened, but not exactly what really happened Solution: Entangle chains of related documents Each entry now contains n checksums, each computed using the checksum of the last entry of the provenance chain of a related document
How to (efficiently) add secure provenance to  real file systems?
Implementation Options Secure provenance management can be implemented in Kernel layer Transparent to user apps But less portable File System layer Also transparent to user apps Less portable Application layer Needs (slight) modification of apps Highly portable
Implementation: SPROV library ,[object Object]
Record all write data, and related context information in the provenance entrySPROV is an application layer library in C Can be added to existing applications with almost no change to code Provides the file system APIs from stdio.h To add secure provenance, simply relinkapplications with sprov library instead of stdio.h 33
Implementation: How Sprov works Modified fopen / fwrite / fclose calls File I/O calls trigger provenance chain handling mechanism Calls to fwrite log the modifications Calls to fclose writes a new provenance entry, computes new checksum
Experiment goals Measure run time overhead of secure provenance Run the benchmarks with and without secure provenance Calculate the % overhead Observe the effects of different types of workloads
Experimental settings Crypto settings 1024 bit DSA signatures  128 bit AES encryption SHA-1 for hashes Experiments Postmark benchmark : performance of writes on small files Hybrid workload benchmark: performance with real life file system distribution and workloads 36
Experimental settings Chain storage: Chains stored as XML formatted meta-files Modes ,[object Object]
Config-RD: Provenance chains stored on RAM Disk buffer, and periodically saved to disk,[object Object]
Hybrid workloads: Simulating real file systems     File system distribution:  File size distribution in real file systems follows the log normal distribution [Bolosky and Douceur 99] We created a file system with 20,000 files, using the lognormal parameters mu = 8.46, sigma = 2.4 Median file size = 4KB , mean file size = 80KB In addition, we included a few large (1GB+) files 39
Hybrid workloads: Using Data from real systems  Workload ,[object Object]
RES: A Research lab (2.9% writes) [Roselli 00]

Contenu connexe

Similaire à Fake Picassos, Tampered History, and Digital Forgery: Protecting the Genealogy of Bits with Secure Provenance

Digital evidencepaper
Digital evidencepaperDigital evidencepaper
Digital evidencepaperwiwin_wuland
 
Digital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptxDigital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptxShubhamKadam807802
 
Digital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptxDigital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptxShubhamKadam807802
 
Lecture2 Introduction to Digital Forensics.ppt
Lecture2 Introduction to Digital Forensics.pptLecture2 Introduction to Digital Forensics.ppt
Lecture2 Introduction to Digital Forensics.pptSurajgroupsvideo
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Sound E-Discovery Collection Practices
Sound E-Discovery Collection PracticesSound E-Discovery Collection Practices
Sound E-Discovery Collection PracticesSeth Row
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Omitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataOmitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataTope Omitola
 
Forensics Investigation ReportAccuracy Inter.docx
Forensics Investigation ReportAccuracy Inter.docxForensics Investigation ReportAccuracy Inter.docx
Forensics Investigation ReportAccuracy Inter.docxlaquandabignell
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
 
Preserving Biological Evidence
Preserving Biological EvidencePreserving Biological Evidence
Preserving Biological EvidenceKate Loge
 

Similaire à Fake Picassos, Tampered History, and Digital Forgery: Protecting the Genealogy of Bits with Secure Provenance (20)

Security Basics
Security BasicsSecurity Basics
Security Basics
 
IT forensic
IT forensicIT forensic
IT forensic
 
Digital evidencepaper
Digital evidencepaperDigital evidencepaper
Digital evidencepaper
 
Digital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptxDigital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptx
 
Digital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptxDigital emerging trends in computer engineering Evidences.pptx
Digital emerging trends in computer engineering Evidences.pptx
 
Lecture2 Introduction to Digital Forensics.ppt
Lecture2 Introduction to Digital Forensics.pptLecture2 Introduction to Digital Forensics.ppt
Lecture2 Introduction to Digital Forensics.ppt
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Sound E-Discovery Collection Practices
Sound E-Discovery Collection PracticesSound E-Discovery Collection Practices
Sound E-Discovery Collection Practices
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Omitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataOmitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddata
 
Forensics Investigation ReportAccuracy Inter.docx
Forensics Investigation ReportAccuracy Inter.docxForensics Investigation ReportAccuracy Inter.docx
Forensics Investigation ReportAccuracy Inter.docx
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
paper9.pdf
paper9.pdfpaper9.pdf
paper9.pdf
 
sheet2.pdf
sheet2.pdfsheet2.pdf
sheet2.pdf
 
doc2.pdf
doc2.pdfdoc2.pdf
doc2.pdf
 
paper2.pdf
paper2.pdfpaper2.pdf
paper2.pdf
 
lecture1.pdf
lecture1.pdflecture1.pdf
lecture1.pdf
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Preserving Biological Evidence
Preserving Biological EvidencePreserving Biological Evidence
Preserving Biological Evidence
 

Plus de ragibhasan

Dw bobs-shikkhok
Dw bobs-shikkhokDw bobs-shikkhok
Dw bobs-shikkhokragibhasan
 
Security and Privacy in Cloud Computing - a High-level view
Security and Privacy in Cloud Computing - a High-level viewSecurity and Privacy in Cloud Computing - a High-level view
Security and Privacy in Cloud Computing - a High-level viewragibhasan
 
600.412.Lecture02
600.412.Lecture02600.412.Lecture02
600.412.Lecture02ragibhasan
 
600.412.Lecture03
600.412.Lecture03600.412.Lecture03
600.412.Lecture03ragibhasan
 
600.412.Lecture05
600.412.Lecture05600.412.Lecture05
600.412.Lecture05ragibhasan
 
600.412.Lecture07
600.412.Lecture07600.412.Lecture07
600.412.Lecture07ragibhasan
 
600.412.Lecture06
600.412.Lecture06600.412.Lecture06
600.412.Lecture06ragibhasan
 
600.412.Lecture08
600.412.Lecture08600.412.Lecture08
600.412.Lecture08ragibhasan
 
Lecture01: Introduction to Security and Privacy in Cloud Computing
Lecture01: Introduction to Security and Privacy in Cloud ComputingLecture01: Introduction to Security and Privacy in Cloud Computing
Lecture01: Introduction to Security and Privacy in Cloud Computingragibhasan
 

Plus de ragibhasan (9)

Dw bobs-shikkhok
Dw bobs-shikkhokDw bobs-shikkhok
Dw bobs-shikkhok
 
Security and Privacy in Cloud Computing - a High-level view
Security and Privacy in Cloud Computing - a High-level viewSecurity and Privacy in Cloud Computing - a High-level view
Security and Privacy in Cloud Computing - a High-level view
 
600.412.Lecture02
600.412.Lecture02600.412.Lecture02
600.412.Lecture02
 
600.412.Lecture03
600.412.Lecture03600.412.Lecture03
600.412.Lecture03
 
600.412.Lecture05
600.412.Lecture05600.412.Lecture05
600.412.Lecture05
 
600.412.Lecture07
600.412.Lecture07600.412.Lecture07
600.412.Lecture07
 
600.412.Lecture06
600.412.Lecture06600.412.Lecture06
600.412.Lecture06
 
600.412.Lecture08
600.412.Lecture08600.412.Lecture08
600.412.Lecture08
 
Lecture01: Introduction to Security and Privacy in Cloud Computing
Lecture01: Introduction to Security and Privacy in Cloud ComputingLecture01: Introduction to Security and Privacy in Cloud Computing
Lecture01: Introduction to Security and Privacy in Cloud Computing
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Fake Picassos, Tampered History, and Digital Forgery: Protecting the Genealogy of Bits with Secure Provenance

  • 1. Fake Picassos, Tampered History, and Digital Forgery:Protecting the Genealogy of Bits withSecure Provenance Ragib Hasan NSF Computing Innovation Fellow Dept. of Computer Science, Johns Hopkins University [Joint work with Marianne Winslett (UIUC) and Radu Sion(Stony Brook)] Computer Science Seminar, Johns Hopkins University February 2, 2010
  • 2. Let’s play a game Real, worth $101.8 million Fake, listed at eBay, worth nothing Can you spot the fake Picasso?
  • 3. So, how do art buyers authenticate art? Among other things, they look at Provenance records Provenance: from Latin provenire ‘come from’, defined as “(i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; a record of the ultimate derivation and passage of an item through its various owners” (Oxford English Dictionary) In other words,Who owned it, what was done to it, how was it transferred … Widely used in arts, archives, and archeology, called the Fundamental Principle of Archival http://moma.org/collection/provenance/items/644.67.html L'artiste et son modèle (1928), at Museum of Modern Art
  • 4. Provenance of a book Owner names recorded in sequential order on left side 4
  • 5. Let’s consider the Digital World Am I getting back untampered data? Was this data created and processed by persons I trust? Unlike data processing in the past, digital data can be rapidly copied, modified, and erased To trust data we receive from others or retrieve from storage, we need to look into the integrity of both the present state and the past historyof data Data is generated, processed, and transmitted between different systems and principals, and stored in database/storage Our life today has become increasingly dependent on digital data Our Most Valuable Asset is Data 5
  • 6. Data History is especially important for mobile data Celebrity death hoax Resulting tweet/retweet cycle Hoaxes propagates fast, so before retweeting, do check the original source
  • 7. The origin of data is important to measure its quality Dr. Evil Dr. Jones Dataset is used by many unsuspecting scientists to produce new data Unreliable experiments create faulty dataset Tainted data only produces more tainted results Reliable Lab Dr. Jones got the data from a lab he trusts, but the original data was tainted, so any data derived from it will also be tainted
  • 8. Bottom line: History matters … The origins of a data object can affect its quality and usefulness The transformation path that a data object took affects its trustworthiness In other words, data provenance is extremely important to decide data quality and trustworthiness
  • 9. What exactly is Data Provenance? Definition* Description of the origins of data and the process by which it arrived at the database. [Buneman et al.] Information describing materials and transformations applied to derive the data. [Lanter] Metadata recording the process of experiment workflows, annotations, and notes about experiments. [Greenwood] Information that helps determine the derivation history of a data product, starting from its original sources. [Simmhan et al.] 9 *Simmhan et al. A Survey of Provenance in E-Science. SIGMOD Record, 2005.
  • 10. Example provenance systems Simmhan et al., 2005
  • 11. What was the common theme of all those systems? They were all scientific computing systems And scientists trust people (more or less) Previous research covers provenance collection, annotation, querying, and workflow, but security issues are not handled For provenance in untrusted environments, we need integrity, confidentiality and privacy guarantees So, we need provenance of provenance, i.e. a model for Secure Provenance Data
  • 12. “History is not history unless it is the truth”. Abraham Lincoln, 1856
  • 13. Problem Statement Can we design an efficient system for securing the provenance of documents, and allow auditing with fine-grained preservation of confidentiality?
  • 14. Previous work on integrity assurances (Logically) centralized repository (CVS, Subversion, GIT) Changes to files recorded Not applicable to mobile documents File systems with integrity assurances (SUNDR, PASIS, TCFS) Provide local integrity checking Do not apply to data that traverses systems System state entanglement (Baker 02) Entangle one system’s state with another, so others can serve as witness to a system’s state Not applicable to mobile data Secure audit logs / trails (Schneier and Kelsey 99), LogCrypt (Holt 2004), (Peterson et al. 2006) Trusted notary certifies logs, or trusted third party given hash chain seed
  • 15. Secure provenance means preventing “Undetectable history rewriting” Preventundetectable modification of history, such that, Adversaries cannot insert fake events, remove genuine events from a document’s provenance No one can deny history of own actions Allow fine grained preservation of privacy and confidentiality of actions Users can choose which auditors can see details of their work Attributes can be selectively disclosed or hidden without harming integrity check 15
  • 16.
  • 17.
  • 18. Collude with others to add/remove entries
  • 19. Claim a chain belongs to another document
  • 20. Repudiate an entry Ragib Hasan, Radu Sion, and Marianne Winslett, “Introducing Secure Provenance: Problems and Challenges”, ACM StorageSS 2007
  • 21. Scenario 1: Securing a single document
  • 22. What realistic guarantees can we provide against a strong adversary? We can’t expect or force the adversary to play by the books Access control is not feasible in a loosely coupled distributed environment So, we can only verify plausibility of a given history
  • 23. Plausible history is something that could have happened Plausible history: The version history that will result if a document were created and subsequently edited and transferred from user to user, with provenance information correctly and indelibly recorded all along the way. A document can have multiple plausible histories
  • 24. Original scheme: Ensure Verification of Plausible History Given a document and a provenance chain, we want to verify if this could have happened Forger’s goal: To create a plausible history involving honest users
  • 25. Analogy: The Forger and the Prada Fake Prada, street price $50 Forger’s goal is to create a plausible history for a fake Prada bag (i.e. make it appear to come from real Prada distributors via real supply chain) Forger’s goal is NOT to replace a real Prada bag’s plausible history, to show it was made elsewhere. Real Prada, price $3000
  • 26. Our solution: Overview P1 P2 P3 P4 Pn-1 Pn Provenance Chain … U3 W3 K3 C3 Pub3 Provenance Entry Uid3 Pid3 Host3 IP3 time3 Ui= identity of the principal (lineage) Ci= integrity checksum(s) Ki = confidentiality locks for Wi Wi= Encrypted modification log
  • 27.
  • 28. Only the auditor(s) trusted by the user can see the user’s actions on the documentMultiple auditors Encrypted Modification log Modification log Encrypted Modification log Modification log Encrypted Modification log Optimization: Use broadcast encryption tree to reduce number of required keys
  • 29. Sidenote: Broadcast Encryption Organize the keys as a tree, with auditors at leaves Each principal knows all the keys from leaf to root A D B C To trust everyone, use the root key To trust only D To trust B and C To trust A and C and D
  • 30.
  • 31. Fine grained control over confidentiality Redacted (unclassified) Document Classified Document Provenance chain has sensitive info Declassify / release Deleting sensitive information will break integrity checks P1 P2 P3 P4 P1 P2 P3 P4 Nonsensitive Information Sensitive Information Original attributes commitment Disclosable provenance entry Nonsensitive Info Sensitive info Commit(sensitive info) Checksum calculation Blinded entry disclosed to third party Nonsensitive Info Commit(sensitive info)
  • 32. We can summarizeprovenance chains to save space, make audits fast 1:1 chain n:1 chain Each entry has 1 checksum, calculated from 1 previous checksum Each entry has n checksums, each of them calculated from 1 previous checksum We can systematically remove entries from the chain while still being able to prove integrity of chain
  • 33. Scenario 2: Multiple Documents How to verify exactly what happened?
  • 34. Co-Provenance provides stronger guarantees Problem: We can only tell if a given provenance chain shows what could have happened, but not exactly what really happened Solution: Entangle chains of related documents Each entry now contains n checksums, each computed using the checksum of the last entry of the provenance chain of a related document
  • 35. How to (efficiently) add secure provenance to real file systems?
  • 36. Implementation Options Secure provenance management can be implemented in Kernel layer Transparent to user apps But less portable File System layer Also transparent to user apps Less portable Application layer Needs (slight) modification of apps Highly portable
  • 37.
  • 38. Record all write data, and related context information in the provenance entrySPROV is an application layer library in C Can be added to existing applications with almost no change to code Provides the file system APIs from stdio.h To add secure provenance, simply relinkapplications with sprov library instead of stdio.h 33
  • 39. Implementation: How Sprov works Modified fopen / fwrite / fclose calls File I/O calls trigger provenance chain handling mechanism Calls to fwrite log the modifications Calls to fclose writes a new provenance entry, computes new checksum
  • 40. Experiment goals Measure run time overhead of secure provenance Run the benchmarks with and without secure provenance Calculate the % overhead Observe the effects of different types of workloads
  • 41. Experimental settings Crypto settings 1024 bit DSA signatures 128 bit AES encryption SHA-1 for hashes Experiments Postmark benchmark : performance of writes on small files Hybrid workload benchmark: performance with real life file system distribution and workloads 36
  • 42.
  • 43.
  • 44. Hybrid workloads: Simulating real file systems File system distribution: File size distribution in real file systems follows the log normal distribution [Bolosky and Douceur 99] We created a file system with 20,000 files, using the lognormal parameters mu = 8.46, sigma = 2.4 Median file size = 4KB , mean file size = 80KB In addition, we included a few large (1GB+) files 39
  • 45.
  • 46. RES: A Research lab (2.9% writes) [Roselli 00]
  • 49. EECS: (82% writes) [Ellard 03]40
  • 50. Typical real life workloads: 1 - 13% overhead EECS EECS CIFS-corp/eng CIFS-corp/eng INS, RES INS, RES Config-Disk Config-RD INS and RES are read-intensive (80%+ reads), so overheads are very low in both cases. CIFS-corp and CIFS-eng have 2:1 ratio of reads and writes, overheads are still low (range from 12% to 2.5%) EECS has very high write load (82%+), so the overhead is higher, but still less than 35% for Config-Disk, and less than 7% for Config-RD
  • 51. Recording both read and writes costs 12-16% for real-life workloads
  • 52. Application-specific history requires different techniques History integrity in databases Developed a solution that leverages WORM storage to provide history integrity for database tuples Incurs only 1%-3% overhead for TPC-C run-time Reduces window of vulnerability 5-fold Reduced audit times 100-fold over the previous state-of-the-art solution History integrity in cloud computing
  • 53. Beyond Provenance Provenance, versions, and snapshots are general cases of the property – remembrance By enabling different objects to recall their past, we can introduce richer functionality, for example: 1.5th factor authentication: Use “What you remember” to strengthen any other authentication factor Flexible security policies: Use knowledge of past of attributes/variables when evaluating security policies Ragib Hasan, Radu Sion, and Marianne Winslett, “Remembrance: The Unbearable Sentience of Being Digital”, CIDR 2009 44
  • 54. Papers Ragib Hasan, Radu Sion, and Marianne Winslett, “Introducing Secure Provenance: Problems and Challenges”, ACM StorageSS 2007 Ragib Hasan, Radu Sion, and Marianne Winslett, “The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance”, USENIX FAST 2009, ACM TOSDec 2009 issue Ragib Hasan, Radu Sion, and Marianne Winslett, “Remembrance: The Unbearable Sentience of Being Digital”, CIDR 2009 45
  • 55. Summary: Secure provenance possible at low cost Yes, We CAN achieve secure provenance with integrity and confidentiality assurances with reasonable overheads For most real-life workloads, overheads are between 1% and 15% only More info at http://tinyurl.com/secprov
  • 56. Usage of Provenance Data Quality – depends on input data and actions taken on data Replication – a recipe for recreating data Attribution – proper attribution of copyright Informational – learn how data was derived Audit Trails –proof of due diligence; investigation of problem sources, system intrusions, proof of prior work in patents 47