SlideShare une entreprise Scribd logo
1  sur  23
Selecting efficient and reliable preservation strategies:
modeling long-term information integrity using large-scale hierarchical
discrete event simulation
Micah Altman
MIT Libraries
Richard Landau
Program on Information Science
1
Prepared for
IDCC 20200
Dublin
< http://tiny.cc/IDCC2020Altman >
ARXIV Preprint:
[1912.07908] Selecting efficient and reliable preservation strategies:
modeling long-term information integrity using large-scale hierarchical
discrete event simulation
Related Work
2
Abstract
This article addresses the problem of formulating efficient and
reliable operational preservation policies that ensure bit-level
information integrity over long periods, and in the presence of a
diverse range of real-world technical, legal, organizational, and
economic threats. We develop a systematic, quantitative prediction
framework that combines formal modeling, discrete-event-based
simulation, hierarchical modeling, and then use empirically
calibrated sensitivity analysis to identify effective strategies.
Specifically, the framework formally defines an objective function for
preservation that maps a set of preservation policies and a risk
profile to a set of preservation costs, and an expected collection
loss distribution. In this framework, a curator’s objective is to select
optimal policies that minimize expected loss subject to budget
constraints. To estimate preservation loss under different policy
conditions optimal policies, we develop a statistical hierarchical risk
model that includes four sources of risk: the storage hardware; the
physical environment; the curating institution; and the global
environment. We then employ a general discrete event-based
simulation framework to evaluate the expected loss and the cost of
employing varying preservation strategies under specific
parameterization of risks.
The framework offers flexibility for the modeling of a wide range of
preservation policies and threats. Since this framework is open
source and easily deployed in a cloud computing environment, it
can be used to produce analysis based on independent estimates
of scenario-specific costs, reliability, and risks.
We present results summarizing hundreds of thousands of
simulations using this framework. This analysis points to a number
of robust and broadly applicable preservation strategies, provides
novel insights into specific preservation tactics, and provides
evidence that challenges received wisdom.
3
Shifting Economics of Digital Information
4
Going digital changes economics of long term access
• Computation is cheap
• Replication is cheap
• Conservation
(of media, hardware)
is expensive
Multi-Level Threat Modeling
5
Major correlated failureShock
● Latent
● Induces immediate server failure
● May raise rate of server failure
Replica failureServer
● Deletected on server or file audit
● Entire replica of collection is lost
Environmental Conditions
Glitche
s
● Latent (Invisible)
● Periodic changes
● Increases sector error rate
Corrupts portion of documentSector
● Detected only on file audit (silent)
● Related to storage quality
HARDWAREINSTITUTION
Core Preservation Actions
● Replication
● Auditing
● Repair
● Transformations
○ Compression
○ Encryption
○ Reformatting
6
Cost Modeling
Cost(C,S)= f(storage(C,S), communications(C,S),
Replicas(S))
Simplifications:
- Each separate replication imposes a fixed cost
- Storage cost is linear in (compressed) collection size
- Communication is linear in collection size; audit frequency
- Other computation costs are negligible
7
Characterizing Preservation as Optimization
Given
● A collection (C), of documents ={D1..DN};
● A budget (B)
● Distribution of threats P()
Choose
A preservation strategy (S) =
{Copies, AuditMethod,
RepairFrequency,
FileTransformation}
8
Optimize
Choose the optimal strategy, S*, to minimize
collection loss, within the budget
minS∗ ∋ S E(Loss(C,S*)) | Cost(C,S*)≤ B
9
Modeling Damage to Files
10
Layer Role Visibility Distribution Lower
Frequency
Higher
Frequency
(lower
severity)
Storage
Hardware
(Sector)
Causes sector
error / single
document loss
Silent. Poisson event Controller
failure
Media
corruption.
Local
environment
(Glitch)
Increases rate of
storage error
Invisible. Poisson event of
some duration
HVAC failure Power spikes
The Best Disk Hardware is Not Enough -- Make Copies
11
5 Copies + Systematic Annual Auditing
is Sufficient Protection from Hardware Errors
12
Without auditing 20 copies are necessary to
prevent loss over a century
With simple annual auditing
5 Copies are Sufficient
13
Modeling Institution-Level Failures
14
Layer Role Visibility Distribution Lower Frequency Higher
Frequency
(lower severity)
Institution
(Server Failure)
Causes loss of a
single copy of a
collection
Silent. Exponential Lifetime Ransomware
Business failure
Curator error.
Billing error
Macro
Environment
(Major Shock)
Increases rate of
server failure
Invisible. Poisson duration Corporate
Mergers
Recession
Immediate loss of
multiple servers
Silent or visible Poisson event Government
Suppression
Regional war
Annual Auditing Does Not Protect Against Server Failure
15
Annual Audits
Significant collection Loss over Long Term
Dividing Audit into Quarterly Segments Controls
Risk
Protection for Major Recessions and Minor Wars
16Weekly Server Auditing Protects Against Triple Simultaneous Server Failures
17
Managing File Encryption with Key Replication
Challenge
● Entire collection is encrypted,
using a set of E encryption keys
● If all keys are destroyed,
collection is lost
18
Modeling Risk
● Given key size, risk of loss
(not corruption) dominates
● Model key failure as ‘server’ failure
● Audit action:
○ Challenge key-holder to prove it can decrypt
Results
➔ Replicate encryption keys (or partial shared secrets) across independent holders
➔ Shock size and frequency are driving factors
➔ Shock-resistant auditing strategy is sufficient
Managing Format Failure with Reader Verification
Challenge
● Documents are encoded
○ K formats in collection
● If format cannot be interpreted
○ Collection is loss
19
Modeling Risk
● Model format failure as ‘server’ failure
○ Each format F is maintained by S servers
○ Each S holds an executable reader that
can read documents in that format
● Audit action:
○ Challenge server to prove it can read file
Results
➔ Migrate formats when number of functioning readers is below replication threshold
➔ Shock size and frequency are driving factors
➔ Shock-resistant auditing strategy is sufficient
Compress or Repair?
Benefits of compression
● Smaller document →
reduce risk from hardware errors
● Smaller collection →
more replicas can be purchased
and audited for fixed cost
Risks of compression
● Increased fragility →
single error destroys document
● Compression format must be
managed for format obsolescence
20
Modeling
● Treat large repairable documents as
collection of R smaller non-repairable
documents
● Estimate using compression ratios for most
common compression formats
Result: Compress
+ 1 Replica
Repairability
Compressability
21
Bottom Line
22
for Memory Institutions
● Replicate
● Don’t fear the cloud
● Diversify across institutions
● Audit regularly and completely
● Audit storage, formats, secrets
● Compress
for Vendors
● Forget 11 nines …
reveal replication strategy
● Collect and share loss rates
● Support auditing primitives
● Disclose institutional
dependencies
23

Contenu connexe

Similaire à Selecting efficient and reliable preservation strategies

ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and OptimizeISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and OptimizeChristoph Adler
 
Block-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplicationBlock-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplicationIRJET Journal
 
Agile performance engineering with cloud 2016
Agile performance engineering with cloud   2016Agile performance engineering with cloud   2016
Agile performance engineering with cloud 2016Ken Chan
 
Eniac – Lotus Consolidation 2009
Eniac – Lotus Consolidation   2009Eniac – Lotus Consolidation   2009
Eniac – Lotus Consolidation 2009Edwin Kanis
 
Smarter Backup
Smarter BackupSmarter Backup
Smarter BackupIBM
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceInformation Security Awareness Group
 
Are You Protected From Downtime and Data Loss?
Are You Protected From Downtime and Data Loss? Are You Protected From Downtime and Data Loss?
Are You Protected From Downtime and Data Loss? Lai Yoong Seng
 
AWS Well-Architected Framework (nov 2017)
AWS Well-Architected Framework (nov 2017)AWS Well-Architected Framework (nov 2017)
AWS Well-Architected Framework (nov 2017)Rick Hwang
 
Maximizing Business Continuity Success
Maximizing Business Continuity SuccessMaximizing Business Continuity Success
Maximizing Business Continuity SuccessSymantec
 
Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...
Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...
Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...Michael Hudak
 
Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29Navid Abbaspour
 
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery:  Understanding Trend, Methodology, Solution, and StandardDisaster Recovery:  Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery: Understanding Trend, Methodology, Solution, and StandardPT Datacomm Diangraha
 
Delivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware EnvironmentsDelivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware EnvironmentsPaula Koziol
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsStorage Switzerland
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsDenodo
 
Analysis on Deduplication Techniques for Storage of Data in Cloud
Analysis on Deduplication Techniques for Storage of Data in CloudAnalysis on Deduplication Techniques for Storage of Data in Cloud
Analysis on Deduplication Techniques for Storage of Data in CloudIRJET Journal
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5Robert Grossman
 

Similaire à Selecting efficient and reliable preservation strategies (20)

ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and OptimizeISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
 
Block-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplicationBlock-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplication
 
Agile performance engineering with cloud 2016
Agile performance engineering with cloud   2016Agile performance engineering with cloud   2016
Agile performance engineering with cloud 2016
 
Eniac – Lotus Consolidation 2009
Eniac – Lotus Consolidation   2009Eniac – Lotus Consolidation   2009
Eniac – Lotus Consolidation 2009
 
Smarter Backup
Smarter BackupSmarter Backup
Smarter Backup
 
Cloud Design Patterns
Cloud Design PatternsCloud Design Patterns
Cloud Design Patterns
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security Alliance
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Are You Protected From Downtime and Data Loss?
Are You Protected From Downtime and Data Loss? Are You Protected From Downtime and Data Loss?
Are You Protected From Downtime and Data Loss?
 
AWS Well-Architected Framework (nov 2017)
AWS Well-Architected Framework (nov 2017)AWS Well-Architected Framework (nov 2017)
AWS Well-Architected Framework (nov 2017)
 
Maximizing Business Continuity Success
Maximizing Business Continuity SuccessMaximizing Business Continuity Success
Maximizing Business Continuity Success
 
Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...
Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...
Net App D2 D Backupwith Snap Vaultand Ossv Customer Strategic Presentation201...
 
Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29
 
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery:  Understanding Trend, Methodology, Solution, and StandardDisaster Recovery:  Understanding Trend, Methodology, Solution, and Standard
Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard
 
Delivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware EnvironmentsDelivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware Environments
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be Backups
 
Knowledge is Power - Richard May, Raritan
Knowledge is Power - Richard May, RaritanKnowledge is Power - Richard May, Raritan
Knowledge is Power - Richard May, Raritan
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large Deployments
 
Analysis on Deduplication Techniques for Storage of Data in Cloud
Analysis on Deduplication Techniques for Storage of Data in CloudAnalysis on Deduplication Techniques for Storage of Data in Cloud
Analysis on Deduplication Techniques for Storage of Data in Cloud
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
 

Plus de Micah Altman

Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Micah Altman
 

Plus de Micah Altman (20)

Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
 

Dernier

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.CarlotaBedoya1
 

Dernier (20)

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
 

Selecting efficient and reliable preservation strategies

  • 1. Selecting efficient and reliable preservation strategies: modeling long-term information integrity using large-scale hierarchical discrete event simulation Micah Altman MIT Libraries Richard Landau Program on Information Science 1 Prepared for IDCC 20200 Dublin < http://tiny.cc/IDCC2020Altman >
  • 2. ARXIV Preprint: [1912.07908] Selecting efficient and reliable preservation strategies: modeling long-term information integrity using large-scale hierarchical discrete event simulation Related Work 2
  • 3. Abstract This article addresses the problem of formulating efficient and reliable operational preservation policies that ensure bit-level information integrity over long periods, and in the presence of a diverse range of real-world technical, legal, organizational, and economic threats. We develop a systematic, quantitative prediction framework that combines formal modeling, discrete-event-based simulation, hierarchical modeling, and then use empirically calibrated sensitivity analysis to identify effective strategies. Specifically, the framework formally defines an objective function for preservation that maps a set of preservation policies and a risk profile to a set of preservation costs, and an expected collection loss distribution. In this framework, a curator’s objective is to select optimal policies that minimize expected loss subject to budget constraints. To estimate preservation loss under different policy conditions optimal policies, we develop a statistical hierarchical risk model that includes four sources of risk: the storage hardware; the physical environment; the curating institution; and the global environment. We then employ a general discrete event-based simulation framework to evaluate the expected loss and the cost of employing varying preservation strategies under specific parameterization of risks. The framework offers flexibility for the modeling of a wide range of preservation policies and threats. Since this framework is open source and easily deployed in a cloud computing environment, it can be used to produce analysis based on independent estimates of scenario-specific costs, reliability, and risks. We present results summarizing hundreds of thousands of simulations using this framework. This analysis points to a number of robust and broadly applicable preservation strategies, provides novel insights into specific preservation tactics, and provides evidence that challenges received wisdom. 3
  • 4. Shifting Economics of Digital Information 4 Going digital changes economics of long term access • Computation is cheap • Replication is cheap • Conservation (of media, hardware) is expensive
  • 5. Multi-Level Threat Modeling 5 Major correlated failureShock ● Latent ● Induces immediate server failure ● May raise rate of server failure Replica failureServer ● Deletected on server or file audit ● Entire replica of collection is lost Environmental Conditions Glitche s ● Latent (Invisible) ● Periodic changes ● Increases sector error rate Corrupts portion of documentSector ● Detected only on file audit (silent) ● Related to storage quality HARDWAREINSTITUTION
  • 6. Core Preservation Actions ● Replication ● Auditing ● Repair ● Transformations ○ Compression ○ Encryption ○ Reformatting 6
  • 7. Cost Modeling Cost(C,S)= f(storage(C,S), communications(C,S), Replicas(S)) Simplifications: - Each separate replication imposes a fixed cost - Storage cost is linear in (compressed) collection size - Communication is linear in collection size; audit frequency - Other computation costs are negligible 7
  • 8. Characterizing Preservation as Optimization Given ● A collection (C), of documents ={D1..DN}; ● A budget (B) ● Distribution of threats P() Choose A preservation strategy (S) = {Copies, AuditMethod, RepairFrequency, FileTransformation} 8 Optimize Choose the optimal strategy, S*, to minimize collection loss, within the budget minS∗ ∋ S E(Loss(C,S*)) | Cost(C,S*)≤ B
  • 9. 9
  • 10. Modeling Damage to Files 10 Layer Role Visibility Distribution Lower Frequency Higher Frequency (lower severity) Storage Hardware (Sector) Causes sector error / single document loss Silent. Poisson event Controller failure Media corruption. Local environment (Glitch) Increases rate of storage error Invisible. Poisson event of some duration HVAC failure Power spikes
  • 11. The Best Disk Hardware is Not Enough -- Make Copies 11
  • 12. 5 Copies + Systematic Annual Auditing is Sufficient Protection from Hardware Errors 12 Without auditing 20 copies are necessary to prevent loss over a century With simple annual auditing 5 Copies are Sufficient
  • 13. 13
  • 14. Modeling Institution-Level Failures 14 Layer Role Visibility Distribution Lower Frequency Higher Frequency (lower severity) Institution (Server Failure) Causes loss of a single copy of a collection Silent. Exponential Lifetime Ransomware Business failure Curator error. Billing error Macro Environment (Major Shock) Increases rate of server failure Invisible. Poisson duration Corporate Mergers Recession Immediate loss of multiple servers Silent or visible Poisson event Government Suppression Regional war
  • 15. Annual Auditing Does Not Protect Against Server Failure 15 Annual Audits Significant collection Loss over Long Term Dividing Audit into Quarterly Segments Controls Risk
  • 16. Protection for Major Recessions and Minor Wars 16Weekly Server Auditing Protects Against Triple Simultaneous Server Failures
  • 17. 17
  • 18. Managing File Encryption with Key Replication Challenge ● Entire collection is encrypted, using a set of E encryption keys ● If all keys are destroyed, collection is lost 18 Modeling Risk ● Given key size, risk of loss (not corruption) dominates ● Model key failure as ‘server’ failure ● Audit action: ○ Challenge key-holder to prove it can decrypt Results ➔ Replicate encryption keys (or partial shared secrets) across independent holders ➔ Shock size and frequency are driving factors ➔ Shock-resistant auditing strategy is sufficient
  • 19. Managing Format Failure with Reader Verification Challenge ● Documents are encoded ○ K formats in collection ● If format cannot be interpreted ○ Collection is loss 19 Modeling Risk ● Model format failure as ‘server’ failure ○ Each format F is maintained by S servers ○ Each S holds an executable reader that can read documents in that format ● Audit action: ○ Challenge server to prove it can read file Results ➔ Migrate formats when number of functioning readers is below replication threshold ➔ Shock size and frequency are driving factors ➔ Shock-resistant auditing strategy is sufficient
  • 20. Compress or Repair? Benefits of compression ● Smaller document → reduce risk from hardware errors ● Smaller collection → more replicas can be purchased and audited for fixed cost Risks of compression ● Increased fragility → single error destroys document ● Compression format must be managed for format obsolescence 20 Modeling ● Treat large repairable documents as collection of R smaller non-repairable documents ● Estimate using compression ratios for most common compression formats Result: Compress + 1 Replica Repairability Compressability
  • 21. 21
  • 22. Bottom Line 22 for Memory Institutions ● Replicate ● Don’t fear the cloud ● Diversify across institutions ● Audit regularly and completely ● Audit storage, formats, secrets ● Compress for Vendors ● Forget 11 nines … reveal replication strategy ● Collect and share loss rates ● Support auditing primitives ● Disclose institutional dependencies
  • 23. 23

Notes de l'éditeur

  1. Alex
  2. Alex
  3. Note: Sector error refers to logical sector error. RAID etc is abstracted into sector error. Assume that hardware is maintained and after service life -- hardware left-in-place will be subject to bathtub curve failures. Document size is important, but effects on risk are absolutely proportional. Baseline model assumes 50M document. Failure in larger documents can be readily estimated by proportionally increasing predicted failure rate.
  4. A single
  5. Notes: Sampling only a percentage, or sampling with replacement is not a good strategy Auditing based on use is even worse strategy Can randomize order of auditing -- but must be systematic coverage of collection during interval
  6. Note: Sector error refers to logical sector error. RAID etc is abstracted into sector error. Assume that hardware is maintained and after service life -- hardware left-in-place will be subject to bathtub curve failures. Document size is important, but effects on risk are absolutely proportional. Baseline model assumes 50M document. Failure in larger documents can be readily estimated by proportionally increasing predicted failure rate.
  7. Note: Assumes infinite sector life -- best case! Half life of firms is 3-7 years n most sectors,
  8. Notes: Include threats from targeted attacks from resourced adversary Caveat -- replication alonge cannot prevent attack on integrity of the auditing system -- need a cryptographic protocol Byzantine failues -- a system needs 3N*1 nodes to be safe against subverted N subverted verifiers System of 7 replicas is safe against shock = 3 even with 1 extra node subverted