My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
1. Department of Internal Affairs
Time Traveling Analyst: The Things Only
a Time Machine Can Tell Me…
Ross Spencer - @beet_keeper
Archives New Zealand
#ARANZ2015
Tuesday September 7 2015
2. Department of Internal Affairs
Sun image, R24685027, E4, Archway,
Archives New Zealand.
http://www.archway.archives.govt.nz/ViewFullItem.do?
code=24685027&digital=yes
3. Department of Internal Affairs
Background
Two sets of born-digital ingest, Minister's Papers, 'code-named', E1
and E4, E2 and E3.
First sets selected for simplicity.
Second sets followed numerical sequence and were used as a
learning exercise.
Complexity grew.
First sets enabled creation of CSV ingest mechanism, configuration
of Rosetta, creation of process.
Second sets enabled the proof of that method.
4. Department of Internal Affairs
●
E1~
●
175 Files
●
10 Directories
●
0 Unidentified Objects
●
0 Unidentified Extensions
●
7 Known Formats
N.B. E4 also contained two
identification false positives.
●
E4~
●
1295 Files
●
6 Directories
●
2 Unidentified Objects
●
1 Unidentified Extensions
●
12 Known Formats
Approximate collection breakdowns at the
beginning of the process…
Approximate collection breakdowns at the
beginning of the process…
5. Department of Internal Affairs
Approximate collection breakdowns at the
beginning of the process…
• E2~
• 2519 Files
• 177 Directories
• 5 Unidentified Objects
• 4 Unidentified Extensions
• 22 Known Formats
• 25 Extension Mismatches
• E3~
• 1748 Files
• 144 Directories
• 8 Unidentified Objects
• 5 Unidentified Extensions
• 12 Known Formats
• 37 Extension Mismatches
N.B. Both collections
contained empty folders,
empty files, and multiple-id
formats.
6. Department of Internal Affairs
Let's begin with a story...
E1, the simplest... Enabled us to develop an ingest mechanism for
heterogeneous collections – and it worked!
E4, not that different, slightly larger, about as 'known', but!
An unexpected exception discovered in the relationship between
the preservation system and some of the filenames in the
collection...
9. Department of Internal Affairs
We had filenames with multiple spaces in
them...
E.g. 'A [space] [space] Filename.docx'
An innocuous enough looking problem... Our digital
preservation system couldn't handle them...
Investigate the system...
...
Confirm it's the system...
…
Ask vendor to fix the problem...
…
No fix forthcoming for next release...
10. Department of Internal Affairs
What now...?
Change filenames?
...
Serious change, this is how we received them!
…
Record provenance...
…
Mechanisms in METS metadata schema [EVENT]
…
How to implement?
11. Department of Internal Affairs
We continue...
Configure CSV to handle EVENT fields...
...
Modify CSV generation tool to output blank EVENT fields...
…
Test ingest in system until configuration is perfected
…
Mechanism works so pre-condition filenames...
...
Record R-Numbers* and design provenance note controlled list...
…
Add data to CSV
…
DONE!!!!
*Dependency on listing being fixed in Archway
13. Department of Internal Affairs
Test in digital preservation system fails...
...
UTF-8 character encoding...
…
How to preserve in Excel?
…
…
Import using special ribbon in Excel...
…
Add notes to sheet...
…
DONE?!
…
Not even now... >.<
Nope...
14. Department of Internal Affairs
It can become exhausting...
As a speaker! And for the audience!!! ^_^;
...Time and date based data becomes a problem...
...Asking non-expert users to do the same...
...Even power tools like Open Office suffer issues...
...E4 went in after solving the UTF-8 issues...
...E2 and E3 suffered from issues with time/date information on top
16. Department of Internal Affairs
The work isn't straight-forward
● It Pushes out time-frames...
● And the problems we're solving aren't what we expected...
● We need to develop with the problem...
17. Department of Internal Affairs
But we have new tools...
Tools to create provenance information in CSV for ingest into the
digital preservation system.
Tools to identify files with this issue up front.
The digital preservation system is fixed, so this specific use-case
for us is unlikely to occur again.
We have gained new experience.
For E2 and E3, we created mechanisms of creating an ingest
'mash-up' using a separate provenance spreadsheet.
For our next ingest we have a macro to automate an Excel
import!!!!! ← IN MICROSOFT?!!!!
18. Department of Internal Affairs
We have what seems like an exhaust-less
list...
●
[Tools] Ability to handle multi-byte character encodings. Maori macrons,
‘Ā’, in DROID, digital preservation system, spreadsheets, etc. .
• [Tools] Unidentified files and false positives - contribute to
[Tools] Zero-byte files, empty folders
●
[Tools] System files
• [Tools] Digital preservation system’s capabilities; dates, delivery,
metadata extraction, etc.
• [Files] Invalid objects
• [Files] Templates, objects with auto-fields
19. Department of Internal Affairs
And we'd never have guessed these up
front...
● What are the next challenges?
● We'd be too conservative, or too O.T.T...
●WE NEED A TIME
MACHINE!!!
21. Department of Internal Affairs
We don't need a time machine at all...
● We need evidence!
● We need to practice!
● We need to do!
● Time-frames will be pushed out
● In a world that loves strategy, it's
terribly detail focused.
● Can someone figure it out first?
● Definition of Leadership!
● But you will almost certainly find
new exceptions... as will we.
22. Department of Internal Affairs
Ground process and policy in the real
world…
● We can reduce surprises...
● But we can't reduce them zero...
● Find the exceptions, create rules, and encode them
in those policies...
● Move one step at a time, with modes increments.
● Flexible endpoints / reasonable / multiple goals...
● Q. HOW DID WE GET THESE FILES??
● A. It doesn't matter, we have to deal with them...
24. Department of Internal Affairs
Writing these documents becomes a much
more advanced thought experiment with a
greater number of inputs from a greater
number of people, and experiences...
25. Department of Internal Affairs
Robustness Principle... (Postel's Law)
e.g. checksums
“Be conservative in what you do; be liberal in what you accept
from others.”
Follow standards... mechanisms should accept non-conforming
input as long as the meaning is clear...
Be prepared to understand material, be prepared to manage it.
A way of doing things... not the only way... WRITE OTHER
SOLUTIONS! RE-WRITE YOUR SOLUTIONS!
26. Department of Internal Affairs
Other tools for you...
DROID (National Archives UK):
http://www.nationalarchives.gov.uk/information-management/manage-information/policy-proce
Or Siegfried (State Records NSW): https://github.com/richardlehane/siegfried
DROID Analysis Tool: https://github.com/exponential-decay/droid-sqlite-analysis
Other presentations: http://www.slideshare.net/RossSpencer/presentations
Blogs (Open Preservation Foundation):
http://openpreservation.org/knowledge/blogs/
Record Keeping Tookit (Archives New Zealand):
http://www.records.archives.govt.nz/
28. Department of Internal Affairs
Who do digital preservation analysts
want to drink a beer with?
29. Department of Internal Affairs
Commander Hadfield!
https://twitter.com/cmdr_hadfield
TED:
What I learned from going blind in space?
Star Talk:
http://www.startalkradio.net/show/social-media-i
30. Department of Internal Affairs
It’s almost comical that astronauts are stereotyped as daredevils and
cowboys. As a rule, we’re highly methodical and detail-oriented. Our
passion isn’t for thrills but for the grindstone, and pressing our noses to
it. We have to: we’re responsible for equipment that has cost taxpayers
many millions of dollars, and the best insurance policy we have on our
lives is our own dedication to training. Studying, simulating, practicing
until responses become automatic—astronauts don’t do all this only to
fulfill NASA’s requirements. Training is something we do to reduce the
odds that we’ll die.”
― Chris Hadfield, An Astronaut's Guide to Life on Earth
The Right Stuff