SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Presentation to
BEST PRACTICES EXCHANGE 2015
Rescuing Early Digital AssetsRescuing Early Digital Assets
andand
Preserving Data Rescue CapabilitiesPreserving Data Rescue Capabilities
chris@mullermedia.com
First, a great pre-digital example of
data rescue and preservation.
12,000 pages of New Amsterdam
records sat in an Albany vault for
over 300 years. Re-discovered in
1973, leading to Russell Shorto's
wonderful book "Island at the
Center of the World".
Fortunately, some visual records have had the
“Luxury of Languishing” (LOL, as we say.  )
“McMoons”
Lunar Orbiter Image Recovery.
set up a lab in a former McD building. It was
an amazing challenge to revive and in some
cases fully create the needed capabilities.
Dennis Wingo said this project has “opened a
door that we call ‘Technoarchaeology’… The
preservation of our digital heritage, both
hardware and software is a growing need …”
See http://www.moonviews.com
Analogue magnetic tapes of a very rare sort, created in the
60’s. Further efforts to recover/re-purpose the data in the
1980’s. Funding lapsed. Equipment sat in Nancy Evan’s
garage until 2007. New project under lead of Dennis Wingo
Why be extra-aggressive regarding
older digital materials?
Because TIME is the Phantom Menace…
A. Time and fragile storage media can cause valuable
records to decay on the shelf. (No LOL.)
B. Time and migration to new computers, software and
media renders older electronic records incompatible and
unusable on the new systems.
C. Time, down-sizing and rapidly changing priorities lead to
undocumented files that are difficult to decipher.
D. Let’s not forget Time’s creepy ally… The same that most
historians now believe destroyed the Great Library and
many other collections: Lack of Attention.
Technology’s VaporVapor TrailTrail
Time & Computer Media (very rough scale)
Techno-
Rocket
Our “Vapor Trail”
of Older Information
Time
A not-so-good kind of Cloud Computing. In “ancient” times
(10-40 years ago), computer storage was expensive and
difficult to use. So most data had actual value. Without MP3s,
videos, tweets, porn and politics, it often yields a richer
harvest than thought. But it’s getting harder to retrieve.
Obstacles to Recovery and Conversion.
It's a bit like peeling back the layers of an onion:
1. Media Compatibility
2. Age, Chemistry, Storage Conditions
3. Recording Method keeping the door open
4. Operating System/Filing System
5. Backup, Exchange, or Archiving Software
6. File Structure
7. File Encoding
Lots and lots of different
onions in that vapor trail…
Media Compatibility
Who would have thought it would be hard to find a PC
with a floppy drive this soon? Let alone 9-track reels.
Dust Gravity Chemistry, Heat, Humidity
Age & Storage Conditions
Media OK for day-to-day backup is often not so for archiving.
9-track reel 800 bpi NRZI ...1600bpi PE ... 3200 PE ... 6250 bpi GCR ...
(even older: 7-track 556 & 800 bpi)
3480/90/9E cart. 18 track = 200mb/400mb 36 track = 800mb/1600mb
DLT 10/20gb … 15/30gb … 20/40gb … 35/70gb 40/80gb … SDLT 220, etc
1/2” Linear Tape (mostly Mainframes, Minis and Large Servers)
0 pp--- -- --- --- ---- --- --- --- -- ----cc parity
1 pp---- - -- - --- - -- - ---- --- -------cc data
2 pp----- -- --- --- --- ---- -- --- -- ---cc data
3 pp--- -- -- --- -- -- -- -- -- ----- ----cc data
4 pp-- --- - --- --------- --- --- -- -- --cc data
5 pp-- ------ -- -- -- --- ---- ---- -- ---cc data
6 pp---- - ---- --- ---- --- --- --- -- ---cc data
7 pp-- ---- --- --- --- --- ---- --- -- ---cc data
8 pp--- --- ---- --- --- ---- -- ---- - ---cc data
preamble checksum
Muller Media Conversions www.mullermedia.com
LTO 100/200gb … 200/400gb … 400/800gb … 800/1600 and so on
Recording Method
QIC serpentine recording 4 track, 9 track, 18 track, etc. from 11mb to 24 gb
(also QIC-80 mini-cartridge, Travan, etc.)
8mm (Exabyte) helical scan recording 2.3gb … 14gb … Mammoth … AIT
4mm DAT helical scan recording DDS, DDS-C, DDS-2,3,4,5… 1 to 72gb
Lighter-Duty Tape Drives (mostly PCs, small Minis, Servers)
Muller Media Cnversions www.mullermedia.com
Recording Method (continued)
The way files are stored and identified-for example:
• Folder/directory organization.
• How long its name can be. What characters are valid.
• File types and their associations.
• How creation date or date of last access are saved.
• How space is allocated for a file.
• Indexed (ISAM) file operations--part of O/S?
Operating/Filing System
Backup software is often not best for archiving or data
exchange. Software intended for exchange of data with
other systems (e.g.- ANSI or IBM Standard, TAR, etc.)
produce the most widely readable tapes, but you can lose
valuable metadata that way.
Next best choice is often the O/S’s standard backup
program, such as Wang VS Backup, VAX VMS Backup,
Windows Backup.
(We are speaking here of older systems. Newer environments often use
software and methods to overcome or at least change these problems.)
Backup, Exchange, or
Archiving Software
Database, word-processing and other complex file types are
generally not organized sequentially, even within each file.
Most databases use some form of index-sequential layout.
Wang and MultiMate word-processing files used an internal
chain to control their organization. (WordStar, WordPerfect
and some others are sequential.) Microsoft Word and many
others applications stored information within a number of
data streams using a technology called OLE Structured
Storage. (Have since moved to XML-based.)
One generally wants to get databases into a flat or
delimited sequential format prior to preservation.
A good sequential storage format for W.P. files is (was) RTF.
File Structure
DATABASES: ASCII, EBCDIC, BCD, PACKED, FLOAT, INTEGER, etc.
WORD PROCESSORS: ASCII as a base. (Except for older IBM). But
that’s just the tip of the iceberg. Not only the codes vary but the way in which
they’re used. One simple example:
An underlined word. It may print like that, but internally it might be:
An underlined word. Extra bit set in u/l’d chars (Wang)
An underlined word. Same code toggles on/off. (WordStar)
An underlined word. Different code starts/ends.
An underlinedW word. Word underscore code (IBM)
An u<_n<_d<_e<_r<_...word. Char/backspace/’score (really old ones)
An underlined word. Attribute pointers from different part of file.
^…………^ (MS Word)
^…………………………...
File Encoding
WASHINGTON-- A slice of America's history has become as
unreadable as Egyptian hieroglyphics before the
discovery of the Rosetta stone. Vast untold volumes of
historic, scientific and business data are in danger of
dissolving into a meaningless jumble of letters, numbers
and computer symbols. Much information from the last
30 years is stranded on computer tape from primitive or
discarded systems-unintelligible or soon to be so…
...ASSOCIATED PRESS JANUARY 3, 1991
www.mullermedia.com
”The Nation’s Records Are a Mess”
Clinton + Moscow = ?
Today, we’d tend to think
of something like this:
But in 1969, a young fellow had visited Moscow, and
now (1991) he was becoming a contender for a
presidential nomination. Political rivals curious. FOIA
request. Old State Department tape.
Some pre-Whitewater fun.
This was fun, but our real exposure to the world of digital preservation
and the great people involved in it was yet to come. (See next slide.)
Problem: un-documented file format (a sadly
common occurrence). But the D.A.’s office
found some folks who enjoyed hacking away.
IAC/IRM honors federal I/T leaders
Fynette Eaton of the National Archives
and Records Administration's Center for
Electronic Records receives a GSA
Technology Excellence Award for
developing the Archival Preservation
System.…
…GOVERNMENT COMPUTER NEWS, JUNE, 1996
Next year, NARA issued an RFP for APS
FDR’s Fireside Chats. What the heck ?
Met great people—including Ken
Thibodeau and Fynette Eaton—and
were thrilled to begin a long, happy
relationship with NARA.
During the due-diligence process, the firm
requests data recovery from some folks in
New York. Obviously, the perp was not a big
fan of digital preservation—and he did not
realize that good old Wang 8 inch floppies are
not so easily erased. (Sorry, no details permitted.)
In the 1980’s at a law firm in Little Rock,
Arkansas, someone did legal work for a
real estate project named “Castle Grande”.
By early 1994, all related records had
mysteriously been “vacuumed” from the
firm’s paper files and computer systems.
Some real Whitewater stuff
In the early 1970’s, a Whitehouse mainframe used a
proprietary database format to store presidential
appointment calendar and notes. Two problems:
(a) 25-year-old tape was in danger of decay, and
(b) data format never been figured out.
Asked to puzzle it out and make a new database and
program for researchers. Luckily, the analyst had
done some work with Vietnam era military records
and noticed similarities in the data structure.
Major Discoveries! Ah, not exactly. Historians
may have noticed interesting items, but for us, at
least we learned that the first visitor to the Nixon
Whitehouse was the great Hank Aaron.
Some Watergate-ish Fun
A famous & beautiful litigator (see
that movie?) decided that oil wells on
school property had been endangering
student health since 1970.
In the school basement for decades
were dozens of Vydec floppy disks with
student records from the late 70’s.
Beverly Hills High School
Anyone old enough to remember
Vydec? A really weird onion. By 2003
ability to read them had almost
vanished. (Not physically compatible with
standard 8” drives—spins in the opposite direction!)
Tribal Mineral Rights Tapes
It was alleged that the U.S.
government had not properly
accounted for mining, drilling
and related payments. A great
deal of the information was on
9,000 3490 tape cartridges at
the Denver Federal Center.
The content of all of those tapes fit on
a single inexpensive external disk drive!
Future backups and conversions will be
thousands of times easier.
IPUMSIPUMS
A source of real enjoyment: the Minnesota Population Center
and IPUMS.org. They collect/analyze/publish population data
from around the world. It becomes apparent that the need for
data-rescue is even greater in the developing world. Tapes and
disks coming in from such places as:
•Bangladesh
•Egypt
•Kenya
•Mali
•Mexico
•Nepal
•Pakistan
•Peru
•Qatar
•Romania
•Santo Domingo
•Sudan
One example - the Peru Census of 1981; picking
up well-packed tapes at the Consulate -
Bangladesh Bureau of Statistics
Data Recovery Center
Very capable people, with other
pressing duties; daily power outages
and other problems making it
impossible to store
legacy tapes in an
optimal way, many
suffering from
decomposition. The initial plan was
to install read-convert system, tape cleaners, train BBS Staff and
recover 1981 Census micro data.
There’s a world-wide need for data rescue
which can lead to many fascinating projects,
if focus and funding are available.
He left many diskettes not compatible with
standard operating systems; char encoding
was morphed for unique display hardware.
Dr. Frank Siebert dedicated much of his life to
the preservation of the Penobscot language.
Many interviews with native-speakers, then
created detailed dictionary in a rare format.
American Philosophical Society and
Maine Folk Life Center undertook a project to
ensure the preservation of this cultural treasure.
Penobscot Dictionary
Working with great organizations like APS and MFLC to preserve the legacy of
someone like the brilliant Dr. Siebert can make data-rescue a real pleasure.
Ready to retire or move on
to another university—those
old tapes are my legacy!
Let’s have another look.
Old Professors’ Stashes (“OPS”)
Or maybe others realize that (for example) the health data,
re-purposed and combined with population and climate
information may well produce remarkable insights. Re-
discovered stashes like this are cropping up all the time!
OK, so we all know that there are digital data collections
that are at risk. But what else is fading away?
Examples of other things in that vapor trail…
1. In some cases, it’s the old professor himself/herself
who created unique data—or a rescue project, and
once retired, special tools and skills may go away too.
3. What we call the “elder geeks” that have
had careers creating many data rescue
software and hardware tools over the years. A
great example is a fellow named Bill King
2. There are still many thousands of older
tape reels on shelves around the world,
but the necessary cleaners and supplies for
them are almost impossible to find.
Documentation is a big deal, whether it’s about equipment,
software, data content or recovery procedures.
National Archives and Records Administration and Library
of Congress. (many, e.g. USC and McMoons drives)
Many special projects within government, universities,
libraries and museums.
NPO’s such as IEDRO (International Environmental Data
Rescue Organization) and many others.
CODATA Data-at-Risk Task Group, e.g. - creation of an
inventory of important scientific data at risk of loss.
RDA Data Rescue Interest Group, just getting under way.
And of course, there are computer museums, etc.
But there is not yet a broad
Initiative to Preserve Data Rescue Capabilities
Lots of Great Data Rescue EffortsLots of Great Data Rescue Efforts
Many of the tools and skills for rescuing older digital assets
apply across the boundaries of science and culture. The
preservation of those would be of support to researchers,
scientists, historians, librarians, and archivists.
IPDRC would collect details of capabilities and projects
across government, academia and commercial entities,
and stay in touch, ensuring that the tools don’t just fade
away when projects end or certain staffers retire.
When certain capabilities appear to be at risk of loss, it
would work to find homes for them.
There are still many details to be determined as to the
best structure, organization and methods.
IPDRC
New group? Hosted by an existing institution?
If new: NPO or “group initiative”? National or global?
What sort of leadership? (Note: to a large extent, it’s a
type of archive.)
What sort of tools for cataloguing capabilities?
How to design a web site to encourage submission of
capability details?
How to find new stewards for CAROLs? (capabilities at risk
of loss). Should IPDRC itself be a steward of last resort?
Many more issues…
Please help us (a) ask the right questions; (b) find good
answers. Hearing your comments on this will be greatly
appreciated.
IPDRC - How should it work?
Data RescueData Rescue
andand
Preserving Related CapabilitiesPreserving Related Capabilities
If it’s worth doing, it’s worth doingIf it’s worth doing, it’s worth doing nownow..
(No Luxury of Languishing)(No Luxury of Languishing)
Thank you very much for your time
chris@mullermedia.com

Contenu connexe

Similaire à Data Rescue and Preserving DR Capabilities

Mateo Valero - Big data: de la investigación científica a la gestión empresarial
Mateo Valero - Big data: de la investigación científica a la gestión empresarialMateo Valero - Big data: de la investigación científica a la gestión empresarial
Mateo Valero - Big data: de la investigación científica a la gestión empresarialFundación Ramón Areces
 
25 History Of The Internet
25 History Of The Internet25 History Of The Internet
25 History Of The InternetImmanuelA
 
What's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfWhat's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfSteven Jong
 
ADED 7330 Introduction
ADED 7330 IntroductionADED 7330 Introduction
ADED 7330 Introductionqueenofrug
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
Design and development of a web-based data visualization software for politic...
Design and development of a web-based data visualization software for politic...Design and development of a web-based data visualization software for politic...
Design and development of a web-based data visualization software for politic...Alexandros Britzolakis
 
History Days 4 5
History Days 4 5History Days 4 5
History Days 4 5guestf7cf98
 
Tek Era - Microfiche and Microfilms
Tek Era - Microfiche and MicrofilmsTek Era - Microfiche and Microfilms
Tek Era - Microfiche and MicrofilmsRimple Sanchla
 
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections Kara Van Malssen
 
02-History.ppt
02-History.ppt02-History.ppt
02-History.pptKashi69
 
The evolution of the collections management system
The evolution of the collections management systemThe evolution of the collections management system
The evolution of the collections management systemirowson
 
History Days 4 5
 History Days 4 5 History Days 4 5
History Days 4 5guestf7cf98
 

Similaire à Data Rescue and Preserving DR Capabilities (20)

lecture-1-1487765601.pptx
lecture-1-1487765601.pptxlecture-1-1487765601.pptx
lecture-1-1487765601.pptx
 
E book-the evolution of storage technologies
E book-the evolution of storage technologiesE book-the evolution of storage technologies
E book-the evolution of storage technologies
 
digital Preservation
digital Preservationdigital Preservation
digital Preservation
 
Mateo Valero - Big data: de la investigación científica a la gestión empresarial
Mateo Valero - Big data: de la investigación científica a la gestión empresarialMateo Valero - Big data: de la investigación científica a la gestión empresarial
Mateo Valero - Big data: de la investigación científica a la gestión empresarial
 
Computing through the ages
Computing through the agesComputing through the ages
Computing through the ages
 
25 History Of The Internet
25 History Of The Internet25 History Of The Internet
25 History Of The Internet
 
What's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfWhat's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdf
 
ADED 7330 Introduction
ADED 7330 IntroductionADED 7330 Introduction
ADED 7330 Introduction
 
Whither Bibliographic Data? Designing a roadmap to a new bibliographic in...
Whither Bibliographic Data? Designing a roadmap to a new bibliographic in...Whither Bibliographic Data? Designing a roadmap to a new bibliographic in...
Whither Bibliographic Data? Designing a roadmap to a new bibliographic in...
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
The Four Main Components And History Of Computers
The Four Main Components And History Of ComputersThe Four Main Components And History Of Computers
The Four Main Components And History Of Computers
 
Design and development of a web-based data visualization software for politic...
Design and development of a web-based data visualization software for politic...Design and development of a web-based data visualization software for politic...
Design and development of a web-based data visualization software for politic...
 
History Days 4 5
History Days 4 5History Days 4 5
History Days 4 5
 
Cam unit-1
Cam unit-1Cam unit-1
Cam unit-1
 
Tek Era - Microfiche and Microfilms
Tek Era - Microfiche and MicrofilmsTek Era - Microfiche and Microfilms
Tek Era - Microfiche and Microfilms
 
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
 
02-History.ppt
02-History.ppt02-History.ppt
02-History.ppt
 
The evolution of the collections management system
The evolution of the collections management systemThe evolution of the collections management system
The evolution of the collections management system
 
History Days 4 5
 History Days 4 5 History Days 4 5
History Days 4 5
 

Dernier

Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 

Dernier (17)

ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 

Data Rescue and Preserving DR Capabilities

  • 1. Presentation to BEST PRACTICES EXCHANGE 2015 Rescuing Early Digital AssetsRescuing Early Digital Assets andand Preserving Data Rescue CapabilitiesPreserving Data Rescue Capabilities chris@mullermedia.com
  • 2. First, a great pre-digital example of data rescue and preservation. 12,000 pages of New Amsterdam records sat in an Albany vault for over 300 years. Re-discovered in 1973, leading to Russell Shorto's wonderful book "Island at the Center of the World". Fortunately, some visual records have had the “Luxury of Languishing” (LOL, as we say.  )
  • 3. “McMoons” Lunar Orbiter Image Recovery. set up a lab in a former McD building. It was an amazing challenge to revive and in some cases fully create the needed capabilities. Dennis Wingo said this project has “opened a door that we call ‘Technoarchaeology’… The preservation of our digital heritage, both hardware and software is a growing need …” See http://www.moonviews.com Analogue magnetic tapes of a very rare sort, created in the 60’s. Further efforts to recover/re-purpose the data in the 1980’s. Funding lapsed. Equipment sat in Nancy Evan’s garage until 2007. New project under lead of Dennis Wingo
  • 4. Why be extra-aggressive regarding older digital materials? Because TIME is the Phantom Menace… A. Time and fragile storage media can cause valuable records to decay on the shelf. (No LOL.) B. Time and migration to new computers, software and media renders older electronic records incompatible and unusable on the new systems. C. Time, down-sizing and rapidly changing priorities lead to undocumented files that are difficult to decipher. D. Let’s not forget Time’s creepy ally… The same that most historians now believe destroyed the Great Library and many other collections: Lack of Attention.
  • 5. Technology’s VaporVapor TrailTrail Time & Computer Media (very rough scale) Techno- Rocket Our “Vapor Trail” of Older Information Time A not-so-good kind of Cloud Computing. In “ancient” times (10-40 years ago), computer storage was expensive and difficult to use. So most data had actual value. Without MP3s, videos, tweets, porn and politics, it often yields a richer harvest than thought. But it’s getting harder to retrieve.
  • 6. Obstacles to Recovery and Conversion. It's a bit like peeling back the layers of an onion: 1. Media Compatibility 2. Age, Chemistry, Storage Conditions 3. Recording Method keeping the door open 4. Operating System/Filing System 5. Backup, Exchange, or Archiving Software 6. File Structure 7. File Encoding Lots and lots of different onions in that vapor trail…
  • 7. Media Compatibility Who would have thought it would be hard to find a PC with a floppy drive this soon? Let alone 9-track reels.
  • 8. Dust Gravity Chemistry, Heat, Humidity Age & Storage Conditions Media OK for day-to-day backup is often not so for archiving.
  • 9. 9-track reel 800 bpi NRZI ...1600bpi PE ... 3200 PE ... 6250 bpi GCR ... (even older: 7-track 556 & 800 bpi) 3480/90/9E cart. 18 track = 200mb/400mb 36 track = 800mb/1600mb DLT 10/20gb … 15/30gb … 20/40gb … 35/70gb 40/80gb … SDLT 220, etc 1/2” Linear Tape (mostly Mainframes, Minis and Large Servers) 0 pp--- -- --- --- ---- --- --- --- -- ----cc parity 1 pp---- - -- - --- - -- - ---- --- -------cc data 2 pp----- -- --- --- --- ---- -- --- -- ---cc data 3 pp--- -- -- --- -- -- -- -- -- ----- ----cc data 4 pp-- --- - --- --------- --- --- -- -- --cc data 5 pp-- ------ -- -- -- --- ---- ---- -- ---cc data 6 pp---- - ---- --- ---- --- --- --- -- ---cc data 7 pp-- ---- --- --- --- --- ---- --- -- ---cc data 8 pp--- --- ---- --- --- ---- -- ---- - ---cc data preamble checksum Muller Media Conversions www.mullermedia.com LTO 100/200gb … 200/400gb … 400/800gb … 800/1600 and so on Recording Method
  • 10. QIC serpentine recording 4 track, 9 track, 18 track, etc. from 11mb to 24 gb (also QIC-80 mini-cartridge, Travan, etc.) 8mm (Exabyte) helical scan recording 2.3gb … 14gb … Mammoth … AIT 4mm DAT helical scan recording DDS, DDS-C, DDS-2,3,4,5… 1 to 72gb Lighter-Duty Tape Drives (mostly PCs, small Minis, Servers) Muller Media Cnversions www.mullermedia.com Recording Method (continued)
  • 11. The way files are stored and identified-for example: • Folder/directory organization. • How long its name can be. What characters are valid. • File types and their associations. • How creation date or date of last access are saved. • How space is allocated for a file. • Indexed (ISAM) file operations--part of O/S? Operating/Filing System
  • 12. Backup software is often not best for archiving or data exchange. Software intended for exchange of data with other systems (e.g.- ANSI or IBM Standard, TAR, etc.) produce the most widely readable tapes, but you can lose valuable metadata that way. Next best choice is often the O/S’s standard backup program, such as Wang VS Backup, VAX VMS Backup, Windows Backup. (We are speaking here of older systems. Newer environments often use software and methods to overcome or at least change these problems.) Backup, Exchange, or Archiving Software
  • 13. Database, word-processing and other complex file types are generally not organized sequentially, even within each file. Most databases use some form of index-sequential layout. Wang and MultiMate word-processing files used an internal chain to control their organization. (WordStar, WordPerfect and some others are sequential.) Microsoft Word and many others applications stored information within a number of data streams using a technology called OLE Structured Storage. (Have since moved to XML-based.) One generally wants to get databases into a flat or delimited sequential format prior to preservation. A good sequential storage format for W.P. files is (was) RTF. File Structure
  • 14. DATABASES: ASCII, EBCDIC, BCD, PACKED, FLOAT, INTEGER, etc. WORD PROCESSORS: ASCII as a base. (Except for older IBM). But that’s just the tip of the iceberg. Not only the codes vary but the way in which they’re used. One simple example: An underlined word. It may print like that, but internally it might be: An underlined word. Extra bit set in u/l’d chars (Wang) An underlined word. Same code toggles on/off. (WordStar) An underlined word. Different code starts/ends. An underlinedW word. Word underscore code (IBM) An u<_n<_d<_e<_r<_...word. Char/backspace/’score (really old ones) An underlined word. Attribute pointers from different part of file. ^…………^ (MS Word) ^…………………………... File Encoding
  • 15. WASHINGTON-- A slice of America's history has become as unreadable as Egyptian hieroglyphics before the discovery of the Rosetta stone. Vast untold volumes of historic, scientific and business data are in danger of dissolving into a meaningless jumble of letters, numbers and computer symbols. Much information from the last 30 years is stranded on computer tape from primitive or discarded systems-unintelligible or soon to be so… ...ASSOCIATED PRESS JANUARY 3, 1991 www.mullermedia.com ”The Nation’s Records Are a Mess”
  • 16. Clinton + Moscow = ? Today, we’d tend to think of something like this: But in 1969, a young fellow had visited Moscow, and now (1991) he was becoming a contender for a presidential nomination. Political rivals curious. FOIA request. Old State Department tape. Some pre-Whitewater fun. This was fun, but our real exposure to the world of digital preservation and the great people involved in it was yet to come. (See next slide.) Problem: un-documented file format (a sadly common occurrence). But the D.A.’s office found some folks who enjoyed hacking away.
  • 17. IAC/IRM honors federal I/T leaders Fynette Eaton of the National Archives and Records Administration's Center for Electronic Records receives a GSA Technology Excellence Award for developing the Archival Preservation System.… …GOVERNMENT COMPUTER NEWS, JUNE, 1996 Next year, NARA issued an RFP for APS FDR’s Fireside Chats. What the heck ? Met great people—including Ken Thibodeau and Fynette Eaton—and were thrilled to begin a long, happy relationship with NARA.
  • 18. During the due-diligence process, the firm requests data recovery from some folks in New York. Obviously, the perp was not a big fan of digital preservation—and he did not realize that good old Wang 8 inch floppies are not so easily erased. (Sorry, no details permitted.) In the 1980’s at a law firm in Little Rock, Arkansas, someone did legal work for a real estate project named “Castle Grande”. By early 1994, all related records had mysteriously been “vacuumed” from the firm’s paper files and computer systems. Some real Whitewater stuff
  • 19. In the early 1970’s, a Whitehouse mainframe used a proprietary database format to store presidential appointment calendar and notes. Two problems: (a) 25-year-old tape was in danger of decay, and (b) data format never been figured out. Asked to puzzle it out and make a new database and program for researchers. Luckily, the analyst had done some work with Vietnam era military records and noticed similarities in the data structure. Major Discoveries! Ah, not exactly. Historians may have noticed interesting items, but for us, at least we learned that the first visitor to the Nixon Whitehouse was the great Hank Aaron. Some Watergate-ish Fun
  • 20. A famous & beautiful litigator (see that movie?) decided that oil wells on school property had been endangering student health since 1970. In the school basement for decades were dozens of Vydec floppy disks with student records from the late 70’s. Beverly Hills High School Anyone old enough to remember Vydec? A really weird onion. By 2003 ability to read them had almost vanished. (Not physically compatible with standard 8” drives—spins in the opposite direction!)
  • 21. Tribal Mineral Rights Tapes It was alleged that the U.S. government had not properly accounted for mining, drilling and related payments. A great deal of the information was on 9,000 3490 tape cartridges at the Denver Federal Center. The content of all of those tapes fit on a single inexpensive external disk drive! Future backups and conversions will be thousands of times easier.
  • 22. IPUMSIPUMS A source of real enjoyment: the Minnesota Population Center and IPUMS.org. They collect/analyze/publish population data from around the world. It becomes apparent that the need for data-rescue is even greater in the developing world. Tapes and disks coming in from such places as: •Bangladesh •Egypt •Kenya •Mali •Mexico •Nepal •Pakistan •Peru •Qatar •Romania •Santo Domingo •Sudan One example - the Peru Census of 1981; picking up well-packed tapes at the Consulate -
  • 23. Bangladesh Bureau of Statistics Data Recovery Center Very capable people, with other pressing duties; daily power outages and other problems making it impossible to store legacy tapes in an optimal way, many suffering from decomposition. The initial plan was to install read-convert system, tape cleaners, train BBS Staff and recover 1981 Census micro data. There’s a world-wide need for data rescue which can lead to many fascinating projects, if focus and funding are available.
  • 24. He left many diskettes not compatible with standard operating systems; char encoding was morphed for unique display hardware. Dr. Frank Siebert dedicated much of his life to the preservation of the Penobscot language. Many interviews with native-speakers, then created detailed dictionary in a rare format. American Philosophical Society and Maine Folk Life Center undertook a project to ensure the preservation of this cultural treasure. Penobscot Dictionary Working with great organizations like APS and MFLC to preserve the legacy of someone like the brilliant Dr. Siebert can make data-rescue a real pleasure.
  • 25. Ready to retire or move on to another university—those old tapes are my legacy! Let’s have another look. Old Professors’ Stashes (“OPS”) Or maybe others realize that (for example) the health data, re-purposed and combined with population and climate information may well produce remarkable insights. Re- discovered stashes like this are cropping up all the time! OK, so we all know that there are digital data collections that are at risk. But what else is fading away?
  • 26. Examples of other things in that vapor trail… 1. In some cases, it’s the old professor himself/herself who created unique data—or a rescue project, and once retired, special tools and skills may go away too. 3. What we call the “elder geeks” that have had careers creating many data rescue software and hardware tools over the years. A great example is a fellow named Bill King 2. There are still many thousands of older tape reels on shelves around the world, but the necessary cleaners and supplies for them are almost impossible to find. Documentation is a big deal, whether it’s about equipment, software, data content or recovery procedures.
  • 27. National Archives and Records Administration and Library of Congress. (many, e.g. USC and McMoons drives) Many special projects within government, universities, libraries and museums. NPO’s such as IEDRO (International Environmental Data Rescue Organization) and many others. CODATA Data-at-Risk Task Group, e.g. - creation of an inventory of important scientific data at risk of loss. RDA Data Rescue Interest Group, just getting under way. And of course, there are computer museums, etc. But there is not yet a broad Initiative to Preserve Data Rescue Capabilities Lots of Great Data Rescue EffortsLots of Great Data Rescue Efforts
  • 28. Many of the tools and skills for rescuing older digital assets apply across the boundaries of science and culture. The preservation of those would be of support to researchers, scientists, historians, librarians, and archivists. IPDRC would collect details of capabilities and projects across government, academia and commercial entities, and stay in touch, ensuring that the tools don’t just fade away when projects end or certain staffers retire. When certain capabilities appear to be at risk of loss, it would work to find homes for them. There are still many details to be determined as to the best structure, organization and methods. IPDRC
  • 29. New group? Hosted by an existing institution? If new: NPO or “group initiative”? National or global? What sort of leadership? (Note: to a large extent, it’s a type of archive.) What sort of tools for cataloguing capabilities? How to design a web site to encourage submission of capability details? How to find new stewards for CAROLs? (capabilities at risk of loss). Should IPDRC itself be a steward of last resort? Many more issues… Please help us (a) ask the right questions; (b) find good answers. Hearing your comments on this will be greatly appreciated. IPDRC - How should it work?
  • 30. Data RescueData Rescue andand Preserving Related CapabilitiesPreserving Related Capabilities If it’s worth doing, it’s worth doingIf it’s worth doing, it’s worth doing nownow.. (No Luxury of Languishing)(No Luxury of Languishing) Thank you very much for your time chris@mullermedia.com