Slides from the "Planning a Successful Digital Project" start-to-finish session presented at the Wisconsin Library Association annual conference, Green Bay, October 25, 2013. Presenters: Sarah Grimm, Electronic Records Archivist, Wisconsin Historical Society and Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS.
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
Planning a Successful Digital Project
1. Supported by WHRAB
DESIGNING A SUCCESSFUL
DIGITAL PROJECT
W I S C O N S I N L I B R A R Y A S S O C I AT I O N C O N F E R E N C E
OCTOBER 25, 2013
Sarah Grimm, Electronic Records Archivist, Wisconsin Historical Society
Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS
3. WHAT DO YOU MEAN, DIGITIZE?
• Selecting materials
• Reformatting materials
(scanning or photographing)
• Adding metadata
(descriptive information)
• Making available online
• Storing and maintaining
digital files and data (digital
preservation)
Wisconsin Historical Society
4. DIGITAL PRESERVATION
The Library of Congress started
the Digital Preservation
Outreach and Education (DPOE)
program in order to foster
national outreach and
education to encourage
individuals and organizations
to actively preserve their
digital content.
http://www.digitalpreservation.
gov/education/
Waterford Public Library/University of
Wisconsin Digital Collections
5. DIGITAL PRESERVATION
Digital preservation combines policies, strategies and
actions to ensure access to reformatted and born
digital content regardless of the challenges of media
failure and technological change. The goal of digital
preservation is the accurate rendering of
authenticated content over time.
Working group on Defining Digital Preservation, ALA Annual Conference, 6/24/2007
6. WHAT IS DIGITAL CONTENT?
• Digital content is any content that is published or
distributed in a digital form, including text, data, sound
recordings, photographs and images, motion pictures,
and software.
• Digital materials created from analogue sources
• Born-digital content
• Digital materials you currently have or create – or expect
to have – that you want to preserve.
7. DEFINING A DIGITAL COLLECTION
• A good digital collection…
• Is publicly accessible
• Is searchable - Includes keywords and other descriptive
information (metadata) so users can find what they’re looking for
• Uses software that is sustainable (will be around for a long time)
and interoperable (can be migrated or shared)
• Remains true to the original materials
• Respects intellectual property rights
• A digital collection is not…
• An inventory
• An online exhibit/gallery/slideshow
8. WELL-MANAGED COLLECTIONS
• Characteristics of well-managed digital content:
•
•
•
•
•
Basic information about each collection
Minimal metadata for objects
Common file formats
Controlled and known storage of content
Multiple copies in at least 2 locations
9. BEFORE YOU EVEN START…..
• Don’t scan a mess! Take the
time to assess and organize
your originals first.
• A digital project can be an
ideal time to evaluate
collection conditions and
rehouse materials as needed.
• Resources for collections care
and organization:
• Wisconsin Historical Society
Field Services staff
• Wisconsin Archives Mentoring
Service
• National Park Service
Conserve-O-Grams
Richland County History Room
12. DEFINING GOALS
• Connect to your
community
• Reach new audiences
• Improve access to
“invisible” materials
• Protect fragile or
heavily used materials
• Learn more about
your collections
• Contribute to our
collective knowledge
South Wood County Historical Museum
13. POTENTIAL AUDIENCES
• Local residents
• Students and teachers
• Genealogists
• Specialists (e.g. Civil War
re-enactors, railroad
buffs)
• Academic researchers
• Curious Wisconsinites
• Everyone!
College of Menominee Nation
15. DEVELOPING SELECTION CRITERIA
When developing a selection policy, consider…
• Your organization’s mission statement and collecting policies
• Appeal and interest (is this of value to researchers? To other
audiences?)
• Uniqueness of materials (is this the only source or does it also
exist elsewhere? Avoid duplication)
• Focusing on a specific subject, theme or creator
• Manageability – tackle a project of appropriate size and scope
16. SETTING PRIORITIES
Ask yourself which materials are…
• most significant to your
organization?
• most extensive?
• most requested/used?
• easiest?
• oldest?
• newest?
• at risk?
Neville Public Museum of Brown County
17. SELECTION – YES OR NO?
•
•
•
•
This item is rare or unique to our collection.
This item is frequently requested by our patrons/visitors.
This item or very similar items are not found anywhere else on the Internet.
There is enough accurate information available about the item to add
useful context for our audience (for example, we know or can find out
names of people, locations, dates).
• We have the appropriate equipment to create an accurate, high-quality
digital copy of this item (for example, item is not too large to fit on
scanner), or funding to outsource if needed.
• This item is in stable condition and will not be damaged by scanning or
other handling.
• This item is in the public domain or we have secured permission from the
rights holder to make it available online.
19. CONSIDERING COPYRIGHT
• Disclaimer: We are not
lawyers.
• Owning a physical item does
not necessarily mean you
hold the copyright to that
item.
• Public domain = no longer
under copyright. In the US
in 2013 that means the item
was:
• Published before 1923 –OR–
• Unpublished; creator died
before 1943 –OR–
• Unpublished; unknown
creator; made before 1893
UW-Milwaukee Libraries
20. CONSIDERING COPYRIGHT
• Works under copyright,
copyright holder is known:
• Contact copyright holder IN
WRITING to request
permission to make available
online.
• Works presumed to be
under copyright; copyright
holder is unknown or
cannot be located:
• Due diligence has been made
to identify and locate
copyright holder.
• Be prepared to remove item
from digital collection if
challenged.
Three Lakes Historical Society
21. SAMPLE COPYRIGHT STATEMENTS
• For an item presumed to be in the public domain: This item is in
the public domain. There are no known restrictions on the use of this
digital resource. Contact [your institution] to purchase a highresolution version of this image.
• For an item under copyright; copyright holder has granted
permission to put online: This image has been made available with
permission of the copyright holder and has been provided here for
educational purposes only. Commercial use is prohibited without
permission. Contact [your institution] for information regarding
permissions and reproductions.
• For an item in which copyright status is undetermined: This
material may be protected by copyright law. The user is responsible
for all issues of copyright. Contact [your institution] for information
regarding permissions and reproductions.
23. POTENTIAL PROJECT COSTS
• Scanner
• Outsourcing imaging to a
commercial vendor
• Digital camera and related
equipment
• Internet access
• Storage for digital files
• Software for online access
• Archival storage supplies
• Be sure to budget for TIME
and SPACE
Merrill Historical Society
24. FUNDING
• Grants
• LSTA Digitization of Local
Resources grants (Dep’t of
Public Instruction)
• Local corporations or
foundations
• Wisconsin Humanities Council
• In-kind contributions
• Tech support
• Equipment use
• Biggest expense is TIME
• Paid staff time
• “Free” volunteer time
• Students/interns
Ripon College
25. DISCUSSION
• What’s one digitization
project you’re currently
working on or thinking
about?
• What are your goals and
audience for this project?
• How did you/will you
determine selection
criteria?
• How will you fund the
project?
Eager Free Public Library/University of
Wisconsin Digital Collections
27. DIGITAL IMAGING
• Goals of imaging:
• Create a digital
representation that’s
faithful to the original item
• Create the highest quality
image you can with
available resources
• Anticipate multiple uses
(online, print publication,
exhibit, etc.)
• Scan once—don’t expect to
return to re-digitize
UW-Madison Archives
28. CHOOSING A SCANNER
• Some features to look for:
• Transparency unit
--for scanning slides and negatives
• Size of scanning bed
• Image editing software
--many new scanners come with Photoshop Elements
• Compatible with your computer’s operating system
• Is your computer fast enough to process large image files?
29. SCANNING PHOTOGRAPHS
• Scan all photographs in 24-bit
color, even if image is black
and white
• Scanning resolution (ppi)
depends on size of original
item
• Longest side of item longer
than 7” = 300ppi
• Shorter than 7” = 600ppi
• 35mm sides or other small
items = 1200ppi
• Save two copies of each scan:
• Master file: TIFF (20-40MB) for
archiving and printing
• Access copy: JPEG (1-5MB) for
editing, online viewing, email,
social media
UW-La Crosse
30. SCANNING DOCUMENTS
• Handwritten texts
• Scan in 24-bit color to
retain character of
original
• 300-400ppi is generally
sufficient
• If feasible, create a
transcription
• Use care when unfolding
papers or handling tightly
bound volumes
Wisconsin Historical Society
31. SCANNING DOCUMENTS
• Printed texts
• Scan in 8-bit grayscale or 1bit black and white
• 300ppi is generally
sufficient
• Use OCR (Optical Character
Recognition) software to
make the text computersearchable
• May be provided with your
scanner software
• ABBYY Fine Reader
• Adobe Acrobat
• OCR is never 100% accurate,
but that’s ok
L. E. Phillips Memorial Library, Eau Claire
32. WORKING WITH PRINTED TEXT? OCR!
• OCR = Optical Character Recognition
• Software that makes printed text computer-readable and fully
searchable
• Very valuable when scanning books, yearbooks, city
directories, newspaper clippings, etc.
• A couple of options…
• ABBYY Finereader ($100-$170)
• Adobe Acrobat ($45 through techsoup.org)
33. WHEN NOT TO SCAN IT YOURSELF
• Look to a vendor for scanning…
• Oversized materials
--maps, blueprints, etc.
• Fragile books or scrapbooks
--bindings can be damaged by laying flat to scan
• Anything with flaking, cracked or otherwise fragile surface
• Microfilm
--newspapers
• Potential vendors
• Northern Micrographics, La Crosse
• A/E Graphics, Milwaukee
• Wisconsin Historical Society (for microfilm)
35. METADATA: WHAT IS IT?
• Information about stuff
• Technical metadata = information
about the digital file (size, type,
etc.)
• Descriptive metadata =
information about the content of
the item (what are we looking
at?)
• Helps users find what they’re
looking for
• Organized, standardized,
consistent, searchable
Grant County Historical Society
36. SAMPLE METADATA
Field Name
Sample Data
Title
DiVall barber shop, Middleton, 1925
Subjects
Barbers; Barbershops
Type
Still image
Format
image/tiff
Rights statement
This material may be protected by copyright law. The
user is responsible for all issues of copyright.
File name
2006_01_12.tif
Submitter
Middleton Area Historical Society
Date digitized
2013-04-05
Middleton Area Historical Society
37. SAMPLE METADATA
Field Name
Sample Data
Creator
Bartle, F. C.
Date Created
1925-09-12 OR 1920-1930
Materials
Photographs
Description
Ralph DiVall (left) and Edwin T. Baltes (right) shave
two men seated in barber chairs. According to a
family history on file at the Society, DiVall operated
this barber shop from the 1920s until his retirement
on July 1, 1966.
Location
Middleton, Dane County, Wisconsin
Collection
DiVall Family Collection
Identifier
2006.01.12
Middleton Area Historical Society
39. EXISTING TITLES
If the photograph contains a title or caption, transcribe it exactly.
Birds-eye-view, No. 4,
1908, Barneveld, Wis.
40. WHAT MAKES A GOOD TITLE?
If the photo does not already have a title, you’ll need to create
one.
A useful title is…
• Descriptive and specific
• Brief
• Follows specific formatting rules
• Capitalize first word and proper names (people, places, institutions)
• Don’t start with “A” or “The”
• Period not needed at the end
41. BASIC FORMULA FOR CREATING TITLES
SUBJECT, LOCATION, DATE
Person, object, building, etc.
City OR township OR county
Year or date range
Only include an element IF KNOWN
42. PEOPLE & PORTRAITS
• Identify…Who? Where?
When?
•
•
•
•
•
•
•
•
Women
Children
Babies
Carriages/strollers
Stores/shops
Boardwalk
Marathon County
1890-1899
43. Women and children with babies in carriages,
Manitowoc County, 1890-1899
(SUBJECT, LOCATION, DATE)
44. BUILDINGS AND CITYSCAPES
• Identify the name of the street or view
• Identify the location (City OR Township OR County)
• Identify the date (Year? Date range?)
45. 100 block of South Main Street,
Fort Atkinson, 1940-1949
(SUBJECT, LOCATION, DATE)
46. EXPANDED FORMULA FOR CREATING TITLES
SUBJECT, ACTIVITY, LOCATION, DATE
Person, object, building, etc.
Action or event
City OR township OR county
Year or date range
Only include an element IF KNOWN
47. ACTIVITIES AND EVENTS
Identify…Who? What are
they doing? Where and
when?
• Circus elephant
• Trainer
• Woman on swing
• Evansville
• 1940-1949
48. Trainer with circus elephant holding woman on swing,
Evansville, 1940-1949
(SUBJECT, ACTIVITY, LOCATION, DATE)
49. ASSIGNING SUBJECT HEADINGS
• Subject headings are terms or
phrases assigned to an item to
facilitate searching and
browsing a collection.
• Consistent use of subject
headings helps link related
content in your collection and
across disparate collections.
50. CONTROLLED VOCABULARIES
• A controlled vocabulary is a
standardized, pre-determined
list of subject headings.
• Some examples of controlled
vocabularies:
• Library of Congress Thesaurus
for Graphic Materials
• Library of Congress Subject
Headings
• Getty Art and Architecture
Thesaurus
• Nomenclature 3.0
New Berlin Historical Society
51. TIPS FOR ASSIGNING SUBJECT HEADINGS
• Consider the following elements to help select terms:
• WHO? People - age, gender, occupation, ethnicity
• WHERE? Building or other setting
• WHAT? Activities or events
• Always copy terms exactly from the controlled vocabulary.
• Think of your own “tags,” then search the controlled
vocabulary list for correct terms.
• How did others do it? Look at similar photos for
examples/ideas.
• Aim for 1-5 terms.
• There is no one right answer!
56. EXERCISE - ASSIGNING TITLES AND SUBJECTS
Work in small groups to assign a title and subjects
to a historic photograph.
Remember the basic title formulas:
• SUBJECT, LOCATION, DATE
• SUBJECT, ACTIVITY, LOCATION, DATE
Select terms from the short list extracted from the Library of
Congress Thesaurus for Graphic Materials. The full version of
this controlled vocabulary is available online:
http://www.loc.gov/rr/print/tgm1/
• choose a maximum of 5 terms
57. FILE NAMING AND ORGANIZATION
Sixty Years of Quality Canning by the Lakeside Packing Company, ca. 1947.
Manitowoc Public Library/ University of Wisconsin Digital Collections
58. WHY IS THIS IMPORTANT?
• To create organizational standards
• To help you find it again
• To prevent accidental overwriting
• To eliminate (minimize) duplication of files
Train Wreck
Image ID: WHi-2011
59. FILE NAMING
• Keep folder / document titles
short and descriptive
• Use only lower case letters,
numbers, and dashes or
underscores
• Don’t use spaces or
punctuation
• Don’t use special characters in
your file/folder titles
(^”<>|? / : @’* &.)
(Just because you CAN doesn’t
mean you SHOULD…..)
Typing at Dickinson Secretarial School
Image ID: WHi-19562
60. FILE NAMING
• Date your documents consistently
• yyyymmdd_brieftitle.xxx
• Use leading zeroes for consecutive numbering. For example, a
multi-page letter could have file names mac001.tif,
mac002.tif, mac003.tif, etc.
• Tie your file names to existing catalog numbers if possible
61. EXAMPLES
• Photograph with accession # 2011.32.1 = 201132001.tif –OR–
2011_32_001.tif
• Series of images by photographer John Smith = smith001.tif,
smith002.tif, smith003.tif
• Not so good: Glassplate16039 Auto repair in basement 025.tif
62. RESOURCES
• State Library of North Carolina –
• Web
http://www.archive.org/details/WhyFileNamingIsImportant
http://www.archive.org/details/HowToChangeAFileName
http://www.archive.org/details/WhatNotToDoWhenNamingFiles
http://www.archive.org/details/WhatToDoWhenNamingFiles
• YouTube
http://digitalpreservation.ncdcr.gov/tutorials.html
63. FILE ORGANIZATION AND MANAGEMENT
• Centralize your files
• Minimize your layers
• Leave breadcrumbs
(AKA “READ ME”)
• Determine what you
don’t know
IH General Office Mail Room
Image ID: WHi-12016
64. WHAT NOT TO KEEP?
• Backups/copies/drafts
• Supplementary files that
provide no additional
long-term value
• Corrupted files
• Same item – different
file formats
• Items that don’t fit your
organization’s purpose
Boy on Curb near Trash Pile
Image ID: WHi-57208
69. CHECKSUMS
• Checksums (AKA “Hash Sums”) are created by programs
running an algorithm against the contents of a file.
(there are many free utilities that will perform this function for you)
• The resulting checksum is a short
sequence of letters and/or numbers
that uniquely identifies that file.
(think “electronic fingerprint”)
Unix cksum utility
70. WHY IS THIS A GOOD THING?
• Checksums help maintain the INTEGRITY of your
collections because they will tell you when things change
over time.
• If two files are exactly the same, the checksums of those
files will also be exactly the same (generally speaking )
• If a file becomes corrupted, degraded or is changed in
some way, the next time you run the utility on it, the
checksum will change
77. VERIFY HASH VALUES
• Copy files to another
directory
(think “backup”)
• Open MD5Summer
• Select the files in
the new location
• Click “Verify Sums”
81. THINGS TO REMEMBER
Things that will NOT affect checksums
• Moving items from one place to another
• Changing the file name
Run on the master files
when a collection is
completed
Set up a schedule to run
“verify checks” periodically
St. Mary of the Lake Parish School First Day
Image ID: WHi-98433
83. KEY DECISION POINTS
•
•
•
•
How are you going to organize it?
What are you going to store it on?
Where are you going to store it?
How many copies do you
need?
Post Office
Image ID: WHi-9135
84. FACTORS TO CONSIDER
• Immediate Costs
• Quantity (size and number of files)
• Number of copies
• Media (life span, availability, $$)
• Other resources
• Expertise (skills required to manage)
• Services (local vs. hosted)
• Partners (achieving geographic distribution)
• Institutional constraints
85. HOW MANY AND WHERE?
• Multiple
• Minimum: two (2) copies in two locations
• Optimum: six (6) copies
• Geographically distributed
• Don’t keep your copies onsite if possible
86. LOCAL STORAGE OPTIONS
•
•
•
•
Local network
RAID device
External hard drive
Archival quality (gold) CDs
or DVDs
Take into account potential
future storage needs.
Villa Terrace Decorative Arts Museum
87. CLOUD STORAGE OPTIONS
Commercial options:
• Google Drive
• Up to 5GB free (approx. 140 high-resolution TIFF files)
• 25GB = $2.50/month
• Amazon Simple Storage Service (S3)
• $.095 per GB/month
Institutional options:
• DuraCloud
88. THE (MOSTLY) GOOD…..
Responsibilities and costs are transferred to the cloud provider
• Installation / replacement / upgrades of hardware and
software
• Backup and recovery of data are part of the package
• No local physical presence (valuable space)
• No local environmental requirements (power or cooling costs)
89. THE (POTENTIALLY) BAD
There are potential disadvantages however…..
• Can records be managed correctly throughout their entire
lifecycle?
• Can it support Open Records requests?
• Security concerns
• Do you know where your data is?
• Accessibility – more “points of failure” when the data is
remote
• Costs for accessing data can be high
90. RESOURCES
State of Wisconsin Public Records Board has created two
documents which can be found at:
http://publicrecordsboard.wi.gov/docs_all.asp?locid=165
• Public Records Board Guidance on the Use of Contractors
for Records Management Services
• Use of Contractors for Records Management Services
(Both docs are in the Reference Materials section)
93. WHY ARE YOU PROVIDING
ACCESS TO CONTENT?
• User demand
• Institutional visibility
• Legal mandates or grant
requirements
• Generate revenue
• Contribute to our collective
knowledge
South Wood County Historical Museum
94. WHAT MAKES A GOOD
ONLINE COLLECTION?
• Publicly accessible.
• Searchable - Includes keywords and other descriptive
information (metadata) so users can find what they’re
looking for.
• Organized and consistent.
• Based on existing international/national/statewide
standards and best practices.
• Uses software that is sustainable (will be around for a
long time) and interoperable (can be migrated or
shared).
• Respects intellectual property rights.
• OAI-PMH compliant (to share content on statewide level)
95. SOME OAI-COMPLIANT ACCESS PLATFORMS
• CONTENTdm
• Your own instance
• Hosted by Milwaukee Public
Library through Recollection
Wisconsin
• ResCarta Web
• Free and open source
• Host it yourself or through vendor
• Omeka
• Free and open source
• Host it yourself or through
Omeka.net
• Other?
Beloit College
96. CONTENTDM
• Hosted by Milwaukee Public Library through Recollection
Wisconsin
• Produced and distributed by OCLC
• Costs (through Recollection Wisconsin):
• $200 one-time setup fee
• Annual hosting fees starting at $75
101. RESCARTA WEB
• Free and open source
• Host it yourself; or hosting available through Northern
Micrographics (fee-based)
• ResCarta Foundation – based in La Crosse
105. OMEKA
• Free and open source
• Host it yourself; or subscribe to hosted version, omeka.net
• Developed by the Center for History and New Media, George
Mason University
110. POTENTIAL AUDIENCES
• Local residents
• Students and teachers
• Genealogists
• Specialists (e.g. Civil War
re-enactors, railroad
buffs)
• Academic researchers
• Curious Wisconsinites
• Everyone!
College of Menominee Nation
111. STAKEHOLDERS AND PARTNERS
•
•
•
•
•
•
•
•
Board
Staff and/or volunteers
Local experts
Community members
Chamber of Commerce
Local government
Students
Other organizations in
your community/
county/region
• Who else?
McMillan Memorial Library, Wisconsin Rapids
112. ENCOURAGING USE OF YOUR COLLECTIONS
• Organizations are moving
away from “if you build it,
they will come” approach
– Google is not enough
• Participatory archives
concept—shared
authority, community
engagement
• Bring your content to
your audience—find them
where they already are
Milwaukee Public Library
113. MARKETING IDEAS
• Add introduction/background
information on your own
website
• http://www.newberlinhistoricalsociety.org
• Highlight an item of the
day/week/month
• https://www.facebook.com/lacross
e.history
• Host an opening event
• Whitefish Bay Public Library
• College of Menominee Nation
• Host a slide show or exhibition
• South Wood County Historical
Museum
• Mineral Point Historical Society
Rock County Historical Society
114. MARKETING IDEAS
• Send someone with a laptop to popular local
spots/events to demonstrate digital collections:
• Ask, “Where do people go first to look for this kind of
information?” and then, market there
• Upload a few digitized images to Flickr with descriptions that
point back to your related digital and physical collections.
• Contribute to relevant pages on Wikipedia and include references
pointing to specific digital materials.
• Request that the Chamber of Commerce and other
relevant local organizations link to the new digital
collections from their websites.
• Send a press release to local media
115. EVALUATING IMPACT
Understanding current users…
Online survey instrument
Web analytics
Email subscriber lists
Visitor forms
Understanding future users…
Special interest groups (AASLH, SAA, etc.)
Listservs
Workshops and conference sessions
116. WRAPPING UP – FINAL THOUGHTS
Commencement, 1978
UW-Madison Archives
118. TIMELINE
• Set final date for project
completion
• Establish goalposts –
break project into smaller
steps/phases/goals
• Set timeframe for
meeting each goal
• Regularly revisit project
progress and modify
schedule as needed
• Always budget extra time
IH General Office Mail Room
Image ID: WHi-12016
119. TIMELINE
• Timeline will vary greatly
depending on…
•
•
•
•
Project scope
Types of materials
Staff experience
Available resources
• One model:
• 1/3 reformatting
• 1/3 metadata
• 1/3 management, quality
control, etc.
• Source: Steven Puglia, ”The
Costs of Digital Imaging
Projects,” RLG DigiNews v. 3,
no. 5 (1999)
WHi-4352
120. TIPS FROM OTHER DIGITIZERS
• If I could do it all over
again, I would:
• Tackle a smaller group of
materials at first
• Make sure two people
started the project at the
same time so we could help
each other
• Start with a clearer plan
• Take the time to sort and
research the physical
collection before digitizing
• Have firm deadlines to help
me stay on track
Langlade County Historical Society
121. NEXT STEPS/TO DO LIST
• Review collections and set priorities for digitization.
• Consider developing a written selection policy.
• Determine the copyright status of any materials you
plan to share online and secure permissions from
copyright holders if materials are not in public
domain.
• Acquire scanning equipment or make other plans for
conversion.
• Familiarize yourself with good, useful metadata by
looking at other online collections.
122. NEXT STEPS/TO DO LIST
• Develop a file naming convention document.
• Develop a storage management policy
• E.g., number of copies, locations
• Monitor copies of content for errors/changes
• Evaluate technology to determine your preferred
access platform
• Develop a marketing plan
• Determine how you will evaluate the success of
your marketing plan
123. THANK YOU!
• Sarah Grimm, Wisconsin
Historical Society
sarah.grimm@wisconsinhistory.org
608-261-1008
• Emily Pfotenhauer, WiLS
emily@wils.org
608-616-9756
• Slides and handouts
available at
http://recollectionwisconsin
.org/wla2013
South Wood County Historical Museum
Notes de l'éditeur
We are…Sarah Grimm, Electronic Records Archivist, Wisconsin Historical SocietyEmily Pfotenhauer, Recollection Wisconsin Program Manager, WiLSYou are…What organization do you represent? What digital projects are you currently working on or thinking about?
As part of that, they developed 6 modules regarding different aspects of managing e-records and have trained several groups of people to bring those modules to groups dedicated to working with e-records.
We are looking to digital preservation for an answer because we realize that being in digital form is not the same as being digitally preserved. Digital preservation is active management of digital content over the long term with access as it’s ultimate goal. With books or documents – We can read it and put it on the shelf and continue to open it and read it for decades with proper handling. However, once something is digitized, we can’t expect to set it aside and then open it in 10 years much less 50 without active management. We must find ways to ensure that the digital item is accessible. In order determine how we are going to preserve something, we must first have an understanding of what we have. We must IDENTIFY it
Once you have your selection criteria, it may not be possible to review/select everything at once, so how might you sequence the process? Again, the answer will be different for each organization.Think about what’smost significant to your organization?most extensive? (and therefore a more coherent body of material to manage)most requested/used?Easiest to tackle (e.g. most familiar, most ready for ingest – a quick win for your digital preservation process; very helpful when you are having to prove the value of your efforts to a reluctant administration)Oldest (possible historical importance)Newest(possible immediate interest)Mandated (via local policies, legislation, etc.)At risk? If it were no longer available, what digital files would be the hardest to replace? Some formats become obsolete a lot faster than other formats. PDFs are viable for a really long time – video files, however, get old very quickly.
If you answered “no” to any of these questions, the item may not be a good candidate for digitization.
Copyright demo
As you are going through the selection process, you will need to establish how you are going to name and organize your files. find things in many places and named in many different ways depending on who worked on the item. Digital items are so much easier to save psychologically for people. 100 items on your hard drive doesn’t take up as much visual space as 100 items in your office. A file that is 1 kb looks pretty much like the one that is 1 MB or 1 GB. There also tends to be more copies of digital items, everyone keeps a draft, or it gets attached to an email and sent to 10 people, or it gets filed in two places. Everybody keeps their own items…project documentation is rarely one person managing the group’s information anymore. Its multiplied by the number of people working on the projectAs a result – EVERYTHING IS SAVED – “just in case” and its often saved more then once
Standards – Need a baseline so that everyone knows how to name items as well as how NOT to name themOR where and how items will be stored
Short and Descriptive – My record is a file name with 167 characters. While really descriptive, it was too hard to work with. Couldn’t read the entire title in a file list and couldn’t copy it since it was buried in several layers of folders. We tend to name things in ways that make sense to us at the time, but this is not handy for long term preservation. You need to name things in a way that will make sense 20 years from now. Has anyone inherited files from previous employees or projects – do they make any sense? “My stuff” “Important” “To Read”
Searching is really difficult if you have to search through multiple layersMany types of documents will be easier to find if you can come up with a consistent date naming convention
This slide contains links to both the web version and the You Tube version of 4 videos created by the State Library of North Carolina about File Naming procedures. They total about 10 minutes and provide some great tips.
Co-locate – It’s OK to move things around if it makes sense to do so. Layers – If you have several layers to hunt through, it can be really hard to find anything – Shallow is better Searching is really difficult if you have to search through multiple layersBreadcrumbs – OK to leave “sticky notes” (AKA “READ ME”) files in folders. Can give a brief description of contents, retention schedule, any naming conventionsDon’t know – unknown file formats, files on old media (floppies), password protected
File backups – EX: Speeches had multiple drafts Final + copies in several different font sizes Supplementary files – folder of images that were used in a power point. Files you can’t open – CorruptedFormats – may receive Word and pdf – May not want to keep both. As you are creating your inventory, you are likely to discover a lot of really simple places you can clean up the files you are reviewing. Co-locate – It’s OK to move things around if it makes sense to do so. Bury – If you have several layers to hunt through, it can be really hard to find anything – Shallow is better
Once you’ve decided how you want to handle file naming issues and have made file management decisions – Document itIt doesn’t have to be long….. You can distribute it in your organization – post it on an intranet, place it in a procedures manual WHY – You will not be the only keeper of the information. (You weren’t here to ask)It will help others who may be helping you with the inventoryYou can hand it out to organizations/departments you receive information from In order to better manage our files, we will accept these file types and formats, they will be named this way. Do not give us password protected documentsYou don’t have to organize and fix everything, but you do need to give other people the tools to help you.
We’ve learned that it is essential to remove duplicates first. Once you start using other tools and changing things, the duplicate finder applications are no longer as accurate. These are all FREEWe have used all three of these and they all work a little bit differently under the covers so the results vary a bit. Auslogics has a number of products that are for sale, but this one falls under their “freebie” categoryWe found it really helpful with documents and wanted to try it with the images. Similar Images - It creates a database, so consecutive runs go faster but the first run while it is creating the database can be really slow. This application works with lots of file formatsVisipics – This application only works with a handful of file formats, but it hits the main ones and does it really well. It will detect two different resolution files of the same picture as a duplicate (we had a number of photos that were corrected with Photoshop and this picks those up), or the same picture saved in different formats, or duplicates where only minor cosmetic changes have taken place.
Resulting MD5 file can be opened in any text editor…..
WHAT are you going to store it on? WHERE you are going to store it?HOW MANY COPIES are you going to make?
WHERE are you going to store it? What are your Options? Decisions can be determined by a number of things…….Size – The options you consider will vary depending on how much you have to store. Media – CDs – on average 5 years Gold CDs - moreIf you’ve burned it – as little as 2 depending among other things on the quality of the CD to begin with. Magnetic Tape – could last 30 years but its very sensitive to heat, magnetic fields and dust. Is the company producing the hardware you are using to run the storage media still around? Cloud – what’s it going to cost to rent space. Sometimes it costs more when you pull it out than when you put it in. You also need to determine where you don’t want to store it and migrate it off those devices accordingly USB drives, old media,
Three copies is a happy medium if you are able
RAID = Redundant Array of Independent Disks = multiple hard drives in one package
[COSTS]The cost of buying/maintaining/upgrading hardware; Technical staff are transferred to the service provider [
[lifecycle]Does it have archive capabilities?Can it maintain “restricted access” on appropriate records? Does “delete” actually MEAN “delete? Can the contractor delete or purge electronic records in accordance with approved retention schedules? [security]How many people have access to your records via the network? How many people have access to the servers your records live on? Does your contractor work with subcontractors (who you don’t know?)[where]Does the data reside in this state, country?[accessibility]Network availability across large distances can be a problem service outages, power outages, severed cables, unspecified network outages[costs]Sometimes the storage fees for holding the data are fairly low – the charges are different (and usually higher) for each time you access your data.
Some ideas: Think about providing an online survey instrument at various strategic points within your access environment. Implement some free web analytics – find out where people are linking into your site, what are they requesting, how much time do they spend on what materials. If you have email subscribers– solicit some inquiries from your regular users about their visits to your digital collections and how they might be finding them useful, as well as ways of improving upon your existing levels of service. If you are only providing access on-site, make sure you add some lines on your visitor forms that account for their use of your digital collections. So, you may get very good at serving your digital collections up to your current stakeholders through these different monitoring measures. But what about new users, users who could really benefit from exposure to your materials but may require it in a different form or through different means, like tablets or cell phones. Maybe near-future users will want to run all sorts of sophisticated services over the top of your materials along with materials from all sorts of other institutions. How might you be able to track trends or use cases from other similar institutions to find out where access needs are heading?