Il seminario affronta le principali problematiche relative alla gestione di grosse collezioni di immagini: come organizzarle, preservarle nel tempo e renderle utilizzabili in modo efficace. Oltre a soffermarsi su aspetti chiave come i formati, i metadati, la catalogazione e il backup, il seminario fornisce una panoramica comparativa delle principali piattaforme software oggi disponibili, sia proprietarie che open-source.
1. Archiving and Cataloging
Digital Photographs
Maurizio Agelli, CRS4
{ agelli@crs4.it }
September 20th 2012, 5.30pm
Aula Magna Facoltà di Architettura - Via Corte d'Appello - Cagliari
2. Point de vue du Gras, Nicéphore Niépce, 1826 (from Wikimedia Commons)
4. The first photograph was taken less than 200 years ago ...
How many photos
have ever been
taken ?
5. Number of photos ever shot (up to 2011): ~3.5 x 1012
500 to 800 billion
taken in 2011
[source: Observatoire des
Professions de l'Image ]
[ source: Jonathan Good, 2011 - 1000memories.com ]
6. Presentation Outline
1) Archiving as part of the photographic workflow
2) Describing photographs: metadata
3) Organizing images in catalogs
4) Ensuring long-term storage: backup and migration
5) An overview of image archiving tools
6) A Digital Asset Management platform developed at CRS4
8. Photo Archive
Photo by Seeweb - CC BY-SA 2.0
A collection of images
kept in secure, long-
term storage.
[ dpBestflow.org ]
Photo by M.Agelli - CC BY-SA 2.0
9. Building a digital photo archive
involves many decisions ...
What to archive ?
File formats
Metadata File naming
Catalog
organization Folder structure
Backup policies
Archiving platform Migration policies
... which strongly depend on the
photographic workflow
10. A general workflow
No single workflow suits all photographers and all
clients [UPDIG]
Workflow decisions are determined by volume production, turnaround, image
quality requirements, regulations, costs, etc..
Capture Ingestion Working Publishing
Archive
11. A general workflow, more in detail
- Image transfer
- File renaming
- Add bulk metadata - Image editing
- Batch editing - Metadata editing - Export images
- Format conversion - Create derivative work - Print images
All camera- - Publish to
related stuff Focus on volume Focus on quality web
and speed
Capture Ingestion Working Publishing
camera computer
Store, search,
organize, ...
Archive Digital Asset
Management
Platform
12. File formats / 1
RAW
JPEG
RAW Camera In-camera
(DNG)
(TIFF)
Many RAW formats (>200). sensor processing
Proprietary, undocumented. TIFF
JPEG
Encodes values from camera Film Scanner
(DNG)
sensor, before demosaicing (12-
16 bit/pixel, 1 color/pixel) .
Lossless. May be compressed.
TIFF
Open standard.
DNG (DIGITAL NEGATIVE) 8, 16, 32 bit RGB
Open standard, created by
Lossless, big file size !
Adobe.
Possible PSD replacement
Targeted to replace RAW, but still
(supports layers).
limited adoption by the industry.
13. File formats / 2
JPEG JPEG 2000
Open standard Better compression than Jpeg
Compressed, lossy (wavelet transform vs. cosine
8 bit RGB: suitable for displaying, transform)
not good for editing 8, 16 bit RGB
Lossless / lossy
Many extra features: regions of
interest, progressive decoding,
multi-resolution decoding.
~35 MB
~5.3 MB ~5 MB ~0.6 MB
Example: 6Mpixel image
(Nikon D40)
TIFF NEF DNG JPEG
48 bit / pixel 12 bit / pixel 12 bit / pixel 90%
uncompressed compressed compressed quality
14. File formats and image editing
RAW PARAMETRIC RAW or DNG JPG
CAMERA EXPORT
EDITING
TIFF or DNG
RASTER TIFF or DNG JPG
EXPORT
EDITING
Parametric Image Editing Raster Image Editing
Image data are not modified. Image pixels are modified.
Source file is preserved. Editing is A new file containing the edited image
saved as a list of rules which are shall be saved in order to preserve the
applied at rendering time. original.
(e.g. Lightroom, Aperture) (e.g. Photoshop, Picture Window Pro)
15. File formats decision tree
CAMERA SCANNER
CAPTURE
JPG RAW DNG TIFF
INGESTION
JPG JPG TIFF RAW DNG JPG DNG JPG DNG TIFF
WORKING
JPG JPG TIFF RAW TIFF DNG TIFF JPG DNG TIFF JPG DNG TIFF
PUBLISHING
JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG JPG
Note: unusual decision paths have been omitted
16. Which files to archive?
Capture Ingestion Working Publishing
ORIGINAL MASTER DERIVATIVE
FILES FILES FILES
Archive
18. The importance of metadata
"An image is worth 1000 words", but ...
... there are questions
which only words can
answer:
When was it shot?
... and where?
Who are those
Photo by Maurizio Agelli - CC BY-SA 2.0
people?
Who took this
photograph ?
Can I use it freely ?
20. A more precise definition
METADATA
"Structured encoded data that describe
characteristics of information-bearing
entities to aid in the identification,
discovery, assessment, and management
of the described entities"
[source American Library Association]
21. Photo by anyjazz65 [ CC BY-NC 2.0 ]
http://www.flickr.com/photos/49024304@N00/
Image metadata is nothing new ...
22. Where digital image metadata
can be written?
○ inside the image file
metadata
image
data
○ in a sidecar file
image +
data
metadata
○ in a database
○ in an online registry
○ in the file name
d40-20120920-DSC_0153-edited.jpg
camera date id derived
24. IPTC IIM EXIF
Information Interchange Model Exchangeable Image File Format
Created in 1991 by International Created in 1995 by Japan Electronic
Press Communication Council Industries Development Association
Adobe defined the mechanism for Driven by CAMERA MANUFACTURERS
embedding IPTC IIM metadata in Focused on low-level properties
image files (1994) (camera settings, geo coordinates,
Driven by NEWS INDUSTRY date/time, ...)
Focused on high-level properties Cannot be extended
(description, geo location, ...)
Cannot be extended
EXIF
IPTC IIM
Image Data
25. XMP EXIF
Legacy Metadata
IPTC IIM
Extensible Metadata Platform
Open standard, created by Adobe
Dublin Core
○ defines a data model and a XMP
XMP Basic
serialization model (RDF/XML)
Rights
○ also covers video, audio, text
○ structured as a set of schemas Media Mng
○ can be extended with new Photoshop
metadata schemas Camera RAW
○ multi-lingual qualifiers EXIF
○ can be serialized and stored in
IPTC Core
most file formats (not in RAW!) Image Data
○ it is widely supported by the IPTC Extens.
industry ...
27. A quick look inside XMP
>200 properties + all EXIF and IPTC properties
TITLE (dc:title)
DESCRIPTION (dc:description)
DESCRIPTION WRITER (photoshop:CaptionWriter)
RATING (xmp:Rating)
KEYWORDS (dc:subject)
GEO COORDINATES (exif:GPSLatitude, exif:GPSLongitude)
LOCATION (photoshop:Country, photoshop:State, photoshop:City,..)
AUTHOR (dc:creator, exif:Artist)
RIGHTS (xmp:Rights)
.....
28. A quick look inside XMP
Date/Time Metadata
The original An ancient The digital The archived
painting postcard representation image (metadata
( ~1507) (1925) of the postcard last edited in
(2008) 2012)
Iptc4xmpExt: photoshop:
AODateCreated DateCreated xmp:CreateDate xmp:MetadataDate
29. Photo by Creative Commons CC BY 3.0
Extending XMP
Creative Commons
CC provides a legal and technical
infrastructure to help people share
knowledge and creativity.
CC defines a set of
properties that allow
authors to specify under
which conditions their
content can be
distributed and used.
CC recommends XMP for
embedding CC properties
inside resources.
30. Extending XMP
PLUS
Picture Licensing Universal System
Non-profit organization whose mission is to simplify
and facilitate the communication and management of
image rights.
PLUS Registry
○ unique ids for creators, right holders, images, ...
○ access to rights information and other metadata
PLUS License Data Format (LDF)
○ metadata schema for embedding image license
○ 88 properties
○ dedicated XMP PLUS namespace
31. Extending XMP
PRISM
Publishing Requirements for Industry Standard Metadata
Defined by IDEAlliance, a global community of content
and media creators.
PRISM Metadata for Images provides information about:
○ objects pictured (manufacturer, model, description, ...)
○ slideshows (sequences of images)
○ shooting info (viewpoint, season, visual technique, ...)
PRISM Advertising Metadata provides information about
the usage of the image in an advertising campaign
PRISM defines dedicated XMP namespaces: pmi and pam
32. Extending XMP
Area Tagging
Metadata Working Group
○ XMP-MP Schema for face tags
○ adopted by Picasa
Microsoft has created a new XMP schema for tagging
people
33. Handling Social Tagging
A research issue
140 billion photos in
Facebook (up to 2011)
[ source: Jonathan Good, 2011 - 1000memories.com ]
35. catalog catalog
noun v.tr.
a list of the contents of a library or a 1. to make an itemized list of
group of libraries, arranged according 2. to classify (a book or publication, for
to any of various systems example) according to a categorical
system
[ Dictionary.com ] [ Dictionary.com ]
Picture by Henry Trotter, 2005 - Source: Wikimedia Commons
36. Photo Cataloging Software
Prime goals of Photo Cataloging Software:
○ provide a secure, long-term storage
○ find the images when you need them
○ interoperate with other tools of the same ecosystem (in
the present, as well as future)
An ecosystem is made up of many parts that must not only coexist
but also work with each other to survive. When all the elements
work in concert, the system can thrive.
(Peter Krogh, The DAM Book)
Photo Cataloguing Software falls into the broad domain of
Digital Asset Management.
Let's try grabbing some definitions ...
37. Digital Asset Management
a term open to many definitions ...
a way of keeping an overview of your digital files and make sure
they don't get lost or altered unintentionally
[J.Jacobsen, T.Schlenker, L.Edwards, Implementing a DAM System, Elsevier]
the protocol for downloading, renaming, backing up, rating, grouping,
archiving, optimizing, maintaining, thinning, and exporting files
[P.Krog, The DAM Book, O'Reilly]
a complete toolbox to the author, publisher, and the end users of
the media to efficiently utilize the assets
[D.Austerberry, Digital Asset Management 2nd edition, Focal Press]
... and whose scope goes beyond the domain of
photography
Enterprise
Creative
Digital Content
Industries
Libraries Publishing Management
38. Core functionalities
of a photo catalog / DAM software
( will use these two terms interchangeably )
○ Import images
○ Harvest metadata
○ Manage metadata in a database ( + index for search)
○ Synchronize metadata
○ Export images
○ Organize photos with hierarchical keywords
○ Manage originals, masters and derivatives files as
different renditions of the same item
Extra functionalities such as file rename, raw converter,
editor, publishing tools may be provided too.
39. Harvesting and synchronizing metadata
EXIF
IPTC IIM
User Interface
EXIF
IPTC IIM
.....
XMP
Harvest
metadata Database
Image Data
Synchronize
metadata
import export
Image Storage
40. Hierarchical keywords
Photo by Isabelle Palatin CC BY-SA 2.0
○ typically mapped to dc:subject
○ no semantic rules for describing the hierarchy,
special characters are used, e.g.:
Organizations|Industry|ACME
41. Renditions / Version sets
Different files related to
the same image under import Image Storage
certain circumstances ORIGINAL
shall be managed as a
single item.
MASTER (edited)
Covered by XMP-MM
(Media Management) DERIVATIVES
...
export
Cataloging applications
provide different
solutions (e.g. stacking,
version sets) 1 item, N renditions
43. There are many causes of data loss
lightning
transfer errors
disk / hardware failure theft
floods
loss
Photo by Lucina M - CC BY-NC 2.0
viruses
fire
human errors
44. Which files to backup
Catalog (DB)
Original Files
Working Files
Derivative Files
Master Files
45. A possible backup strategy for single user
workflow
Copy to optical storage
(ORIGINALS, MASTERS,
DERIVATEIVES)
additional copy on
4 a remote NAS
rsync (*)
1 2 3
OFF-LINE OFF-SITE
PRIMARY ON-LINE BACKUP BACKUP
STORAGE BACKUP
(e.g. NAS) storage media are swapped
at every backup
(*) deleting files on the receiving side
shall be disabled for ORIGINALS, additional copy on CLOUD
5
MASTERS and DERIVATIVES Service (Amazon S3,
Elephant Drive, Symform.
...)
CLOUD
BACKUP
46. Migration
Currently there are no permanent solutions for storing
digital content. No media lasts forever, and file formats
become obsolete. Migration must be considered as a
necessary part of every storage strategy.
[ dpBestflow.org ]
○ file formats can become obsolete (just think what is happening to
Kodak Photo CD ...)
○ storage evolves (higher capacity, higher speed, ...)
○ solution:
○ monitoring the storage process
○ conversion to newer and safer formats (e.g. DNG)
○ periodical replacement of storage devices
47. -5-
An overview of
image archiving tools and services
49. Image management applications
Examples
Photoshop INGESTION
Picture Window Pro TOOL
RASTER ImageIngester Pro
SPECIAL
PURPOSE IMAGE
EDITOR EDITOR
RAW
Photomatix PROCESSOR
Adobe Camera Raw
Lightroom
PARAMETRIC
Image IMAGE
CULLING Browser DAM ApertureEDITOR
APPLICATION Bridge (Photo
Fast Picture Viewer IDImager
Catalog)
Bibble Pro
PUBLISHING
TOOLS
DEDICATED
PRINTING
SOFTWARE
SCANNER
Qimage SOFTWARE
Quad Tone RIP Vuescan
Silverfast
50. A few photo cataloging applications
Product Notes Platforms Cost (EUR)
include Adobe Camera RAW,
Adobe Lightroom 4 many export features
WIN / MAC 130
Photo Supreme (formerly very powerful catalog
WIN / MAC 80
known as IDIMAGER) explorer, multiuser DB
Phase One Media Pro
(formerly known as Expression WIN / MAC ~85
Media, formerly as iView)
Apple Aperture 3 MAC 63
Corel AfterShot Pro (formerly WIN / MAC ~50
known as Bibble Pro)
RAW processing based on
Digikam Software
dcraw, rendition support from Linux free
Collection 3 version 2
Picasa 3.9 WIN / MAC free
PicaJet basic editing, multiuser DB WIN ~50
Common features:
○ parametric editor, with possibility to use an external editor
○ XMP support (with some issues when exporting/importing keyword hierarchies)
○ some kind of rendition support
○ trial period (typically 30 days)
55. References
1. Jonathan Good - How many photos have ever been taken? - September
15, 2011 - http://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox
2. Observatoire des Professions de l'Image - Les chiffres officiels 2010 du
marché de la photo et de l'image en France et dans le Monde - http://www.
sipec.org/pdf/OPI2011.pdf
3. UPDIG Photographers Guidelines v4.0 - Universal Photographic Imaging
Guidelines - http://www.updig.org/pdfs/updig_photographers_guidelines_v40.pdf
4. dpBestflow.org Best Practices - http://dpbestflow.org/links/32
5. Maurizio Agelli, Maria Laura Clemente, Mauro Del Rio, Daniela Ghironi,
Orlando Murru and Fabrizio Solinas, CRS4 - NotreDAM, a multi-user, web
based Digital Asset Management platform - TPDL 2011 Conference on
Theory and Practice of Digital Libraries, Berlin http://notredam.org/wp-
content/uploads/2012/02/TPDL2011-notredam-demo.pdf
6. MS Windows Dev center - People tagging Overview - http://msdn.microsoft.com/en-
us/library/windows/desktop/ee719905(v=vs.85).aspx#_people_tagging
56. Metadata Standards
○ Exchangeable image file format for digital still cameras: Exif
Version 2.3 http://www.cipa.jp/english/hyoujunka/kikaku/pdf/DC-008-2010_E.pdf
○ IPTC Information Interchange Model (IIM), IIM Schema for
XMP, Specification Version 1.0, Document Revision 1, 2008
http://www.iptc.org/std/IIM/4.1/specification/IPTC-IIM-Schema4XMP-1.0-spec_1.pdf
○ XMP Specification http://www.adobe.com/devnet/xmp.html
○ Part 1: Data Model, Serialization and Core Properties
○ Part 2: Additional Properties
○ Part 3: Storage in Files
○ PLUS Technical Specification http://ns.useplus.org/go.ashx
○ PRISM 2.0 Specifications http://www.prismstandard.org/specifications/
○
57. Further reading
○ Peter Krogh - The DAM Book, Digital Asset Management
for Photographers, 2nd edition - O'Reilly
○ Patti Russotti, Richard Anderson - Digital Photography
Best Practices and Workflow - Focal Press
○ Metadata Working Group - Guidelines for Handling Image
Metadata - http://www.metadataworkinggroup.org/specs/