SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
Pairtrees for object storage
                               John Kunze and Stephen Abrams, California Digital Library (CDL)

                                                                                                                                     Summary
The deadly embrace                                              Objects in a pairtree                                                Pairtree is the thinnest smear we can add to our very well-
• Digital repositories tend to require a surrender of storage   A pairtree is especially useful if, for each contained object,          understood filesystems and their universal tools (the
  transparency that creates unhealthy system dependency            all of the object’s parts, and nothing but its parts, are            universal “API”) to create a very well-understood,
• Internally objects are often broken up so that they can be       enclosed in the object’s directory                                   platform-independent object storage substrate
  difficult to piece together in case of trouble                Import such a pairtree and, knowing nothing about the                Pairtree is not a complete repository system, but it is
                                                                   objects’ structure and semantics, you can reliably                   complete for object storage and makes it easier to build
Fig. 1. Object storage should not                                                                                                       systems and to share objects between institutions
need a fearful entanglement with                                • Enumerate all objects and their identifiers
software. Since objects have to                                 • Produce any object by requested id
be parked in a filesystem before
repository software upgrade, what
                                                                • Maintain and back it up with ordinary OS tools                     Why pairs of characters?
                                                                • Rebuild the collection in case of database corruption              Taking two chars at a time balances path depth and
if we left them in there and built                                 simply by walking the pairtree                                       fanout (number of possible entries in any directory)
our repositories around them?
                                                                To walk a pairtree requires knowing path termination rules           • Example: ab2def3 ⇒ ab/2d/ef/3/
                                          Jim B L
                                                                • A pairpath terminates when you reach a file or reach a             • Each pair, letters+digits, has 36x36 possibilities
                                                                   directory name with 1 char or more than 2 chars                   Compared to taking one char at a time
                                                                  ab/                                                                • Only 36 possibilities, but path depth grows rapidly
A pairtree maps ids to paths,                                     --- cd/                                                           • Example: ab2def3 ⇒ a/b/2/d/e/f/3/
                                                                                                                                     At another extreme, taking seven characters at a time
 two characters at a time                                                   |--- foo/
                                                                            |      | README.txt                                      • Short paths, but 78 billion (367) possible items
A pairtree is a filesystem hierarchy that uses an identifier                |      | thumbnail.gif                                   • Example: ab2def3 ⇒ ab2def3/
   string to derive an object directory (or folder) location                |      |--- master_images/
• The derivation takes successive pairs of characters and                   |      |       |      ...
   creates a succession of directories, called a pairpath                   |      |
                                                                                                                                     Pairtree credits and details
                                                                            |      --- gh/                                          Pairtree specification:
              ab2def3 ⇒ ab/2d/ef/3/
                                                                            --- e/                                                       www.ietf.org/internet-drafts/draft-kunze-pairtree-01.txt
• A pairpath ends at directory containing an object’s files;                                                                              www.cdlib.org/inside/diglib/pairtree/pairtreespec.html
                                                                                   --- bar/
   most systems do variation of this (is variation needed?)                                                                          Authors from CDL and University of Michigan (UM):
                                                                                            | metadata
• Reverse the mapping to find all ids/objects in a pairtree;                                                                            Martin Haye, Erik Hetzner, John Kunze, Mark Reyes,
                                                                                           | 54321.wav
   pairpath termination rules permit variable length ids                                                                                and Cory Snavely; many thanks to Stephen Abrams,
                                                                                           | index.html                                 Sebastien Korner, Brian Tingle, et al
Pre-converting problematic characters                              Fig. 2. Example pairtree containing two objects:                  Pairtree origins include
Some identifier characters are inconvenient or illegal in          abcd and abcde. The first object is enclosed in                    • Prototype: UCSF tobacco control
  filenames and must be hex-encoded (e.g., *→^2a)                  directory foo/, the second in bar/. While foo/                    documents and CDL digitized books
      id:    what-the-*@?#!                                        does not subsume e/ at the same level, by                         • Early production: digitized books
         → what-the-^2a@^3f#!                                      enclosure, it does subsume the gh/ underneath it.                 for UM and Hathi Trust
         ⇒ wh/at/-t/he/-^/2a/@^/3f/#!                                                                                                                                                            cyocum



But to keep paths short, 3 common chars are converted to 3
  rare chars (at cost of complexity): /→= :→+ .→,               Sample software implementation                                       For further information
      id:    ark:/13030/xt12t3                                     http://search.cpan.org/~jak/Pairtree-0.2/lib/File/Pairtree.pm     Please contact jak@ucop.edu or stephen.abrams@ucop.edu
            → ark+=13030=xt12t3                                 A Perl module that implements two mappings: id2ppath() takes an      For information on CDL’s Preservation Program, see
            ⇒ ar/k+/=1/30/30/=x/t1/2t/3/                           id into a pairpath and ppath2id() performs the inverse mapping.     http://www.cdlib.org/programs/digital_preservation.html

Contenu connexe

Similaire à Pairtrees for object storage

Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
saintdevil163
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Anne Nicolas
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013
olberger
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
Akira Shibata
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLite
ISSGC Summer School
 
Query processing and optimization
Query processing and optimizationQuery processing and optimization
Query processing and optimization
Arif A.
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
Viswanath Gangavaram
 
The BagIt file package format
The BagIt file package formatThe BagIt file package format
The BagIt file package format
John Kunze
 
[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their Own[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their Own
Perforce
 

Similaire à Pairtrees for object storage (20)

Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
 
stream processing engine
stream processing enginestream processing engine
stream processing engine
 
Hadoop
HadoopHadoop
Hadoop
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive Biology
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
Streams, sockets and filters oh my!
Streams, sockets and filters oh my!Streams, sockets and filters oh my!
Streams, sockets and filters oh my!
 
Git studynotes
Git studynotesGit studynotes
Git studynotes
 
Bin carver
Bin carverBin carver
Bin carver
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mapping
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLite
 
Spark Gotchas and Lessons Learned
Spark Gotchas and Lessons LearnedSpark Gotchas and Lessons Learned
Spark Gotchas and Lessons Learned
 
Query processing and optimization
Query processing and optimizationQuery processing and optimization
Query processing and optimization
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
The BagIt file package format
The BagIt file package formatThe BagIt file package format
The BagIt file package format
 
[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their Own[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their Own
 

Plus de John Kunze

DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
John Kunze
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
John Kunze
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
John Kunze
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
John Kunze
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
John Kunze
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
John Kunze
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many Fronts
John Kunze
 

Plus de John Kunze (20)

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ Metadictionary
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary Builder
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDL
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy building
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for Persistence
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not Schemes
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forward
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabulary
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
 
Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research Datasets
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
 
Pamwg 2012ahm
Pamwg 2012ahmPamwg 2012ahm
Pamwg 2012ahm
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many Fronts
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Pairtrees for object storage

  • 1. Pairtrees for object storage John Kunze and Stephen Abrams, California Digital Library (CDL) Summary The deadly embrace Objects in a pairtree Pairtree is the thinnest smear we can add to our very well- • Digital repositories tend to require a surrender of storage A pairtree is especially useful if, for each contained object, understood filesystems and their universal tools (the transparency that creates unhealthy system dependency all of the object’s parts, and nothing but its parts, are universal “API”) to create a very well-understood, • Internally objects are often broken up so that they can be enclosed in the object’s directory platform-independent object storage substrate difficult to piece together in case of trouble Import such a pairtree and, knowing nothing about the Pairtree is not a complete repository system, but it is objects’ structure and semantics, you can reliably complete for object storage and makes it easier to build Fig. 1. Object storage should not systems and to share objects between institutions need a fearful entanglement with • Enumerate all objects and their identifiers software. Since objects have to • Produce any object by requested id be parked in a filesystem before repository software upgrade, what • Maintain and back it up with ordinary OS tools Why pairs of characters? • Rebuild the collection in case of database corruption Taking two chars at a time balances path depth and if we left them in there and built simply by walking the pairtree fanout (number of possible entries in any directory) our repositories around them? To walk a pairtree requires knowing path termination rules • Example: ab2def3 ⇒ ab/2d/ef/3/ Jim B L • A pairpath terminates when you reach a file or reach a • Each pair, letters+digits, has 36x36 possibilities directory name with 1 char or more than 2 chars Compared to taking one char at a time ab/ • Only 36 possibilities, but path depth grows rapidly A pairtree maps ids to paths, --- cd/ • Example: ab2def3 ⇒ a/b/2/d/e/f/3/ At another extreme, taking seven characters at a time two characters at a time |--- foo/ | | README.txt • Short paths, but 78 billion (367) possible items A pairtree is a filesystem hierarchy that uses an identifier | | thumbnail.gif • Example: ab2def3 ⇒ ab2def3/ string to derive an object directory (or folder) location | |--- master_images/ • The derivation takes successive pairs of characters and | | | ... creates a succession of directories, called a pairpath | | Pairtree credits and details | --- gh/ Pairtree specification: ab2def3 ⇒ ab/2d/ef/3/ --- e/ www.ietf.org/internet-drafts/draft-kunze-pairtree-01.txt • A pairpath ends at directory containing an object’s files; www.cdlib.org/inside/diglib/pairtree/pairtreespec.html --- bar/ most systems do variation of this (is variation needed?) Authors from CDL and University of Michigan (UM): | metadata • Reverse the mapping to find all ids/objects in a pairtree; Martin Haye, Erik Hetzner, John Kunze, Mark Reyes, | 54321.wav pairpath termination rules permit variable length ids and Cory Snavely; many thanks to Stephen Abrams, | index.html Sebastien Korner, Brian Tingle, et al Pre-converting problematic characters Fig. 2. Example pairtree containing two objects: Pairtree origins include Some identifier characters are inconvenient or illegal in abcd and abcde. The first object is enclosed in • Prototype: UCSF tobacco control filenames and must be hex-encoded (e.g., *→^2a) directory foo/, the second in bar/. While foo/ documents and CDL digitized books id: what-the-*@?#! does not subsume e/ at the same level, by • Early production: digitized books → what-the-^2a@^3f#! enclosure, it does subsume the gh/ underneath it. for UM and Hathi Trust ⇒ wh/at/-t/he/-^/2a/@^/3f/#! cyocum But to keep paths short, 3 common chars are converted to 3 rare chars (at cost of complexity): /→= :→+ .→, Sample software implementation For further information id: ark:/13030/xt12t3 http://search.cpan.org/~jak/Pairtree-0.2/lib/File/Pairtree.pm Please contact jak@ucop.edu or stephen.abrams@ucop.edu → ark+=13030=xt12t3 A Perl module that implements two mappings: id2ppath() takes an For information on CDL’s Preservation Program, see ⇒ ar/k+/=1/30/30/=x/t1/2t/3/ id into a pairpath and ppath2id() performs the inverse mapping. http://www.cdlib.org/programs/digital_preservation.html