SlideShare une entreprise Scribd logo
1  sur  147
Télécharger pour lire hors ligne
Repository
        Development
           at LC
Daniel Chudnov - 2009-10-01 - dchud at loc gov
       Access 2009 - Charlottetown, PEI
who we are
what we do
what’s next
who we are
30ish people
    dev, QA, PM, ops
from libs, uni, industry, etc.
OSI
     Office of
Strategic Initiatives
“...capture the digital artifact,
 register and/or deposit it for the
 Copyright Office, pass it along to
   those who decide whether to
include it in the Library, and allow
  it to be incorporated digitally in
 the collection, with the optimum
  flow-through of information for
 registration, cataloging, indexing,
          and preservation.”
                              (search for “LC21”)
or, to be precise
capture the
    “



digital artifact
 register and/or deposit it for the Copyright Office, pass
                                                          ,


it along to those who decide whether to include it in the
  Library, and allow it to be incorporated digitally in the
       collection, with the optimum flow-through of
                        information for
    registration, cataloging, indexing, and preservation.”

                                                (search for “LC21”)
“capture the digital artifact,


 register and/or
deposit it for the
Copyright Office                                          ,
 pass it along to those who decide whether to include it
in the Library, and allow it to be incorporated digitally in
    the collection, with the optimum flow-through of
                       information for
   registration, cataloging, indexing, and preservation.”
                                                 (search for “LC21”)
“capture the digital artifact, register and/or deposit
            it for the Copyright Office,



pass it along
  to those who decide whether to include it in the
Library, and allow it to be incorporated digitally in
 the collection, with the optimum flow-through of
                    information for
registration, cataloging, indexing, and preservation.”


                                            (search for “LC21”)
“capture the digital artifact, register and/or deposit it for the
Copyright Office, pass it along to those who decide whether to
                 include it in the Library, and



  allow it to be     incorporated digitally


in the collection
       with the optimum flow-through of information for
                                                                 ,

      registration, cataloging, indexing, and preservation.”

                                                    (search for “LC21”)
“capture the digital artifact, register and/or deposit it for
 the Copyright Office, pass it along to those who decide
  whether to include it in the Library, and allow it to be
incorporated digitally in the collection, with the optimum
             flow-through of information for


        registration,
         cataloging,
       indexing, and
       preservation                                 .”
                                                   (search for “LC21”)
what we do
“capture the
digital artifact”
at scale
world scale
  then
web scale
wdl.org
partners
 all over
the world
content from
  all over
 the world
users
 all over
the world
wdl.org/ru/
wdl.org/zh/
wdl.org/ar/
launched
April 2009
lots of press
9,026 req/s
1.25 Gbit/s
on day one
no crash
just bugs
  (yay!)
that was
new for LC
how?
solaris
apache
 nginx
 mysql
  solr
django
jquery
clean URIs

static pages
global edge caching
what we do
capture the artifact

   pass it along

cataloging, indexing
chroniclingamerica.loc.gov
139,582 title records
1,442,462 pages
freely available
               now


download whole issues - tell friends - mash it up
100+ TB
16 of 50+ states/terr.
 and growing quickly
how?
solaris
apache
 mysql
 solr
django
clean URIs

page caching
capture the artifact

   pass it along

cataloging, indexing,
    preservation
preservation
  storage
 “movage”
capture the artifact
BagIt

packing slip
  for data
.
|--   bag-info.txt
|--   bagit.txt
|--   data
|     |-- batch.xml
|     |-- batch_1.xml
|     |-- batch_ne_dewitt_rework
|     |    |-- 00206538016_batch.xml
|     |    |-- 00206538028_batch.xml
|     |    `-- sn99021999
|     `-- sn99021999
|
|
|
           |-- 00206538016
           |
           |
               |-- 0000.jp2
               |-- 0000.pdf
                               data in a Bag
|          |   |-- 0000.tif
|          |   |-- 0000.xml
|          |   |-- 0001.jp2
|          |   |-- 0001.pdf
|          |   |-- 0001.tif
|          |   |-- 0001.xml
.
|--
|--
      bag-info.txt
      bagit.txt                        identifies
                                         a bag
|--   data
|     |-- batch.xml
|     |-- batch_1.xml
|     |-- batch_ne_dewitt_rework
|     |    |-- 00206538016_batch.xml
|     |    |-- 00206538028_batch.xml
|     |    `-- sn99021999
|     `-- sn99021999
|          |-- 00206538016
|          |   |-- 0000.jp2
|          |   |-- 0000.pdf
|          |   |-- 0000.tif
|          |   |-- 0000.xml
|          |   |-- 0001.jp2
|          |   |-- 0001.pdf
|          |   |-- 0001.tif
|          |   |-- 0001.xml
.

                               where the
|--   bag-info.txt
|--   bagit.txt
|--   data
|
|
|
      |-- batch.xml
      |-- batch_1.xml          data starts
      |-- batch_ne_dewitt_rework
|     |    |-- 00206538016_batch.xml
|     |    |-- 00206538028_batch.xml
|     |    `-- sn99021999
|     `-- sn99021999
|          |-- 00206538016
|          |   |-- 0000.jp2
|          |   |-- 0000.pdf
|          |   |-- 0000.tif
|          |   |-- 0000.xml
|          |   |-- 0001.jp2
|          |   |-- 0001.pdf
|          |   |-- 0001.tif
|          |   |-- 0001.xml
.
|--   bag-info.txt
|--   bagit.txt
|--   data
|     |-- batch.xml
|     |-- batch_1.xml
|     |-- batch_ne_dewitt_rework
|     |    |-- 00206538016_batch.xml
|     |    |-- 00206538028_batch.xml
|     |    `-- sn99021999
|     `-- sn99021999
|          |-- 00206538016
|          |   |-- 0000.jp2
|          |   |-- 0000.pdf
|          |   |-- 0000.tif
|          |   |-- 0000.xml
|
|
|
           |
           |
           |
               |-- 0001.jp2
               |-- 0001.pdf
               |   ...
                                       packing
|--
`--
      manifest-md5.txt
      tagmanifest-md5.txt                slip
71607ad119be88c842268a76f0b6b9e9   data/sn99021999/00206538107/1884091301/0621.pdf
c602d2ac07508059ce5f5597e239b97f   data/sn99021999/00206538120/1885100601/0831.xml
a59795bd1584532d5cbc0b1d82f75cf8   data/sn99021999/00206538016/1880061401/0593.pdf
3c64fac7e2d49671e0d93908ae42a779   data/sn99021999/00206539616/1888101801/0905.xml
03158a560baa7479b3805d2b45ee02cd   data/sn99021999/00206538028/1880111501/0405.tif
fa56ea18580e1446939ed62709e5b2db   data/sn99021999/00206538077/1883061901/1145.pdf
bf4fb83ff8305e8256970a3466c1a12d   data/sn99021999/00206538120/1885061501/0043.pdf
8f3649fc812de74b9d9443ee90a8ac9c   data/sn99021999/00206538120/1885111101/1109.tif
e0b83a7f9ca228271fdaecf6348e1cec   data/sn99021999/00206538120/1885101201/0871.xml
1c2f84e12792c123ba0aabedd0c0bbad   data/sn99021999/00206538107/1884071401/0197.xml
080e557fe9f68037605e5b80df4bc4ac   data/sn99021999/0020653820A/1888050701/0543.tif
532efe32c156459d9d9589caf618f502   data/sn99021999/00206538120/1885071401/0250.tif
ce607af59a96f2656d9448f38ffda072   data/sn99021999/0020653820A/1888052801/0731.pdf
60b626d8fd40aca1b425e86a004bb055   data/sn99021999/00206539628/1888111801/0088.xml
a467cd62350334c7aa83cf1e9056c1c6   data/sn99021999/00206539616/1888091701/0629.jp2
1a434f7a4d843a2c8ffe8d0824fafc3f   data/sn99021999/00206538028/1880120801/0482.jp2
22996d89b4a3334256afaddcaa0238d8   data/sn99021999/00206538016/1874102001/0259.jp2
36f550da273ad4c592fee1761c98322a   data/sn99021999/00206538016/1880052201/0518.jp2
7f7ccec3f2afae896338498372fd476e   data/sn99021999/00206539616/1888080101/0200.pdf
c247a5d74d0e7f857c534d935661adbe   data/sn99021999/00206538107/1884072601/0286.jp2
4d497a18a154adcc8636239378ab340b   data/sn99021999/00206539628/1889021101/0868.pdf
2e8ca2558b54b5c49b2f20a355a60895   data/sn99021999/00206538065/1882092001/0136.xml
fb71493048e5010100f18012f5060d42   data/sn99021999/00206538028/1880123001/0569.xml
40b100432890b055a5defbfbea815d57   data/sn99021999/00206538107/1884090901/0590.xml
46f6d61480dadc1c988b0baa4de8b6c4   data/sn99021999/00206539628/1888122801/0463.pdf
1cb8af0648e8c9df395b63226fe7371f   data/sn99021999/00206538016/1874101501/0244.pdf
9257834023c683b02f354888b2740b8f   data/sn99021999/00206539616/1888102301/0956.xml
0d52b3b2b1c5459b7e8d500a8566b0bf   data/sn99021999/00206538120/1885080801/0425.tif
defines two things
1

  what i think
i’m sending you
2

whether you
 received it
just like
      a
packing slip
works across
   space
works across
  systems
works across
   orgs
works across
   time
easy to make
md5deep
BIL

 BagIt
Library
bvar@sun9 /ingest/bvar/test $ bag create --dest new_bag test_data/*
12:08:47,044 [main] INFO CommandLineBagDriver : Performing operation: create
2.301112941466272:2.3
12:08:47,141 [main] INFO ManifestImpl : Creating manifest for manifest-md5.txt
12:09:09,493 [main] INFO ManifestImpl : Creating manifest for tagmanifest-md5.txt
12:09:09,511 [main] INFO AbstractBagImpl : Writing bag
12:09:41,507 [main] INFO CommandLineBagDriver : Operation completed.
12:09:41,508 [main] INFO CommandLineBagDriver : Returning 0
bvar@sun9 /ingest/bvar/push/test_bag $ bag isvalid .
11:55:45,582 [main] INFO CommandLineBagDriver : Performing operation: isvalid
11:55:46,378 [main] INFO ManifestImpl : Creating manifest for manifest-md5.txt
11:55:46,458 [main] INFO ManifestImpl : Creating manifest for tagmanifest-md5.txt
11:55:46,540 [main] INFO AbstractBagImpl : Completion check: Result is true.
11:56:21,273 [main] INFO AbstractBagImpl : Validity check: Result is true.
11:56:21,273 [main] INFO CommandLineBagDriver : Result is true.
11:56:21,274 [main] INFO CommandLineBagDriver : Returning 0
bvar@sun9 /ingest/bvar/push/test_bag $
Bagger
free/open source
     releases
     from LC
sf.net/projects/loc-xferutils/



 get yours today - tell friends - start trading bags
that was
new for LC
pass it along
transfer
inventory
workflow
transfer UI - inventory - workflow
how?
apache
spring/mvc
 hibernate
   mysql
and other
automation
 strategies
lots of
   work
still to do
lots of
integration
 still to do
register/deposit
       for
   Copyright
not my area,
    but
we hope to support
    eDeposit
 with these tools
“Deposit Demand”

     June 2009
  Federal Register
Proposed Rulemaking
stay tuned
        or
ask my colleagues :)

    (ask me whom to ask)
but, not my area
“allow it to be...
      incorporated digitally


in the collection”
“allow it to be...


incorporated
   digitally
    in the collection”
how?
traditional approach:

  catalog records
    exhibit sites
cost of
integrating everything
        is high
cost of
updating everything
      is high
cost of
consistent web strategies
         is low
for example
Linked Data
use URIs as names for things
       use HTTP URIs
 provide useful information
 include links to other URIs
 http://www.w3.org/DesignIssues/LinkedData.html
id.loc.gov
LCSH
on the web
    free
clean URIs
follow
 your
 nose

formats
view source
<link rel="alternate"
  type="application/rdf+xml"
  href="/authorities/sh00009460.rdf" />
<link rel="alternate"
  type="text/plain"
  href="/authorities/sh00009460.nt" />
<link rel="alternate"
  type="application/json"
  href="/authorities/sh00009460.json" />
<rdf:RDF>
 <rdf:Description rdf:about="http://id.loc.gov/authorities/
sh00009460#concept">
 <dcterms:modified rdf:datatype="http://www.w3.org/2001/
XMLSchema#dateTime">2000-11-27T10:39:57-04:00</dcterms:modified>
 <skos:prefLabel xml:lang="en">National parks and reserves--Prince Edward
Island</skos:prefLabel>
 <owl:sameAs rdf:resource="info:lc/authorities/sh00009460"/>
 <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
 <skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/>
 <skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/>
 <dcterms:created rdf:datatype="http://www.w3.org/2001/
XMLSchema#dateTime">2000-10-17T00:00:00-04:00</dcterms:created>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh2002010534#concept"/>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh2008004743#concept"/>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh2003002637#concept"/>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh00009458#concept"/>
 </rdf:Description>
 <rdf:Description rdf:about="http://id.loc.gov/authorities/
sh2002010534#concept">
 <skos:prefLabel xml:lang="en">Prince Edward Island National Park (P.E.I.)
 </skos:prefLabel>
</rdf:Description>
<rdf:RDF>
 <rdf:Description rdf:about="http://id.loc.gov/authorities/
sh00009460#concept">
 <dcterms:modified rdf:datatype="http://www.w3.org/2001/
XMLSchema#dateTime">2000-11-27T10:39:57-04:00</dcterms:modified>
 <skos:prefLabel xml:lang="en">National parks and reserves--Prince Edward
Island</skos:prefLabel>
 <owl:sameAs rdf:resource="info:lc/authorities/sh00009460"/>
 <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
 <skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/>
 <skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/>
 <dcterms:created rdf:datatype="http://www.w3.org/2001/
XMLSchema#dateTime">2000-10-17T00:00:00-04:00</dcterms:created>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh2002010534#concept"/>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh2008004743#concept"/>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh2003002637#concept"/>
 <skos:narrower rdf:resource="http://id.loc.gov/authorities/
sh00009458#concept"/>
 </rdf:Description>
 <rdf:Description rdf:about="http://id.loc.gov/authorities/
sh2002010534#concept">
 <skos:prefLabel xml:lang="en">Prince Edward Island National Park (P.E.I.)
 </skos:prefLabel>
</rdf:Description>
                          explicit concepts, schema, meaning
a web of data...
...with precise meaning
at this URI
   is this
concept
 with this
meaning
a standard way
  to refer to
   a heading
freely available
                   now


download the whole thing - tell friends - amaze enemies
that was
new for LC
another example
<link rel="resourcemap"
  type="application/rdf+xml" href="/lccn/
sn83030214/1905-01-15/ed-1/seq-25.rdf" />
<link rel="alternate"
  type="image/jp2" href="/lccn/sn83030214/1905-01-15/
ed-1/seq-25.jp2" />
<link rel="alternate"
  type="application/pdf" href="/lccn/
sn83030214/1905-01-15/ed-1/seq-25.pdf" />
<link rel="alternate"
  type="application/xml" href="/lccn/
sn83030214/1905-01-15/ed-1/seq-25/ocr.xml" />
<link rel="alternate"
  type="text/plain" href="/lccn/
sn83030214/1905-01-15/ed-1/seq-25/ocr.txt" />
<rdf:Description rdf:about="/lccn/sn83030214/1905-01-15/ed-1/
seq-25#page">
    <ore:isDescribedBy rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/
seq-25.rdf"/>
    <foaf:depiction rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/
seq-25/thumbnail.jpg"/>
    <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/
seq-25.jp2"/>
    <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/
seq-25/ocr.txt"/>
    <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/
seq-25.pdf"/>
    <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/
seq-25/ocr.xml"/>
    <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/
seq-25/thumbnail.jpg"/>
    <rdf:type rdf:resource="http://chroniclingamerica.loc.gov/
terms#Page"/>
    <ore:isAggregatedBy rdf:resource="/lccn/sn83030214/1905-01-15/
ed-1#issue"/>
    <dcterms:issued rdf:datatype="http://www.w3.org/2001/
XMLSchema#date">1905-01-15</dcterms:issued>
    <ndnp:sequence rdf:datatype="http://www.w3.org/2001/
XMLSchema#long">25</ndnp:sequence>
    <dcterms:title>New-York tribune. - 1905-01-15 - 25</dcterms:title>
</rdf:Description>
OAI-ORE
aggregation
this is a
  page
it has these
files in these formats
it is this
sequence number
it is
part of this issue
it has this
issue date
it has this
    title
all explicit concepts
all exposed
 in the app
on the web
that was
new for LC
the web
is the API
the

web
is the

API
there’s an API doc...
...it’s just a
bunch of links
“...make resources

 available
             and

    useful             ...”


from the mission of the Library
“allow it to be...


incorporated
   digitally
      in the collection”



   from the LC21 report
“...sustain and preserve
               a

 universal
collection                    ...”


from the mission of the Library
each app
consistent
  about
 meaning
follow your nose
        to
concept definitions
in our apps
and in yours
distributed
conceptual
integration
the web is a
universal collection
this is a way to
incorporate digitally
our digital artifacts
   on our web
your digital artifacts
   in your web
our digital artifacts
   in your web
your digital artifacts
    in our web
available
   &
 useful
  &c.
summary
content that scales
  on the way in
apps that scale
on the way out
movage
movage
movage
transfer
  inventory
  workflow

all in active development
the BagIt spec



   try it - it works
free/open source
software releases
free data
you can use
web of data
available and useful
view source:

           wdl.org
 chroniclingamerica.loc.gov
          id.loc.gov
sf.net/projects/loc-xferutils/

   dchud at loc gov - @dchud

Contenu connexe

Similaire à Repository Development at LC - Access 2009

Git 101 tutorial presentation
Git 101 tutorial presentationGit 101 tutorial presentation
Git 101 tutorial presentation
Terry Wang
 

Similaire à Repository Development at LC - Access 2009 (7)

Leinfelder Earth Grid Jam2008
Leinfelder Earth Grid Jam2008Leinfelder Earth Grid Jam2008
Leinfelder Earth Grid Jam2008
 
eFileCabinet Manual Version 4.0
eFileCabinet Manual Version 4.0eFileCabinet Manual Version 4.0
eFileCabinet Manual Version 4.0
 
Git 101 tutorial presentation
Git 101 tutorial presentationGit 101 tutorial presentation
Git 101 tutorial presentation
 
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
 
Icinga Camp Antwerp - Current State of Icinga
Icinga Camp Antwerp - Current State of IcingaIcinga Camp Antwerp - Current State of Icinga
Icinga Camp Antwerp - Current State of Icinga
 
iMarine Products and Services delivery
iMarine Products and Services deliveryiMarine Products and Services delivery
iMarine Products and Services delivery
 
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
 

Plus de Dan Chudnov

Capturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed ManagerCapturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed Manager
Dan Chudnov
 
think locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talkthink locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talk
Dan Chudnov
 

Plus de Dan Chudnov (15)

Overview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research LabOverview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research Lab
 
stuff i'm learning in data school
stuff i'm learning in data schoolstuff i'm learning in data school
stuff i'm learning in data school
 
Capturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed ManagerCapturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed Manager
 
think locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talkthink locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talk
 
what i want from linked data
what i want from linked datawhat i want from linked data
what i want from linked data
 
collecting twitter data w/social feed manager
collecting twitter data w/social feed managercollecting twitter data w/social feed manager
collecting twitter data w/social feed manager
 
web archiving tools and technologies
web archiving tools and technologiesweb archiving tools and technologies
web archiving tools and technologies
 
20121018 Access "social feed manager"
20121018 Access "social feed manager"20121018 Access "social feed manager"
20121018 Access "social feed manager"
 
WWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service MediumWWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service Medium
 
introduction to Django in five slides
introduction to Django in five slides introduction to Django in five slides
introduction to Django in five slides
 
Linking Library Data on the Web
Linking Library Data on the WebLinking Library Data on the Web
Linking Library Data on the Web
 
Hacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, PythonHacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, Python
 
Hacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and PythonHacker102 - RegExes w/JavaScript and Python
Hacker102 - RegExes w/JavaScript and Python
 
Hacker 101/102 - Introduction to Programming w/Processing
Hacker 101/102 - Introduction to Programming w/ProcessingHacker 101/102 - Introduction to Programming w/Processing
Hacker 101/102 - Introduction to Programming w/Processing
 
TCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linkingTCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linking
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Repository Development at LC - Access 2009

  • 1. Repository Development at LC Daniel Chudnov - 2009-10-01 - dchud at loc gov Access 2009 - Charlottetown, PEI
  • 2. who we are what we do what’s next
  • 4. 30ish people dev, QA, PM, ops from libs, uni, industry, etc.
  • 5. OSI Office of Strategic Initiatives
  • 6. “...capture the digital artifact, register and/or deposit it for the Copyright Office, pass it along to those who decide whether to include it in the Library, and allow it to be incorporated digitally in the collection, with the optimum flow-through of information for registration, cataloging, indexing, and preservation.” (search for “LC21”)
  • 7. or, to be precise
  • 8. capture the “ digital artifact register and/or deposit it for the Copyright Office, pass , it along to those who decide whether to include it in the Library, and allow it to be incorporated digitally in the collection, with the optimum flow-through of information for registration, cataloging, indexing, and preservation.” (search for “LC21”)
  • 9. “capture the digital artifact, register and/or deposit it for the Copyright Office , pass it along to those who decide whether to include it in the Library, and allow it to be incorporated digitally in the collection, with the optimum flow-through of information for registration, cataloging, indexing, and preservation.” (search for “LC21”)
  • 10. “capture the digital artifact, register and/or deposit it for the Copyright Office, pass it along to those who decide whether to include it in the Library, and allow it to be incorporated digitally in the collection, with the optimum flow-through of information for registration, cataloging, indexing, and preservation.” (search for “LC21”)
  • 11. “capture the digital artifact, register and/or deposit it for the Copyright Office, pass it along to those who decide whether to include it in the Library, and allow it to be incorporated digitally in the collection with the optimum flow-through of information for , registration, cataloging, indexing, and preservation.” (search for “LC21”)
  • 12. “capture the digital artifact, register and/or deposit it for the Copyright Office, pass it along to those who decide whether to include it in the Library, and allow it to be incorporated digitally in the collection, with the optimum flow-through of information for registration, cataloging, indexing, and preservation .” (search for “LC21”)
  • 16. world scale then web scale
  • 19. content from all over the world
  • 29. how?
  • 30. solaris apache nginx mysql solr django jquery
  • 34. capture the artifact pass it along cataloging, indexing
  • 38. freely available now download whole issues - tell friends - mash it up
  • 39. 100+ TB 16 of 50+ states/terr. and growing quickly
  • 40. how?
  • 43. capture the artifact pass it along cataloging, indexing, preservation
  • 44. preservation storage “movage”
  • 46. BagIt packing slip for data
  • 47. . |-- bag-info.txt |-- bagit.txt |-- data | |-- batch.xml | |-- batch_1.xml | |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | | | |-- 00206538016 | | |-- 0000.jp2 |-- 0000.pdf data in a Bag | | |-- 0000.tif | | |-- 0000.xml | | |-- 0001.jp2 | | |-- 0001.pdf | | |-- 0001.tif | | |-- 0001.xml
  • 48. . |-- |-- bag-info.txt bagit.txt identifies a bag |-- data | |-- batch.xml | |-- batch_1.xml | |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | |-- 00206538016 | | |-- 0000.jp2 | | |-- 0000.pdf | | |-- 0000.tif | | |-- 0000.xml | | |-- 0001.jp2 | | |-- 0001.pdf | | |-- 0001.tif | | |-- 0001.xml
  • 49. . where the |-- bag-info.txt |-- bagit.txt |-- data | | | |-- batch.xml |-- batch_1.xml data starts |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | |-- 00206538016 | | |-- 0000.jp2 | | |-- 0000.pdf | | |-- 0000.tif | | |-- 0000.xml | | |-- 0001.jp2 | | |-- 0001.pdf | | |-- 0001.tif | | |-- 0001.xml
  • 50. . |-- bag-info.txt |-- bagit.txt |-- data | |-- batch.xml | |-- batch_1.xml | |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | |-- 00206538016 | | |-- 0000.jp2 | | |-- 0000.pdf | | |-- 0000.tif | | |-- 0000.xml | | | | | | |-- 0001.jp2 |-- 0001.pdf | ... packing |-- `-- manifest-md5.txt tagmanifest-md5.txt slip
  • 51. 71607ad119be88c842268a76f0b6b9e9 data/sn99021999/00206538107/1884091301/0621.pdf c602d2ac07508059ce5f5597e239b97f data/sn99021999/00206538120/1885100601/0831.xml a59795bd1584532d5cbc0b1d82f75cf8 data/sn99021999/00206538016/1880061401/0593.pdf 3c64fac7e2d49671e0d93908ae42a779 data/sn99021999/00206539616/1888101801/0905.xml 03158a560baa7479b3805d2b45ee02cd data/sn99021999/00206538028/1880111501/0405.tif fa56ea18580e1446939ed62709e5b2db data/sn99021999/00206538077/1883061901/1145.pdf bf4fb83ff8305e8256970a3466c1a12d data/sn99021999/00206538120/1885061501/0043.pdf 8f3649fc812de74b9d9443ee90a8ac9c data/sn99021999/00206538120/1885111101/1109.tif e0b83a7f9ca228271fdaecf6348e1cec data/sn99021999/00206538120/1885101201/0871.xml 1c2f84e12792c123ba0aabedd0c0bbad data/sn99021999/00206538107/1884071401/0197.xml 080e557fe9f68037605e5b80df4bc4ac data/sn99021999/0020653820A/1888050701/0543.tif 532efe32c156459d9d9589caf618f502 data/sn99021999/00206538120/1885071401/0250.tif ce607af59a96f2656d9448f38ffda072 data/sn99021999/0020653820A/1888052801/0731.pdf 60b626d8fd40aca1b425e86a004bb055 data/sn99021999/00206539628/1888111801/0088.xml a467cd62350334c7aa83cf1e9056c1c6 data/sn99021999/00206539616/1888091701/0629.jp2 1a434f7a4d843a2c8ffe8d0824fafc3f data/sn99021999/00206538028/1880120801/0482.jp2 22996d89b4a3334256afaddcaa0238d8 data/sn99021999/00206538016/1874102001/0259.jp2 36f550da273ad4c592fee1761c98322a data/sn99021999/00206538016/1880052201/0518.jp2 7f7ccec3f2afae896338498372fd476e data/sn99021999/00206539616/1888080101/0200.pdf c247a5d74d0e7f857c534d935661adbe data/sn99021999/00206538107/1884072601/0286.jp2 4d497a18a154adcc8636239378ab340b data/sn99021999/00206539628/1889021101/0868.pdf 2e8ca2558b54b5c49b2f20a355a60895 data/sn99021999/00206538065/1882092001/0136.xml fb71493048e5010100f18012f5060d42 data/sn99021999/00206538028/1880123001/0569.xml 40b100432890b055a5defbfbea815d57 data/sn99021999/00206538107/1884090901/0590.xml 46f6d61480dadc1c988b0baa4de8b6c4 data/sn99021999/00206539628/1888122801/0463.pdf 1cb8af0648e8c9df395b63226fe7371f data/sn99021999/00206538016/1874101501/0244.pdf 9257834023c683b02f354888b2740b8f data/sn99021999/00206539616/1888102301/0956.xml 0d52b3b2b1c5459b7e8d500a8566b0bf data/sn99021999/00206538120/1885080801/0425.tif
  • 53. 1 what i think i’m sending you
  • 55. just like a packing slip
  • 56. works across space
  • 57. works across systems
  • 58. works across orgs
  • 59. works across time
  • 63. bvar@sun9 /ingest/bvar/test $ bag create --dest new_bag test_data/* 12:08:47,044 [main] INFO CommandLineBagDriver : Performing operation: create 2.301112941466272:2.3 12:08:47,141 [main] INFO ManifestImpl : Creating manifest for manifest-md5.txt 12:09:09,493 [main] INFO ManifestImpl : Creating manifest for tagmanifest-md5.txt 12:09:09,511 [main] INFO AbstractBagImpl : Writing bag 12:09:41,507 [main] INFO CommandLineBagDriver : Operation completed. 12:09:41,508 [main] INFO CommandLineBagDriver : Returning 0 bvar@sun9 /ingest/bvar/push/test_bag $ bag isvalid . 11:55:45,582 [main] INFO CommandLineBagDriver : Performing operation: isvalid 11:55:46,378 [main] INFO ManifestImpl : Creating manifest for manifest-md5.txt 11:55:46,458 [main] INFO ManifestImpl : Creating manifest for tagmanifest-md5.txt 11:55:46,540 [main] INFO AbstractBagImpl : Completion check: Result is true. 11:56:21,273 [main] INFO AbstractBagImpl : Validity check: Result is true. 11:56:21,273 [main] INFO CommandLineBagDriver : Result is true. 11:56:21,274 [main] INFO CommandLineBagDriver : Returning 0 bvar@sun9 /ingest/bvar/push/test_bag $
  • 65. free/open source releases from LC
  • 66. sf.net/projects/loc-xferutils/ get yours today - tell friends - start trading bags
  • 70. transfer UI - inventory - workflow
  • 71. how?
  • 74. lots of work still to do
  • 76. register/deposit for Copyright
  • 78. we hope to support eDeposit with these tools
  • 79. “Deposit Demand” June 2009 Federal Register Proposed Rulemaking
  • 80. stay tuned or ask my colleagues :) (ask me whom to ask)
  • 81. but, not my area
  • 82. “allow it to be... incorporated digitally in the collection”
  • 83. “allow it to be... incorporated digitally in the collection”
  • 84. how?
  • 85. traditional approach: catalog records exhibit sites
  • 88. cost of consistent web strategies is low
  • 91. use URIs as names for things use HTTP URIs provide useful information include links to other URIs http://www.w3.org/DesignIssues/LinkedData.html
  • 94.
  • 95. clean URIs follow your nose formats
  • 97. <link rel="alternate" type="application/rdf+xml" href="/authorities/sh00009460.rdf" /> <link rel="alternate" type="text/plain" href="/authorities/sh00009460.nt" /> <link rel="alternate" type="application/json" href="/authorities/sh00009460.json" />
  • 98. <rdf:RDF> <rdf:Description rdf:about="http://id.loc.gov/authorities/ sh00009460#concept"> <dcterms:modified rdf:datatype="http://www.w3.org/2001/ XMLSchema#dateTime">2000-11-27T10:39:57-04:00</dcterms:modified> <skos:prefLabel xml:lang="en">National parks and reserves--Prince Edward Island</skos:prefLabel> <owl:sameAs rdf:resource="info:lc/authorities/sh00009460"/> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/> <skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/> <dcterms:created rdf:datatype="http://www.w3.org/2001/ XMLSchema#dateTime">2000-10-17T00:00:00-04:00</dcterms:created> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh2002010534#concept"/> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh2008004743#concept"/> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh2003002637#concept"/> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh00009458#concept"/> </rdf:Description> <rdf:Description rdf:about="http://id.loc.gov/authorities/ sh2002010534#concept"> <skos:prefLabel xml:lang="en">Prince Edward Island National Park (P.E.I.) </skos:prefLabel> </rdf:Description>
  • 99. <rdf:RDF> <rdf:Description rdf:about="http://id.loc.gov/authorities/ sh00009460#concept"> <dcterms:modified rdf:datatype="http://www.w3.org/2001/ XMLSchema#dateTime">2000-11-27T10:39:57-04:00</dcterms:modified> <skos:prefLabel xml:lang="en">National parks and reserves--Prince Edward Island</skos:prefLabel> <owl:sameAs rdf:resource="info:lc/authorities/sh00009460"/> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/> <skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/> <dcterms:created rdf:datatype="http://www.w3.org/2001/ XMLSchema#dateTime">2000-10-17T00:00:00-04:00</dcterms:created> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh2002010534#concept"/> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh2008004743#concept"/> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh2003002637#concept"/> <skos:narrower rdf:resource="http://id.loc.gov/authorities/ sh00009458#concept"/> </rdf:Description> <rdf:Description rdf:about="http://id.loc.gov/authorities/ sh2002010534#concept"> <skos:prefLabel xml:lang="en">Prince Edward Island National Park (P.E.I.) </skos:prefLabel> </rdf:Description> explicit concepts, schema, meaning
  • 100. a web of data...
  • 102. at this URI is this concept with this meaning
  • 103. a standard way to refer to a heading
  • 104. freely available now download the whole thing - tell friends - amaze enemies
  • 107.
  • 108. <link rel="resourcemap" type="application/rdf+xml" href="/lccn/ sn83030214/1905-01-15/ed-1/seq-25.rdf" /> <link rel="alternate" type="image/jp2" href="/lccn/sn83030214/1905-01-15/ ed-1/seq-25.jp2" /> <link rel="alternate" type="application/pdf" href="/lccn/ sn83030214/1905-01-15/ed-1/seq-25.pdf" /> <link rel="alternate" type="application/xml" href="/lccn/ sn83030214/1905-01-15/ed-1/seq-25/ocr.xml" /> <link rel="alternate" type="text/plain" href="/lccn/ sn83030214/1905-01-15/ed-1/seq-25/ocr.txt" />
  • 109. <rdf:Description rdf:about="/lccn/sn83030214/1905-01-15/ed-1/ seq-25#page"> <ore:isDescribedBy rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/ seq-25.rdf"/> <foaf:depiction rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/ seq-25/thumbnail.jpg"/> <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/ seq-25.jp2"/> <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/ seq-25/ocr.txt"/> <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/ seq-25.pdf"/> <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/ seq-25/ocr.xml"/> <ore:aggregates rdf:resource="/lccn/sn83030214/1905-01-15/ed-1/ seq-25/thumbnail.jpg"/> <rdf:type rdf:resource="http://chroniclingamerica.loc.gov/ terms#Page"/> <ore:isAggregatedBy rdf:resource="/lccn/sn83030214/1905-01-15/ ed-1#issue"/> <dcterms:issued rdf:datatype="http://www.w3.org/2001/ XMLSchema#date">1905-01-15</dcterms:issued> <ndnp:sequence rdf:datatype="http://www.w3.org/2001/ XMLSchema#long">25</ndnp:sequence> <dcterms:title>New-York tribune. - 1905-01-15 - 25</dcterms:title> </rdf:Description>
  • 111. this is a page
  • 112. it has these files in these formats
  • 114. it is part of this issue
  • 116. it has this title
  • 118. all exposed in the app on the web
  • 122. there’s an API doc...
  • 124. “...make resources available and useful ...” from the mission of the Library
  • 125. “allow it to be... incorporated digitally in the collection” from the LC21 report
  • 126. “...sustain and preserve a universal collection ...” from the mission of the Library
  • 127. each app consistent about meaning
  • 128. follow your nose to concept definitions
  • 129. in our apps and in yours
  • 131. the web is a universal collection
  • 132. this is a way to incorporate digitally
  • 133. our digital artifacts on our web
  • 134. your digital artifacts in your web
  • 135. our digital artifacts in your web
  • 136. your digital artifacts in our web
  • 137. available & useful &c.
  • 139. content that scales on the way in
  • 140. apps that scale on the way out
  • 142. transfer inventory workflow all in active development
  • 143. the BagIt spec try it - it works
  • 146. web of data available and useful
  • 147. view source: wdl.org chroniclingamerica.loc.gov id.loc.gov sf.net/projects/loc-xferutils/ dchud at loc gov - @dchud