SlideShare a Scribd company logo
1 of 42
The Biodiversity Heritage Library: Liberating
                     the World’s Biodiversity Literature

Thomas Garnett                                     EOL Fellows March 2010
BHL- Why?

       The cited half-life of
       publications in taxonomy
       is longer than in any other
       scientific discipline
         -Macro-economic case for open access, Tom Moritz

       -Current taxonomic
       literature often relies on
       texts and specimens > 100
       years old.
      Levinus Vincent
      Elenchus tabularum, pinacothecarum, 1719
.
                                                    2
BHL – Why?

  The Taxonomic
    Impediment

“The taxonomic
impediment is a term
that describes the gaps
of knowledge in our
taxonomic system”
                                       - Darwin Declaration, 1998

Georges Louis Leclerc, comte de Buffon
Histoire naturelle : générale et particulière (Oiseaux), 1799-1808


                                                                     3
BHL Members: US/UK
•   Academy of Natural Science (Philadelphia, PA)
•   American Museum of Natural History (New York, NY)
•   California Academy of Science (San Francisco, CA)
•   The Field Museum (Chicago, IL)
•   Harvard University Botany Libraries (Cambridge, MA)
•   Harvard University, Ernst Mayr Library of the Museum of
    Comparative Zoology (Cambridge, MA)
•   Marine Biological Laboratory / Woods Hole Oceanographic
    Institution (Woods Hole, MA)
•   Missouri Botanical Garden (St. Louis, MO)
•   Natural History Museum (London, UK)
•   The New York Botanical Garden (New York, NY)
•   Royal Botanic Gardens, Kew (Richmond, UK)
•   Smithsonian Institution Libraries (Washington, DC)
BHL Members: BHL-Europe
•   Museum für Naturkunde - Leibniz-Institut   •   Stichting Nationaal Natuurhistorisch
    für Evolutions- und                            Museum, Naturalis
    Biodiversitätsforschung an der Humboldt-   •   National Botanic Garden of Belgium
    Universität zu Berlin                      •   Royal Museum for Central Africa,
•   Natural History Museum, UK                 •   Royal Belgian Institute of Natural
•   Narodni muzeum NMP CZ                          Sciences
•   Angewandte Informationstechnik             •   Bibliothèque nationale de France
    Forschungsgesellschaft mbH                 •   Museum national d’histoire naturelle
•   Freie Universität Berlin FUBBGBM           •   Consejo Superior de Investigaciones
•   Georg-August-Universität Göttingen             Cientificas
    Stiftung Öffentlichen Rechts               •   Università degli Studi di Firenze
•   Naturhistorisches Museum Wien              •   Royal Botanic Garden, Edinburgh
•   Hungarian Natural History Museum           •   Species 2000
•   Museum and Institute of Zoology, Polish    •   John Wiley & Sons limited
    Academy of Sciences
•   University of Copenhagen                   •   Helsingin yliopisto UH-Viikki
BHL Members: BHL-China
• Chinese Academy of Science – Institute of
  Botany
• Chinese Academy of Science – Institute of
  Zoology
• Chinese Academy of Science – Institute of
  Microbiology
• Chinese Academy Science - Institute of
  Oceanography
BHL is a Focused Program
•   Though BHL has is composed of libraries it
    has been a domain-specific program, not just a
    digital library project. It arose from and is
    responsive to the biodiversity community
    composed of the disciplines of taxonomy,
    systematics, evolutionary biology, ecology,
    conservation, and wildlife management. These
    are the primary audience.
Biomechanics
                                 Biochemistry    Biomagnetism
                                                                                        Core
                            Bioelectronics            Zoos   Radioecology
                     Bioacoustics
                                                                                         Supporting
                   Petrology                     Agricultural ecology Sedimentation
                                        Paleontology Biogeomorphology            Orogeny
           Geophysics                                                               Microscopy
                                    BioclimatologyForestry         Restoration
       Geochemistry          History of         Scientific drawing ecology               Taxidermy
      Stratigraphy           Natural sciences& illustration            Soil science         Vivariums,
                              Animal biochemistry         Aquaculture
                                                                                            terrariums,
   Geomicrobiology      Natural History – Animal culture Medical botany / zoology           aquariums
                        Terminology, Abbrv.                            Cyanobacteria
 Geomorphology                                                                                   Immunology
                       Specimen catalogs                          Natural History –
 Toponymy                     Ecophysiology                       Dictionaries & Encyclopedias animal
                                                                                              Wile
                                                                                              trade
 Physical geography Collection &                                       Natural History –             Virology
                      preservation
                                                                       Biographies           Environmental
  Mineralogy              Continental drift
                                                                    Natural History –        Policy
  Socio-cultural        Plate tectonics                             Directories
   Anthropology                    Oceanography                      Economic botany        Environmental
                           Plant Culture           Microbial ecology                        Management
                                                                     Geobiology
          Ethnology          History of discoveries, Seismology
                                                                                         Biophysics
                             Exploration & travelBioluminescence Hydrology
            Plant lore              Phenology       Atlases & Gazeteers               Cytology
                                        Wildlife conservation                          Genetics
              Melioration
                                          Coral Islands, Reefs & Atolls
                     Physical Anthropology                                Fluid dynamics
Topical terms                Crops and climate           Prehistoric archaeology                 Outliers
                                          Agricultural meteorology
derived from LCSH
Core Literature
                     Botany Plant conservation
           Phytogeography              Plant anatomy
                    Plant physiology Plant ecology
              Spermatophyta, Phanerogams             Cryptogams
            Biological diversity      Evolution
              Phylogenetic relationships Evolutionary genetics
                     Scientific voyages and expeditions
            Pre-Linnaean works         Linnaean works
       Biodiversity conservation Conservation biology
    Ecosystem management       Endangered species & ecosystems
       Extinction         Classification, Nomenclature
                           Biogeography
    Zoology/Botany--Morphology      Zoology/Botany--Anatomy
        Zoology/Botany--Embryology      Zoology/Botany--
      Reproduction Zoology/Botany--Geographical distribution
             Classification, systematics and taxonomy
          Zoology Invertebrates Chordates     Vertebrates
                           Animal Behavior
Stats: Now Online

• 70,630 volumes
• 26.4 million pages




                       Oldest book: Schöffer’s Herbarius, 1484.
What is the plan?
Digitize the core literature of biodiversity. Full works, not
bits & pieces.
Open Access: all content can be repurposed, reused,
reformatted.
Congruent: must fit in to a dynamic knowledge ecology.
Scan public domain biodiversity literature.
Negotiate rights to digitize copyrighted materials.
Ingest content digitized by others.
Provide interfaces & APIs for repository.
     GUIs
     Services for data mining & citation resolution
BHL Digital Preservation
•   Committed to long-term storage, curation,
    and preservation of digital text assets for
    the world-wide biodiversity community
•   BHL is a steward for this literature.
•   To keep this content available and open for
    the future requires careful organizational
    planning.
•   Preservation is both a technical and
    political/social process.
BHL Relationship with Non-Profit Journal
              Publishers

Opt in Copyright Model: The BHL works with professional societies and
   associations to integrate their publications into the BHL in a way that
   serves the societies’ missions and goals
BHL indexes the articles using Taxonomic Intelligence, thereby vastly
   increasing their usability.
Publishers’ content is embedded in the emerging knowledge ecology
   that is sweeping biology in this century .
73 Permission Agreements to date. More under negotiation.
Integration with gray literature in later phases of project.
Scanning = human work
Scan & Store: Internet Archive


                          Storage in Petaboxes




Scanning on Scribes
Referrers: 1 Jan 08 – 31 Jan 10




Jan 1, 2008 – Jan 31, 2010
Name Finding via TaxonFinder
SOAP response Name finding via TaxonFinder Submit Extract names
                                                  to NameBank
  Image from Scanner Converted to text OCR
                     via OC OCR OCR




         Name Finding in action
   with Taxonomic Intelligence…
OCR error rate for names only


Of the 3,003 names, 1,056 were incorrectly transcribed by OCR.


                                     Top OCR errors
                                      1   Insert Space   8       n->v
                35.16%                2    Omit Space    9       l->i
                                      3       e->c       10      r->i
                                      4       u->I       11      u->ii
                                      5       u->n       12      h->l
                                      6        i->l      13      h->ii
                                      7       c->e       14      e->o
Considerations

• Improving OCR software is out of scope
  – Google’s Tesseract is only viable open
    source option
  – Flurry of activity in 2006-2007, quiet since
• Rekeying is expensive given size of
  corpus
  – Will not scale
Name finding statistics

• 27.7 million pages scanned
• 70.4 million name strings found
• 56.2 million names verified with a
  NameBankID
• 1.4 million unique names with a
  NameBankID
• 3.3 million unique names *without* a
  NameBankID
  – This is where the interesting data live!!!
http://www.biodiversitylibrary.org/name/Physeter_catodon
PDF Generation Stats
Mandate for new development

• display / manage articles

• meet community demands for
  bibliography / citation management

• build from more open source tools
Development goals re: citations

• Create a repository for community-vetted
  taxonomic bibliographies.
• Ability to ingest, display, download, and
  index articles so that the BHL can operate
  as an article repository.
• Build from existing community of work
  around Drupal / Biblio.
  – In use by collaborators
http://www.citebank.org
http://citebank.org/search
http://citebank.org/node/47423
Services
•    OpenURL
    – Facilitate links to citations: protologues, articles, references
       • Documentation:
            http://www.biodiversitylibrary.org/openurlhelp.aspx
•    Names Service
    – Return all occurrences of a name throughout BHL digitized
       corpus
       • Documentation: http://bit.ly/2e6sg9
    – Access to 51million name strings using TaxonFinder
            – 1.4million unique names
    – Working out a strategy for obscure species
    – Algorithm improvements to detect nomenclatural & taxonomic
       acts
•    New API
Services: OpenURL




                                          http://www.biodiversitylibrary.org/openurl?
                         pid=title:3934&volume=14&issue=&spage=301&date=1879




http://www.tropicos.org/Name/1200408
Services: OpenURL Disambiguation

• Looking for:

• BHL returns:
Services: OpenURL Results
EOL Interfaces
   Taxonomic name finding enhancements
  – Nomenclatural acts in web services
  – Other algorithms / verification
• WoRMS data
• Improvement
  – Ranking results
  – Visualization
• LifeDesks
  – Bibliography sharing
  – Resolve to articles
Thank You Tom
We welcome your input and advice.
Tom Garnett
Biodiversity Heritage Library Program
  Director
garnettt@si.edu
202-633-2238

More Related Content

What's hot

Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
ijtsrd
 

What's hot (8)

The Social Daily Activity Correlation of Olive Baboon (Papio Anubis) in Gash...
The Social Daily Activity Correlation of Olive Baboon (Papio  Anubis) in Gash...The Social Daily Activity Correlation of Olive Baboon (Papio  Anubis) in Gash...
The Social Daily Activity Correlation of Olive Baboon (Papio Anubis) in Gash...
 
Use of Radioactive Isotope in Tropical Fish feeding
Use of Radioactive Isotope in Tropical Fish feedingUse of Radioactive Isotope in Tropical Fish feeding
Use of Radioactive Isotope in Tropical Fish feeding
 
Jacques Benveniste - A TRUE LEGEND AMONG MYTHS
Jacques Benveniste - A TRUE LEGEND AMONG MYTHSJacques Benveniste - A TRUE LEGEND AMONG MYTHS
Jacques Benveniste - A TRUE LEGEND AMONG MYTHS
 
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
Actinobacterial Diversity of Machilipatnam Coast India with an Emphasis on No...
 
Building the Atlas of Living Australia
Building the Atlas of Living AustraliaBuilding the Atlas of Living Australia
Building the Atlas of Living Australia
 
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
Iokiñe Rodriguez: Reframing the fire narrative in Canaima National Park, Vene...
 
Jmb
Jmb  Jmb
Jmb
 
Shravan Shetty on Biology Careers
Shravan Shetty on Biology Careers Shravan Shetty on Biology Careers
Shravan Shetty on Biology Careers
 

Viewers also liked (9)

Ux General V8 0
Ux General V8 0Ux General V8 0
Ux General V8 0
 
Qwizdom - Hcf lcm qwizdom
Qwizdom  - Hcf lcm qwizdomQwizdom  - Hcf lcm qwizdom
Qwizdom - Hcf lcm qwizdom
 
Qwizdom - Proportion
Qwizdom  - ProportionQwizdom  - Proportion
Qwizdom - Proportion
 
Eol 2010 rapid response
Eol 2010 rapid responseEol 2010 rapid response
Eol 2010 rapid response
 
Qwizdom - Ratio qwizdom
Qwizdom  - Ratio qwizdomQwizdom  - Ratio qwizdom
Qwizdom - Ratio qwizdom
 
#1 to Sell For 2008
#1 to Sell For 2008#1 to Sell For 2008
#1 to Sell For 2008
 
#1 to Sell For 2009
#1 to Sell For 2009#1 to Sell For 2009
#1 to Sell For 2009
 
Qwizdom - Higher indices qwizdom 10 r1
Qwizdom  - Higher indices qwizdom 10 r1Qwizdom  - Higher indices qwizdom 10 r1
Qwizdom - Higher indices qwizdom 10 r1
 
Qwizdom - Year 9 percentages
Qwizdom  - Year 9 percentagesQwizdom  - Year 9 percentages
Qwizdom - Year 9 percentages
 

Similar to Biodiversity Heritage Library

Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
ICZN
 
Bhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationBhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaboration
tgarnett
 
Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371
Geonyzl Alviola
 
Evolution, Natural Selection, and Speciation
Evolution, Natural Selection, and SpeciationEvolution, Natural Selection, and Speciation
Evolution, Natural Selection, and Speciation
cgales
 
Anphibian biology and husbandry
Anphibian biology and husbandryAnphibian biology and husbandry
Anphibian biology and husbandry
andreafuentesarze
 

Similar to Biodiversity Heritage Library (20)

Intro to aDNA and bioarchaeology
Intro to aDNA and bioarchaeologyIntro to aDNA and bioarchaeology
Intro to aDNA and bioarchaeology
 
Archaeology its correlation with other subjects
Archaeology its correlation with other subjectsArchaeology its correlation with other subjects
Archaeology its correlation with other subjects
 
Conservation biology
Conservation biologyConservation biology
Conservation biology
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010
 
Evidence of Evolution
Evidence of EvolutionEvidence of Evolution
Evidence of Evolution
 
EOL Intro
EOL IntroEOL Intro
EOL Intro
 
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
 
Lec 3 socio
Lec 3 socioLec 3 socio
Lec 3 socio
 
Bhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationBhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaboration
 
Sbc174 evolution2014 week3
Sbc174 evolution2014 week3Sbc174 evolution2014 week3
Sbc174 evolution2014 week3
 
Living world in space and time
Living world in space and timeLiving world in space and time
Living world in space and time
 
Living world in space and time
Living world in space and timeLiving world in space and time
Living world in space and time
 
introduction to biology .chapter 1 (1st year )
introduction to biology .chapter 1 (1st year )introduction to biology .chapter 1 (1st year )
introduction to biology .chapter 1 (1st year )
 
Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371Evolution Natural Selection And Speciation 6371
Evolution Natural Selection And Speciation 6371
 
Evolution, Natural Selection, and Speciation
Evolution, Natural Selection, and SpeciationEvolution, Natural Selection, and Speciation
Evolution, Natural Selection, and Speciation
 
ZOO1-Branches of biology
ZOO1-Branches of biologyZOO1-Branches of biology
ZOO1-Branches of biology
 
Lec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyLec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_society
 
Anphibian biology and husbandry
Anphibian biology and husbandryAnphibian biology and husbandry
Anphibian biology and husbandry
 
Lec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_societyLec 3 anthropolgical_foundations_of_society
Lec 3 anthropolgical_foundations_of_society
 
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
Bs2081 Heslop-Harrison Summary Lecture Ecology and Biodiversity - Agricultura...
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Biodiversity Heritage Library

  • 1. The Biodiversity Heritage Library: Liberating the World’s Biodiversity Literature Thomas Garnett EOL Fellows March 2010
  • 2. BHL- Why? The cited half-life of publications in taxonomy is longer than in any other scientific discipline -Macro-economic case for open access, Tom Moritz -Current taxonomic literature often relies on texts and specimens > 100 years old. Levinus Vincent Elenchus tabularum, pinacothecarum, 1719 . 2
  • 3. BHL – Why? The Taxonomic Impediment “The taxonomic impediment is a term that describes the gaps of knowledge in our taxonomic system” - Darwin Declaration, 1998 Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux), 1799-1808 3
  • 4.
  • 5. BHL Members: US/UK • Academy of Natural Science (Philadelphia, PA) • American Museum of Natural History (New York, NY) • California Academy of Science (San Francisco, CA) • The Field Museum (Chicago, IL) • Harvard University Botany Libraries (Cambridge, MA) • Harvard University, Ernst Mayr Library of the Museum of Comparative Zoology (Cambridge, MA) • Marine Biological Laboratory / Woods Hole Oceanographic Institution (Woods Hole, MA) • Missouri Botanical Garden (St. Louis, MO) • Natural History Museum (London, UK) • The New York Botanical Garden (New York, NY) • Royal Botanic Gardens, Kew (Richmond, UK) • Smithsonian Institution Libraries (Washington, DC)
  • 6. BHL Members: BHL-Europe • Museum für Naturkunde - Leibniz-Institut • Stichting Nationaal Natuurhistorisch für Evolutions- und Museum, Naturalis Biodiversitätsforschung an der Humboldt- • National Botanic Garden of Belgium Universität zu Berlin • Royal Museum for Central Africa, • Natural History Museum, UK • Royal Belgian Institute of Natural • Narodni muzeum NMP CZ Sciences • Angewandte Informationstechnik • Bibliothèque nationale de France Forschungsgesellschaft mbH • Museum national d’histoire naturelle • Freie Universität Berlin FUBBGBM • Consejo Superior de Investigaciones • Georg-August-Universität Göttingen Cientificas Stiftung Öffentlichen Rechts • Università degli Studi di Firenze • Naturhistorisches Museum Wien • Royal Botanic Garden, Edinburgh • Hungarian Natural History Museum • Species 2000 • Museum and Institute of Zoology, Polish • John Wiley & Sons limited Academy of Sciences • University of Copenhagen • Helsingin yliopisto UH-Viikki
  • 7. BHL Members: BHL-China • Chinese Academy of Science – Institute of Botany • Chinese Academy of Science – Institute of Zoology • Chinese Academy of Science – Institute of Microbiology • Chinese Academy Science - Institute of Oceanography
  • 8. BHL is a Focused Program • Though BHL has is composed of libraries it has been a domain-specific program, not just a digital library project. It arose from and is responsive to the biodiversity community composed of the disciplines of taxonomy, systematics, evolutionary biology, ecology, conservation, and wildlife management. These are the primary audience.
  • 9. Biomechanics Biochemistry Biomagnetism Core Bioelectronics Zoos Radioecology Bioacoustics Supporting Petrology Agricultural ecology Sedimentation Paleontology Biogeomorphology Orogeny Geophysics Microscopy BioclimatologyForestry Restoration Geochemistry History of Scientific drawing ecology Taxidermy Stratigraphy Natural sciences& illustration Soil science Vivariums, Animal biochemistry Aquaculture terrariums, Geomicrobiology Natural History – Animal culture Medical botany / zoology aquariums Terminology, Abbrv. Cyanobacteria Geomorphology Immunology Specimen catalogs Natural History – Toponymy Ecophysiology Dictionaries & Encyclopedias animal Wile trade Physical geography Collection & Natural History – Virology preservation Biographies Environmental Mineralogy Continental drift Natural History – Policy Socio-cultural Plate tectonics Directories Anthropology Oceanography Economic botany Environmental Plant Culture Microbial ecology Management Geobiology Ethnology History of discoveries, Seismology Biophysics Exploration & travelBioluminescence Hydrology Plant lore Phenology Atlases & Gazeteers Cytology Wildlife conservation Genetics Melioration Coral Islands, Reefs & Atolls Physical Anthropology Fluid dynamics Topical terms Crops and climate Prehistoric archaeology Outliers Agricultural meteorology derived from LCSH
  • 10. Core Literature Botany Plant conservation Phytogeography Plant anatomy Plant physiology Plant ecology Spermatophyta, Phanerogams Cryptogams Biological diversity Evolution Phylogenetic relationships Evolutionary genetics Scientific voyages and expeditions Pre-Linnaean works Linnaean works Biodiversity conservation Conservation biology Ecosystem management Endangered species & ecosystems Extinction Classification, Nomenclature Biogeography Zoology/Botany--Morphology Zoology/Botany--Anatomy Zoology/Botany--Embryology Zoology/Botany-- Reproduction Zoology/Botany--Geographical distribution Classification, systematics and taxonomy Zoology Invertebrates Chordates Vertebrates Animal Behavior
  • 11. Stats: Now Online • 70,630 volumes • 26.4 million pages Oldest book: Schöffer’s Herbarius, 1484.
  • 12. What is the plan? Digitize the core literature of biodiversity. Full works, not bits & pieces. Open Access: all content can be repurposed, reused, reformatted. Congruent: must fit in to a dynamic knowledge ecology. Scan public domain biodiversity literature. Negotiate rights to digitize copyrighted materials. Ingest content digitized by others. Provide interfaces & APIs for repository. GUIs Services for data mining & citation resolution
  • 13. BHL Digital Preservation • Committed to long-term storage, curation, and preservation of digital text assets for the world-wide biodiversity community • BHL is a steward for this literature. • To keep this content available and open for the future requires careful organizational planning. • Preservation is both a technical and political/social process.
  • 14. BHL Relationship with Non-Profit Journal Publishers Opt in Copyright Model: The BHL works with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals BHL indexes the articles using Taxonomic Intelligence, thereby vastly increasing their usability. Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century . 73 Permission Agreements to date. More under negotiation. Integration with gray literature in later phases of project.
  • 16. Scan & Store: Internet Archive Storage in Petaboxes Scanning on Scribes
  • 17.
  • 18. Referrers: 1 Jan 08 – 31 Jan 10 Jan 1, 2008 – Jan 31, 2010
  • 19.
  • 20. Name Finding via TaxonFinder
  • 21. SOAP response Name finding via TaxonFinder Submit Extract names to NameBank Image from Scanner Converted to text OCR via OC OCR OCR Name Finding in action with Taxonomic Intelligence…
  • 22. OCR error rate for names only Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. Top OCR errors 1 Insert Space 8 n->v 35.16% 2 Omit Space 9 l->i 3 e->c 10 r->i 4 u->I 11 u->ii 5 u->n 12 h->l 6 i->l 13 h->ii 7 c->e 14 e->o
  • 23. Considerations • Improving OCR software is out of scope – Google’s Tesseract is only viable open source option – Flurry of activity in 2006-2007, quiet since • Rekeying is expensive given size of corpus – Will not scale
  • 24. Name finding statistics • 27.7 million pages scanned • 70.4 million name strings found • 56.2 million names verified with a NameBankID • 1.4 million unique names with a NameBankID • 3.3 million unique names *without* a NameBankID – This is where the interesting data live!!!
  • 26.
  • 27.
  • 28.
  • 29.
  • 31. Mandate for new development • display / manage articles • meet community demands for bibliography / citation management • build from more open source tools
  • 32. Development goals re: citations • Create a repository for community-vetted taxonomic bibliographies. • Ability to ingest, display, download, and index articles so that the BHL can operate as an article repository. • Build from existing community of work around Drupal / Biblio. – In use by collaborators
  • 36.
  • 37. Services • OpenURL – Facilitate links to citations: protologues, articles, references • Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspx • Names Service – Return all occurrences of a name throughout BHL digitized corpus • Documentation: http://bit.ly/2e6sg9 – Access to 51million name strings using TaxonFinder – 1.4million unique names – Working out a strategy for obscure species – Algorithm improvements to detect nomenclatural & taxonomic acts • New API
  • 38. Services: OpenURL http://www.biodiversitylibrary.org/openurl? pid=title:3934&volume=14&issue=&spage=301&date=1879 http://www.tropicos.org/Name/1200408
  • 39. Services: OpenURL Disambiguation • Looking for: • BHL returns:
  • 41. EOL Interfaces Taxonomic name finding enhancements – Nomenclatural acts in web services – Other algorithms / verification • WoRMS data • Improvement – Ranking results – Visualization • LifeDesks – Bibliography sharing – Resolve to articles
  • 42. Thank You Tom We welcome your input and advice. Tom Garnett Biodiversity Heritage Library Program Director garnettt@si.edu 202-633-2238