3. I want to be able at anytime, anywhere to access, mine and analyse a
significant body of published and digitized taxonomic knowledge.
I want to build by machine the catalogue of life.
I hope taxonomiy communications arrives in the 21st century
Vision and hope
4. 1. The demand
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the
only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s
digital library, access
to this body of
literature is worldwide,
and it is actively used
(>10,000 visits in one
month only).
2004
6. Build and establish a TreatmentBank, such as Plazi, as basis for
content mining of and linking to the taxonomic literature
3. The core corpus of taxonomic knowledge: Treatments
7. 4. Make use of the semantic linked WWW
Avoid all the waistful actual publishing!
• Publish structured data
• Publish open access
• Make taxonomic literature first class literature by minting
DOIs and making digital copies accessible
• Add links to names, treatments, articles, DNA sequences,
digital objects
• Help by building your own public corpus of citable data
Pensoft journals (e.g. Biodiversity Data Journal, Zookeys,
Phytokeys) are the gold standard.
19. Text mining tools: Visualization of treatment content
Summary of content of 37 Zootaxa spider publications and 8
Biodiversity Data Journal. (Miller et al., 2015)
20. Pseudomyrmex ants and Vachellia ant-acacias
are a classic example of mutualism in biology.
allenii
melanoceras
ruddiae
chiapensis
collinsii
cookii
cornigera
globulifera
hindsii
janzenii
mayana
sphaerocephala
boopis
flavicornis
hesperius
ita
janzeni
kuenckeli
mixtecus
nigrocinctus
nigropilosus
opaciceps
particeps
peperi
reconditus
satanicus
simulans
spinicola
subtilissimus
veneficus
ferrugineus
gentlei
gracilis
Transbiotic link network
Associated species linked through
references in taxonomic treatments
Acacia-ant species: Pseudomyrmex gracili
Treatment: redescription
Associated ant-acacia: Acacia gentlei
Ants Plants
Photocredits: Alex Wild
Treatment
Treatments linked
through citations
Text mining tools: Visualization of treatment content
21. What does this mean?
The Linking Open Data cloud diagram
Linked Open Data Cloud
22. The demand: scientists and citizen scientists
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the
only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s
digital library, access
to this body of
literature is worldwide,
and it is actively used
(>10,000 visits in one
month only).
Online catalogue
Open access
Online library
28. The bristlemouths are a rapacious
family of deep-sea fishes that include
the wildly successful genus
Cyclothone
In contrast, ichthyologists put the
likely figure for bristlemouths at
hundreds of trillions — and perhaps
quadrillions, or thousands of
trillions.
29. The bristlemouths are a rapacious
family of deep-sea fishes that include
the wildly successful genus
Cyclothone
34. Get a copy of the Cyclothone paper
Our contribution for a better understanding of biodiversity
35. Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire
body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages. Source: (Agosti 2005)
Access
36. • Limited access (copyright)
• Limited discoverability of content
• Research results cannot be cited
• Data mining does not work
Issues of access
37. Provide an open access, linked corpus of taxonomic literature
A solution
57. Treatment: a well defined part of an article that
defines the particular usage of a scientific name
by an authority at a given time (a page(s) in a
publication).
Treatment
The special case taxonomic literature: The citated elements are
treatments, not article
Formica obsoleta Linnaeus, 1758: 580
61. Treatment
Citing of treatments or linking of treatments to treatments
By minting persistent httpURIs for treatments, treatments
can be cited like a bibliographic reference
http://treatment.plazi.org/id/A9FFD1FC-4629-FFB4-968F-AD38386521BA
62. Status quo
• 50,000+ treatments life, daily growth
• RDF in Betaversion
• GoldenGate Imagine (PDF and text mining tool) in betaversion
• Provider for data for NCBI, Wikidata, GBIF, EOL, antweb
• Biodiversity Literature Repository functional
64. Next steps
Planned collaboration with ContentMine to extract treatments on a
daly bases
http://www.slideshare.net/petermurrayrust/?
BioDiv
65. Next steps
• Collaborate with ContentMine to extract 50 treatments/day
• 1 Million treatments life
• RDF Version accessibl
• GoldenGate Imagine (Text mining tool)
• Provider für Daten für NCBI, GBIF, EOL, antweb
• Biodiversity Literature Repository mit 100,000 bibliographic
references and digital copies (PDF, images, etc.)
67. Next steps
Avoid all this waste (our next generation will have to clean up)!
Publish structured data
Publish open access
Publish in journals with DOI
Add links to names, treatments, articles, DNA sequences, digital
objects
Help build your own corpus of citable data
Pensoft journals (e.g. Biodiversity Data Journal, Zookeys,
Phytokeys) are the gold standard.