Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Botanists and annotations: use cases and their relevance for the larger scientific community
1. Botanists and Annotations:
use cases and their relevance
for the larger scientific community
William Ulate
Trish Rose-Sandler
Center for Biodiversity Informatics
Missouri Botanical Garden
Jun. 2018
2. Where do we come from?
Why are we here?
I Annotate Conference , Berlin (2016)
The uptake of web annotation could be sufficiently
moved forward by tackling three key issues:
1) interoperability
2) domain use cases
3) user centered design
3. Darwin Virtual Library (2011)
Charles Darwin’s Library is a digital edition and virtual
reconstruction of the surviving books owned by Charles
Darwin.
https://www.biodiversitylibrary.org/collection/darwinlibrary
Charles Darwin’s Library is a digital edition and virtual
reconstruction of the surviving books owned by Charles
Darwin.
In 1908, Charles Darwin’s son, Francis, transferred what he
called the ‘Darwin Library’ to the Botany School at
Cambridge University.
‘The chief interest of the Darwin books lies in the pencil notes
scribbled on their pages, or written on scraps of paper and
pinned to the last page.’ – Francis Darwin
Darwin read to gather evidence, to explore and define the
research possibilities of his evolutionary ideas, and to gauge
reactions to his own publications.
https://www.biodiversitylibrary.org/collection/darwinlibrary
Charles Darwin’s Library is a digital edition and virtual
reconstruction of the surviving books owned by Charles
Darwin.
In 1908, Charles Darwin’s son, Francis, transferred what he
called the ‘Darwin Library’ to the Botany School at
Cambridge University.
https://www.biodiversitylibrary.org/collection/darwinlibrary
Charles Darwin’s Library is a digital edition and virtual
reconstruction of the surviving books owned by Charles
Darwin.
In 1908, Charles Darwin’s son, Francis, transferred what he
called the ‘Darwin Library’ to the Botany School at
Cambridge University.
‘The chief interest of the Darwin books lies in the pencil notes
scribbled on their pages, or written on scraps of paper and
pinned to the last page.’ – Francis Darwin
https://www.biodiversitylibrary.org/collection/darwinlibrary
Charles Darwin’s Library is a digital edition and virtual
reconstruction of the surviving books owned by Charles
Darwin.
In 1908, Charles Darwin’s son, Francis, transferred what he
called the ‘Darwin Library’ to the Botany School at
Cambridge University.
‘The chief interest of the Darwin books lies in the pencil notes
scribbled on their pages, or written on scraps of paper and
pinned to the last page.’ – Francis Darwin
Darwin read to gather evidence, to explore and define the
research possibilities of his evolutionary ideas, and to gauge
reactions to his own publications.
This digital reconstruction of the Darwin Library delivers is
the ability to retrace and reduplicate Darwin’s reading of a
wealth of materials.
https://www.biodiversitylibrary.org/collection/darwinlibrary
7. Mining Biodiversity
• Transform BHL into a next-generation social
digital library
• A multi-disciplinary approach
– Text Mining
– Machine learning
– History of Science
– Environmental History & Studies
– Library and Information Science
– Social Media
This project was made possible in part by the Institute of Museum and Library Services [LG-00-14-04-0032-14].
http://miningbiodiversity.com
9. What’s wrong with
keyword-based search: Polysemy
•Ambiguity!
Boxwood
historic place in
Alabama?
North American term for
plants in the Buxaceae
family?
California bay
hardwood
tree?
location?
10. What’s wrong with
keyword-based search: Synonymy
Campanula
portenschlagiana Schult.
Campanula
portenschlagiana Schult.
Campanula affinis
Rchb. ex Nyman
Campanula muralis
Port ex. A. DC.
18. Text Annotation Use Cases
Annotator Use Case: I am a contributing
participant, adding or curating annotations in
the Biodiversity Digital Library.
Searcher Use Case: I am an user of the
Biodiversity Digital Library, searching for content
that is indexed by annotations
Admin Use Case
19. Annotator Use Case
• Add an annotation by selecting text
• Conveniently select an appropriate annotation (autocomplete, dropdown menu)
• “Cross out” an annotation (eg: a homonym) and toggle showing it.
• Modify which text is selected and/or change the annotation term associated with
my own or a pre-existing annotation.
• Confirm or agree with an existing annotation.
• Show measure of certainty on an annotation, either a count of how many people
agree, or just “Confirmed” versus “Still in need of review”
• Easily browse existing annotations in a document (using the tab or next button)
• Browse annotations filtered by their status (confirmed, crossed out, review)
• Find documents by annotation status.
• Find documents that interest me (combine the solution above with search or
filter by other document metadata (keyword, title, author, etc.)
20. Searcher Use Case
• Discover annotation terms to search by (autocomplete, drop down menu,
browsable tree of terms)
• Navigate to locations in documents from my search (search results show
truncated text found and a link to the location of the annotated text)
• Download search results (several columns: annotation term; the chunk of text
containing the annotation; URL to the location of the annotated text)
• Search for documents containing combinations of terms
• Search for combinations of terms in proximity to each other in the text.
• Search for facts based on semantic combinations or relative positions of terms
(eg: “Leptinotarsa” “feed on” ?)
• Retrieve search results for associated terms. Asking for water bodies, should
return rivers, bays, lakes, seas, etc. Asking for butterflies, should get all the
Lepidoptera species.
23. Search by facets
Opisthoproctus soleatus reported between 1840 and 1950
filtered by Habitat, Morphology and
Reproduction annotations.
• Taxonomy (73)
• Geography (18)
• Habitat (61)
• Traits (57)
- Morphology(20)
- Feeding (35)
- Reproduction(10)
• Publication (73)
- Journal (21)
- Author (63)
-Collection (10)
24. Automatically generated questions
Opisthoproctus soleatus reported between 1840 and 1950
filtered by Habitat, Morphology and
Reproduction annotations.
there is no strong sentiment on whether this
functionality is something that is definitely useful
this is very relevant to their work (50%)
I can see how it can be useful but not currently (50%)
Ask a question
-Which species taxa are related to Opisthoproctus soleatus?
- In which geographical locations can I find Opisthoproctus soleatus?
- What other species are co-located with Opisthoproctus soleatus?
- In which environments does Opisthoproctus soleatus live?
- What other species are in the same habitat as Opisthoproctus soleatus?
- What are the characteristics of Opisthoproctus soleatus?
- What other species share the same characteristics of Opisthoproctus soleatus?
25. Searching by subject-verb-object
Leptinotarsa feeds on ? reported between 1840 and 1950
…they can see how the graph-based
visualization of results can be useful
but not for their current purposes …
26. Searching for directly associated concepts
I’m looking for Taxa/Geographic locations/Habitats/Traits
directly associated with Eltanin reported between 1840 and 1950.
this is very relevant to what they are doing (50%)
it might be useful but not for their current purposes (40%)
there is no strong indication of whether this
feature is definitely wanted by our respondents
27. Searching for indirectly associated concepts
I’m looking for Taxa indirectly associated with tarsier
via Geographic locations reported between 1840 and 1950.
they can see its benefits
but not to what they are currently doing (50%)
it will be definitely useful (26%)
28. Use Cases
1. Finding the original description (taxonomic research).
2. Finding host plants, for example (ecological research).
3. Finding illustrations and plates.
4. Finding taxon name usage instances (taxonomic
treatment, nomenclatural act).
5. Capturing spelling variants (orthographic variants).
6. Marking errors on versions of OCR/transcribed text.
7. Exposing semantic metadata (as a SPARQL endpoint).
8. Being able to access through APIs search functionalities.
9. Allowing users to highlight in text (keywords).
10. Allowing users to annotate concepts if incorrectly
recognized or missed.
29. Application to Query Expansion
• an interface for searching documents using a
species name as a query
• query is automatically expanded by retrieving
synonyms/semantically related names from
the term inventory
• documents mentioning all of the names in the
expanded query are returned
30. Term Inventory
• compilation of species names (flowering
plants, mammals, birds)
• acts as a thesaurus, as each name is linked to
its synonyms as well as other semantically
related names
• “semantically relatedness”: defined in terms
of a contextual similarity measure, computed
over the entire Digital Library corpus
35. Real Use Cases
May.25, 2016 :
On the etymology of the word "elephant" and
the origins of the word "tamarind", the "Indian
date"
Sketches of the natural history of Ceylon : with narratives and anecdotes
illustrative of the habits and instincts of the mammalia, birds, reptiles,
fishes, insects, etc. including a monograph of the elephant
https://twitter.com/WUlate/status/734805482536198144
36. Disqus
• Annotation functionality was made available as a trial within the
portal from December of 2015 through June 2016 as part of the
IMLS-funded Mining Biodiversity project.
• A social commenting tool that allowed users to add comments to
individual pages in a book and follow users and discussions about
those books.
• The following tasks were carried out:
1. Created Requirements document to outline the commenting tool
needs and how Disqus achieved them.
2. Coordinated with Disqus development staff to determine how best
to implement Disqus to meet those needs.
3. Tool built and implemented in Portal.
4. Extensive testing of the feature before launching the tool.
5. Developed User Tutorials and Outreach Content to announce the
feature to the public and provide training for its use.
37.
38. Disqus
• In 6 months, 188 individual annotations were received
and stored in Disqus repository.
• The tool was discontinued within BHL because it was
considered a proprietary tool that would not have
served well as a long term scalable solution and
customizations to the tool were limited and
annotations were stored on Disqus and not BHL servers
• The trial demonstrated a desire from users to actively
engage in the annotation process within a digital
library interface.
• Citizen scientists and librarians were among the most
active profiles in generating annotations.
39.
40. • The International Plant Names Index (IPNI) is a database of the
names and associated basic bibliographical details of seed plants,
ferns and lycophytes.
• Its goal is to eliminate the need for repeated reference to primary
sources for basic bibliographic information about plant names.
• The data are freely available and are gradually being standardized
and checked.
• IPNI is a dynamic resource, depending on direct contributions by all
members of the botanical community.
http://www.ipni.org
Why Botanists?
41. Why Botanists?
Botanico-Periodicum-Huntianum (1968)
Worldwide bibliography of periodicals
• 12,000 titles (45 languages)
• title abbreviations
• cross-referenced to other published
abbreviations and complete titles
• details of volumation and duration
• and other basic bibliographic data.
BPH-2 (2004)
Periodicals with Botanical Content
Second edition of B-P-H
Alphabetical title list (1665 – 2002)
33,000 titles from around the world
Agriculture, Agronomy, Bacteriology,
Biology, Biotechnology,
Botanical Bibliography and History,
Conservation, Ecology,
Environmental Science, Floriculture,
Forestry, Fruit growing,
Genetics and Plant breeding,
Geography, Horticulture,
Hydrobiology and Limnology,
Immunology and Toxicology,
Medical Mycology,
Microbiology and Microscopy,
Molecular biology, Palaeontology,
Pharmacology and Pharmacognosy,
Plant pathology and Vegetable crops, etc.
B-P-H/Supplementum (1991)
• 25,000 title entries arranged by title
• key to entries in both volumes.
• Citation abbreviations for all titles
• improved cross-referencing
• expanded thesaurus of title words
and their abbreviation equivalents
• included periodicals dealing with
biotechnology, molecular biology,
environmental studies and
conservation.
42. Landscape Review
New Media Consortium
• Horizon Report Library Edition
Few examples of adoption within Libraries
Except for:
• Australia’s Trove and
• Europeana Sounds Project
Lack of Available tools? No
44. Purpose
Analyze Web annotation needs of the
botanical community and develop a
prototype of how those needs may
be met within a digital library
platform
45. Results from this project will be useful to the
following audiences:
• Librarians looking to improve their virtual library by
enabling users to add value to their content.
• Botanists who want to enhance the corpus of their
digital library collection by augmenting knowledge
through the annotations provided.
• Developers who want to choose a tool to enable
annotations in their online solutions, particularly within
digital library platforms.
46. Deliverables:
• Needs Analysis Report with prioritized list of annotation
needs for users of a botanical virtual library.
• Feasibility Study with the evaluation of four open source
existing annotation tools based on their potential to
address the needs identified in the Analysis Report
• Proof of concept prototype installed within a virtual
library to demonstrate the functional capacity of one of
the evaluated tools
• Outcomes Assessment with next step recommendations
to propose a full-scale project adopting an annotation
tool as part of a virtual library.
47. Needs analysis report
Using case research approach,
• Interview 10 users of a botanical virtual
library from 5 separate institutions
• Answers will be analyzed and classified
by user type, purpose and function
48. Feasibility study
Four existing annotation tools will be thoroughly
evaluated against the needs analysis in order to
develop a feasibility study for how they could satisfy
botanists’ needs
digilib
49. Proof of concept prototype
RERUM will be integrated within a digital
library platform as proof-of-concept
50. Outcomes assessment and next steps
• Identify requisites, best practices, and
further developments for research project
• Identify appropriate partners
51. Interested in joining us?
Contact:
Trish Rose-Sandler trish.rose-sandler@mobot.org
William Ulate william.ulate@mobot.org