SlideShare une entreprise Scribd logo
1  sur  24
iPlant's Taxonomic Name Resolution
               Service

            Naim Matasci
    BIO5 / The iPlant Collaborative

           tnrs.iplantc.org
What is iPlant?
Empowering a New Plant Biology
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
TMU* Growth of Biological Collections
                              (1600 – 2012)
            600,000,000




            500,000,000




            400,000,000
Specimens




            300,000,000




            200,000,000




            100,000,000




                     0
                          1600 1620 1640 1660 1680 1700 1720 1740 1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000 2020




                   *TMU: Totally Made Up
If you can't find it, it doesn't exist
Data Reuse

• What's the correlation between leaf
  morphology and leaf economy (R. Walls)?
• Evolution of pit domatia (M. Donoghue)
iPlant Data Store

• Based on iRODS
  – Metadata driven
  – Storing, Sharing and Distributing
• Redundant (mirrors at TACC and UoA)
• Really, really, really big (6 PB + 40 PB LTS)
• Really, really, really fast
iPlant Data Store Performance
                                    UC Berkeley to iDS
                               100GB: 29m15s
                            1 GB / 17.5 seconds
      Source                 Destination              Copy Method               Time (seconds)
      CD                     Desktop PC               cp                        320
      Berkeley Server        Desktop PC               scp                       150
      External Drive         Desktop PC               cp                        36
      USB 2.0 Flash          Desktop PC               cp                        30
      iDS                    Desktop PC               iget                      18
      Desktop PC             Desktop PC               cp                        15

Desktop PC (UA): Mac with 7.2K Internal Hard Drive
External Drive: USB 2.0: 5.4k Hard Drive
Flash Drive: USB 2.0 Patriot XT

    https://pods.iplantcollaborative.org/wiki/display/start/How+fast+is+the+iPlant+Data+Store
PhytoBisque features
• Rich internet application (completely web based)
• Draws upon features from popular large scale photo
  sharing sites and high resolution aerial imagery (google
  maps)
• Ability to import and export over 100+ image formats,
  movies
• Ability to import extremely large image sets using iPlant
  data store
• Can display 20Kx20K image using standard web browser
• Manage data sets with tags, metadata management
• Utilizes distributed computing (connected to iPlant
  execute environment)
Taxonomic uncertainty

1. Non-existent names
  •   Misspellings
  •   Contamination
      •   Annotations
      •   Morphospecies
      •   Digitization issues (frame shifts, character
          encoding)Lexical variants (digitization conventions)
2. Synonymy
  •   Nomenclatural synonyms
  •   Taxonomic synonyms / concepts
3. Misidentifications, incomplete identifications
Non-existent names:
                    Herbarium specimens

Total specimens:                                                         1.1 million

Unique species names:                                                       53,052

Published names (legitimate & illegitimate):                                44,532

Misspelled names:                                                    9371 (18%)

Specimens with misspelled names:                                   101,237 (9%)




*New World plant specimens, 34 herbaria, simple match against IPNI and
                                          TROPICOS, excluding authors
Taxonomic Name Resolution Service

• Computer assisted standardization of plant
  names
• Corrects spelling errors and alternative
  spellings to a standard list of names
• Convert out-of-date names to currently
  accepted names
Future

• More sources
  – Standard source import with DwC support
• Better performance
• TNRastic API
• Integration with Global Names components
• Web: http://tnrs.iplantc.org/
• Code:
  https://github.com/iPlantCollaborativeOpenS
  ource/TNRS
• API (provisional): http://goo.gl/XnUiH
• TNRastic API: http://goo.gl/Z7Fkc
Brad Boyle                                  Paul Morris (Harvard University)
Brian Enquist                               Alan Paton (Kew Royal Botanic Gardens
Juan Antonio Raygoza Garay                  and their International Plant Names Index)
Nicole Hopkins                              Tony Rees (Commonwealth Scientific and
Zhenyuan Lu                                 Industrial Research Organisation)
Martha Narro                                Michael Giddens (www.silverbiology.com)
Shannon Oliver                              Dmitry Mozzherin (Global Biodiversity
William Piel                                Information Facility)
Jill Yarmchuk                               David Remsen (Global Biodiversity
                                            Information Facility)
Bob Magill (Missouri Botanical Garden)      David Patterson (Encyclopedia of Life)
Chris Freeland (Missouri Botanical          Cam Webb (Harvard University)
Garden)
Chuck Miller (Missouri Botanical Garden)    Missouri Botanical Garden (Tropicos)
Peter Jorgensen (Missouri Botanical
Garden)                                     Funding provided by the National Science
Amy Zanne (University of Missouri, St.      Foundation Plant Cyberinfrastructure
Louis)                                      Program (grant #DBI-0735191).
Peter Stevens (Missouri Botanical Garden)
Jay Paige (Missouri Botanical Garden)
Bob Peet (University of North Carolina at
Chapel Hill)

Contenu connexe

Tendances

10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?Tony Rees
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responsesRoderic Page
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK Cyndy Parr
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...Phoenix Bioinformatics
 
Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...boussau
 
Models of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolutionModels of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolutionboussau
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...GigaScience, BGI Hong Kong
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilitiesmkim8
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in UberonChris Mungall
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013millerjeremya
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to BiotechnologyDoug Jones
 

Tendances (20)

10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responses
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...
 
Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...
 
Models of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolutionModels of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolution
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
 
Shorthouse
ShorthouseShorthouse
Shorthouse
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Building Data
Building DataBuilding Data
Building Data
 
Zfin
ZfinZfin
Zfin
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to Biotechnology
 

En vedette

Trabajo de steve jobs
Trabajo de steve jobsTrabajo de steve jobs
Trabajo de steve jobssilviafercor
 
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life SciencesThe iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life SciencesNaim Matasci
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic WorkflowsNaim Matasci
 

En vedette (7)

Ab680000
Ab680000Ab680000
Ab680000
 
Trabajo de steve jobs
Trabajo de steve jobsTrabajo de steve jobs
Trabajo de steve jobs
 
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life SciencesThe iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
 
Practica
PracticaPractica
Practica
 
Ab680000
Ab680000Ab680000
Ab680000
 
Liliana
LilianaLiliana
Liliana
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic Workflows
 

Similaire à iPlant TNRS for digital collections - iDigBio Workshop

Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitNaim Matasci
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitNaim Matasci
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10PICNIC Festival
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...FOODCROPS
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Vince Smith
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientistsCyndy Parr
 

Similaire à iPlant TNRS for digital collections - iDigBio Workshop (20)

Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientists
 

Plus de Naim Matasci

iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3Naim Matasci
 
Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliationNaim Matasci
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of LifeNaim Matasci
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses WorkflowNaim Matasci
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic WorkflowsNaim Matasci
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsNaim Matasci
 

Plus de Naim Matasci (7)

iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3
 
iPlant TNRS
iPlant TNRSiPlant TNRS
iPlant TNRS
 
Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliation
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of Life
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses Workflow
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic Workflows
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for Plants
 

Dernier

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 

Dernier (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

iPlant TNRS for digital collections - iDigBio Workshop

  • 1. iPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative tnrs.iplantc.org
  • 3.
  • 4.
  • 5. Empowering a New Plant Biology
  • 7. TMU* Growth of Biological Collections (1600 – 2012) 600,000,000 500,000,000 400,000,000 Specimens 300,000,000 200,000,000 100,000,000 0 1600 1620 1640 1660 1680 1700 1720 1740 1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000 2020 *TMU: Totally Made Up
  • 8. If you can't find it, it doesn't exist
  • 9.
  • 10. Data Reuse • What's the correlation between leaf morphology and leaf economy (R. Walls)? • Evolution of pit domatia (M. Donoghue)
  • 11. iPlant Data Store • Based on iRODS – Metadata driven – Storing, Sharing and Distributing • Redundant (mirrors at TACC and UoA) • Really, really, really big (6 PB + 40 PB LTS) • Really, really, really fast
  • 12. iPlant Data Store Performance UC Berkeley to iDS 100GB: 29m15s 1 GB / 17.5 seconds Source Destination Copy Method Time (seconds) CD Desktop PC cp 320 Berkeley Server Desktop PC scp 150 External Drive Desktop PC cp 36 USB 2.0 Flash Desktop PC cp 30 iDS Desktop PC iget 18 Desktop PC Desktop PC cp 15 Desktop PC (UA): Mac with 7.2K Internal Hard Drive External Drive: USB 2.0: 5.4k Hard Drive Flash Drive: USB 2.0 Patriot XT https://pods.iplantcollaborative.org/wiki/display/start/How+fast+is+the+iPlant+Data+Store
  • 13. PhytoBisque features • Rich internet application (completely web based) • Draws upon features from popular large scale photo sharing sites and high resolution aerial imagery (google maps) • Ability to import and export over 100+ image formats, movies • Ability to import extremely large image sets using iPlant data store • Can display 20Kx20K image using standard web browser • Manage data sets with tags, metadata management • Utilizes distributed computing (connected to iPlant execute environment)
  • 14. Taxonomic uncertainty 1. Non-existent names • Misspellings • Contamination • Annotations • Morphospecies • Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) 2. Synonymy • Nomenclatural synonyms • Taxonomic synonyms / concepts 3. Misidentifications, incomplete identifications
  • 15. Non-existent names: Herbarium specimens Total specimens: 1.1 million Unique species names: 53,052 Published names (legitimate & illegitimate): 44,532 Misspelled names: 9371 (18%) Specimens with misspelled names: 101,237 (9%) *New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors
  • 16. Taxonomic Name Resolution Service • Computer assisted standardization of plant names • Corrects spelling errors and alternative spellings to a standard list of names • Convert out-of-date names to currently accepted names
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Future • More sources – Standard source import with DwC support • Better performance • TNRastic API • Integration with Global Names components
  • 23. • Web: http://tnrs.iplantc.org/ • Code: https://github.com/iPlantCollaborativeOpenS ource/TNRS • API (provisional): http://goo.gl/XnUiH • TNRastic API: http://goo.gl/Z7Fkc
  • 24. Brad Boyle Paul Morris (Harvard University) Brian Enquist Alan Paton (Kew Royal Botanic Gardens Juan Antonio Raygoza Garay and their International Plant Names Index) Nicole Hopkins Tony Rees (Commonwealth Scientific and Zhenyuan Lu Industrial Research Organisation) Martha Narro Michael Giddens (www.silverbiology.com) Shannon Oliver Dmitry Mozzherin (Global Biodiversity William Piel Information Facility) Jill Yarmchuk David Remsen (Global Biodiversity Information Facility) Bob Magill (Missouri Botanical Garden) David Patterson (Encyclopedia of Life) Chris Freeland (Missouri Botanical Cam Webb (Harvard University) Garden) Chuck Miller (Missouri Botanical Garden) Missouri Botanical Garden (Tropicos) Peter Jorgensen (Missouri Botanical Garden) Funding provided by the National Science Amy Zanne (University of Missouri, St. Foundation Plant Cyberinfrastructure Louis) Program (grant #DBI-0735191). Peter Stevens (Missouri Botanical Garden) Jay Paige (Missouri Botanical Garden) Bob Peet (University of North Carolina at Chapel Hill)

Notes de l'éditeur

  1. Bringing a culture of computing to the Plant Sciences.