SlideShare une entreprise Scribd logo
1  sur  12
PhyloTastic: Names-Based
Phyloinformatic Data Integration
Rutger Vos
Re-use of phylogenetic knowledge
Currently, mostphylogenetic
knowledge is not easily re-used
due to a lack of:
• archiving;
• awareness of best practices;
• community-wide standards for
formatting data, naming entities,
and annotating data.
Most attempts at data re-use
seem to end in disappointment.
Nevertheless, we find many
positive examples of data re-use,
particularly those that involve
customized species trees
generated by grafting to, and
pruning from, a much larger
tree.
Phylomatic: automated re-use of
phylogenetic knowledge
• In a recent survey of practices of
re-use of phylogenetic knowledge,
Phylomatic was the most
frequently used method for
obtaining trees, e.g. in studies of
phylogenetic community structure.
• Phylomatictakes a set of input taxa
and extracts them from a
reference phylogeny by pruning
and grafting.
• The reference phylogeny is usually
APG-III
• Taxon names are matched
exactly or grafted on.
• Branch lengths are either retained
or modeled (bladj)
Phylotastic: generalizing and
modularizing phylomatic-ish functionality

Phylotastic was conceived by NESCent’sHackathons,
Interoperability, Phylogenies (HIP) working group and was initiated
by several dozen participants at a NESCent hackathon on June 48, 2012. A second hackathon took place at iPlant’s headquarters in
Tucson, Arizona on January 28 through February 1, 2013.
Phylotastic: a design
pattern for phylogenetic
data re-use
1.

Input list of names

2.

Controller queries TNRS
with list of names

3.

TNRS provides token with
redirect to results

4.

Controller gets TNRS
results

5.

Controller queries
Treestore for trees with
TNRS taxa

6.

Controller
POSTsTreestore
matches, GETssubtree
back

7.

Treestore (or proxy)
performs pruning and
grafting

8.

Annotated subtree is
returned
Cross-pollinations and spin-offs
TaxoSaurus: the PhyloTastic TNRS
• A simple, asynchronous, RESTful API that
communicates in JSON.
• Modular design: multiple taxonomies can be
ingested and queried
• Built around the iPlant TNRS service
• Available at taxosaurus.org
TaxoSaurus: the PhyloTastic TNRS
/submit - POST or GET a list of scientific names to
the service and retrieve a JSON token to access
results.
Parameters:
• query: newline separated list of scientific names.
OR
• file: a text file containing newline separated
scientific names.
• source (optional): a comma separated list of
taxonomic source ids (see /sources/list).
• code (optional): the abbreviation for one of the
nomenclature codes (ICZN/ICN/ICNB).
TaxoSaurus: the PhyloTastic TNRS
/retrieve/<token>- GET the result of a TNRastic query.
• Parameters: none
• Returns: a JSON object containing the accepted
names
/sources/list – GET a ranked list of available sources
• Parameters: none
• Returns: a JSON object containing the list of source
IDs
TaxoSaurus: the PhyloTastic TNRS
/sources/<source_id>- GET the details about a
particular source, or all sources if no ID specified
• Parameters: <source_id> or none

• Returns: a JSON object containing the source
details
/delete/<token>- GET or POST or DELETE. Cancels
a running job
• Parameters: <token>, the hash of the job to cancel
• Returns: a JSON object indicating success or an
error
TaxoSaurus: the PhyloTastic TNRS
Voettekst vullen: Invoegen|Koptekst en voettekst / Insert|Header & Footer

Contenu connexe

Tendances

20140327 rda plazi_final
20140327 rda plazi_final20140327 rda plazi_final
20140327 rda plazi_final
agosti
 

Tendances (20)

Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Research Objects @ HARMONY 2014
Research Objects @ HARMONY 2014Research Objects @ HARMONY 2014
Research Objects @ HARMONY 2014
 
Linking Data, Linking People
Linking Data, Linking PeopleLinking Data, Linking People
Linking Data, Linking People
 
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Bh14 ogo
Bh14 ogoBh14 ogo
Bh14 ogo
 
Nucleic acid database
Nucleic acid databaseNucleic acid database
Nucleic acid database
 
Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Structural databases
Structural databases Structural databases
Structural databases
 
20140327 rda plazi_final
20140327 rda plazi_final20140327 rda plazi_final
20140327 rda plazi_final
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies
 
Cataloging Taxonomic Data
Cataloging Taxonomic DataCataloging Taxonomic Data
Cataloging Taxonomic Data
 
Journal Data Requirements
Journal Data Requirements Journal Data Requirements
Journal Data Requirements
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
DataCite at APE 2011
DataCite at APE 2011DataCite at APE 2011
DataCite at APE 2011
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 

Similaire à PhyloTastic: names-based phyloinformatic data integration

Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
ICZN
 
Schuh web catalog_ecn_2012
Schuh web catalog_ecn_2012Schuh web catalog_ecn_2012
Schuh web catalog_ecn_2012
ECNOfficer
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
Roderic Page
 

Similaire à PhyloTastic: names-based phyloinformatic data integration (20)

BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Schuh web catalog_ecn_2012
Schuh web catalog_ecn_2012Schuh web catalog_ecn_2012
Schuh web catalog_ecn_2012
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
patterndat.pdf
patterndat.pdfpatterndat.pdf
patterndat.pdf
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
OSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery indexOSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery index
 
iEvoBio 2010 cdaostore
iEvoBio 2010 cdaostoreiEvoBio 2010 cdaostore
iEvoBio 2010 cdaostore
 
Ievobio2010cdaostore
Ievobio2010cdaostoreIevobio2010cdaostore
Ievobio2010cdaostore
 
2016 02 23_biological_databases_part1
2016 02 23_biological_databases_part12016 02 23_biological_databases_part1
2016 02 23_biological_databases_part1
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
philogenetic tree
philogenetic treephilogenetic tree
philogenetic tree
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 

Plus de Rutger Vos

Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
Rutger Vos
 

Plus de Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

PhyloTastic: names-based phyloinformatic data integration

  • 2. Re-use of phylogenetic knowledge Currently, mostphylogenetic knowledge is not easily re-used due to a lack of: • archiving; • awareness of best practices; • community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree.
  • 3. Phylomatic: automated re-use of phylogenetic knowledge • In a recent survey of practices of re-use of phylogenetic knowledge, Phylomatic was the most frequently used method for obtaining trees, e.g. in studies of phylogenetic community structure. • Phylomatictakes a set of input taxa and extracts them from a reference phylogeny by pruning and grafting. • The reference phylogeny is usually APG-III • Taxon names are matched exactly or grafted on. • Branch lengths are either retained or modeled (bladj)
  • 4. Phylotastic: generalizing and modularizing phylomatic-ish functionality Phylotastic was conceived by NESCent’sHackathons, Interoperability, Phylogenies (HIP) working group and was initiated by several dozen participants at a NESCent hackathon on June 48, 2012. A second hackathon took place at iPlant’s headquarters in Tucson, Arizona on January 28 through February 1, 2013.
  • 5. Phylotastic: a design pattern for phylogenetic data re-use 1. Input list of names 2. Controller queries TNRS with list of names 3. TNRS provides token with redirect to results 4. Controller gets TNRS results 5. Controller queries Treestore for trees with TNRS taxa 6. Controller POSTsTreestore matches, GETssubtree back 7. Treestore (or proxy) performs pruning and grafting 8. Annotated subtree is returned
  • 7. TaxoSaurus: the PhyloTastic TNRS • A simple, asynchronous, RESTful API that communicates in JSON. • Modular design: multiple taxonomies can be ingested and queried • Built around the iPlant TNRS service • Available at taxosaurus.org
  • 8. TaxoSaurus: the PhyloTastic TNRS /submit - POST or GET a list of scientific names to the service and retrieve a JSON token to access results. Parameters: • query: newline separated list of scientific names. OR • file: a text file containing newline separated scientific names. • source (optional): a comma separated list of taxonomic source ids (see /sources/list). • code (optional): the abbreviation for one of the nomenclature codes (ICZN/ICN/ICNB).
  • 9. TaxoSaurus: the PhyloTastic TNRS /retrieve/<token>- GET the result of a TNRastic query. • Parameters: none • Returns: a JSON object containing the accepted names /sources/list – GET a ranked list of available sources • Parameters: none • Returns: a JSON object containing the list of source IDs
  • 10. TaxoSaurus: the PhyloTastic TNRS /sources/<source_id>- GET the details about a particular source, or all sources if no ID specified • Parameters: <source_id> or none • Returns: a JSON object containing the source details /delete/<token>- GET or POST or DELETE. Cancels a running job • Parameters: <token>, the hash of the job to cancel • Returns: a JSON object indicating success or an error
  • 12. Voettekst vullen: Invoegen|Koptekst en voettekst / Insert|Header & Footer