SlideShare une entreprise Scribd logo
1  sur  38
Phyloinformatics and the Semantic Web Rutger Vos
Outline What is phyloinformatics and why should you care? How we got here and where we are now How the semantic web can help Projects that apply the semantic web to phyloinformatics Examples of linked data Where to next
What is Phyloinformatics? Phylogenetics: “The systematic study of organism relationships based on evolutionary similarities and differences.” Informatics: “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
Why should you care? Firstly,  “Nothing in evolution makes sense except in the light of phylogeny” Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile? But if that doesn’t convince you…
As a consumer of phylogenetic data The “New Biology” is coming: “Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009) Presumably, this will involve retrieving and classifying.
As a consumer of phylogenetic data Or maybe for you phylogeny is simply a nuisance: Functional prediction Comparative analysis Ortholog finding Etc. But it would still be nice to have that out of the way painlessly…
As a producer of phylogenetic data Many journals require proper storage of data described in a manuscript. Funding agencies require dissemination and sharing of research results.
The Past Everything was closed: Idiosyncratic, private data  “pay-walls” Closed source software No accessible publishing medium
The Present Science is opening up: Open data Open access publishing Open source software Publishing is now accessible to everyone, online
Our current nightmare Documents,  documents everywhere
The current web makes sense to us
But not to a machine
What was informatics again? “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
This is too hard O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.
Let’s delegate that
Instead of linked documents
A web of linked concepts
Concepts connected by statements
Concepts are defined in ontologies “An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”
Expressing concepts in data syntax
Concepts are linked Linked by statements called “triples” A triple is a statement subject predicate object Any part of a triple may have to be uniquely identifiable. For this we use URLs.
An applied example Triple 1 	Subject: <http://example.org/data/tree1> 	Predicate: <http://example.org/terms/hasLikelihood> 	Object: 2342.323 i.e. -lnL(tree1) = 2342.323 Triple 2 	Subject: <http://example.org/data/tree2> 	Predicate: <http://example.org/terms/hasLikelihood> 	Object: 2341.184 i.e. -lnL(tree2) = 2341.184
What’s the better tree? The ontology defines what a likelihood is and how to compare negative log likelihoods. Hence, automated reasoning can conclude that tree2 is the better tree.
URLs for phylogenetics PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.
The EvoInfo “stack”
TreeBASE
External links Study Taxon variant Taxon
A simple example TreeBASE maps  to uBio using skos:closeMatch... …and uBio to ToL  using gla:mapping
Another Example, UniProt sequences Standard tools can rewrite these linkout URLs  Result is a corresponding list of UniProt records TreeBASE stores NCBI taxonomy identifiers
Another Example, Geocoding TreeBASE uses DarwinCore for lat/lon annotations
Many online data repositories
Challenges Fragile: many services offline in Japan Data gets bigger and bigger Many concepts not yet in ontologies Many data still “locked in” in publications
The Future
The cloud Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo) Data will be stored in the cloud (Big Table, FreeBase)
Interpreting locked in knowledge Text and images meant for humans are being processed by machines. Examples: Taxon name mining (BHL) Gene name and function mining Tree figure processing Automated annotation
Summary Phyloinformatics is moving from closed to open to linked data Concepts and syntax are increasingly formalized and machine readable Automated queries across integrated resources will enable synthetic research Still lots to do to deploy these technologies and unlock legacy data
Acknowledgements Thank you for your attention! Also, many thanks to: 	The Pagel lab at UoR 	The EvoInfo group 	Val Tannen 	Wayne Maddison 	William Piel 	Hilmar Lapp ArlinStoltzfus

Contenu connexe

Tendances

Knowledge Sharing - aCCCeso
Knowledge Sharing - aCCCesoKnowledge Sharing - aCCCeso
Knowledge Sharing - aCCCesoKaitlin Thaney
 
Mozilla Science Lab 101
Mozilla Science Lab 101Mozilla Science Lab 101
Mozilla Science Lab 101Kaitlin Thaney
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data viaNeuroscience Information Framework
 
Data Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCData Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCKaitlin Thaney
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...Maryann Martone
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Making the web work for science - eResearch nz
Making the web work for science - eResearch nzMaking the web work for science - eResearch nz
Making the web work for science - eResearch nzKaitlin Thaney
 
Week 8 DRP sem 2 09
Week 8 DRP sem 2 09Week 8 DRP sem 2 09
Week 8 DRP sem 2 09guest992d811
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1guest2426e1d
 
Principle Violations: Revisiting the Dublin Core 1:1 Principle
Principle Violations:  Revisiting the Dublin Core 1:1 PrinciplePrinciple Violations:  Revisiting the Dublin Core 1:1 Principle
Principle Violations: Revisiting the Dublin Core 1:1 PrincipleRichard Urban
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Mathieu d'Aquin
 
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"hypertext2007
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open SciencePhilip Bourne
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Mathieu d'Aquin
 
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago ProjectKnowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago ProjectUgo Eccli
 

Tendances (20)

Knowledge Sharing - aCCCeso
Knowledge Sharing - aCCCesoKnowledge Sharing - aCCCeso
Knowledge Sharing - aCCCeso
 
Mozilla Science Lab 101
Mozilla Science Lab 101Mozilla Science Lab 101
Mozilla Science Lab 101
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
Data Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCData Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWC
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Making the web work for science - eResearch nz
Making the web work for science - eResearch nzMaking the web work for science - eResearch nz
Making the web work for science - eResearch nz
 
Week 8 DRP sem 2 09
Week 8 DRP sem 2 09Week 8 DRP sem 2 09
Week 8 DRP sem 2 09
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
 
Principle Violations: Revisiting the Dublin Core 1:1 Principle
Principle Violations:  Revisiting the Dublin Core 1:1 PrinciplePrinciple Violations:  Revisiting the Dublin Core 1:1 Principle
Principle Violations: Revisiting the Dublin Core 1:1 Principle
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
 
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago ProjectKnowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
 

En vedette

IPR Training Program
IPR Training ProgramIPR Training Program
IPR Training ProgramIleague
 
Retail Saa S 2011 1
Retail Saa S 2011 1Retail Saa S 2011 1
Retail Saa S 2011 1tgeyskens
 
Computer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจComputer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจthanapat yeekhaday
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveRutger Vos
 
Application of Computer in Government
Application of Computer in GovernmentApplication of Computer in Government
Application of Computer in Governmentthanapat yeekhaday
 

En vedette (7)

IPR Training Program
IPR Training ProgramIPR Training Program
IPR Training Program
 
Retail Saa S 2011 1
Retail Saa S 2011 1Retail Saa S 2011 1
Retail Saa S 2011 1
 
Introduction to Computer
Introduction to ComputerIntroduction to Computer
Introduction to Computer
 
Computer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจComputer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจ
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Application of Computer in Government
Application of Computer in GovernmentApplication of Computer in Government
Application of Computer in Government
 
Biomechatronics
BiomechatronicsBiomechatronics
Biomechatronics
 

Similaire à Phyloinformatics and the Semantic Web: Unlocking Evolutionary Data

The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence IJORCS
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataRyan Johnson
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsDereck Downing
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSilvia Puglisi
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebValentina Presutti
 
Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Access Innovations, Inc.
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 

Similaire à Phyloinformatics and the Semantic Web: Unlocking Evolutionary Data (20)

Presentationonline
PresentationonlinePresentationonline
Presentationonline
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked data
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In Bioinformatics
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
020610
020610020610
020610
 
Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Plus de Rutger Vos

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Rutger Vos
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over EvolutieRutger Vos
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course BiodiversiteitRutger Vos
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data scienceRutger Vos
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionRutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterflyRutger Vos
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningRutger Vos
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Rutger Vos
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataRutger Vos
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Rutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenRutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline introRutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Rutger Vos
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLRutger Vos
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 

Plus de Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Phyloinformatics and the Semantic Web: Unlocking Evolutionary Data

  • 1. Phyloinformatics and the Semantic Web Rutger Vos
  • 2. Outline What is phyloinformatics and why should you care? How we got here and where we are now How the semantic web can help Projects that apply the semantic web to phyloinformatics Examples of linked data Where to next
  • 3. What is Phyloinformatics? Phylogenetics: “The systematic study of organism relationships based on evolutionary similarities and differences.” Informatics: “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
  • 4. Why should you care? Firstly, “Nothing in evolution makes sense except in the light of phylogeny” Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile? But if that doesn’t convince you…
  • 5. As a consumer of phylogenetic data The “New Biology” is coming: “Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009) Presumably, this will involve retrieving and classifying.
  • 6. As a consumer of phylogenetic data Or maybe for you phylogeny is simply a nuisance: Functional prediction Comparative analysis Ortholog finding Etc. But it would still be nice to have that out of the way painlessly…
  • 7. As a producer of phylogenetic data Many journals require proper storage of data described in a manuscript. Funding agencies require dissemination and sharing of research results.
  • 8. The Past Everything was closed: Idiosyncratic, private data “pay-walls” Closed source software No accessible publishing medium
  • 9. The Present Science is opening up: Open data Open access publishing Open source software Publishing is now accessible to everyone, online
  • 10. Our current nightmare Documents, documents everywhere
  • 11. The current web makes sense to us
  • 12. But not to a machine
  • 13. What was informatics again? “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
  • 14.
  • 15. This is too hard O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.
  • 17. Instead of linked documents
  • 18. A web of linked concepts
  • 19. Concepts connected by statements
  • 20. Concepts are defined in ontologies “An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”
  • 21. Expressing concepts in data syntax
  • 22. Concepts are linked Linked by statements called “triples” A triple is a statement subject predicate object Any part of a triple may have to be uniquely identifiable. For this we use URLs.
  • 23. An applied example Triple 1 Subject: <http://example.org/data/tree1> Predicate: <http://example.org/terms/hasLikelihood> Object: 2342.323 i.e. -lnL(tree1) = 2342.323 Triple 2 Subject: <http://example.org/data/tree2> Predicate: <http://example.org/terms/hasLikelihood> Object: 2341.184 i.e. -lnL(tree2) = 2341.184
  • 24. What’s the better tree? The ontology defines what a likelihood is and how to compare negative log likelihoods. Hence, automated reasoning can conclude that tree2 is the better tree.
  • 25. URLs for phylogenetics PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.
  • 28. External links Study Taxon variant Taxon
  • 29. A simple example TreeBASE maps to uBio using skos:closeMatch... …and uBio to ToL using gla:mapping
  • 30. Another Example, UniProt sequences Standard tools can rewrite these linkout URLs Result is a corresponding list of UniProt records TreeBASE stores NCBI taxonomy identifiers
  • 31. Another Example, Geocoding TreeBASE uses DarwinCore for lat/lon annotations
  • 32. Many online data repositories
  • 33. Challenges Fragile: many services offline in Japan Data gets bigger and bigger Many concepts not yet in ontologies Many data still “locked in” in publications
  • 35. The cloud Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo) Data will be stored in the cloud (Big Table, FreeBase)
  • 36. Interpreting locked in knowledge Text and images meant for humans are being processed by machines. Examples: Taxon name mining (BHL) Gene name and function mining Tree figure processing Automated annotation
  • 37. Summary Phyloinformatics is moving from closed to open to linked data Concepts and syntax are increasingly formalized and machine readable Automated queries across integrated resources will enable synthetic research Still lots to do to deploy these technologies and unlock legacy data
  • 38. Acknowledgements Thank you for your attention! Also, many thanks to: The Pagel lab at UoR The EvoInfo group Val Tannen Wayne Maddison William Piel Hilmar Lapp ArlinStoltzfus

Notes de l'éditeur

  1. Thank for invitation-Thank for showing up given other lecture-Introduce self-Talk title
  2. -Mention figure on the right
  3. -Mention dobzhansky
  4. Here’s an example that uses the Yahoo! Pipes tool to turns the list of NCBI taxon identifiers that TreeBASE stores for a given study into a list of all UniProt sequence records for those taxa.
  5. This example shows that with a minimal amount of JavaScript coding a google map can be added to a web page (first code block), and the taxa for a given study can be mapped onto it using the DarwinCore coordinate annotations that TreeBASE stores.