SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Integra(ng	data	with	phylogenies,	
at	scale	
Nico	Cellinese	
University	of	Florida	
&	
Hilmar	Lapp	
Duke	University
WHAT’S	IN	A	NAME?
What’s	in	a	name?	
Chaos!	
•  Names	and	Concepts	do	not	
reconcile	that	easily	
•  Names	are	text	strings	
•  Context	is	lacking	or	subjec(ve	
•  Meaning	is	not	computable
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
I	don’t	understand	any	of	those	concepts	
whether	in	LaDn	or	English,	but	I	can	sDll	
link	them	to	their	names,	as	in	one	object	
to	one	object
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
From	a	human	perspecDve,	we	lose	track	of	concepts.	Hard	to	reconcile	all	of	them.	We	need	
help!	Can	we	compute	them?	
Idiosyncratic Russian dolls syndrome
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
•  We	can	uncluNer	concepts,	and	thereby	
nomenclature	
•  How	do	we	navigate	along	the	Tree	of	Life	
repurposing	Linnean	names,	which	are	
linked	to	tradi(onal	concepts?
Dark	taxa!
Dark	taxa!	
How	do	we	integrate	data	with	this	tree?
Tree-thinking	
Common	descent	àevoluDon	at	the	center	of	taxonomy	
B	 C	 D	
Branches	
Synapomorphies	
A	
Clades	=	taxa	
Discovery
Tree-thinking	
Common	descent	àevoluDon	at	the	center	of	taxonomy	
Discovery	
CommunicaDon	How??	
0147
Density
0.07
0.22
0.72
Diversification rate
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
These	names	are	not	generated	in	an	evoluDonary-based	framework	
(Groups	defined	by	character	similarity	vs.	common	descent)
Both	the	Encyclopedia	of	Life	(EOL)	and	the	Open	Tree	of	Life	suggest	that	
Campanuloideae	is	a	misspelling	of	Campaniloidea	(marine	gastropods!)		
GBIF	does	not	currently	have	Campanuloideae	in	its	backbone	taxonomy.
Are	you	kidding	me?	
These	are	the	Campanuloideae!	
Wang	et	al.	2014
Life	as	a	street	map	How	to	navigate	life	as	a	machine
Mapping	data	to	phylogene(c	
knowledge	space
Street	signs	serve	people,	not	machines
•  How	do	we	build	a	reliable	GPS	for	phylogenies?	
•  How	do	we	reproducibly	find	the	right	nodes?	
	
Mapping	data	to	phylogene(c	
knowledge	space
FEED
Textual Definition –
The hyoglossus is a muscle that attaches to
the hyoid and tongue and is innervated by
Cranial Nerve XII.
Computable Definition –
('attached to' some 'hyoid bone')
and ('attached to' some tongue)
and ('innervated by' some 'hypoglossal
nerve') and
spatially disjoint with 'intrinsic tongue
muscle'
Druzinsky	et	al	(2015):	Logic	definiDons	of	mammalian	
feeding	muscles	by	means	of	necessary	and	sufficient	
condiDons	true	for	all	mammals	
Nomenclature	≠	Seman(cs
Phyloreference	
=	
Logic	defini(on	of	a	clade,	
using	the	property	common	to	
all	of	life
Phyloreferences	
Statements	formally	expressing	the	paaerns	we	discover	
(analogous	to	map	coordinates)	
	
Node-Based Branch-Based Apomorphy-Based
A B C A B C A B C
X
The	clade	originaDng	
with	the	last	common	
ancestor	of	B	and	C.	
The	clade	originaDng	
with	the	first	ancestor	of	
B	that	is		not	an	
ancestor	of	A.	
The	clade	originaDng	
with	the	first	ancestor	
of	C	to	evolve	X.
Phyloreferences	yield	a	
coordinate	system	for	the	Tree	of	Life	
•  Any	node,	branch,	subtree	is	referenceable	
•  References	are	unambiguous	
•  References	are	computable	
•  References	are	portable	
•  Adapts	to	new	and	changing	knowledge
Many	needed	technologies	already	exist	
•  OWL	ontologies	designed	
for	
–  PhylogeneDc	knowledge:	
CDAO	
–  Phenotypic	knowledge:	
Uberon,	PATO,	…	
–  Efficient	and	expressive	
reasoners:	FaCT++,	HermiT,	
Racer,	ELK
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1889_to_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Crysanthemum
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Lobelia
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_aier_1995	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Sphenoclea
Phyloreferences	as	ontological	expressions	
Phyloreference	expressions	
can	be:		
•  Easily	generated	by	
anyone	
•  Can	work	on	any	tree	
•  Named	and	registered	
– To	promote	reuse	and	
consistency	
– To	improve	usability	
and	accessibility	
Class:	Campanulaceae	
Annota(ons:	
				rdfs:label	“Campanulaceae_aier_1995”	
				dc:descripDon	“the	clade	that	includes	
Campanula	laDfolia	but	not	Sphenoclea”	
EquivalentTo:		
cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
Class:	AGF4-SHRU-3560	
EquivalentTo:		
	cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
vs.
Challenges	
•  OWL-based	data	model	to	saDsfy	phylogeneDc	
taxonomy,	reasoning	expressivity,	scalability	
•  ConvenDons	for	data	transformaDon,	and	
consequences	of	different	choices	
•  Least	common	ancestor	reasoning	for	OWL	
data	
•  Lack	of	canonical	specimen	idenDfier	system	
•  Specifier	mapping	ontologies
Tree	of	Life,	ontologized:	
A	universal	coordinate	system	
•  The	Tree	of	Life	is	itself	an	aggregaDon	and	
integraDon	of	our	phylogeneDc	knowledge.	
•  Phyloreferencing	is	addressing	into	a	knowledge	
universe.	
•  Ontologies,	reasoning,	and	other	KR	techniques	
are	powerful	tools	for	this.
Acknowledgements	
•  NaDonal	Science	FoundaDon	(DBI-1458484)	
•  Ken	and	Linda	McGurn	
•  Phenoscape	
•  EvoIO

Contenu connexe

Plus de Hilmar Lapp

The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
Hilmar Lapp
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Hilmar Lapp
 
Lapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing SymposiumLapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing Symposium
Hilmar Lapp
 

Plus de Hilmar Lapp (14)

Open Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionOpen Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some Introspection
 
Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014 Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
Bringing reason to phenotype diversity, character change, and common descent
Bringing reason to phenotype diversity, character change, and common descentBringing reason to phenotype diversity, character change, and common descent
Bringing reason to phenotype diversity, character change, and common descent
 
Phyloinformatics VoCamp
Phyloinformatics VoCampPhyloinformatics VoCamp
Phyloinformatics VoCamp
 
Reasoning over phenotype diversity, character change, and evolutionary descent
Reasoning over phenotype diversity, character change, and evolutionary descentReasoning over phenotype diversity, character change, and evolutionary descent
Reasoning over phenotype diversity, character change, and evolutionary descent
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
 
Liberating Our Beautiful Trees: A Call to Arms.
Liberating Our Beautiful Trees: A Call to Arms.Liberating Our Beautiful Trees: A Call to Arms.
Liberating Our Beautiful Trees: A Call to Arms.
 
OBF Address at BOSC 2012
OBF Address at BOSC 2012OBF Address at BOSC 2012
OBF Address at BOSC 2012
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Lapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing SymposiumLapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing Symposium
 

Dernier

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 

Dernier (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 

Integrating data with phylogenies, at scale