SlideShare a Scribd company logo
1 of 11
Big Data that might benefit from
ontology technology, but why this
            usually fails

             Barry Smith
    National Center for Ontological
               Research

                                      1
The strategy of annotation
Databases describe data using multiple heterogeneous
labels
If we can annotate (tag) these labels using terms from
common controlled vocabularies, then a virtual arms-
length integration can be achieved, providing
• immediate benefits for search and retrieval
• a starting point for the creation of net-centric
   reference data
• potential longer term benefits for reasoning
with no need to modify existing systems, code or data

             See Ceusters et al. Proceedings of DILS 2004.
             http://ontology.buffalo.edu/bio/LinkSuite.pdf
                                                             2
String searches yield partial results, rest
  on manual effort and on familiarity
     with existing database contents
Ontologies facilitate grouping of annotations

          brain               20
            hindbrain         15
               rhombomere     10


      Query ‘brain’ without ontology 20
      Query ‘brain’ with ontology    45
Examples of where this method works
• Reference Genome Annotation Project
  http://www.geneontology.org/GO.refgenome.shtml
• Human resources data in large organizations
  http://www.youtube.com/watch?v=OzW3Gc_yA9A
• Military intelligence data
  Salmen et al. in http://ceur-ws.org/Vol-808/
Other potential areas of application:
• Crime                      • Public health
• Insurance                  • Finance

                                                   4
But normally the method does not work

Semantic technology (OWL, …) seeks to break
  down data silos
Unfortunately it is now so easy to create
  ontologies that myriad incompatible ontologies
  are being created in ad hoc ways leading to the
  creation of new, semantic silos
The Semantic Web framework as currently
  conceived and governed by the W3C (modeled
  on html) yields minimal standardization
The more semantic technology is
  successful, they more we fail to achieve
  our goals
                                                    5
Reasons for this effect
• Just as it’s easier to build a new database, so it’s
  easier to build a new ontology for each new
  project
• You will not get paid for reusing existing
  ontologies (Let a million ontologies bloom)
• There are no ‘good’ ontologies, anyway (just
  arbitrary choices of terms and relations …)
• Information technology (hardware) changes
  constantly, not worth the effort of getting things
  right
                                                         6
How to do it right?
• how create an incremental, evolutionary
  process, where what is good survives, and what is
  bad fails
• create a scenario in which people will find it
  profitable to reuse ontologies, terminologies and
  coding systems which have been tried and tested
• silo effects will be avoided and results of
  investment in Semantic Technology will cumulate
  effectively
                                                  7
Uses of ‘ontology’ in PubMed abstracts




                                         8
By far the most successful: GO (Gene Ontology)




                                                 9
GO provides a controlled vocabulary of terms for
  use in annotating (tagging) biological data

• multi-species, multi-disciplinary, open source
• built and maintained by domain experts
• contributing to the cumulativity of scientific results
  obtained by distinct research communities
• natural language and logical definitions for all
  terms to support consistent human application and
  computational exploitation
• rigorous governance process
• feedback loop connects users to editors
                                                      10
How to do it right
• ontologies should mimic the methodology used
  by the GO (following the principles of the OBO
  Foundry: http://obofoundry.org)
• ontologies in the same field should be
  developed in coordinated fashion to ensure
  that there is exactly one ontology for each
  subdomain
• ontologies should be developed incrementally
  in a way that builds on successful user testing at
  every stage
                                                  11

More Related Content

What's hot

Modern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging resultsModern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging resultsKrzysztof Gorgolewski
 
Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachKrzysztof Gorgolewski
 
Explorations in bioinformatics
Explorations in bioinformaticsExplorations in bioinformatics
Explorations in bioinformaticsDouglas Joubert
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Towards open and reproducible neuroscience in the age of big data
Towards open and reproducible neuroscience in the age of big dataTowards open and reproducible neuroscience in the age of big data
Towards open and reproducible neuroscience in the age of big dataKrzysztof Gorgolewski
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelKrzysztof Gorgolewski
 
Open Source Pharma: From philosophy to real time experience
Open Source Pharma: From philosophy to real time experienceOpen Source Pharma: From philosophy to real time experience
Open Source Pharma: From philosophy to real time experienceOpen Source Pharma
 
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014aceas13tern
 

What's hot (11)

Modern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging resultsModern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging results
 
Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approach
 
Explorations in bioinformatics
Explorations in bioinformaticsExplorations in bioinformatics
Explorations in bioinformatics
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Towards open and reproducible neuroscience in the age of big data
Towards open and reproducible neuroscience in the age of big dataTowards open and reproducible neuroscience in the age of big data
Towards open and reproducible neuroscience in the age of big data
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next level
 
Open Source Pharma: From philosophy to real time experience
Open Source Pharma: From philosophy to real time experienceOpen Source Pharma: From philosophy to real time experience
Open Source Pharma: From philosophy to real time experience
 
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
 
Log Analysis to Understand Medical Professionals' Image Searching Behaviour
Log Analysis to Understand Medical Professionals' Image Searching BehaviourLog Analysis to Understand Medical Professionals' Image Searching Behaviour
Log Analysis to Understand Medical Professionals' Image Searching Behaviour
 

Viewers also liked

Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big DataKouji Kozaki
 
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...IJwest
 
ontology meets big data: immutability
ontology meets big data:immutabilityontology meets big data:immutability
ontology meets big data: immutabilityChris Partridge
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsChristoph Lange
 
Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataBarry Smith
 
Tutorial what is_an_ontology_ncbo_march_2012
Tutorial what is_an_ontology_ncbo_march_2012Tutorial what is_an_ontology_ncbo_march_2012
Tutorial what is_an_ontology_ncbo_march_2012Barry Smith
 
Imagenes De Amistad
Imagenes De Amistad Imagenes De Amistad
Imagenes De Amistad ErikithaPte
 
Towards Joint Doctrine for Military Informatics
Towards Joint Doctrine for Military InformaticsTowards Joint Doctrine for Military Informatics
Towards Joint Doctrine for Military InformaticsBarry Smith
 
Ontologies for big data
Ontologies for big dataOntologies for big data
Ontologies for big dataYu Lin
 
Towards an Ontology of Philosophy
Towards an Ontology of PhilosophyTowards an Ontology of Philosophy
Towards an Ontology of PhilosophyBarry Smith
 
The Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataThe Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataBarry Smith
 
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainIAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainBarry Smith
 
Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataBarry Smith
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisCraig Knoblock
 
Ontology of Poker
Ontology of PokerOntology of Poker
Ontology of PokerBarry Smith
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 

Viewers also liked (17)

Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big Data
 
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
 
ontology meets big data: immutability
ontology meets big data:immutabilityontology meets big data:immutability
ontology meets big data: immutability
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process Descriptions
 
Ontology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in ChinaOntology based metadata schema for digital library projects in China
Ontology based metadata schema for digital library projects in China
 
Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence data
 
Tutorial what is_an_ontology_ncbo_march_2012
Tutorial what is_an_ontology_ncbo_march_2012Tutorial what is_an_ontology_ncbo_march_2012
Tutorial what is_an_ontology_ncbo_march_2012
 
Imagenes De Amistad
Imagenes De Amistad Imagenes De Amistad
Imagenes De Amistad
 
Towards Joint Doctrine for Military Informatics
Towards Joint Doctrine for Military InformaticsTowards Joint Doctrine for Military Informatics
Towards Joint Doctrine for Military Informatics
 
Ontologies for big data
Ontologies for big dataOntologies for big data
Ontologies for big data
 
Towards an Ontology of Philosophy
Towards an Ontology of PhilosophyTowards an Ontology of Philosophy
Towards an Ontology of Philosophy
 
The Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataThe Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military Data
 
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainIAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
 
Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence data
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and Analysis
 
Ontology of Poker
Ontology of PokerOntology of Poker
Ontology of Poker
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 

Similar to Big data ontology_summit_feb2012

Ontological realism as a strategy for integrating ontologies
Ontological realism as a strategy for integrating ontologiesOntological realism as a strategy for integrating ontologies
Ontological realism as a strategy for integrating ontologiesBarry Smith
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010Peter Yim
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityMonica Munoz-Torres
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2Nadine Ludwig
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)James Hendler
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Susanna-Assunta Sansone
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Carla Lima
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inMary Chitty
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECAProject
 

Similar to Big data ontology_summit_feb2012 (20)

Ontological realism as a strategy for integrating ontologies
Ontological realism as a strategy for integrating ontologiesOntological realism as a strategy for integrating ontologies
Ontological realism as a strategy for integrating ontologies
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research Community
 
SciBite
SciBiteSciBite
SciBite
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-in
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 

More from Barry Smith

An application of Basic Formal Ontology to the Ontology of Services and Commo...
An application of Basic Formal Ontology to the Ontology of Services and Commo...An application of Basic Formal Ontology to the Ontology of Services and Commo...
An application of Basic Formal Ontology to the Ontology of Services and Commo...Barry Smith
 
Ways of Worldmarking: The Ontology of the Eruv
Ways of Worldmarking: The Ontology of the EruvWays of Worldmarking: The Ontology of the Eruv
Ways of Worldmarking: The Ontology of the EruvBarry Smith
 
The Division of Deontic Labor
The Division of Deontic LaborThe Division of Deontic Labor
The Division of Deontic LaborBarry Smith
 
Ontology of Aging (August 2014)
Ontology of Aging (August 2014)Ontology of Aging (August 2014)
Ontology of Aging (August 2014)Barry Smith
 
The Fifth Cycle of Philosophy
The Fifth Cycle of PhilosophyThe Fifth Cycle of Philosophy
The Fifth Cycle of PhilosophyBarry Smith
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
The Philosophome: An Exercise in the Ontology of the Humanities
The Philosophome: An Exercise in the Ontology of the HumanitiesThe Philosophome: An Exercise in the Ontology of the Humanities
The Philosophome: An Exercise in the Ontology of the HumanitiesBarry Smith
 
Science of Emerging Social Media
Science of Emerging Social MediaScience of Emerging Social Media
Science of Emerging Social MediaBarry Smith
 
Ethics, Informatics and Obamacare
Ethics, Informatics and ObamacareEthics, Informatics and Obamacare
Ethics, Informatics and ObamacareBarry Smith
 
e‐Human Beings: The contribution of internet ranking systems to the developme...
e‐Human Beings: The contribution of internet ranking systems to the developme...e‐Human Beings: The contribution of internet ranking systems to the developme...
e‐Human Beings: The contribution of internet ranking systems to the developme...Barry Smith
 
Ontology of aging and death
Ontology of aging and deathOntology of aging and death
Ontology of aging and deathBarry Smith
 
Ontology in-buffalo-2013
Ontology in-buffalo-2013Ontology in-buffalo-2013
Ontology in-buffalo-2013Barry Smith
 
ImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial dataImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial dataBarry Smith
 
Ontology of Documents (2005)
Ontology of Documents (2005)Ontology of Documents (2005)
Ontology of Documents (2005)Barry Smith
 
Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)Barry Smith
 
Introduction to the Logic of Definitions
Introduction to the Logic of DefinitionsIntroduction to the Logic of Definitions
Introduction to the Logic of DefinitionsBarry Smith
 
Ontology in Buffalo -- Big Data 2013
Ontology in Buffalo -- Big Data 2013Ontology in Buffalo -- Big Data 2013
Ontology in Buffalo -- Big Data 2013Barry Smith
 
How to Do Things With Documents
How to Do Things With DocumentsHow to Do Things With Documents
How to Do Things With DocumentsBarry Smith
 

More from Barry Smith (20)

An application of Basic Formal Ontology to the Ontology of Services and Commo...
An application of Basic Formal Ontology to the Ontology of Services and Commo...An application of Basic Formal Ontology to the Ontology of Services and Commo...
An application of Basic Formal Ontology to the Ontology of Services and Commo...
 
Ways of Worldmarking: The Ontology of the Eruv
Ways of Worldmarking: The Ontology of the EruvWays of Worldmarking: The Ontology of the Eruv
Ways of Worldmarking: The Ontology of the Eruv
 
The Division of Deontic Labor
The Division of Deontic LaborThe Division of Deontic Labor
The Division of Deontic Labor
 
Ontology of Aging (August 2014)
Ontology of Aging (August 2014)Ontology of Aging (August 2014)
Ontology of Aging (August 2014)
 
Meaningful Use
Meaningful UseMeaningful Use
Meaningful Use
 
The Fifth Cycle of Philosophy
The Fifth Cycle of PhilosophyThe Fifth Cycle of Philosophy
The Fifth Cycle of Philosophy
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
The Philosophome: An Exercise in the Ontology of the Humanities
The Philosophome: An Exercise in the Ontology of the HumanitiesThe Philosophome: An Exercise in the Ontology of the Humanities
The Philosophome: An Exercise in the Ontology of the Humanities
 
Science of Emerging Social Media
Science of Emerging Social MediaScience of Emerging Social Media
Science of Emerging Social Media
 
Ethics, Informatics and Obamacare
Ethics, Informatics and ObamacareEthics, Informatics and Obamacare
Ethics, Informatics and Obamacare
 
e‐Human Beings: The contribution of internet ranking systems to the developme...
e‐Human Beings: The contribution of internet ranking systems to the developme...e‐Human Beings: The contribution of internet ranking systems to the developme...
e‐Human Beings: The contribution of internet ranking systems to the developme...
 
Ontology of aging and death
Ontology of aging and deathOntology of aging and death
Ontology of aging and death
 
Ontology in-buffalo-2013
Ontology in-buffalo-2013Ontology in-buffalo-2013
Ontology in-buffalo-2013
 
ImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial dataImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial data
 
Ontology of Documents (2005)
Ontology of Documents (2005)Ontology of Documents (2005)
Ontology of Documents (2005)
 
Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)Ontology and the National Cancer Institute Thesaurus (2005)
Ontology and the National Cancer Institute Thesaurus (2005)
 
Introduction to the Logic of Definitions
Introduction to the Logic of DefinitionsIntroduction to the Logic of Definitions
Introduction to the Logic of Definitions
 
Ontology in Buffalo -- Big Data 2013
Ontology in Buffalo -- Big Data 2013Ontology in Buffalo -- Big Data 2013
Ontology in Buffalo -- Big Data 2013
 
How to Do Things With Documents
How to Do Things With DocumentsHow to Do Things With Documents
How to Do Things With Documents
 

Big data ontology_summit_feb2012

  • 1. Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1
  • 2. The strategy of annotation Databases describe data using multiple heterogeneous labels If we can annotate (tag) these labels using terms from common controlled vocabularies, then a virtual arms- length integration can be achieved, providing • immediate benefits for search and retrieval • a starting point for the creation of net-centric reference data • potential longer term benefits for reasoning with no need to modify existing systems, code or data See Ceusters et al. Proceedings of DILS 2004. http://ontology.buffalo.edu/bio/LinkSuite.pdf 2
  • 3. String searches yield partial results, rest on manual effort and on familiarity with existing database contents Ontologies facilitate grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query ‘brain’ without ontology 20 Query ‘brain’ with ontology 45
  • 4. Examples of where this method works • Reference Genome Annotation Project http://www.geneontology.org/GO.refgenome.shtml • Human resources data in large organizations http://www.youtube.com/watch?v=OzW3Gc_yA9A • Military intelligence data Salmen et al. in http://ceur-ws.org/Vol-808/ Other potential areas of application: • Crime • Public health • Insurance • Finance 4
  • 5. But normally the method does not work Semantic technology (OWL, …) seeks to break down data silos Unfortunately it is now so easy to create ontologies that myriad incompatible ontologies are being created in ad hoc ways leading to the creation of new, semantic silos The Semantic Web framework as currently conceived and governed by the W3C (modeled on html) yields minimal standardization The more semantic technology is successful, they more we fail to achieve our goals 5
  • 6. Reasons for this effect • Just as it’s easier to build a new database, so it’s easier to build a new ontology for each new project • You will not get paid for reusing existing ontologies (Let a million ontologies bloom) • There are no ‘good’ ontologies, anyway (just arbitrary choices of terms and relations …) • Information technology (hardware) changes constantly, not worth the effort of getting things right 6
  • 7. How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested • silo effects will be avoided and results of investment in Semantic Technology will cumulate effectively 7
  • 8. Uses of ‘ontology’ in PubMed abstracts 8
  • 9. By far the most successful: GO (Gene Ontology) 9
  • 10. GO provides a controlled vocabulary of terms for use in annotating (tagging) biological data • multi-species, multi-disciplinary, open source • built and maintained by domain experts • contributing to the cumulativity of scientific results obtained by distinct research communities • natural language and logical definitions for all terms to support consistent human application and computational exploitation • rigorous governance process • feedback loop connects users to editors 10
  • 11. How to do it right • ontologies should mimic the methodology used by the GO (following the principles of the OBO Foundry: http://obofoundry.org) • ontologies in the same field should be developed in coordinated fashion to ensure that there is exactly one ontology for each subdomain • ontologies should be developed incrementally in a way that builds on successful user testing at every stage 11