Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   1
4th International...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   2
Microbial	
  dive...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   3
	
  
From	
  free...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   4
Copyright Inra
Al...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   5
A	
  classificati...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   6
OntoBiotope	
  on...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   7
Habitats	
  in	
 ...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   8
	
  
51	
  classe...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   9
Information	
  ex...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   10
Information	
  e...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   11
	
  
Bibliograph...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   12
	
  
Extract	
  ...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   13
OntoBiotope	
  p...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   14
From	
  research...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   15
	
  
	
  
	
  
	...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   16
On-­‐going	
  pr...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   17
Conclusion	
  	
...
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   18
Acknowledgements...
Prochain SlideShare
Chargement dans…5
×

Text-mining and ontologies - new approaches to knowledge discovery of microbial diversity

32 vues

Publié le

Presentation of text-mining technology and data for microbial diversity research at Microbial Diversity conference in Bari, 2017.

Publié dans : Sciences
  • Soyez le premier à commenter

Text-mining and ontologies - new approaches to knowledge discovery of microbial diversity

  1. 1.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         1 4th International Microbial Diversity Conference, Bari - Nov. 2017 Text-mining and ontologies new approaches to knowledge discovery of microbial diversity Claire Nédellec, Bibliome MaIAGE
  2. 2.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         2 Microbial  diversity,  information  sources   Where  do  micro-­‐organisms  live?     A  critical  information  that  is  collected  and  stored  in  many  public  databases   Huge  amount  of  isolation  site  information  on  micro-­‐organisms   • Data  sources:  organism  collections,  sequence  databases,  ...         • Documents:  scientific  papers,  reports   7  millions  PubMed  references  on  micro-­‐organism  habitats  [Deléger  et  al,  2016]     Often  available  for  automatic  pipelines     on-­‐line  access,  programming  interface   But  under  exploited  because  expressed  in  unstructured  free  text   Number  of  articles   about  "bacteria"  in   PubMed   24,150  "isolated  from"  entries  in  BacDive  (DSMZ)   18,000  "isolation"  entries  in  ATCC     25,000  "isolation  site"  for  bacteria  &  archae  in  Genome  On  Line  Database     Number  of  complete   genome  sequences   at  JGI  
  3. 3.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         3   From  free  text  to  knowledge     Isolation  site,  always  in  free  text                       Unified  representation  of  habitat  descriptions     a  major  challenge  for  data  access  and  curation     ⇒  Facilitate  Information  access  by  reference  keywords   ⇒  Enable  Interoperability  among  databases   ⇒  Enhance  databases  by  scientific  published  knowledge   GenBank  example   Species TaxID Isolation site Acetobacter lovaniensis 104100 fermented dairy products Acetobacter lovaniensis 104100 fermented rice flour Acetobacter lovaniensis 104100 vinegar Acetobacter lovaniensis 104100 water kefir fermented food Needs   1.  A  classification  of  Habitats  relevant  to  microorganism  studies   2.  Information  extraction  method  for  mapping  free  text  entities  to  the  classes     OntoBiotope Ontology Alvis text-mining Suite
  4. 4.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         4 Copyright Inra Alvis pipeline - Florilège database   Mapping  various  terms  to  an  habitat  classification         PubMed DOCUMENT TAXON HABITAT HABITAT TERM PMID: 21549046, 21247298, 16204502, 15992268, 2116711, 2116712, 15992260, 1348242, 11530195, 23042180, 23208291, 10458115, 11456331, 21669068, 17954748, 8867607, 23433372, 26325149, 8977904, 23880504, 8227616, 16156701, 15553633, 20494189, 24715203, 21441322, 19114514, 2125110, 19254151, 22980010 Listeria monocytogenes , dairy farm Dairy farm, dairy farm environments, dairy farms, dairy farm environmental samples, environment of dairy farms, potential dairy farm, Dairy farm environmental samples, single dairy farm, Irish dairy farms, high-prevalence dairy farm, dairy farm environment, dairy farms of different size, local dairy farm, second Northwest dairy farm, dairy cattle farms, selected dairy farms, dairy farm, Dairy farms     Term  variation   10,000  habitats  of  Listeria  monocytogenes  in  PubMed   Reference class
  5. 5.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         5 A  classification  with  a  hierarchical  structure                     Higher  habitat  classes  needed   for  ecology  &  evolution  studies   10,000  habitats  of  Listeria  monocytogenes  in  PubMed   Alvis IR semantic search engine Scientific paper extracts Habitat classes Listeria  monocytogenes  contamination  in  Chinese  beef  processing  plants. Listeria  monocytogenes  isolated  from  artisanal  Portuguese  cheses-­‐making    dairy. the  presence  of L.  monocytogenes  in  samples  collected  from  crab  processing  plant   Portuguese  cheses-­‐making    dairy. L.  monocytogenes  persisting  in  a    cold-­‐smoked  fish  processing  plant. two L.  monocytogenes    cheese  dairy  isolates
  6. 6.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         6 OntoBiotope  ontology   A  large  ontology  dedicated  to  microorganism  biotopes       What  structure  for  the  habitat  classification   Microbiology  research  domains   Reuse  of  existing  habitat  classifications  (ATCC,  GOLD,  FedEx2)   Gather  habitats  with  similar  physico-­‐chemical  properties     Ontology  scope   Extensive  study  of  habitat  terminology  in  text  (databases  and  papers)   paper mill sludge /  anaerobic sludge of paper mill waste water   Collaborations  with  microbiologists  in  focused  projects  (phytobiome,  food  microbiome)     Evaluation   Text-­‐mining  benchmarks:  Bacteria  Biotope  in  BioNLP  Shared  Tasks   Through  its  use  in  applications  (e.g.  food  positive  flora)       2329  habitat  classes   492  synonyms   13  levels    
  7. 7.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         7 Habitats  in  OntoBiotope  ontology   Distributed  since  2012,    http://agroportal.lirmm.fr/ontologies/ONTOBIOTOPE   14   19   21   43   55   120   281   352   369  480   801   experimental  medium   aquaculture  habitat   bacteria  associated  habitat   medical  environment   agricultural  habitat   habitat  wrt  chemico-­‐physical  property   artiBicial  environment   living  organism   natural  environment  habitat   part  of  living  organism   food   49  classes  in  the   gastrointestinal  tract   subtree     35  classes  in  the   waste  subtree   the  largest  classes   51  classes  in  the  soil   subtree  
  8. 8.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         8   51  classes  in  the  soil   subtree   Contribution  welcome
  9. 9.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         9 Information  extraction  from  text  and  mapping  to  the  habitat  classes         Ontology     lives_in   newborn   gut   Article  text   Bifidobacterium  longum  is  found  in  newborn   infant  as  a  normal  component  of  gut  flora   Article  text   Bifidobacterium  longum  is  found  in  newborn   infant  as  a  normal  component  of  gut  flora   Bifidobacterium   longum   subsp.   longum   is   found   in   newborn   infant   as   a   normal   component  of  gut  flora.   Information   Bacteria:   Bifidobacterium  longum   hosted  by:  newborn  infant  [baby]   lives_in:   gut  [intestine]     Information   Bacteria:   Bifidobacterium  longum   hosted  by:  newborn  infant  [baby]   lives_in:   gut  [intestine]     Bacteria   Bifidobacterium  longum       subsp.  longum     [taxid:  1679]   hosted  by   newborn  infant   [baby]   lives_in   gut     [intestine]     Ontology   simplified  view     Information   Extraction   Text  of  articles   Formal  representation  of  the  information    
  10. 10.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         10 Information  extraction  and  classification  -­‐  Process         ...  virulence  of  aquatic  pathogen  Vibrio  anguillarum  towards  sea  bass  larvae  ...             Artificial  Intelligence  methods  (machine  learning  and  natural  language  processing)     Implemented  in  several  components  (>  1  hundred)  of  Alvis  text-­‐mining  pipeline.     1.  Entity  recognition  =  identification  (text  boundaries)  and  broad  type  assignment     2.  Entity  classification  =  assignment  of  an  OntoBiotope  class   3.  Relationship  prediction  =  links  microorganism  mentions  to  their  habitats  in  the  text     Microbial  species   HabitatHabitat     aquatic  environment     marine  farm  fish   Dicentrarchus labrax   larvae   Lives  in   TaxID5560   Ratkovic  et  al.,  BMC  Bioinformatics,  2012   Nédellec  et  al.,  Handbook  on  Ontology,  2009  
  11. 11.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         11   Bibliographic  sources     Semantic  ressources   ontologies   Information   extraction   Full-­‐text   data   and  metadata   Services   http://bibliome.jouy.inra.fr/demo/ontobiotope/alvisir2/webapi/search     Ba  &  Bossy,  LREC  2016  
  12. 12.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         12   Extract  of   OntoBiotope,   milk  product   subtree  
  13. 13.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         13 OntoBiotope  pipeline,  applied  to  PubMed     BioNLP-­‐ST   Entity  detection   Detection  and   classification   Relation  (lives  in)   Recall   65%   50%   70   Precision   81%   62%   51,4     PubMed   Documents   2,3  millions   Habitats   18,5  millions   Taxa   8,4  millions   Relations   7,2  millions     Text  source   Data  of  the  international  competition  on  bacteria  information   extraction   Nédellec  et  al.,  BMC  Bioinformatics,  2015   Ratkovic  et  al.,  BMC  Bioinformatics,  2012  
  14. 14.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         14 From  research  lab  to  infrastructure,     an  European  Open  Science  perspective                     Deployment  on   OpenMinTed,   European   text-­‐mining  infrastructure     offers   to   the   scientific   communities     A   fully   open   access   in   a   unified  framework       Reproducibility  and   flexibility.     Full-­‐text   paper   collection   and   database   aggregation   and  standardisation   Przybyła  et  al.,  Database,  2016  
  15. 15.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         15           Treemap  visualization  for   biodiversity  analytics Semantic  relational  search  through  all   PubMed  references   On-­‐line  services     Data  integration     http://genome.jouy.inra.fr/Florilege/
  16. 16.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         16 On-­‐going  projects,  examples  of  application   Food  positive  flora     (Florilege)  MD  Poster  S2-­‐23.   Characterization  of  biodiversity,  phenotypes,  uses  and  molecules  produced/degraded     Food  innovation  (nutrient  production,  biopreservation)   1  millions  phenotypes.  1,1  million  relationships  taxon  -­‐  phenotype   Tracing  the  origin     (FoodMicrobiome  Transfert)   Cheese  ingredients  and  cheese  processing  bring  unexpected  strains   Text-­‐mining  contributes  to  express  plausible  hypotheses  on  the  source     Likelihood  of  organism  identification  (metagenomics),  consistency  with  previous  results   (Visa  TM  project)   Has  this  microorganism  already  be  identified  in  this  place?   Of  the  same  family?  In  a  similar  place?  In  a  similar  ecosystem?     [INRA  -­‐  CNIEL]   [INRA  Food  WG]   [INRA,  AgroPortal,  Inist]  
  17. 17.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         17 Conclusion         Millions  of  microorganism  habitat  descriptions,  exponentially  increasing.   Invaluable  information  for  fundamental  research  and  applications   Largely  underused  because  mostly  expressed  in  free  text       OntoBiotope  ontology  and  Information  Extraction  from  text   provides  a  formal  representation  of  microorganisms  biotopes     Open  up  new  research  opportunities   • Not  only  for  data  curation  and  indexing  in  information  systems   • Analysis   in   combination   with   experimental   data   for   integrative   and   predictive   biology   A  prime  example  is  metagenomics  &  biodiversity  in  OpenMinTeD        
  18. 18.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         18 Acknowledgements  and  funding   Mouhamadou  Ba,  Baptiste  Bohuon,  Robert  Bossy,  Philippe  Bessières,  Estelle  Chaix,  Louise  Deléger,   Sandra  Dérozier,  Arnaud  Ferré,  Wiktoria  Golik,  Julien  Jourde,  Valentin  Loux,  Frédéric  Papazian,   Jean-­‐  Zorana  Ratkovic,  Dialekti  Valsamou               MEM   Méta-­‐omiques  des   Ecosystèmes   Microbiens  

×