Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Literature Services Resource Description Framework
Jee-Hyub Kim
Literature Services, EMBL-EBI
21 May 2015
1 / 15
Contents
1 Europe PMC and Linking Literature
2 Publishing Text-Mined Data on RDF
3 Text-Mining RDF Service
4 Discussion
2 ...
Europe PMC
• Europe PMC is a literature database [1].
• Abstracts: 30 million PubMed, Agricola and patent records, updated...
Linking Literature
• Europe PMC provides various types of linking literature.
• External Links: to any (e.g., database, Wi...
Europe PMC Text-Mining Pipeline
• A pipeline of dictionary- and machine learning-based named entity
taggers [3].
• 6 seman...
Publishing Text-Mined Data
• Beyond BioEntities Tab
• Goals
• More connectivity
• More contexts for each linking
• Links t...
Web Annotation Data Model
• Built on the top on RDF
• Annotations as resources
• To provide a standard description mechani...
Core Annotation Framework
• Typically an Annotation has a single Body, which is the comment or
other descriptive resource,...
One Scenario: Text Comment On Web Page
• A textual comment on a selection of text within a web page
• How to select a text...
Text Quote Selector
10 / 15
A Model for Annotation
11 / 15
Service Description
• Running on EBI RDF Platform
• Stores 1,563,241,810 triples text-mined from 400,746 Open Access
artic...
Use Case for Database Curation
• Given an database identier, provides sentence-level information for
database curation.
1 ...
Discussion
• Can we deal with a large number of triples from 3 million full text
articles?
• A better URI scheme: e.g.,
ht...
References
The Europe PMC Consortium.
Europe pmc: a full-text literature database for the life sciences and
platform for i...
Prochain SlideShare
Chargement dans…5
×

Literature Services Resource Description Framework

Presented as part of EBI industrial workshop

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Literature Services Resource Description Framework

  1. 1. Literature Services Resource Description Framework Jee-Hyub Kim Literature Services, EMBL-EBI 21 May 2015 1 / 15
  2. 2. Contents 1 Europe PMC and Linking Literature 2 Publishing Text-Mined Data on RDF 3 Text-Mining RDF Service 4 Discussion 2 / 15
  3. 3. Europe PMC • Europe PMC is a literature database [1]. • Abstracts: 30 million PubMed, Agricola and patent records, updated daily • Full text articles: over 3 million full text articles, of which over 900,000 are free to read and reuse, updated daily • Powerful and easy search • Search all article content through one simple search interface, supported by deep search options for advanced users. 3 / 15
  4. 4. Linking Literature • Europe PMC provides various types of linking literature. • External Links: to any (e.g., database, Wikipedia, press release, etc.) • Citations: to literature • BioEntities (produced by Europe PMC text-mining pipeline) • Biological entities: to concept • Accession numbers: to data • Example: http://europepmc.org/abstract/MED/21926972 4 / 15
  5. 5. Europe PMC Text-Mining Pipeline • A pipeline of dictionary- and machine learning-based named entity taggers [3]. • 6 semantic types • Genes/proteins • Chemicals • Organisms • GO terms • Disease terms • EFO terms • 20 accession numbers [2]: • ENA, RefSNP, PDB, UniProt, OMIM, PFam, ArrayExpress, RefSeq, Data DOI, Ensembl, InterPro • NCT, Bioproject, Biosample, Eudract, EMDB, PXD, GO, EGA, TreeFam • Programmatic access available. 5 / 15
  6. 6. Publishing Text-Mined Data • Beyond BioEntities Tab • Goals • More connectivity • More contexts for each linking • Links to share • Challenge: dealing with nearly a billion annotations generated automatically in a large scale • Using Web Annotation Data Model. 6 / 15
  7. 7. Web Annotation Data Model • Built on the top on RDF • Annotations as resources • To provide a standard description mechanism for sharing annotations between systems • For more general purpose use • Not only for text mining • For example, YouTube video comments (by people), image annotation, etc. • W3C Working Draft: http://www.w3.org/TR/2014/WD-annotation-model-20141211/ 7 / 15
  8. 8. Core Annotation Framework • Typically an Annotation has a single Body, which is the comment or other descriptive resource, and a single Target that the Body is somehow "about". • The Body provides the information which is annotating the Target. • This "aboutness" may be further claried or extended to notions such as classifying or identifying. 8 / 15
  9. 9. One Scenario: Text Comment On Web Page • A textual comment on a selection of text within a web page • How to select a text fragment? • Text Position Selector: oa:start, oa:end • Text Quote Selector: oa:exact, oa:prex, oa:postx 9 / 15
  10. 10. Text Quote Selector 10 / 15
  11. 11. A Model for Annotation 11 / 15
  12. 12. Service Description • Running on EBI RDF Platform • Stores 1,563,241,810 triples text-mined from 400,746 Open Access articles in Europe PubMed Central. • Provides • for each article, all the annotations linking to ontologies/databases • with contexts: • sentences • section information 12 / 15
  13. 13. Use Case for Database Curation • Given an database identier, provides sentence-level information for database curation. 1 Show all the articles where a PDB accession number 3NSS is mentioned. 2 Show all the annotations with each its label in PMC3382907. 3 Show all the articles where inammatory bowel disease (C0021390) is mentioned. • http://wwwdev.ebi.ac.uk/rdf/services/textmining/sparql 13 / 15
  14. 14. Discussion • Can we deal with a large number of triples from 3 million full text articles? • A better URI scheme: e.g., http://europepmc.org/articles/PMC4298172/methods/genes/TEM- 1/23 • Interoperability with other formats used in text-mining community • e.g., BioC, UIMA • Questions? 14 / 15
  15. 15. References The Europe PMC Consortium. Europe pmc: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Research, 2014. Senay Kafkas, Jee-Hyub Kim, and Johanna R. McEntyre. Database citation in full text biomedical articles. PLoS ONE, 8(5):e63184, 05 2013. Dietrich Rebholz-Schuhmann, Miguel Arregui, Sylvain Gaudan, Harald Kirsch, and Antonio J. Yepes. Text processing through web services: Calling whatizit. Bioinformatics, pages btm557+, November 2007. 15 / 15

×