Provenance-Assisted Roadmap for Life Sciences Linked Open Data

•Télécharger en tant que PPTX, PDF•

0 j'aime•509 vues

Linked Life Sciences Provenance Linked Open Data Query Engine SPARQL Query Query Federation a-posteriori Integration Query Engine

Santé

A Provenance assisted Roadmap for
Life Sciences Linked Open Data Cloud
Ali Hasnain et. al
Insight Center for Data Analytics
National University of Ireland, Galway

Agenda
• Motivation
• Linked Life Sciences Roadmap
• Cataloguing and Linking
• Extending Catalogue – Metadata & Provenance
• Query Engine
• Results

Motivation
• Biomedical Data is heterogeneous and spread across
multiple sources (SPARQL endpoints).
• Navigation is a challenge.
• Containing trillions of triples and represented with
insufficient vocabulary reuse.
• Biologists sometimes want to get more information
regarding the data including its source, creator,
publisher and also statistics with respect to its size
(Metadata & Provenance).
3

How to deal heterogeneous data?
DrugBank
DailyMed
CheBI,
KEGG
Reactome
Sider
BioPax
Medicare

We want to query the content, not the source
Proteins
Molecules
Genes
Diseases

A Linked Life Sciences Roadmap
Proteins
Molecules
Genes
Diseases
:Protein
:Molecule
:Gene
:Disease
Uniprot
PDB
Pfam PROSITE
ProDom
Uniref
UniPark Daily
medDrug
Bank ChemBL
Pub
Chem KEGG
Gene
Ontology
GeneID
Affy
metrix
Homo
gene
MGI
Disea
some
SIDER

2- Possible Solutions
• To assemble queries over multiple graphs at
multiple endpoints, either:
• vocabularies and ontologies are reused, Or
• translation maps between different terminologies are
created (“a posteriori integration”)

Describing DataSets- an Extract from Catalogue

Extending Catalogue – Metadata & Provenance

Query Engine
http://srvgal86.deri.ie:8000/graph/Granatum

SPARQL Endpoints returning results per query

Runtimes taken by different queries
(Max, Min, Average, Median)

Provenance-Assisted Roadmap for Life Sciences Linked Open Data

Contenu connexe

Tendances

Global Burden of Animal Diseases: Disease prioritization themeILRI

A state-of-the-art biorepository: Challenges and opportunitiesILRI

Azizi biorepository: Challenges and opportunitiesILRI

International Journal of Advances in Biology (IJAB) ijabjournal

FAIRness and Accountability BioIT 2019 FAIR trackHelena Deus

Bioschemas at bio hackathon 2017Bioschemas

Idcc kansa-kansa-arbuckleEric Kansa

MEDLINEVISHNUMAYA R S

Andy J Gap analysis and crop wild relatives bellagio sept 2010Decision and Policy Analysis Program

Biocuration 2014 - The Resource Identification Initiativemhaendel

Highly dimensional data_20160926Laura Clarke

The CATE ProjectKehan Harman

A Look into Closed Access Capitalism and LIS Publishing PracticesRobyn Hall

Plantwise presentation MIT RDRoland D.J. Dietz - ???

SciVal Biotechnology PortalAlberto Zigoni

Biomedical data collection for mass gathering research and evaluation: A revi...Jamie Ranse

Beacon Network: A System for Global Genomic Data SharingMiro Cupak

Beacon: A Protocol for Federated Discovery and Sharing of Genomic DataMiro Cupak

Examples of Ontology ApplicationsAIMS (Agricultural Information Management Standards)

Examples of ontology applicationsAIMS (Agricultural Information Management Standards)

Tendances (20)

Global Burden of Animal Diseases: Disease prioritization theme

A state-of-the-art biorepository: Challenges and opportunities

Azizi biorepository: Challenges and opportunities

International Journal of Advances in Biology (IJAB)

FAIRness and Accountability BioIT 2019 FAIR track

Bioschemas at bio hackathon 2017

Idcc kansa-kansa-arbuckle

MEDLINE

Andy J Gap analysis and crop wild relatives bellagio sept 2010

Biocuration 2014 - The Resource Identification Initiative

Highly dimensional data_20160926

The CATE Project

A Look into Closed Access Capitalism and LIS Publishing Practices

Plantwise presentation MIT RD

SciVal Biotechnology Portal

Biomedical data collection for mass gathering research and evaluation: A revi...

Beacon Network: A System for Global Genomic Data Sharing

Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data

Examples of Ontology Applications

Examples of ontology applications

En vedette

Processing Life Science Data at Scale - using Semantic Web TechnologiesSyed Muhammad Ali Hasnain

6 Dimensions of Quality Management MaturityLNSResearch

It strategy for life sciences david royleDavid Royle

Reinventing Life Sciences: How emerging ecosystems fuel innovationIBM in Healthcare

An IBM Perspective: Life Sciences in the CloudIBM in Healthcare

Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim

Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze

Evaluating Named Entity Recognition and Disambiguation in News and TweetsMarieke van Erp

Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer

NLP todoRohit Verma

DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann

DBpedia InsideOutCristina Pattuelli

Federated SPARQL query processing over the Web of DataMuhammad Saleem

Linked Data FragmentsRuben Verborgh

LDQL: A Query Language for the Web of Linked DataOlaf Hartig

Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim

Applying Linked Open Data to Public ProcurementJindřich Mynarz

Exploiting the query structure for efficient join ordering in SPARQL queriesLuiz Henrique Zambom Santana

Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Heiko Paulheim

Unsupervised Extraction of Attributes and Their Values from Product DescriptionRakuten Group, Inc.

En vedette (20)

Processing Life Science Data at Scale - using Semantic Web Technologies

6 Dimensions of Quality Management Maturity

It strategy for life sciences david royle

Reinventing Life Sciences: How emerging ecosystems fuel innovation

An IBM Perspective: Life Sciences in the Cloud

Gathering Alternative Surface Forms for DBpedia Entities

Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...

Evaluating Named Entity Recognition and Disambiguation in News and Tweets

Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

NLP todo

DBpedia: A Public Data Infrastructure for the Web of Data

DBpedia InsideOut

Federated SPARQL query processing over the Web of Data

Linked Data Fragments

LDQL: A Query Language for the Web of Linked Data

Fast Approximate A-box Consistency Checking using Machine Learning

Applying Linked Open Data to Public Procurement

Exploiting the query structure for efficient join ordering in SPARQL queries

Data Mining with Background Knowledge from the Web - Introducing the RapidMin...

Unsupervised Extraction of Attributes and Their Values from Product Description

Similaire à Provenance-Assisted Roadmap for Life Sciences Linked Open Data

Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...Fiona Nielsen

Workshop finding and accessing data - fiona - lunteren april 18 2016Fiona Nielsen

AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...Peter McQuilton

Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Frederik van den Broek

Mobilizing informational resources webinarAnn-Marie Roche

Quantifying the content of biomedical semantic resources as a core for drug d...Syed Muhammad Ali Hasnain

The Missing Link-The Evolving Current State of Linked Data for Serials-LauruhnNASIG

ISB Prosperity Partnership Presentation by John AitchisonInstitute for Systems Biology

Biosample exchanges – the past, the current and the future – how do we make i...Pistoia Alliance

NCBO haendel talk 2013mhaendel

6-005-1430-Keeppanasserilmed20su

JALANov2000Ellie Nawara

The Role of Libraries in Data Management and CurationNicole Vasilevsky

The Learning Health System: Thinking and Acting Across ScalesPhilip Payne

Biological data bioinformatics AakifahAmreen

IRDiRC: progress and expectationsCanadian Organization for Rare Disorders

dkNET Poster Experimental Biology 2019dkNET

Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748

Amia tb-review-08Russ Altman

Biomedical Literature Arete-Zoe, LLC

Similaire à Provenance-Assisted Roadmap for Life Sciences Linked Open Data (20)

Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...

Workshop finding and accessing data - fiona - lunteren april 18 2016

AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...

Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...

Mobilizing informational resources webinar

Quantifying the content of biomedical semantic resources as a core for drug d...

The Missing Link-The Evolving Current State of Linked Data for Serials-Lauruhn

ISB Prosperity Partnership Presentation by John Aitchison

Biosample exchanges – the past, the current and the future – how do we make i...

NCBO haendel talk 2013

6-005-1430-Keeppanasseril

JALANov2000

The Role of Libraries in Data Management and Curation

The Learning Health System: Thinking and Acting Across Scales

Biological data bioinformatics

IRDiRC: progress and expectations

dkNET Poster Experimental Biology 2019

Introduction to Biological database ppt(1).pptx

Amia tb-review-08

Biomedical Literature

Plus de Syed Muhammad Ali Hasnain

Fair data vs 5 star open data finalSyed Muhammad Ali Hasnain

SHARP: Harmonizing cross-workflow ProvenanceSyed Muhammad Ali Hasnain

SHARP: Harmonizing Galaxy and Taverna workflow provenanceSyed Muhammad Ali Hasnain

Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Syed Muhammad Ali Hasnain

An Approach for Discovering and Exploring Semantic Relationships between GenesSyed Muhammad Ali Hasnain

Federated Query Formulation and Processing through BioFedSyed Muhammad Ali Hasnain

Improving discovery in Life Sciences Linked Open Data CloudSyed Muhammad Ali Hasnain

Knowledge Processing with Big Data and Semantic Web TechnologiesSyed Muhammad Ali Hasnain

FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionSyed Muhammad Ali Hasnain

Plus de Syed Muhammad Ali Hasnain (9)

Fair data vs 5 star open data final

SHARP: Harmonizing cross-workflow Provenance

SHARP: Harmonizing Galaxy and Taverna workflow provenance

Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...

An Approach for Discovering and Exploring Semantic Relationships between Genes

Federated Query Formulation and Processing through BioFed

Improving discovery in Life Sciences Linked Open Data Cloud

Knowledge Processing with Big Data and Semantic Web Technologies

FedViz: A Visual Interface for SPARQL Queries Formulation and Execution

Dernier

Call Girls Laxmi Nagar 9999965857 Cheap and Best with original Photosparshadkalavatidevi7

Gurgaon Sector 45 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...vrvipin164

Globalny raport: „Prawdziwe piękno 2024" od Doveagatadrynko

Call Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalorenarwatsonia7

Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7

MVP Health Care City of Schenectady PresentationMVP Health Care

Kidney Transplant At Hiranandani HospitalDr. Sujit Chatterjee CEO Hiranandani Hospital

Russian Call Girls South Delhi | 9711199171 | High Profile -New Model -Availa...satishsharma69855

Russian Call Girls Delhi Cantt | 9711199171 | High Profile -New Model -Availa...satishsharma69855

Disaster Management Cycle (DMC)| Ms. Pooja Sharma , Department of Hospital A...Era University , Lucknow

Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...ggsonu500

Russian Call Girls South Delhi 9711199171 discount on your bookingRussian Escorts Delhi | 9711199171 | To Enjoy Every Moments Of Life!

Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...sandeepkumar69420

Low Vision Case (Nisreen mokhanawala).pptxShubham

Field exchange, Issue 72 April 2024 FEX-72.pdfMohamed Miyir

Call Girl Service ITPL - [ Cash on Delivery ] Contact 7001305949 Escorts Servicenarwatsonia7

Call Girls Hsr Layout Whatsapp 7001305949 Independent Escort Servicenarwatsonia7

Russian Escorts Delhi | 9711199171 | all area service availablesandeepkumar69420

Single Assessment Framework - What We Know So FarCareLineLive

2024 HCAT Healthcare Technology InsightsHealth Catalyst

Dernier (20)

Call Girls Laxmi Nagar 9999965857 Cheap and Best with original Photos

Gurgaon Sector 45 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...

Globalny raport: „Prawdziwe piękno 2024" od Dove

Call Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalore

Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Booking

MVP Health Care City of Schenectady Presentation

Kidney Transplant At Hiranandani Hospital

Russian Call Girls South Delhi | 9711199171 | High Profile -New Model -Availa...

Russian Call Girls Delhi Cantt | 9711199171 | High Profile -New Model -Availa...

Disaster Management Cycle (DMC)| Ms. Pooja Sharma , Department of Hospital A...

Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...

Russian Call Girls South Delhi 9711199171 discount on your booking

Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...

Low Vision Case (Nisreen mokhanawala).pptx

Field exchange, Issue 72 April 2024 FEX-72.pdf

Call Girl Service ITPL - [ Cash on Delivery ] Contact 7001305949 Escorts Service

Call Girls Hsr Layout Whatsapp 7001305949 Independent Escort Service

Russian Escorts Delhi | 9711199171 | all area service available

Single Assessment Framework - What We Know So Far

2024 HCAT Healthcare Technology Insights

Provenance-Assisted Roadmap for Life Sciences Linked Open Data

1. A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland, Galway

2. Agenda • Motivation • Linked Life Sciences Roadmap • Cataloguing and Linking • Extending Catalogue – Metadata & Provenance • Query Engine • Results

3. Motivation • Biomedical Data is heterogeneous and spread across multiple sources (SPARQL endpoints). • Navigation is a challenge. • Containing trillions of triples and represented with insufficient vocabulary reuse. • Biologists sometimes want to get more information regarding the data including its source, creator, publisher and also statistics with respect to its size (Metadata & Provenance). 3

4. How to deal heterogeneous data? DrugBank DailyMed CheBI, KEGG Reactome Sider BioPax Medicare

5. We want to query the content, not the source Proteins Molecules Genes Diseases

6. A Linked Life Sciences Roadmap Proteins Molecules Genes Diseases :Protein :Molecule :Gene :Disease Uniprot PDB Pfam PROSITE ProDom Uniref UniPark Daily medDrug Bank ChemBL Pub Chem KEGG Gene Ontology GeneID Affy metrix Homo gene MGI Disea some SIDER

7. 2- Possible Solutions • To assemble queries over multiple graphs at multiple endpoints, either: • vocabularies and ontologies are reused, Or • translation maps between different terminologies are created (“a posteriori integration”)

8. a-priori v.s a-posteriori Integration 8

9. Cataloguing and Linking 9

10. Describing DataSets- an Extract from Catalogue

11. Extending Catalogue – Metadata & Provenance

12.

13.

14. Query Engine http://srvgal86.deri.ie:8000/graph/Granatum

15. Visual & Graphical View

16. SPARQL Endpoints returning results per query

17. Runtimes taken by different queries (Max, Min, Average, Median)

Notes de l'éditeur

M: part of the challenge lies in the fact that, even though multiple datasets talk about the same concepts, they don’t use the same terminologies. Both the URI are different, and so are the labels. -> In Granatum, we enable drug discovery by addressing this problem in linked open data
M: the way linked data is organizes still forces us to lookup data by its location, not the content! But those who turn to linked data don’t want to query “PDB”, they want to learn more about proteins, or genes, etc -> Our first task is to catalogue the concepts that are relevant in these various datasets. Proving a common access for data is the first pillar on the bridge that crosses the valley of death
M: when data is catalogues, we can discovering new links by crossreferencing with existing datasets -> once we identify these concepts, how do we actualy query them toegether?
Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed

Provenance-Assisted Roadmap for Life Sciences Linked Open Data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Provenance-Assisted Roadmap for Life Sciences Linked Open Data

Similaire à Provenance-Assisted Roadmap for Life Sciences Linked Open Data (20)

Plus de Syed Muhammad Ali Hasnain

Plus de Syed Muhammad Ali Hasnain (9)

Dernier

Dernier (20)

Provenance-Assisted Roadmap for Life Sciences Linked Open Data

Notes de l'éditeur