SlideShare une entreprise Scribd logo
1  sur  34
From Advanced Queries to Algorithms to
Advanced ML: 3 Pharmaceutical Graph Use Cases
Dr. Alexander Jarasch
• 5 partners + assoc. partners


• 450 researchers


• bundles basic research and
clinical trials expertise


• => variety of data


=> unstructured


=> heterogeneous


=> not connected


=> unFAIR
DZD Data and Knowledge Management team
Dr. Alexander Jarasch
Justus Täger
Tim Bleimehl
Angela Dedie
Yaroslav Zdravomyslov
The Challenge


Connecting data (silos) -> get new insights
Easy question -> Difficult to answer
Why graph? -> why not relational
• biomedical data / healthcare data is highly connected


• => variety of data


=> unstructured


=> heterogeneous


=> not connected


=> unFAIR


• easy to model


• extremely flexible / easy adoptable („re-shaping the graph“) vs. static SQL model


• scalable (Billion of nodes+relationships on a single machine


• easy to query (cyclic dependencies)


• GraphDataScience library + graph embeddings
Biological question:


Are human genes from GWAS T2D enzymes acting on metabolites which in turn are
regulated in pig diabetes model?




The actual question (from a data-point-of-view):




Is there a connection between A and R?


Easy scientific question
A
B
C
E
D
F
G
K
Q
R
S
W
Z
U


The actual question (from a data-point-of-view):




Is there a connection between A and R?
Back to the question
Are human genes from GWAS T2D enzymes acting on metabolites which in turn are regulated in pig diabetes model?
Genomics
Human diabetic data
Genes
SNPs
Proteins
Enzymes
Pathways
Metabolites
Metabolomics
Pre diabetic pig
Metabolites
List of SNPs
List of Genes of
(species 1)
List of Proteins of
(species 1)
List of loci
List of Enzymes of
(species 1)
List of Pathways of
(species 1)
List of Metabolites
of (species 1)
List of Metabolites
of (species 2)
graph
Alzheimer‘s
cancer
cardio
vascular
diseases
diabetes
Lung


diseases
infectious


diseases
new hypotheses
Diseases are connected
“Patient“
64kg, 178cm, male
“drug“
Metformin
“Study“
T2D
“statistics“
“Gene“
AAGCTTCACATGG
“Metabolite“
C6H12O6
insulin resistance
cell
inactive
mice
prediabetic pig
microscope


image
complications
DZDconnect - a Neo4j Knowledge Graph
Natural Language Processing


Ontologies


Inferring knowledge
DZDconnect: Concept
DZD in-house data
Natural Language Processing


Inferring knowledge
Knowledge Graph
DZDconnect:


data integration + ML
Gene RNA Protein
CODES CODES
CODES*
• Python


• Py2Neo, GraphIO


• Docker Pipeline for orchestration (open-source by DZD)


• https://healthecco.org/healthecco/how-to-create-a-heterogeneous-neo4j-data-
loading-pipeline-framework-fast/


• Based on integrated data => annotate / enrich


• textmatching + Natural Language Processing


• „shortcuts“ for queries (reduce #hops)


• inferring knowledge
DZDconnect:


data model <-> human readable = easy to query
DZDconnect:


data model <-> human readable = easy to query
Use case 1


Handle mapping identifiers of molecular entities
Knowledge Graph
Query „friends of a friend“ on a gene level


Example: diabetes relevant gene ‚TCF7L2’
match path=(g:Gene{sid:'TCF7L2'})-[:MAPS|SYNONYM*0..2]-(g1:Gene) return path
Use case 2


Find information that is NOW connected
Knowledge Graph
Query for SNPs (mutations) associated to diabetes


Output: relevant protein and its function (ontology terms)
match (tr:Trait)


where tr.name contains ‚diabetes mellitus‘


with tr as disease


match path=(disease)<-[:ASSOCIATED_WITH_TRAIT]-(asso:Association)<-[:SNP_HAS_ASSOCIATION]-(snp:SNP)-
[:SNP_HAS_GENE]-(gene:Gene)-[:MAPS]-(g1:Gene)-[x:CODES]->(transcript:Transcript)-[:CODES]->
(prot:Protein)-[:ASSOCIATION]->(term:Term)—(o:Ontology)


return path
Use case 3


Transform text into knowledge


Annotate and enrich text information
Natural Language
Processing


Ontologies


Knowledge Graph
Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor:
molecular mechanisms and potential therapeutic target. SARS-CoV-2
has been sequenced [3]. A phylogenetic analysis [3,  4] found a bat
origin for the SARS-CoV-2. There is a diversity of possible intermediate
hosts for SARS-CoV-2, including pangolins, but not mice and rats [5].


There are many similarities of SARS-CoV-2 with the original SARS-CoV.
Using computer modeling, Xu et al. [6] found that the spike proteins of
SARS-CoV-2 and SARS-CoV have almost identical 3-D structures in the
receptor-binding domain that maintains van der Waals forces. SARS-CoV
spike protein has a strong binding affinity to human ACE2, based on
biochemical interaction studies and crystal structure analysis [7]. SARS-
CoV-2 and SARS-CoV spike proteins share 76.5% identity in amino acid
sequences
1 of 30m scientific abstracts
NLP: transform text into knowledge


Re-integrate Named Entities into the graph
Angiotensin-converting enzyme 2 GENE_OR_GENOME ( ACE2
GENE_OR_GENOME ) as a SARS-CoV-2 CORONAVIRUS receptor:
molecular mechanisms and potential therapeutic target. SARS-CoV-2
CORONAVIRUS has been sequenced [3 CARDINAL]. A phylogenetic
analysis [3 CARDINAL,  4 CARDINAL] found a bat WILDLIFE origin for
the SARS-CoV-2 CORONAVIRUS. There is a diversity of possible
intermediate hosts for SARS-CoV-2 CORONAVIRUS, including pangolins
WILDLIFE, but not mice EUKARYOTE and rats EUKARYOTE [5
CARDINAL].


There are many similarities of SARS-CoV-2 CORONAVIRUS with the
original SARS-CoV CORONAVIRUS. Using computer modeling, Xu et al.
[6 CARDINAL] found that the spike proteins GENE_OR_GENOME of
SARS-CoV-2 CORONAVIRUS and SARS-CoV CORONAVIRUS have almost
identical 3-D structures in the receptor-binding domain that maintains
van der Waals forces PHYSICAL_SCIENCE. SARS-CoV CORONAVIRUS
spike protein has a strong binding affinity to human ACE2
GENE_OR_GENOME, based on biochemical interaction studies and
crystal structure analysis [7 CARDINAL]. SARS-CoV-2 CORONAVIRUS and
SARS-CoV spike proteins GENE_OR_GENOME share 76.5% identity in
amino acid sequences
Use case 4


Using graph algorithms to infer new insights
Natural Language
Processing


Ontologies


Knowledge Graph
GDS - page rank - find the most relevant gene


finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell
• 140’000 abstracts from


Covid19 related publications


• NER of gene names


• Page Rank identified


‚ACE2‘ as the most relevant


gene
Use case 5


Using node embeddings to sub phenotype diabetic patients
Natural
DZDconnect


connect raw data of diabetic patients with cancer
Clinical data from 404 diabetic patients
DZDconnect


connect lipidomics fingerprint
Lipidomics
Lipidomics experiment with 116 specific lipids
DZDconnect


connect transcriptomics fingerprint
Transcriptomics experiment with 58’345 specific Transcripts (RNAs)
Transform patients


Fast random projections (fastRP)
CALL gds.fastRP.write
(

'patients'
,

{

embeddingDimension: 50
,

writeProperty: 'fastrp-
embedding'
}

)

YIELD nodePropertiesWritten
Lipido
k-nearest neighbour clustering with k=5


representing the 5 diabetes subtypes
patient 01 patient 02
patient 03
Graph

algorithms
patient 04
patient 05
patient 02
p
a
t
i
e
n
t
0
4
patient 03
patient 05
patient 01
subphenotyping of diabetic patients
DZDconnect


connect patient data with knowledge graph
Transcript
Gene
Synonyms
Abstract
PubMed


Article
Keyword


MeSH-term


Ontology term
Covid19 example
Use case 6


Graph + NLP + sentiment analysis + GDS
Remdesivir
Hydroxy-chloroquine
drugs
press texts with sen
ti
ment


(posi
ti
ve/neutral/nega
ti
ve)
Meltwater's sentiment analysis to analyze press releases on clinical trials
Take home message
• Knowledge graph


• as single point of truth


• connect in-house data


• scalability


• infer new insights


• Use cases:


• simple and advanced (Cypher) queries


• Graph Data Science library (page rank, kNN)


• Node embeddings for complex data


• NLP
Thanks to

Contenu connexe

Tendances

'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014Anita de Waard
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderAlexander Pico
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHbiocs
 
Applications of crispr cas system
Applications of crispr cas systemApplications of crispr cas system
Applications of crispr cas systemSarika_12
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and textLars Juhl Jensen
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature miningLars Juhl Jensen
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataGraph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataMaulik Kamdar
 
Literature mining: what is it, and should I care?
Literature mining: what is it, and should I care?Literature mining: what is it, and should I care?
Literature mining: what is it, and should I care?Lars Juhl Jensen
 
Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...
Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...
Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...ijtsrd
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingIncedo
 
NetBioSIG2014-Talk by Gerald Quon
NetBioSIG2014-Talk by Gerald QuonNetBioSIG2014-Talk by Gerald Quon
NetBioSIG2014-Talk by Gerald QuonAlexander Pico
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
NetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterNetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterAlexander Pico
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Enrico Glaab
 
Pydata London January 2017
Pydata London January 2017Pydata London January 2017
Pydata London January 2017Edward Perello
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyMaté Ongenaert
 

Tendances (20)

'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas Kelder
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
 
Applications of crispr cas system
Applications of crispr cas systemApplications of crispr cas system
Applications of crispr cas system
 
iOmics
iOmicsiOmics
iOmics
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 
Naveen Kumar Resume
Naveen Kumar ResumeNaveen Kumar Resume
Naveen Kumar Resume
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataGraph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
 
Literature mining: what is it, and should I care?
Literature mining: what is it, and should I care?Literature mining: what is it, and should I care?
Literature mining: what is it, and should I care?
 
Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...
Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...
Lack of association between CD45 C77G polymorphism and multiple sclerosis in ...
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
NetBioSIG2014-Talk by Gerald Quon
NetBioSIG2014-Talk by Gerald QuonNetBioSIG2014-Talk by Gerald Quon
NetBioSIG2014-Talk by Gerald Quon
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
NetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterNetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan Schuster
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)
 
Pydata London January 2017
Pydata London January 2017Pydata London January 2017
Pydata London January 2017
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 

Similaire à From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes with Knowledge Graphs

Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...Gabe Rudy
 
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...Matthias Samwald
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaquesSHAPE Society
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics PosterMichael Atkins
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsGolden Helix Inc
 
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019dkNET
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...nist-spin
 

Similaire à From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes with Knowledge Graphs (20)

Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
 
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
Medinfo2013 - An RDF/OWL Knowledge Base for Query Answering and Decision Supp...
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Qi liu 08.08.2014
Qi liu 08.08.2014Qi liu 08.08.2014
Qi liu 08.08.2014
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
Micro array study for gene expression in vp
Micro array study for gene expression in vpMicro array study for gene expression in vp
Micro array study for gene expression in vp
 
Bioinformatics.pptx
Bioinformatics.pptxBioinformatics.pptx
Bioinformatics.pptx
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
 
ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 

Plus de Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

Plus de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes with Knowledge Graphs

  • 1. From Advanced Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases Dr. Alexander Jarasch
  • 2. • 5 partners + assoc. partners • 450 researchers • bundles basic research and clinical trials expertise • => variety of data 
 => unstructured 
 => heterogeneous 
 => not connected 
 => unFAIR
  • 3. DZD Data and Knowledge Management team Dr. Alexander Jarasch Justus Täger Tim Bleimehl Angela Dedie Yaroslav Zdravomyslov
  • 4. The Challenge Connecting data (silos) -> get new insights Easy question -> Difficult to answer
  • 5. Why graph? -> why not relational • biomedical data / healthcare data is highly connected • => variety of data 
 => unstructured 
 => heterogeneous 
 => not connected 
 => unFAIR • easy to model • extremely flexible / easy adoptable („re-shaping the graph“) vs. static SQL model • scalable (Billion of nodes+relationships on a single machine • easy to query (cyclic dependencies) • GraphDataScience library + graph embeddings
  • 6. Biological question: Are human genes from GWAS T2D enzymes acting on metabolites which in turn are regulated in pig diabetes model? 
 The actual question (from a data-point-of-view): 
 
 Is there a connection between A and R? Easy scientific question
  • 7. A B C E D F G K Q R S W Z U 
 The actual question (from a data-point-of-view): 
 
 Is there a connection between A and R?
  • 8. Back to the question Are human genes from GWAS T2D enzymes acting on metabolites which in turn are regulated in pig diabetes model? Genomics Human diabetic data Genes SNPs Proteins Enzymes Pathways Metabolites Metabolomics Pre diabetic pig Metabolites List of SNPs List of Genes of (species 1) List of Proteins of (species 1) List of loci List of Enzymes of (species 1) List of Pathways of (species 1) List of Metabolites of (species 1) List of Metabolites of (species 2) graph
  • 10. “Patient“ 64kg, 178cm, male “drug“ Metformin “Study“ T2D “statistics“ “Gene“ AAGCTTCACATGG “Metabolite“ C6H12O6 insulin resistance cell inactive mice prediabetic pig microscope 
 image complications DZDconnect - a Neo4j Knowledge Graph
  • 11. Natural Language Processing 
 Ontologies Inferring knowledge DZDconnect: Concept DZD in-house data Natural Language Processing Inferring knowledge Knowledge Graph
  • 12. DZDconnect: data integration + ML Gene RNA Protein CODES CODES CODES* • Python • Py2Neo, GraphIO • Docker Pipeline for orchestration (open-source by DZD) • https://healthecco.org/healthecco/how-to-create-a-heterogeneous-neo4j-data- loading-pipeline-framework-fast/ • Based on integrated data => annotate / enrich • textmatching + Natural Language Processing • „shortcuts“ for queries (reduce #hops) • inferring knowledge
  • 13. DZDconnect: data model <-> human readable = easy to query
  • 14. DZDconnect: data model <-> human readable = easy to query
  • 15. Use case 1 Handle mapping identifiers of molecular entities Knowledge Graph
  • 16. Query „friends of a friend“ on a gene level 
 Example: diabetes relevant gene ‚TCF7L2’ match path=(g:Gene{sid:'TCF7L2'})-[:MAPS|SYNONYM*0..2]-(g1:Gene) return path
  • 17. Use case 2 Find information that is NOW connected Knowledge Graph
  • 18. Query for SNPs (mutations) associated to diabetes 
 Output: relevant protein and its function (ontology terms) match (tr:Trait) where tr.name contains ‚diabetes mellitus‘ with tr as disease match path=(disease)<-[:ASSOCIATED_WITH_TRAIT]-(asso:Association)<-[:SNP_HAS_ASSOCIATION]-(snp:SNP)- [:SNP_HAS_GENE]-(gene:Gene)-[:MAPS]-(g1:Gene)-[x:CODES]->(transcript:Transcript)-[:CODES]-> (prot:Protein)-[:ASSOCIATION]->(term:Term)—(o:Ontology) return path
  • 19. Use case 3 Transform text into knowledge Annotate and enrich text information Natural Language Processing 
 Ontologies Knowledge Graph
  • 20. Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target. SARS-CoV-2 has been sequenced [3]. A phylogenetic analysis [3,  4] found a bat origin for the SARS-CoV-2. There is a diversity of possible intermediate hosts for SARS-CoV-2, including pangolins, but not mice and rats [5]. There are many similarities of SARS-CoV-2 with the original SARS-CoV. Using computer modeling, Xu et al. [6] found that the spike proteins of SARS-CoV-2 and SARS-CoV have almost identical 3-D structures in the receptor-binding domain that maintains van der Waals forces. SARS-CoV spike protein has a strong binding affinity to human ACE2, based on biochemical interaction studies and crystal structure analysis [7]. SARS- CoV-2 and SARS-CoV spike proteins share 76.5% identity in amino acid sequences 1 of 30m scientific abstracts
  • 21. NLP: transform text into knowledge 
 Re-integrate Named Entities into the graph Angiotensin-converting enzyme 2 GENE_OR_GENOME ( ACE2 GENE_OR_GENOME ) as a SARS-CoV-2 CORONAVIRUS receptor: molecular mechanisms and potential therapeutic target. SARS-CoV-2 CORONAVIRUS has been sequenced [3 CARDINAL]. A phylogenetic analysis [3 CARDINAL,  4 CARDINAL] found a bat WILDLIFE origin for the SARS-CoV-2 CORONAVIRUS. There is a diversity of possible intermediate hosts for SARS-CoV-2 CORONAVIRUS, including pangolins WILDLIFE, but not mice EUKARYOTE and rats EUKARYOTE [5 CARDINAL]. There are many similarities of SARS-CoV-2 CORONAVIRUS with the original SARS-CoV CORONAVIRUS. Using computer modeling, Xu et al. [6 CARDINAL] found that the spike proteins GENE_OR_GENOME of SARS-CoV-2 CORONAVIRUS and SARS-CoV CORONAVIRUS have almost identical 3-D structures in the receptor-binding domain that maintains van der Waals forces PHYSICAL_SCIENCE. SARS-CoV CORONAVIRUS spike protein has a strong binding affinity to human ACE2 GENE_OR_GENOME, based on biochemical interaction studies and crystal structure analysis [7 CARDINAL]. SARS-CoV-2 CORONAVIRUS and SARS-CoV spike proteins GENE_OR_GENOME share 76.5% identity in amino acid sequences
  • 22. Use case 4 Using graph algorithms to infer new insights Natural Language Processing 
 Ontologies Knowledge Graph
  • 23. GDS - page rank - find the most relevant gene 
 finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell • 140’000 abstracts from Covid19 related publications • NER of gene names • Page Rank identified 
 ‚ACE2‘ as the most relevant 
 gene
  • 24. Use case 5 Using node embeddings to sub phenotype diabetic patients Natural
  • 25. DZDconnect connect raw data of diabetic patients with cancer Clinical data from 404 diabetic patients
  • 27. DZDconnect connect transcriptomics fingerprint Transcriptomics experiment with 58’345 specific Transcripts (RNAs)
  • 28. Transform patients Fast random projections (fastRP) CALL gds.fastRP.write ( 'patients' , { embeddingDimension: 50 , writeProperty: 'fastrp- embedding' } ) YIELD nodePropertiesWritten Lipido
  • 29. k-nearest neighbour clustering with k=5 representing the 5 diabetes subtypes patient 01 patient 02 patient 03 Graph
 algorithms patient 04 patient 05 patient 02 p a t i e n t 0 4 patient 03 patient 05 patient 01 subphenotyping of diabetic patients
  • 30. DZDconnect connect patient data with knowledge graph Transcript Gene Synonyms Abstract PubMed 
 Article Keyword 
 MeSH-term Ontology term
  • 31. Covid19 example Use case 6 Graph + NLP + sentiment analysis + GDS
  • 32. Remdesivir Hydroxy-chloroquine drugs press texts with sen ti ment 
 (posi ti ve/neutral/nega ti ve) Meltwater's sentiment analysis to analyze press releases on clinical trials
  • 33. Take home message • Knowledge graph • as single point of truth • connect in-house data • scalability • infer new insights 
 • Use cases: • simple and advanced (Cypher) queries • Graph Data Science library (page rank, kNN) • Node embeddings for complex data • NLP