SlideShare a Scribd company logo
1 of 26
Download to read offline
SPONSORED BY THE NATIONAL CANCER INSTITUTE
And then there were 15
standards
Using Neo4j to harmonize data in cancer
research
Todd Pihl, Ph.D.
Mark Jensen, Ph.D.
https://xkcd.com/927
Biological data is naturally a graph
Graph management by subject matter experts
Node
s
Edge
s
Propert
y
Defs
Props referenced here … and defined
here
Entity names are the
keys
Nodes at the
ends,
with direction
Other attributes
specified
Constrain the
data values
to defined
types
Model Description Files
https://github.com/CBIIT/bento-mdf
Bento Framework
Installing a Bento Data Sharing Platform on a Cloud Platform
LOCAL
MACHINE
GITHUB
CLOUD
PLATFORM
Clone files
from
GitHub
Frontend
Backend
Neo4J
-Add test meta data to DB
-Edit UI config files
-View updates in real-time
-Save updated files in bento-frontend
-Push to Git Hub
bento-frontend
bento-backend
bento-data-model
bento-frontend
bento-backend
bento-data-model
Pull updated files
from GitHub
Load data from a
secure S3
bucket
Frontend
Backend
Neo4J
Data Sharing Platform
AWS Environment
Cancer Research Data Commons (CRDC)
Cancer Data Aggregator
Aggregate by patient, sample, study, disease, tissue, etc.
Clinical Proteomics Imaging
Genomics Immuno-
oncology
Animal
Models
Cancer
Biomarkers
Cancer
Research
Data Commons
0100111
0
0100001
1
0100100
1
Data Standards Services
Cancer Data Aggregator (CDA)
• CDA Mission: Provide a single location to query across all CRDC data repositories
• API, Python library
• Currently contains data from Genomics, Proteomics and Imaging Data Commons
• Remaining CRDC data repositories in progress
• Released for CRDC production use on June 28th
• Documentation: https://cda.readthedocs.io/en/latest/
• The Examples page has many Python use cases
• CDA Github: https://github.com/CancerDataAggregator
• Swagger: https://cda.datacommons.cancer.gov/api/swagger-ui.html
• For the first time, CDA allows us to easily look across CRDC at how data are presented to
users.
Houston, we have a problem
Example: Species
Are these fields really the same?
12
Models are for data, not vice versa.
13
Models are for data, not vice versa.
CRDC is a federation of going concerns
• Each CRDC node has its own data systems, business processes, stakeholders,
and users
• Each has its own purpose-built data model that enables data ingestion, query, and distribution.
• Each has large, ongoing inflows and outflows of data today.
• So – A top-down, prescriptive approach to standardization is not feasible.
(Believe us; we know.)
• Standardization emphasizing carrots instead of sticks:
• Access to the CDA is a benefit for any node wanting to extend the reach of its data.
• Approach data standardization as a practical mapping goal: “If you can place your model in the
context of the CDA’s data maps, the CDA can query and serve your data”
• Approach standardization as an iterative process: “Start with a high priority set of metadata, and
expand mapping over time.”
Graphs as a common language for expressing data models
Property Graph Relational Data OWL/RDF
Node Table rows Class
Property Table columns/cells Datatype Property
Relationship Foreign keys/Linking tables Object Property
Representing custom data models as graphs can provide:
• a unified context for managing data and semantics, and
• a framework for integrating data with minimal impact on repository operations.
Creating graph versions of many kinds of data models is possible, since many
popular modeling approaches find natural expression in the Property Graph:
Model Description Format (MDF) - simple, iterative model
recording and schematizing
MDF is a compact, human-readable—and computable—format for defining a
property graph:
• Define Nodes
• Node Properties
• Define Relationships
• Relationship Properties
• Relationship Attributes
• Define and Describe Properties
• Property Attributes, including
• Allowable value types or sets
https://github.com/CBIIT/bento-md
f
In the Bento framework:
• Data SMEs directly update MDF (in GitHub) to make model updates
• Backend data loader and frontend user interfaces are configured directly by MDF
MDF is simple and standardized
17
Philip Musk 12:06
And let me tell you, with data needs driving many
of ICDC's requirements as they are, and have
been thus far, being able to both write the
requirements, and make the required model
changes ahead of engineers doing their thing, is
really powerful. I don't have to explain what
model changes we need to make to someone else
- I can get the model changes done myself, and
explain what we need the engineers and the UI to
do with those changes.
SMEs
Engineering
• Practical principles towards a practical goal led us to practical tools, enabling
• Rapid prototypes and production tier commons
• Integrated Canine Data Commons
• Clinical Trial Data Commons
• Rapid prototypes for data modeling and model visualization
• Cancer Data Service
• Children’s Cancer Data Initiative
• New practical problem: management of multiple dynamic data models over
independent projects
• Creating new models: component reuse?
• Managing acceptable value sets for many Properties in models
• Understanding interrelationships between models for mapping and interoperability
Metamodel Database – the models as data
18
Both data and model as property graphs
Data
Model
("Schema")
Label:
Person
Label:
Person
Label:
Group
Metamodel Schema
20
Defines:
• Models
• Nodes, Relationships, Properties
• Origins, Terms, and Value Sets
• Concepts and Predicates
Schema is represented in MDF
https://github.com/CBIIT/bento-meta/blob/master/metamodel.yaml
Two models in an Metamodel DB (MDB)
21
ICDC CTDC
• In the simple context of Properties, Nodes, and Relationships, we have a
functional repository for multiple graph models
• Python packages move MDF into an MDB, create MDF from models in an MDB
• Docker containers easily run a local MDB, or can provide an instantiated, loaded MDB
• Based directly on Neo4j Community server
images
• Simple Terminology Server (STS) with MDB
as backend
• Enables both GUI and API access to the
models
• Model browsing and fulltext search across
all entities
• STS is also intended to be easy to
distribute and set up
MDB as a model repository and reference
The MDB schema also defines entities for relating models to one another
and to external authorities:
• Concepts & Predicates (“semantics”)
• Origins, Terms, & Value Sets (“terminology”)
Patterns for connecting these to model entities
create separable “layers” that can be added
or modified without disrupting the repository
function.
MDB as a cross-model tool
23
24
• Dynamic
• Like data and data models
• Pragmatic
• Not a repository of ultimate truth
• Tool to help us provide value to NCI today
• Friendly
• Communicates to humans and computers
• Simple, but well-defined
• Not necessarily exhaustive or “complete”
• Distributable
• Not necessarily “central”
• A platform for “mutual understanding” of data
MDB Philosophy: keys to its utility
25
https://cbiit.github.io/bento-meta/mdb-principles.html
• Mark Benson, PhD
• Phil Musk, PhD
• Ming Ying, MS
• Anjan Purkayastha, PhD
• Ye Wu, PhD
• Pat Dunn, PhD
• Nelson Moore, MS
• John Otridge, PhD
Acknowledgements
26

More Related Content

Similar to Government GraphSummit: And Then There Were 15 Standards

Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Mobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideMobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideRob Worthington
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSNicolas Georgeault
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzurePrecisely
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligenceAhsan Kabir
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationDenodo
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloudredmondpulver
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructureprojectandppt
 

Similar to Government GraphSummit: And Then There Were 15 Standards (20)

Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Mobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideMobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divide
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDS
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligence
 
An Introduction to CCDH
An Introduction to CCDHAn Introduction to CCDH
An Introduction to CCDH
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
 

More from Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Government GraphSummit: And Then There Were 15 Standards

  • 1. SPONSORED BY THE NATIONAL CANCER INSTITUTE And then there were 15 standards Using Neo4j to harmonize data in cancer research Todd Pihl, Ph.D. Mark Jensen, Ph.D.
  • 3. Biological data is naturally a graph
  • 4. Graph management by subject matter experts Node s Edge s Propert y Defs Props referenced here … and defined here Entity names are the keys Nodes at the ends, with direction Other attributes specified Constrain the data values to defined types Model Description Files https://github.com/CBIIT/bento-mdf
  • 6. Installing a Bento Data Sharing Platform on a Cloud Platform LOCAL MACHINE GITHUB CLOUD PLATFORM Clone files from GitHub Frontend Backend Neo4J -Add test meta data to DB -Edit UI config files -View updates in real-time -Save updated files in bento-frontend -Push to Git Hub bento-frontend bento-backend bento-data-model bento-frontend bento-backend bento-data-model Pull updated files from GitHub Load data from a secure S3 bucket Frontend Backend Neo4J Data Sharing Platform AWS Environment
  • 7. Cancer Research Data Commons (CRDC) Cancer Data Aggregator Aggregate by patient, sample, study, disease, tissue, etc. Clinical Proteomics Imaging Genomics Immuno- oncology Animal Models Cancer Biomarkers Cancer Research Data Commons 0100111 0 0100001 1 0100100 1 Data Standards Services
  • 8. Cancer Data Aggregator (CDA) • CDA Mission: Provide a single location to query across all CRDC data repositories • API, Python library • Currently contains data from Genomics, Proteomics and Imaging Data Commons • Remaining CRDC data repositories in progress • Released for CRDC production use on June 28th • Documentation: https://cda.readthedocs.io/en/latest/ • The Examples page has many Python use cases • CDA Github: https://github.com/CancerDataAggregator • Swagger: https://cda.datacommons.cancer.gov/api/swagger-ui.html • For the first time, CDA allows us to easily look across CRDC at how data are presented to users.
  • 9. Houston, we have a problem
  • 11. Are these fields really the same?
  • 12. 12 Models are for data, not vice versa.
  • 13. 13 Models are for data, not vice versa.
  • 14. CRDC is a federation of going concerns • Each CRDC node has its own data systems, business processes, stakeholders, and users • Each has its own purpose-built data model that enables data ingestion, query, and distribution. • Each has large, ongoing inflows and outflows of data today. • So – A top-down, prescriptive approach to standardization is not feasible. (Believe us; we know.) • Standardization emphasizing carrots instead of sticks: • Access to the CDA is a benefit for any node wanting to extend the reach of its data. • Approach data standardization as a practical mapping goal: “If you can place your model in the context of the CDA’s data maps, the CDA can query and serve your data” • Approach standardization as an iterative process: “Start with a high priority set of metadata, and expand mapping over time.”
  • 15. Graphs as a common language for expressing data models Property Graph Relational Data OWL/RDF Node Table rows Class Property Table columns/cells Datatype Property Relationship Foreign keys/Linking tables Object Property Representing custom data models as graphs can provide: • a unified context for managing data and semantics, and • a framework for integrating data with minimal impact on repository operations. Creating graph versions of many kinds of data models is possible, since many popular modeling approaches find natural expression in the Property Graph:
  • 16. Model Description Format (MDF) - simple, iterative model recording and schematizing MDF is a compact, human-readable—and computable—format for defining a property graph: • Define Nodes • Node Properties • Define Relationships • Relationship Properties • Relationship Attributes • Define and Describe Properties • Property Attributes, including • Allowable value types or sets https://github.com/CBIIT/bento-md f
  • 17. In the Bento framework: • Data SMEs directly update MDF (in GitHub) to make model updates • Backend data loader and frontend user interfaces are configured directly by MDF MDF is simple and standardized 17 Philip Musk 12:06 And let me tell you, with data needs driving many of ICDC's requirements as they are, and have been thus far, being able to both write the requirements, and make the required model changes ahead of engineers doing their thing, is really powerful. I don't have to explain what model changes we need to make to someone else - I can get the model changes done myself, and explain what we need the engineers and the UI to do with those changes. SMEs Engineering
  • 18. • Practical principles towards a practical goal led us to practical tools, enabling • Rapid prototypes and production tier commons • Integrated Canine Data Commons • Clinical Trial Data Commons • Rapid prototypes for data modeling and model visualization • Cancer Data Service • Children’s Cancer Data Initiative • New practical problem: management of multiple dynamic data models over independent projects • Creating new models: component reuse? • Managing acceptable value sets for many Properties in models • Understanding interrelationships between models for mapping and interoperability Metamodel Database – the models as data 18
  • 19. Both data and model as property graphs Data Model ("Schema") Label: Person Label: Person Label: Group
  • 20. Metamodel Schema 20 Defines: • Models • Nodes, Relationships, Properties • Origins, Terms, and Value Sets • Concepts and Predicates Schema is represented in MDF https://github.com/CBIIT/bento-meta/blob/master/metamodel.yaml
  • 21. Two models in an Metamodel DB (MDB) 21 ICDC CTDC
  • 22. • In the simple context of Properties, Nodes, and Relationships, we have a functional repository for multiple graph models • Python packages move MDF into an MDB, create MDF from models in an MDB • Docker containers easily run a local MDB, or can provide an instantiated, loaded MDB • Based directly on Neo4j Community server images • Simple Terminology Server (STS) with MDB as backend • Enables both GUI and API access to the models • Model browsing and fulltext search across all entities • STS is also intended to be easy to distribute and set up MDB as a model repository and reference
  • 23. The MDB schema also defines entities for relating models to one another and to external authorities: • Concepts & Predicates (“semantics”) • Origins, Terms, & Value Sets (“terminology”) Patterns for connecting these to model entities create separable “layers” that can be added or modified without disrupting the repository function. MDB as a cross-model tool 23
  • 24. 24
  • 25. • Dynamic • Like data and data models • Pragmatic • Not a repository of ultimate truth • Tool to help us provide value to NCI today • Friendly • Communicates to humans and computers • Simple, but well-defined • Not necessarily exhaustive or “complete” • Distributable • Not necessarily “central” • A platform for “mutual understanding” of data MDB Philosophy: keys to its utility 25 https://cbiit.github.io/bento-meta/mdb-principles.html
  • 26. • Mark Benson, PhD • Phil Musk, PhD • Ming Ying, MS • Anjan Purkayastha, PhD • Ye Wu, PhD • Pat Dunn, PhD • Nelson Moore, MS • John Otridge, PhD Acknowledgements 26