SlideShare a Scribd company logo
1 of 17
© 2021, Amazon Web Services, Inc. or its Affiliates.
Nolan Nichols
Maze Therapeutics:
genetic insight to
new medicines
© 2020, Amazon Web Services, Inc. or its Affiliates.
why do some people get sick and
others don’t, even when they have
the same disease-causing gene?
© 2020, Amazon Web Services, Inc. or its Affiliates.
3
genetic modifiers are naturally occurring and can be identified,
CRISPRi enables mapping of genetic interactions at scale
in 2016, the Resilience Project published that
they had identified individuals who should have
serious childhood diseases, but didn’t, describing
potential genetic modifiers
Chen et al. Nat Biotechnology 2016
Dr. Jonathan Weissman and team observed
that some gene-gene interactions have a
‘buffering’ or protective effect on disease-
causing mutations
Chen et al. Nat Biotechnology 2016
Horlbeck et al. Cell 2018
© 2020, Amazon Web Services, Inc. or its Affiliates.
based on genetic insights, genetic modifier targets can be developed
into transformative therapies for patients
protective variants can…
be discovered from, or
validated by, functional
genomics data
be targeted to
develop new
therapeutics
be identified from
human genetic data
that naturally protect
some people from
disease
© 2020, Amazon Web Services, Inc. or its Affiliates.
COMPASS guides us along the path from genetic insight to new
medicines
human genetics
mine biobanks across the
world to identify genetic
variation and prioritize
novel targets that impact
human disease
data science
seamlessly integrate diverse
proprietary and external
data sets and incorporates
new computational methods,
including machine learning,
for analyses
functional genomics
define the biological mechanisms
linking genes to disease and
suggest therapeutic strategies for
a broad range of unmet needs
platform addresses key challenges
identify relevant genomic associations
determine the mechanistic basis
drug difficult genetic targets
• tools enable us to establish the basis
for the association between a
particular gene, cellular pathology
and disease state of interest
• ability to drug targets with or without
structural biology information and
direct therapy to the right location
• ability to discover novel gene-disease
relationships that are pharmacologically
relevant at scale
© 2020, Amazon Web Services, Inc. or its Affiliates.
data challenges to our target identification workflow
• Data sources are heterogeneous formats
and block scientists from integrating
datasets
• Datasets can contain hundreds of
thousands of samples that take analysts
weeks to process
• Many “artisanal” analyses become
untrustworthy over time as data drift
• Reports and datasets cannot be found
quickly and are not in an analysis ready
format
• Analysts don’t have a process for sharing
results and interactive visualizations
https://www.anaconda.com/state-of-data-science-2020
which genes are
differentially expressed
in this experiment?
© 2020, Amazon Web Services, Inc. or its Affiliates.
overview of an analyst workflow – providing a computational
environment
7
Maze
Command Line Interface
AWS Batch
Analysis Environment
Provision
Analyst
© 2020, Amazon Web Services, Inc. or its Affiliates.
https://github.com/aws-samples/biotech-blueprint-multi-
account
AWS Biotech Blueprint
A collaboration with AWS Healthcare and Life Sciences and Biotech Industry
• Enabled Maze to go from a concept
deployed architecture in hours
• A multi-account architecture provided
features to support growing AWS
footprint
 Additional accounts improve
security posture
 SSO w/role-based access
 Transit Gateway simplifies
network configuration and
maintenance
© 2020, Amazon Web Services, Inc. or its Affiliates.
https://medium.com/slalom-technology/next-generation-networking-with-aws-transit-gateway-and-shared-vpcs-
9d971d868c65
Single Account with Multiple VPCs Multiple Accounts with Single VPC per Account
Original
Account
Data Science
Account
Informatics
Account
Comp Chem
Account
N…
Account
AWS Transit Gateway simplifies network configuration and maintenance
© 2020, Amazon Web Services, Inc. or its Affiliates.
overview of an analyst workflow – providing analysis ready data
1
0
Open Data
Maze
Command Line Interface
Maze Data
Buckets Athena
Internal
Data
Vendor
Data
Data Sources
Other Shared
Data
Analysis Ready Data
Register
AWS Batch
Analysis Environment
Provision
Analyst
BioBank Analysis
Data API
© 2020, Amazon Web Services, Inc. or its Affiliates.
data sources: data lake as code
https://github.com/aws-samples/data-lake-
as-code
• A framework to enroll data sources as
registered assets in a data catalog
• Optimized data formats (e.g., parquet)
reduce data size and increase
performance
• Once registered, data can be directly
queried through Athena or using BI
tools
• Examples provided for how to enroll
data from the Registry of Open Data on
AWS
© 2020, Amazon Web Services, Inc. or its Affiliates.
analysis ready data: life science data lake as code
73,635,38
0
17,198,17
4
9,354,592
49,005,57
5
GTEx Open Targets BindingDB ChEMBL
• The Registry of Open Data on AWS (RODA)
contains 237 datasets with 73 tagged as “life
science”
• Enabled Maze to import GTEx, Open Targets,
BindingDB, and ChEMBL using Data Lake as Code
in about an hour
• Provides access to 150M records about Biological
and Chemical Entities as well as their properties and
associations (genes, diseases, compounds)
• Questions that took hours or days to answer using
public APIs now take seconds or minutes using
Athena
• Challenges remain for finding the right data elements
when there are over 18k unique columns from 242
tables
107
8
50
77
Table Count
Record Count
https://registry.opendata.aws/
© 2020, Amazon Web Services, Inc. or its Affiliates.
overview of an analyst workflow – sharing results with collaborators
1
3
Open Data
Maze
Command Line Interface
Maze Data
Buckets Athena
Internal
Data
Vendor
Data
Data Sources
Other Shared
Data
Analysis Ready Data
Register
AWS Batch
Publish
Analysis Environment
Provision
Analyst
Data API
BioBank Analysis
Self-Service Analytics
Published
Analyses
Data API
Results Portal
Semantic Data
Catalog
Drug Target
Dashboards
Decision
Support
© 2020, Amazon Web Services, Inc. or its Affiliates.
Publishing analysis results
• ontology terms define result
types and relationships
• provide canonical labels and
definitions
• designed using the protégé
editor and versioned in git
• analysts initialize a
templated project directory
and environment
• a dataset description is
generated using ontology-
driven tooling
• a validated dataset
description is published to
a central data portal
• metadata is added to a
search index
• tabular files accessed via a
data service api
target
constraint
violation
dataset description
• dataset descriptions are
modeled as a data graph
• the shape constraint
language is used to
validate the graph
© 2020, Amazon Web Services, Inc. or its Affiliates.
Summary
• We were able to rapidly deploy our network
and data architecture in a matter of days not
months
• Reference architectures provided a
foundation for building out a solution tailored
to our goals
• Barriers to weaving open data with
proprietary data and analyses were reduced
but still a challenge
• A key gap to fill is ensuring that data have
embedded semantics and links to entities
with associations relevant to drug discovery
© 2020, Amazon Web Services, Inc. or its Affiliates.
launched in 2019 with
$190m+ investment
based in south san francisco
with ~80 employees
founded on concept of
genetic modifiers
investors
translating genetic
modifying insights into
new therapeutics
© 2020, Amazon Web Services, Inc. or its Affiliates.
© 2020, Amazon Web Services, Inc. or its Affiliates.
Q&A
Nolan Nichols

More Related Content

Similar to AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx

eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
ibemam
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
Matt Barnes
 

Similar to AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx (20)

An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
 
Self Service BI for Healthcare
Self Service BI for HealthcareSelf Service BI for Healthcare
Self Service BI for Healthcare
 
Self Service BI for Healthcare
Self Service BI for HealthcareSelf Service BI for Healthcare
Self Service BI for Healthcare
 
How to generate Synthetic Data for an effective App Testing strategy.pdf
How to generate Synthetic Data for an effective App Testing strategy.pdfHow to generate Synthetic Data for an effective App Testing strategy.pdf
How to generate Synthetic Data for an effective App Testing strategy.pdf
 
Oracle Clinical Overview_Katalyst HLS
Oracle Clinical Overview_Katalyst HLSOracle Clinical Overview_Katalyst HLS
Oracle Clinical Overview_Katalyst HLS
 
Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
Focus on the Evidence: a knowledge graph approach to profiling drug targets
Focus on the Evidence: a knowledge graph approach to profiling drug targetsFocus on the Evidence: a knowledge graph approach to profiling drug targets
Focus on the Evidence: a knowledge graph approach to profiling drug targets
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlook
 
Sdl use cases
Sdl use casesSdl use cases
Sdl use cases
 
Donders neuroimage toolkit - open science and good practices
Donders neuroimage toolkit -  open science and good practicesDonders neuroimage toolkit -  open science and good practices
Donders neuroimage toolkit - open science and good practices
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
Kasyanov "Web of Science API Workshop"
Kasyanov "Web of Science API Workshop"Kasyanov "Web of Science API Workshop"
Kasyanov "Web of Science API Workshop"
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
SAS Clinical training program in Hyderabad
SAS Clinical training program in HyderabadSAS Clinical training program in Hyderabad
SAS Clinical training program in Hyderabad
 
What we do
What we doWhat we do
What we do
 

Recently uploaded

Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
mahaiklolahd
 

Recently uploaded (20)

Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 

AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx

  • 1. © 2021, Amazon Web Services, Inc. or its Affiliates. Nolan Nichols Maze Therapeutics: genetic insight to new medicines
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates. why do some people get sick and others don’t, even when they have the same disease-causing gene?
  • 3. © 2020, Amazon Web Services, Inc. or its Affiliates. 3 genetic modifiers are naturally occurring and can be identified, CRISPRi enables mapping of genetic interactions at scale in 2016, the Resilience Project published that they had identified individuals who should have serious childhood diseases, but didn’t, describing potential genetic modifiers Chen et al. Nat Biotechnology 2016 Dr. Jonathan Weissman and team observed that some gene-gene interactions have a ‘buffering’ or protective effect on disease- causing mutations Chen et al. Nat Biotechnology 2016 Horlbeck et al. Cell 2018
  • 4. © 2020, Amazon Web Services, Inc. or its Affiliates. based on genetic insights, genetic modifier targets can be developed into transformative therapies for patients protective variants can… be discovered from, or validated by, functional genomics data be targeted to develop new therapeutics be identified from human genetic data that naturally protect some people from disease
  • 5. © 2020, Amazon Web Services, Inc. or its Affiliates. COMPASS guides us along the path from genetic insight to new medicines human genetics mine biobanks across the world to identify genetic variation and prioritize novel targets that impact human disease data science seamlessly integrate diverse proprietary and external data sets and incorporates new computational methods, including machine learning, for analyses functional genomics define the biological mechanisms linking genes to disease and suggest therapeutic strategies for a broad range of unmet needs platform addresses key challenges identify relevant genomic associations determine the mechanistic basis drug difficult genetic targets • tools enable us to establish the basis for the association between a particular gene, cellular pathology and disease state of interest • ability to drug targets with or without structural biology information and direct therapy to the right location • ability to discover novel gene-disease relationships that are pharmacologically relevant at scale
  • 6. © 2020, Amazon Web Services, Inc. or its Affiliates. data challenges to our target identification workflow • Data sources are heterogeneous formats and block scientists from integrating datasets • Datasets can contain hundreds of thousands of samples that take analysts weeks to process • Many “artisanal” analyses become untrustworthy over time as data drift • Reports and datasets cannot be found quickly and are not in an analysis ready format • Analysts don’t have a process for sharing results and interactive visualizations https://www.anaconda.com/state-of-data-science-2020 which genes are differentially expressed in this experiment?
  • 7. © 2020, Amazon Web Services, Inc. or its Affiliates. overview of an analyst workflow – providing a computational environment 7 Maze Command Line Interface AWS Batch Analysis Environment Provision Analyst
  • 8. © 2020, Amazon Web Services, Inc. or its Affiliates. https://github.com/aws-samples/biotech-blueprint-multi- account AWS Biotech Blueprint A collaboration with AWS Healthcare and Life Sciences and Biotech Industry • Enabled Maze to go from a concept deployed architecture in hours • A multi-account architecture provided features to support growing AWS footprint  Additional accounts improve security posture  SSO w/role-based access  Transit Gateway simplifies network configuration and maintenance
  • 9. © 2020, Amazon Web Services, Inc. or its Affiliates. https://medium.com/slalom-technology/next-generation-networking-with-aws-transit-gateway-and-shared-vpcs- 9d971d868c65 Single Account with Multiple VPCs Multiple Accounts with Single VPC per Account Original Account Data Science Account Informatics Account Comp Chem Account N… Account AWS Transit Gateway simplifies network configuration and maintenance
  • 10. © 2020, Amazon Web Services, Inc. or its Affiliates. overview of an analyst workflow – providing analysis ready data 1 0 Open Data Maze Command Line Interface Maze Data Buckets Athena Internal Data Vendor Data Data Sources Other Shared Data Analysis Ready Data Register AWS Batch Analysis Environment Provision Analyst BioBank Analysis Data API
  • 11. © 2020, Amazon Web Services, Inc. or its Affiliates. data sources: data lake as code https://github.com/aws-samples/data-lake- as-code • A framework to enroll data sources as registered assets in a data catalog • Optimized data formats (e.g., parquet) reduce data size and increase performance • Once registered, data can be directly queried through Athena or using BI tools • Examples provided for how to enroll data from the Registry of Open Data on AWS
  • 12. © 2020, Amazon Web Services, Inc. or its Affiliates. analysis ready data: life science data lake as code 73,635,38 0 17,198,17 4 9,354,592 49,005,57 5 GTEx Open Targets BindingDB ChEMBL • The Registry of Open Data on AWS (RODA) contains 237 datasets with 73 tagged as “life science” • Enabled Maze to import GTEx, Open Targets, BindingDB, and ChEMBL using Data Lake as Code in about an hour • Provides access to 150M records about Biological and Chemical Entities as well as their properties and associations (genes, diseases, compounds) • Questions that took hours or days to answer using public APIs now take seconds or minutes using Athena • Challenges remain for finding the right data elements when there are over 18k unique columns from 242 tables 107 8 50 77 Table Count Record Count https://registry.opendata.aws/
  • 13. © 2020, Amazon Web Services, Inc. or its Affiliates. overview of an analyst workflow – sharing results with collaborators 1 3 Open Data Maze Command Line Interface Maze Data Buckets Athena Internal Data Vendor Data Data Sources Other Shared Data Analysis Ready Data Register AWS Batch Publish Analysis Environment Provision Analyst Data API BioBank Analysis Self-Service Analytics Published Analyses Data API Results Portal Semantic Data Catalog Drug Target Dashboards Decision Support
  • 14. © 2020, Amazon Web Services, Inc. or its Affiliates. Publishing analysis results • ontology terms define result types and relationships • provide canonical labels and definitions • designed using the protégé editor and versioned in git • analysts initialize a templated project directory and environment • a dataset description is generated using ontology- driven tooling • a validated dataset description is published to a central data portal • metadata is added to a search index • tabular files accessed via a data service api target constraint violation dataset description • dataset descriptions are modeled as a data graph • the shape constraint language is used to validate the graph
  • 15. © 2020, Amazon Web Services, Inc. or its Affiliates. Summary • We were able to rapidly deploy our network and data architecture in a matter of days not months • Reference architectures provided a foundation for building out a solution tailored to our goals • Barriers to weaving open data with proprietary data and analyses were reduced but still a challenge • A key gap to fill is ensuring that data have embedded semantics and links to entities with associations relevant to drug discovery
  • 16. © 2020, Amazon Web Services, Inc. or its Affiliates. launched in 2019 with $190m+ investment based in south san francisco with ~80 employees founded on concept of genetic modifiers investors translating genetic modifying insights into new therapeutics
  • 17. © 2020, Amazon Web Services, Inc. or its Affiliates. © 2020, Amazon Web Services, Inc. or its Affiliates. Q&A Nolan Nichols