More Related Content
Similar to AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx (20)
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
- 1. © 2021, Amazon Web Services, Inc. or its Affiliates.
Nolan Nichols
Maze Therapeutics:
genetic insight to
new medicines
- 2. © 2020, Amazon Web Services, Inc. or its Affiliates.
why do some people get sick and
others don’t, even when they have
the same disease-causing gene?
- 3. © 2020, Amazon Web Services, Inc. or its Affiliates.
3
genetic modifiers are naturally occurring and can be identified,
CRISPRi enables mapping of genetic interactions at scale
in 2016, the Resilience Project published that
they had identified individuals who should have
serious childhood diseases, but didn’t, describing
potential genetic modifiers
Chen et al. Nat Biotechnology 2016
Dr. Jonathan Weissman and team observed
that some gene-gene interactions have a
‘buffering’ or protective effect on disease-
causing mutations
Chen et al. Nat Biotechnology 2016
Horlbeck et al. Cell 2018
- 4. © 2020, Amazon Web Services, Inc. or its Affiliates.
based on genetic insights, genetic modifier targets can be developed
into transformative therapies for patients
protective variants can…
be discovered from, or
validated by, functional
genomics data
be targeted to
develop new
therapeutics
be identified from
human genetic data
that naturally protect
some people from
disease
- 5. © 2020, Amazon Web Services, Inc. or its Affiliates.
COMPASS guides us along the path from genetic insight to new
medicines
human genetics
mine biobanks across the
world to identify genetic
variation and prioritize
novel targets that impact
human disease
data science
seamlessly integrate diverse
proprietary and external
data sets and incorporates
new computational methods,
including machine learning,
for analyses
functional genomics
define the biological mechanisms
linking genes to disease and
suggest therapeutic strategies for
a broad range of unmet needs
platform addresses key challenges
identify relevant genomic associations
determine the mechanistic basis
drug difficult genetic targets
• tools enable us to establish the basis
for the association between a
particular gene, cellular pathology
and disease state of interest
• ability to drug targets with or without
structural biology information and
direct therapy to the right location
• ability to discover novel gene-disease
relationships that are pharmacologically
relevant at scale
- 6. © 2020, Amazon Web Services, Inc. or its Affiliates.
data challenges to our target identification workflow
• Data sources are heterogeneous formats
and block scientists from integrating
datasets
• Datasets can contain hundreds of
thousands of samples that take analysts
weeks to process
• Many “artisanal” analyses become
untrustworthy over time as data drift
• Reports and datasets cannot be found
quickly and are not in an analysis ready
format
• Analysts don’t have a process for sharing
results and interactive visualizations
https://www.anaconda.com/state-of-data-science-2020
which genes are
differentially expressed
in this experiment?
- 7. © 2020, Amazon Web Services, Inc. or its Affiliates.
overview of an analyst workflow – providing a computational
environment
7
Maze
Command Line Interface
AWS Batch
Analysis Environment
Provision
Analyst
- 8. © 2020, Amazon Web Services, Inc. or its Affiliates.
https://github.com/aws-samples/biotech-blueprint-multi-
account
AWS Biotech Blueprint
A collaboration with AWS Healthcare and Life Sciences and Biotech Industry
• Enabled Maze to go from a concept
deployed architecture in hours
• A multi-account architecture provided
features to support growing AWS
footprint
Additional accounts improve
security posture
SSO w/role-based access
Transit Gateway simplifies
network configuration and
maintenance
- 9. © 2020, Amazon Web Services, Inc. or its Affiliates.
https://medium.com/slalom-technology/next-generation-networking-with-aws-transit-gateway-and-shared-vpcs-
9d971d868c65
Single Account with Multiple VPCs Multiple Accounts with Single VPC per Account
Original
Account
Data Science
Account
Informatics
Account
Comp Chem
Account
N…
Account
AWS Transit Gateway simplifies network configuration and maintenance
- 10. © 2020, Amazon Web Services, Inc. or its Affiliates.
overview of an analyst workflow – providing analysis ready data
1
0
Open Data
Maze
Command Line Interface
Maze Data
Buckets Athena
Internal
Data
Vendor
Data
Data Sources
Other Shared
Data
Analysis Ready Data
Register
AWS Batch
Analysis Environment
Provision
Analyst
BioBank Analysis
Data API
- 11. © 2020, Amazon Web Services, Inc. or its Affiliates.
data sources: data lake as code
https://github.com/aws-samples/data-lake-
as-code
• A framework to enroll data sources as
registered assets in a data catalog
• Optimized data formats (e.g., parquet)
reduce data size and increase
performance
• Once registered, data can be directly
queried through Athena or using BI
tools
• Examples provided for how to enroll
data from the Registry of Open Data on
AWS
- 12. © 2020, Amazon Web Services, Inc. or its Affiliates.
analysis ready data: life science data lake as code
73,635,38
0
17,198,17
4
9,354,592
49,005,57
5
GTEx Open Targets BindingDB ChEMBL
• The Registry of Open Data on AWS (RODA)
contains 237 datasets with 73 tagged as “life
science”
• Enabled Maze to import GTEx, Open Targets,
BindingDB, and ChEMBL using Data Lake as Code
in about an hour
• Provides access to 150M records about Biological
and Chemical Entities as well as their properties and
associations (genes, diseases, compounds)
• Questions that took hours or days to answer using
public APIs now take seconds or minutes using
Athena
• Challenges remain for finding the right data elements
when there are over 18k unique columns from 242
tables
107
8
50
77
Table Count
Record Count
https://registry.opendata.aws/
- 13. © 2020, Amazon Web Services, Inc. or its Affiliates.
overview of an analyst workflow – sharing results with collaborators
1
3
Open Data
Maze
Command Line Interface
Maze Data
Buckets Athena
Internal
Data
Vendor
Data
Data Sources
Other Shared
Data
Analysis Ready Data
Register
AWS Batch
Publish
Analysis Environment
Provision
Analyst
Data API
BioBank Analysis
Self-Service Analytics
Published
Analyses
Data API
Results Portal
Semantic Data
Catalog
Drug Target
Dashboards
Decision
Support
- 14. © 2020, Amazon Web Services, Inc. or its Affiliates.
Publishing analysis results
• ontology terms define result
types and relationships
• provide canonical labels and
definitions
• designed using the protégé
editor and versioned in git
• analysts initialize a
templated project directory
and environment
• a dataset description is
generated using ontology-
driven tooling
• a validated dataset
description is published to
a central data portal
• metadata is added to a
search index
• tabular files accessed via a
data service api
target
constraint
violation
dataset description
• dataset descriptions are
modeled as a data graph
• the shape constraint
language is used to
validate the graph
- 15. © 2020, Amazon Web Services, Inc. or its Affiliates.
Summary
• We were able to rapidly deploy our network
and data architecture in a matter of days not
months
• Reference architectures provided a
foundation for building out a solution tailored
to our goals
• Barriers to weaving open data with
proprietary data and analyses were reduced
but still a challenge
• A key gap to fill is ensuring that data have
embedded semantics and links to entities
with associations relevant to drug discovery
- 16. © 2020, Amazon Web Services, Inc. or its Affiliates.
launched in 2019 with
$190m+ investment
based in south san francisco
with ~80 employees
founded on concept of
genetic modifiers
investors
translating genetic
modifying insights into
new therapeutics
- 17. © 2020, Amazon Web Services, Inc. or its Affiliates.
© 2020, Amazon Web Services, Inc. or its Affiliates.
Q&A
Nolan Nichols