SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
Accelerating research with an open source,
declarative framework for deep learning experiments
1. Overview
2. Origin
3. Pain points
4. Framework solution
5. Genes & drugs
6. Big picture
■ Slides: bit.ly/aiqc-bosc
■ github.com/AIQC/aiqc
■ docs.aiqc.io
■ layne@aiqc.io
Agenda Links
Overview
-1-
Gain
Insight
Track
Experiments
Prepare
Data
Train models
Register datasets Rank features
User interface
Zero setup SQLite database
Evaluate models
Preprocess data Decode results
Tune parameters
Stratify samples Make predictions
Unifies
PyData
Ecosystem
With Deep
Learning
Ecosystem
Origin
-2-
Cohort Analysis Experience
GORdb
Case Study: Harvard Med ー Proteomics of Alzheimer’s
“The Tau protein is as a
biomarker of Alzheimer’s
Disease (AD). It acts like a
cast that holds a neuron
together. Its degradation
spreads from the stem of
the brain to other regions.
No one knows why, there
is no diagnosis process
and no drug to stop it.
“We aggregated healthy and
diseased Tau samples from 5
institutes to study AD
progression. Using mass
spectrometry, the sites within
each sample have been
scanned for post-translational
modifications (PTMs).
Which PTMS at which sites
are driving the disease?
Ranks the type and location of the
post-translational modifications
(PTMs) that drive Alzheimer’s.

It’s largely phosphorylation &
ubiquitination sites in the middle of
the peptide. This insight can be used
to design treatments that help prevent
the degradation of the Tau protein.

Feature
Importance
Pho=P02662:115
Pho=Q14195:622
Gly=P37837:277
Ace=P04406:215
Pho=P10636:282
Pho=Q16555:485
Pho=P29966:101
Pho=P10636:231
Gly=P0CG48:63
Pho=P10636:217
Pho=P10636:181
Site & type
of protein
modification
Most
important
modifications
Pain
Points
-3-
Galton’s regression of pea size inheritance
Limitations of Association Studies (GWAS)
Not
multi-modal
📷
Not multi-label
(subtypes, phases)
🐁
Not
longitudinal
⏱
Not unified model
“Many hypotheses”
🍂
No predictive algorithm
(although PRS possible)
🔮
Not designed for
parallelization
🔀
Neural Networks are Flexible
📷 ⏱
🧮
Information
Turing Award-Winning
Architectures +
Automated Differentiation
Information
🔠 🧮
🔢
Versus the latest task-
specific statistical tools
(e.g. nth fine-mapping tool)
Binary
■ Survival
■ Malignancy
Multi-Label
■ Subtyping
■ Progression
Regression
■ Expression
■ Toxicity
Forecast
■ Remission
■ Age of Onset
What is it? How much of it?
Deep Learning Answers Deeper Questions
🔨 Workflows vary based on data and analysis type
❄ Each team member manually patches together their own glue code
🪤 Pitfalls to Prevent with Quality Control (QC)
Data
Leakage
🚰
Model
Overfitting
🐍
Evaluation
Bias
󰳌
Pipeline
Not Reusable
❄
Data
Drift
🌊
Model
Rot
🍄
🎪 Data Juggling Demands Systematic Approach
Encoding multiple
stratified splits &
cross-validation folds
󰤠
Sliding time
series windows
Multiple
array dimensions
(sklearn designed for 2D)
󰤕
Training & evaluating
many models w many
hyperparameters
󰤚
Multiple preprocesses
each with multiple
column filters
󰤝
Pre/post-preprocces
during inference
6 months later
󰤟
Skillset Trifecta
Bioinformatics
Data Science Software Engineering
Solution
-4-
Dataset
Feature
Dataset
Label Feature
Splitset
Encoder
Encoder
Queue
Job Predictor Prediction
Algorithm
Params
Building Blocks for
Machine Learning
Goodbye X_train, y_test
Declarative API
3 Main
API Classes
Example of API Subclasses
Tutorials & Use Cases
Compare
Genes
& Drugs
-5-
Example: Tumor Classification based on Gene Expression Profiles in TCGA
■ Cohort of 800 participants with
expression profiles of 20,532 genes.
■ Predict the type of tumor observed:
BRCA, KIRC, LUAD, or PRAD.
■ Rank the genes.
[notebook, data]
Cross-validation is
just `fold_count=n`
Train
Validation
Test
Dataset.Image
Detecting brain tumors
from MRI scans
[notebook]
Dataset.Sequence
Detecting epileptic seizures
from EEG time series
[notebook]
Other Biomedical Examples
Example: Compound Classification based on High Throughput Screening
■ Screened 60K compounds for 200
structural characteristics.
■ Predict whether the compound is
effective (active vs inactive).
Imbalanced: only 0.6% active.
■ Rank the structural characteristics.
■ Simulate new compounds by
tweaking those characteristics. [notebook, data]
Big
Picture
-6-
Partner with Cloud Platforms to Bring ML to Genomics
+
Process Omics & Design Cohort
Analyze Cohort
Big Pharma is Partnering with Startups Gain AI Capabilities
PRESCIENT
DESIGN
Presents barrier (ML hurdle) for early-stage labs/biotechs
AIQC is the Seed Around which Labs/Biotechs can Develop ML Capabilities
Problem:
Competing
for ML talent
Problem:
Budgeting for ML
talent
Problem:
Bioinformaticians
aren’t ML experts
Problem:
Expensive to build
in-house ML
platform
Long-Term Solution:
As the biotech company
scales, adopt AIQC
platform & depend less on
professional services
Problem:
How to adopt ML to
accelerate research?
Near-Term Solution:
AIQC tool + AIQC services
■ Slides: bit.ly/aiqc-bosc
■ github.com/aiqc/aiqc
■ docs.aiqc.io
■ layne@aiqc.io
Links

Contenu connexe

Similaire à AIQC - ISCB 2022.pdf

[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
ahmad abdelhafeez
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth Israel
Levi Shapiro
 

Similaire à AIQC - ISCB 2022.pdf (20)

SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Aries systems eemug 2021 manuscript eval services panel sci score v2_edits
Aries systems eemug 2021 manuscript eval services panel sci score v2_editsAries systems eemug 2021 manuscript eval services panel sci score v2_edits
Aries systems eemug 2021 manuscript eval services panel sci score v2_edits
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
 
Capabilities
CapabilitiesCapabilities
Capabilities
 
QuahogLife | Solutions and Services
QuahogLife | Solutions and ServicesQuahogLife | Solutions and Services
QuahogLife | Solutions and Services
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
 
Final_Presentation.pptx
Final_Presentation.pptxFinal_Presentation.pptx
Final_Presentation.pptx
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth Israel
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
IRJET- Disease Prediction System
IRJET- Disease Prediction SystemIRJET- Disease Prediction System
IRJET- Disease Prediction System
 

Dernier

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Dernier (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

AIQC - ISCB 2022.pdf

  • 1. Accelerating research with an open source, declarative framework for deep learning experiments
  • 2. 1. Overview 2. Origin 3. Pain points 4. Framework solution 5. Genes & drugs 6. Big picture ■ Slides: bit.ly/aiqc-bosc ■ github.com/AIQC/aiqc ■ docs.aiqc.io ■ layne@aiqc.io Agenda Links
  • 4. Gain Insight Track Experiments Prepare Data Train models Register datasets Rank features User interface Zero setup SQLite database Evaluate models Preprocess data Decode results Tune parameters Stratify samples Make predictions
  • 5.
  • 6.
  • 7.
  • 8.
  • 11.
  • 13. Case Study: Harvard Med ー Proteomics of Alzheimer’s “The Tau protein is as a biomarker of Alzheimer’s Disease (AD). It acts like a cast that holds a neuron together. Its degradation spreads from the stem of the brain to other regions. No one knows why, there is no diagnosis process and no drug to stop it. “We aggregated healthy and diseased Tau samples from 5 institutes to study AD progression. Using mass spectrometry, the sites within each sample have been scanned for post-translational modifications (PTMs). Which PTMS at which sites are driving the disease?
  • 14. Ranks the type and location of the post-translational modifications (PTMs) that drive Alzheimer’s.
 It’s largely phosphorylation & ubiquitination sites in the middle of the peptide. This insight can be used to design treatments that help prevent the degradation of the Tau protein.
 Feature Importance Pho=P02662:115 Pho=Q14195:622 Gly=P37837:277 Ace=P04406:215 Pho=P10636:282 Pho=Q16555:485 Pho=P29966:101 Pho=P10636:231 Gly=P0CG48:63 Pho=P10636:217 Pho=P10636:181 Site & type of protein modification Most important modifications
  • 16. Galton’s regression of pea size inheritance
  • 17. Limitations of Association Studies (GWAS) Not multi-modal 📷 Not multi-label (subtypes, phases) 🐁 Not longitudinal ⏱ Not unified model “Many hypotheses” 🍂 No predictive algorithm (although PRS possible) 🔮 Not designed for parallelization 🔀
  • 18. Neural Networks are Flexible 📷 ⏱ 🧮 Information Turing Award-Winning Architectures + Automated Differentiation Information 🔠 🧮 🔢 Versus the latest task- specific statistical tools (e.g. nth fine-mapping tool)
  • 19. Binary ■ Survival ■ Malignancy Multi-Label ■ Subtyping ■ Progression Regression ■ Expression ■ Toxicity Forecast ■ Remission ■ Age of Onset What is it? How much of it? Deep Learning Answers Deeper Questions
  • 20. 🔨 Workflows vary based on data and analysis type ❄ Each team member manually patches together their own glue code
  • 21. 🪤 Pitfalls to Prevent with Quality Control (QC) Data Leakage 🚰 Model Overfitting 🐍 Evaluation Bias 󰳌 Pipeline Not Reusable ❄ Data Drift 🌊 Model Rot 🍄
  • 22. 🎪 Data Juggling Demands Systematic Approach Encoding multiple stratified splits & cross-validation folds 󰤠 Sliding time series windows Multiple array dimensions (sklearn designed for 2D) 󰤕 Training & evaluating many models w many hyperparameters 󰤚 Multiple preprocesses each with multiple column filters 󰤝 Pre/post-preprocces during inference 6 months later 󰤟
  • 25. Dataset Feature Dataset Label Feature Splitset Encoder Encoder Queue Job Predictor Prediction Algorithm Params Building Blocks for Machine Learning Goodbye X_train, y_test
  • 26.
  • 29. Example of API Subclasses
  • 33. Example: Tumor Classification based on Gene Expression Profiles in TCGA ■ Cohort of 800 participants with expression profiles of 20,532 genes. ■ Predict the type of tumor observed: BRCA, KIRC, LUAD, or PRAD. ■ Rank the genes. [notebook, data]
  • 35.
  • 37. Dataset.Image Detecting brain tumors from MRI scans [notebook] Dataset.Sequence Detecting epileptic seizures from EEG time series [notebook] Other Biomedical Examples
  • 38. Example: Compound Classification based on High Throughput Screening ■ Screened 60K compounds for 200 structural characteristics. ■ Predict whether the compound is effective (active vs inactive). Imbalanced: only 0.6% active. ■ Rank the structural characteristics. ■ Simulate new compounds by tweaking those characteristics. [notebook, data]
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 49. Partner with Cloud Platforms to Bring ML to Genomics + Process Omics & Design Cohort Analyze Cohort
  • 50. Big Pharma is Partnering with Startups Gain AI Capabilities PRESCIENT DESIGN Presents barrier (ML hurdle) for early-stage labs/biotechs
  • 51. AIQC is the Seed Around which Labs/Biotechs can Develop ML Capabilities Problem: Competing for ML talent Problem: Budgeting for ML talent Problem: Bioinformaticians aren’t ML experts Problem: Expensive to build in-house ML platform Long-Term Solution: As the biotech company scales, adopt AIQC platform & depend less on professional services Problem: How to adopt ML to accelerate research? Near-Term Solution: AIQC tool + AIQC services
  • 52. ■ Slides: bit.ly/aiqc-bosc ■ github.com/aiqc/aiqc ■ docs.aiqc.io ■ layne@aiqc.io Links