Presented at GlobusWorld 2022 by Michael Reich from the uCSD School of Medicine. Describes how Globus services are integrated with a leading genomics analysis platform.
2. Research Objectives
• New statistical and machine learning
techniques for the genomic analysis of
cancer and other diseases
• Software applications for the analysis and
visualization of genomics data
• Web-based environments for accessible,
reproducible research
www.mesirovlab.org
Mesirov Lab, UCSD
3. GenePattern: a platform for reproducible bioinformatics research
● First released in 2004
● ~84,000 registered users as of May, 2022
● Public server runs 1000 – 5000 analyses per week on Amazon cloud
● Hundreds of analyses available:
• Machine learning
• Clustering
• Classification
• Dimension reduction
• Gene expression analysis
• Single-cell RNA-seq analysis
• Cancer genomics
• Gene Set Enrichment Analysis (GSEA)
• Proteomics
• Flow Cytometry
• Network analysis
• Data import and formatting utilities, etc.
● BSD-style open source license
Public server: cloud.genepattern.org
Web site: genepattern.org
4. > java -Djava.awt.headless=true -
Dwin=cluster.exe -Dmac=clusterMac -
Dlinux=clusterLinux -
Dlinux64=clusterLinux64 -cp
hcl.jar/legacy-gp-modules.jar/ant.jar
org.genepattern.modules.hcl.RunCluster -f
input.filename log.transform row.center
row.normalize column.center
column.normalize -u output.base.name -e
column.distance.measure -g
row.distance.measure -m clustering.method
Corresponding GenePattern
visual representation
Standard “command-line” method for
running analysis
GenePattern wraps software tools in an accessible visible format
5. GenePattern Ecosystem Components
Analysis Engine
Record/replay analyses
Monitor job status
Versioning of methods
Web service access
Access for all
levels of user
Pipeline Environment
Easy creation of multi-step
analytical workflows that can
be shared, versioned, and
published in support of
reproducible research
Notebook
Environment
Programming
Interface
Web UI
Module Repository
GISTIC HiSAT2
GSEA
NMF MuTect
Seurat
Hundreds of analysis and
visualization tools
…
Picard
FastQC DESeq2
…
…
GenePattern API
6. The GenePattern Notebook Environment
• Integrates GenePattern with Jupyter
Notebook
• Access hundreds of GenePattern genomic
analyses from within a notebook without the
need for code
Reich et al.,
Cell systems, 2017
7. Access to genomics analysis tools within a notebook
Set input
parameters
Upload datasets
8. Access to multiple compute resources from a notebook
SDSC Expanse HPC
Local desktop or cluster
Amazon Cloud
cloud.genepattern.org
genepattern.ucsd.edu
9. Additional capabilities for non-coding scientists
User-friendly UI to Python functions
Rich-text Editor
Abakir et al. Nature Genetics 2020
10. GenePattern Notebook Workspace
• Create, execute, and share
GenePattern notebooks
• Zero-install usage of
GenePattern Notebook
• Python libraries installed and
you can install your own
• Make notebooks public for
dissemination to the analysis
community
• Runs in Amazon cloud utilizing
scalable compute
• Over 15,000 notebooks
created by 4200+ users
notebook.genepattern.org
11. Angerer et al., Curr Opinion in Sys Bio (2017)
10k cells
1.1m cells
4m cells
Single-Cell RNA-seq data
Han et al., Nature (2020)
702k cells
12.
13.
14.
15.
16.
17.
18. Acknowledgements
PI
Jill P. Mesirov
Anthony Castanza
Owen Chapman
Lukas Chavez
David Eby
Barbara Hill
Edwin Huang
Forrest Kim
Ted Liefeld
Michael Reich
Jim Robinson
Thorin Tabor
Pablo Tamayo
Helga Thorvaldsdottìr
Douglass Turner
Alex Wenzel
Research Collaborators
U of Chicago
Brigitte Raumann
Nickolaus Saint
Lee Liming
Steve Turoscy
Blankenberg Lab
Cleveland Clinic
Ferando Perez
UC Berkeley