On March 14th, from 11:00am to 12:00pm, Dr. Lee Cooper delivered a virtual presentation via Adobe Connect highlighting his recent publication, “Integrated morphologic analysis for the identification and characterization of disease subtypes.” Dr. Cooper is a postdoctoral researcher in the Center for Comprehensive Informatics at Emory University. He received a Ph.D. in Electrical Engineering from Ohio State University in 2009, where he worked to develop computational methods for image-based phenotyping in mouse models of breast cancer. Dr. Cooper joined Emory in 2009 where he works under the guidance of Joel Saltz to develop methods for analyzing and integrating genomic and imaging datasets to discover associations among pathology, genetics, and patient outcomes. While at Emory, Dr. Cooper has co-authored several methodological and scientific papers describing work performed at the Emory In Silico Brain Tumor Research Center.
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Characterization of Disease Subtypes
1. Integrated Morphologic Analysis for Identification and
Characterization of Disease Subtypes
Lee Cooper
Center for Comprehensive Informatics,
Emory University
1
2. Agenda
• Background
• Pipeline for integrated morphologic analysis
• Results and validation
• Software Infrastructure
• Future Work and Conclusions
• Acknowledgements
2
4. NCI caBIG® In Silico Brain Tumor Research
Center Emory University
Atlanta, GA
Joel Saltz, MD PhD Daniel Brat, MD PhD
Director Science PI
Jefferson Hospital Henry Ford Hospital Stanford University
Philadelphia, PA Detroit, MI Stanford, CA
4
5. Application domain: glioblastoma
• Most common primary brain
tumor in adults
• Median survival 50 weeks
• ISBTRC Goals:
• To leverage rich datasets to understand the mechanisms of glioma
progression through In Silico analysis
• To manage, explore and share semantically complex data among
researchers
5
7. The Cancer Genome Atlas (TCGA)
• Characterize 500 tumors for each of a variety of cancers
• Clinical records
• Genomics: gene, miRNA expression, copy number, sequence,
DNA methylation
• Imaging: pathology and radiology
histology radiology
genomic clincalpathology
Integrated
Analysis
7
8. Slide scanning and image analysis
• High throughput slide scanning systems
• Digitize entire slides at 200X / 400X magnification
• 250 slides / day
• Algorithms to segment and describe cells and structures
8
9. Glioblastoma morphology
• Themes: morphology, subtypes, rich datasets
Are there natural clusters of GBM morphology?
Are there links to patient outcome and molecular
characteristics?
9
10. Methodology
Cooper LA, Kong J, Gutman DA, Wang F, Gao J, Appin C, Cholleti S, Pan T, Sharma A,
Scarpace L, Mikkelsen T, Kurc T, Moreno CS, Brat DJ, Saltz JH, “Integrated morphologic
analysis for the identification and characterization of disease subtypes,”
Journal of the American Medical Informatics Association, 2012 19:317-323
10
17. Clustering identifies three morphological groups
• Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides)
• Named for functions of associated genes:
Cell Cycle (CC), Chromatin Modification (CM),
Protein Biosynthesis (PB)
• Prognostically-significant (logrank p=4.5e-4)
CC CM PB
10
20
Feature Indices
30
40
50
17
18. Gene Expression Class Associations
• Cox proportional hazards
• Verhaak expression class1 not significant p=0.58
• Morphology clustering p=5.0e-3
100
Classical
Mesenchymal
80
Subtype Percentage (%)
Neural
Proneural
60
40
20
0
CC CM PB
Cluster
1Verhaak RG, Hoadley KA, Purdom E, et al; Cancer Genome Atlas Research Network. Integrated genomic
analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,
EGFR, and NF1. Cancer Cell 2010;17:98e110.
18
19. Clustering Validation
• Separate set of 84 GBMs from Henry Ford Hospital
• ClusterRepro: CC p=7.2e-3, CM p=1.3e-2
CC Mixed CM 1
CC
10 0.8 Mixed
CM
Feature Indices
20 0.6
30 Survival
0.4
40
0.2
50
0
0 20 40 60 80 100
Months
19
22. From Gene Lists to Biology
• Nuclear lumen localization most highly enriched in cluster
associated genes
(CC p=2.8e-36, CM p=2.17e-19, PB p=1.08e-15)
• Other enriched GO terms: DNA repair, m-phase , cell cycle,
protein biosynthesis, chromatin modification
• Differences in activation of cancer-related pathways including
ATM and TP53 DNA damage checkpoints, NFκB pathway, Wnt
signaling and PTEN/AKT pathways
22
23. Software Infrastructure
Wang F, Kong J, Cooper L, Pan T, Kurc T, Chen W, Sharma A, Niedermayr C, Oh T-W, Brat
D, Farris A, Foran D, Saltz J, “A Data Model and Database for High-resolution Pathology
Analytical Image Informatics,” Journal of Pathology Informatics, Vol. 2, Issue 1, pp. 32-40,
2011.
Teodoro G, Kurc T, Pan T, Cooper L, Kong J, Widener P, Saltz J, “Accelerating Large Scale
Image Analyses on Parallel CPU-GPU Equipped Systems”, Accepted for presentation at the
International Parallel and Distributed Processing Symposium, China, 2012. Also available
as Emory University, Center for Comprehensive Informatics, Technical Report: CCI-TR-
2011-4, 2011.
23
24. How to scale to 14,000 images?
• TCGA contains 20 cancer types
• 14K images – 4 Terabytes
• How to analyze larger datasets? HPC Pipeline
• How to organize results? PAIS Database
• How to interact with the data? CDSA Portal
24
26. PAIS (Pathology Analytical Imaging Standards)
PAIS Logical Model:
• 62 UML classes
• markups, annotations,
imageReferences,
provenance
• Semantic enabled
PAIS Data Representation:
• XML (compressed) or HDF5
PAIS Databases:
• loading, managing and
querying and sharing data
• RDBMS + SDBMS + parallel
DBMS
Fusheng Wang
26
27. Microscopy Image Database
Image analysis
PAIS model PAIS data management
Modeling and management of markup and annotation for querying
and sharing through parallel RDBMS + spatial DBMS
Segmentation
HDFS data staging MapReduce based queries
On the fly data processing for algorithm validation/algorithm
Feature extraction
sensitivity studies, or discovery of preliminary results
27
35. Conclusions
• Pathology imagery contains important cues
• Pipeline for analyzing whole slide imagery
• Tooling to handle large datasets
• Other TCGA diseases (14000 Images!)
• Developing richer descriptions of image content
• Resources:
• Emory Websites: bmi.emory.edu cci.emory.edu
• Cancer Digital Slide Archive: cancer.digitalslidearchive.net
• TCGA Symposium Talk:
http://cancergenome.nih.gov/newsevents/multimedialibrary/videos/morphol
ogicalcooper
• JAMIA Paper: http://jamia.bmj.com/content/19/2/317.abstract
35
36. In Silico Brain Tumor Research Center Team
• Emory University • Henry Ford Hospital
• Joel Saltz (Director) • Tom Mikkelsen
• Daniel Brat (Science PI) • Lisa Scarpace
• Carlos Moreno (Bioinformatics
Lead) • Thomas Jefferson University
• Lee Cooper • Adam Flanders (Radiology
• David Gutman Lead)
• Jun Kong
• Fusheng Wang • Stanford University
• Chad Holder • Daniel Rubin
• Christina Appin
• Candace Chisolm
• Erwin van Meir
• Tahsin Kurc
• Sharath Cholleti
• Tony Pan
• Ashish Sharma
36
37. Related Papers and Acknowledgements
• Cooper LA, Kong J, Gutman DA, Wang F, Gao J, Appin C, Cholleti S, Pan T,
Sharma A, Scarpace L, Mikkelsen T, Kurc T, Moreno CS, Brat DJ, Saltz JH,
“Integrated morphologic analysis for the identification and characterization of
disease subtypes”, Journal of the American Medical Informatics Association, in
press, 2012. Pre-print Available: http://jamia.bmj.com/content/19/2/317.long
• Wang F, Kong J, Cooper L, Pan T, Kurc T, Chen W, Sharma A, Niedermayr C,
Oh T-W, Brat D, Farris A, Foran D, Saltz J, “A Data Model and Database for
High-resolution Pathology Analytical Image Informatics,” Journal of Pathology
Informatics, Vol. 2, Issue 1, pp. 32-40, 2011.
• Teodoro G, Kurc T, Pan T, Cooper L, Kong J, Widener P, Saltz J, “Accelerating
Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems”,
Accepted for presentation at the International Parallel and Distributed
Processing Symposium, China, 2012. Also available as Emory University,
Center for Comprehensive Informatics, Technical Report: CCI-TR-2011-4,
2011.
This work is supported in part by NCI HHSN261200800001E, NHLBI R24HL085343, NLM
R01LM011119-01 and R01LM009239, NIH RC4MD005964, NIH NIBIB BISTI P20EB000591,
and CTSA PHS Grant UL1RR025008.
37