SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Integrated Morphologic Analysis for Identification and
            Characterization of Disease Subtypes
                           Lee Cooper
              Center for Comprehensive Informatics,
                         Emory University




1
Agenda

    •   Background
    •   Pipeline for integrated morphologic analysis
    •   Results and validation
    •   Software Infrastructure
    •   Future Work and Conclusions
    •   Acknowledgements




2
Background




3
NCI caBIG® In Silico Brain Tumor Research
    Center                     Emory University
                                                       Atlanta, GA




      Joel Saltz, MD PhD   Daniel Brat, MD PhD
      Director             Science PI



    Jefferson Hospital                  Henry Ford Hospital          Stanford University
    Philadelphia, PA                    Detroit, MI                  Stanford, CA




4
Application domain: glioblastoma


    •   Most common primary brain
        tumor in adults

    •   Median survival 50 weeks

    •   ISBTRC Goals:
         • To leverage rich datasets to understand the mechanisms of glioma
           progression through In Silico analysis
         • To manage, explore and share semantically complex data among
           researchers




5
Glioblastoma Histology




               Necrosis      Angiogenesis




6
The Cancer Genome Atlas (TCGA)
    •   Characterize 500 tumors for each of a variety of cancers
    •   Clinical records
    •   Genomics: gene, miRNA expression, copy number, sequence,
        DNA methylation
    •   Imaging: pathology and radiology




                        histology          radiology




            genomic                                    clincalpathology



                              Integrated
                               Analysis
7
Slide scanning and image analysis




     •   High throughput slide scanning systems
     •   Digitize entire slides at 200X / 400X magnification
     •   250 slides / day
     •   Algorithms to segment and describe cells and structures




8
Glioblastoma morphology




    •   Themes: morphology, subtypes, rich datasets

             Are there natural clusters of GBM morphology?
             Are there links to patient outcome and   molecular
        characteristics?




9
Methodology




     Cooper LA, Kong J, Gutman DA, Wang F, Gao J, Appin C, Cholleti S, Pan T, Sharma A,
     Scarpace L, Mikkelsen T, Kurc T, Moreno CS, Brat DJ, Saltz JH, “Integrated morphologic
     analysis for the identification and characterization of disease subtypes,”
     Journal of the American Medical Informatics Association, 2012 19:317-323

10
Computational
     Pathology and
     Correlative
     Analysis




11
Morphology engine




12
Clustering engine


                     Patient Morphology Profiles




13
Correlative engine

                          Patient Cluster Labels




14
Genome wide analysis




                            GISTIC



15
Results




16
Clustering identifies three morphological groups
      • Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides)
      • Named for functions of associated genes:
        Cell Cycle (CC), Chromatin Modification (CM),
        Protein Biosynthesis (PB)
      • Prognostically-significant (logrank p=4.5e-4)



                                        CC   CM   PB

                                   10


                                   20
                 Feature Indices




                                   30


                                   40


                                   50




17
Gene Expression Class Associations
     •   Cox proportional hazards
          • Verhaak expression class1 not significant p=0.58
          • Morphology clustering p=5.0e-3


                                                   100
                                                                                          Classical
                                                                                          Mesenchymal
                                                   80
                          Subtype Percentage (%)




                                                                                          Neural
                                                                                          Proneural
                                                   60


                                                   40


                                                   20


                                                    0
                                                         CC     CM      PB
                                                              Cluster
         1Verhaak  RG, Hoadley KA, Purdom E, et al; Cancer Genome Atlas Research Network. Integrated genomic
         analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1,
         EGFR, and NF1. Cancer Cell 2010;17:98e110.


18
Clustering Validation

     •   Separate set of 84 GBMs from Henry Ford Hospital
     •   ClusterRepro: CC p=7.2e-3, CM p=1.3e-2


                                CC   Mixed   CM               1
                                                                                                CC
                           10                                0.8                                Mixed
                                                                                                CM
         Feature Indices




                           20                                0.6


                           30                     Survival
                                                             0.4


                           40
                                                             0.2

                           50
                                                              0
                                                                   0   20   40            60   80       100
                                                                                 Months


19
Representative nuclei




                          Large,        Small light nuclei,   Intermediate
                       hyperchromatic Eosinophilic cyoplasm
                           nuclei
20
Associations




21
From Gene Lists to Biology

     •   Nuclear lumen localization most highly enriched in cluster
         associated genes
         (CC p=2.8e-36, CM p=2.17e-19, PB p=1.08e-15)

     •   Other enriched GO terms: DNA repair, m-phase , cell cycle,
         protein biosynthesis, chromatin modification

     •   Differences in activation of cancer-related pathways including
         ATM and TP53 DNA damage checkpoints, NFκB pathway, Wnt
         signaling and PTEN/AKT pathways




22
Software Infrastructure


     Wang F, Kong J, Cooper L, Pan T, Kurc T, Chen W, Sharma A, Niedermayr C, Oh T-W, Brat
     D, Farris A, Foran D, Saltz J, “A Data Model and Database for High-resolution Pathology
     Analytical Image Informatics,” Journal of Pathology Informatics, Vol. 2, Issue 1, pp. 32-40,
     2011.

     Teodoro G, Kurc T, Pan T, Cooper L, Kong J, Widener P, Saltz J, “Accelerating Large Scale
     Image Analyses on Parallel CPU-GPU Equipped Systems”, Accepted for presentation at the
     International Parallel and Distributed Processing Symposium, China, 2012. Also available
     as Emory University, Center for Comprehensive Informatics, Technical Report: CCI-TR-
     2011-4, 2011.


23
How to scale to 14,000 images?

     • TCGA contains 20 cancer types
        • 14K images – 4 Terabytes


     • How to analyze larger datasets? HPC Pipeline
     • How to organize results?         PAIS Database
     • How to interact with the data?   CDSA Portal




24
HPC Segmentation and Feature Extraction Pipeline




                                  Tony Pan and George Teodoro




25
PAIS (Pathology Analytical Imaging Standards)
                                    PAIS Logical Model:
                                    •   62 UML classes
                                    •   markups, annotations,
                                        imageReferences,
                                        provenance
                                    •   Semantic enabled


                                    PAIS Data Representation:
                                    •   XML (compressed) or HDF5


                                    PAIS Databases:
                                    •   loading, managing and
                                        querying and sharing data
                                    •   RDBMS + SDBMS + parallel
                                        DBMS
       Fusheng Wang
26
Microscopy Image Database




      Image analysis


                                 PAIS model                    PAIS data management
                          Modeling and management of markup and annotation for querying
                          and sharing through parallel RDBMS + spatial DBMS

      Segmentation




                                 HDFS data staging              MapReduce based queries
                          On the fly data processing for algorithm validation/algorithm
     Feature extraction
                          sensitivity studies, or discovery of preliminary results

27
Cancer Digital Slide Archive




28
cancer.digitalslidearchive.net




29
cancer.digitalslidearchive.net




30
cancer.digitalslidearchive.net




31
Future Work and Conclusions




32
Radiology Imaging Correlative Study




33
Studying Protein Expression Patterns
     Using Quantum Dot Immunohistochemistry

                                              Cytoplasm

                                              Nucleus




34
Conclusions
     •   Pathology imagery contains important cues
     •   Pipeline for analyzing whole slide imagery
     •   Tooling to handle large datasets
          • Other TCGA diseases (14000 Images!)
     •   Developing richer descriptions of image content


     •   Resources:
          •   Emory Websites: bmi.emory.edu cci.emory.edu
          •   Cancer Digital Slide Archive: cancer.digitalslidearchive.net
          •   TCGA Symposium Talk:
            http://cancergenome.nih.gov/newsevents/multimedialibrary/videos/morphol
            ogicalcooper
          •   JAMIA Paper: http://jamia.bmj.com/content/19/2/317.abstract




35
In Silico Brain Tumor Research Center Team
     •   Emory University                   •   Henry Ford Hospital
          • Joel Saltz (Director)                • Tom Mikkelsen
          • Daniel Brat (Science PI)             • Lisa Scarpace
          • Carlos Moreno (Bioinformatics
            Lead)                           •   Thomas Jefferson University
          • Lee Cooper                           • Adam Flanders (Radiology
          • David Gutman                           Lead)
          • Jun Kong
          • Fusheng Wang                    •   Stanford University
          • Chad Holder                          • Daniel Rubin
          • Christina Appin
          • Candace Chisolm
          • Erwin van Meir
          • Tahsin Kurc
          • Sharath Cholleti
          • Tony Pan
          • Ashish Sharma


36
Related Papers and Acknowledgements
     •   Cooper LA, Kong J, Gutman DA, Wang F, Gao J, Appin C, Cholleti S, Pan T,
         Sharma A, Scarpace L, Mikkelsen T, Kurc T, Moreno CS, Brat DJ, Saltz JH,
         “Integrated morphologic analysis for the identification and characterization of
         disease subtypes”, Journal of the American Medical Informatics Association, in
         press, 2012. Pre-print Available: http://jamia.bmj.com/content/19/2/317.long
     •   Wang F, Kong J, Cooper L, Pan T, Kurc T, Chen W, Sharma A, Niedermayr C,
         Oh T-W, Brat D, Farris A, Foran D, Saltz J, “A Data Model and Database for
         High-resolution Pathology Analytical Image Informatics,” Journal of Pathology
         Informatics, Vol. 2, Issue 1, pp. 32-40, 2011.
     •   Teodoro G, Kurc T, Pan T, Cooper L, Kong J, Widener P, Saltz J, “Accelerating
         Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems”,
         Accepted for presentation at the International Parallel and Distributed
         Processing Symposium, China, 2012. Also available as Emory University,
         Center for Comprehensive Informatics, Technical Report: CCI-TR-2011-4,
         2011.
     This work is supported in part by NCI HHSN261200800001E, NHLBI R24HL085343, NLM
     R01LM011119-01 and R01LM009239, NIH RC4MD005964, NIH NIBIB BISTI P20EB000591,
     and CTSA PHS Grant UL1RR025008.

37

Contenu connexe

Similaire à Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Characterization of Disease Subtypes

[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
DataScienceConferenc1
 
TriStar Presentation 2011
TriStar Presentation 2011TriStar Presentation 2011
TriStar Presentation 2011
thnkstudios
 
antigen presentation machinery in ctcs (1)
antigen presentation machinery in ctcs (1)antigen presentation machinery in ctcs (1)
antigen presentation machinery in ctcs (1)
Stephanie Thiede
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Enrico Glaab
 
Data Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architectureData Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA
 

Similaire à Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Characterization of Disease Subtypes (20)

2011 AACR OncoPanel Poster
2011 AACR OncoPanel Poster2011 AACR OncoPanel Poster
2011 AACR OncoPanel Poster
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methods
 
Data and Computational Challenges in Integrative Biomedical Informatics
Data and Computational Challenges in Integrative Biomedical InformaticsData and Computational Challenges in Integrative Biomedical Informatics
Data and Computational Challenges in Integrative Biomedical Informatics
 
Wci Pop Sci Feb 2011
Wci Pop Sci Feb 2011Wci Pop Sci Feb 2011
Wci Pop Sci Feb 2011
 
Pathomics Based Biomarkers and Precision Medicine
Pathomics Based Biomarkers and Precision MedicinePathomics Based Biomarkers and Precision Medicine
Pathomics Based Biomarkers and Precision Medicine
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
 
MDC Connects Series 2021 | A Guide to Complex Medicines: CryoEM in characteri...
MDC Connects Series 2021 | A Guide to Complex Medicines: CryoEM in characteri...MDC Connects Series 2021 | A Guide to Complex Medicines: CryoEM in characteri...
MDC Connects Series 2021 | A Guide to Complex Medicines: CryoEM in characteri...
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
The Cancer imaging Phenomics Toolkit (CaPTk)
The Cancer imaging Phenomics Toolkit (CaPTk)The Cancer imaging Phenomics Toolkit (CaPTk)
The Cancer imaging Phenomics Toolkit (CaPTk)
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation Oncology
 
Nanotechnology in Cancer - Dr. Cote
Nanotechnology in Cancer - Dr. CoteNanotechnology in Cancer - Dr. Cote
Nanotechnology in Cancer - Dr. Cote
 
Molecular profiling 2013
Molecular profiling 2013Molecular profiling 2013
Molecular profiling 2013
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
TriStar Presentation 2011
TriStar Presentation 2011TriStar Presentation 2011
TriStar Presentation 2011
 
antigen presentation machinery in ctcs (1)
antigen presentation machinery in ctcs (1)antigen presentation machinery in ctcs (1)
antigen presentation machinery in ctcs (1)
 
John Luk Shanghai Bioforum 2012-05-11
John Luk Shanghai Bioforum 2012-05-11John Luk Shanghai Bioforum 2012-05-11
John Luk Shanghai Bioforum 2012-05-11
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Mason abrf single_cell_2017
Mason abrf single_cell_2017Mason abrf single_cell_2017
Mason abrf single_cell_2017
 
Data Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architectureData Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architecture
 

Plus de National Cancer Institute National Cancer Informatics Program

Plus de National Cancer Institute National Cancer Informatics Program (7)

Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group BioinformaticsDr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
 
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
 
Dr. Stephen Chanock: Genome-wide Association Studies
Dr. Stephen Chanock: Genome-wide Association StudiesDr. Stephen Chanock: Genome-wide Association Studies
Dr. Stephen Chanock: Genome-wide Association Studies
 
Dr. Ethan Cerami: cBio Cancer Genomics Portal
Dr. Ethan Cerami: cBio Cancer Genomics PortalDr. Ethan Cerami: cBio Cancer Genomics Portal
Dr. Ethan Cerami: cBio Cancer Genomics Portal
 
Dr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
Dr. David Gutman: Development and Validation of Radiology Descriptors in GliomasDr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
Dr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
 
Dr. Martin McIntosh: Identifying Cancer Selective Proteins Using RNA-Sequenci...
Dr. Martin McIntosh: Identifying Cancer Selective Proteins Using RNA-Sequenci...Dr. Martin McIntosh: Identifying Cancer Selective Proteins Using RNA-Sequenci...
Dr. Martin McIntosh: Identifying Cancer Selective Proteins Using RNA-Sequenci...
 
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Characterization of Disease Subtypes

  • 1. Integrated Morphologic Analysis for Identification and Characterization of Disease Subtypes Lee Cooper Center for Comprehensive Informatics, Emory University 1
  • 2. Agenda • Background • Pipeline for integrated morphologic analysis • Results and validation • Software Infrastructure • Future Work and Conclusions • Acknowledgements 2
  • 4. NCI caBIG® In Silico Brain Tumor Research Center Emory University Atlanta, GA Joel Saltz, MD PhD Daniel Brat, MD PhD Director Science PI Jefferson Hospital Henry Ford Hospital Stanford University Philadelphia, PA Detroit, MI Stanford, CA 4
  • 5. Application domain: glioblastoma • Most common primary brain tumor in adults • Median survival 50 weeks • ISBTRC Goals: • To leverage rich datasets to understand the mechanisms of glioma progression through In Silico analysis • To manage, explore and share semantically complex data among researchers 5
  • 6. Glioblastoma Histology Necrosis Angiogenesis 6
  • 7. The Cancer Genome Atlas (TCGA) • Characterize 500 tumors for each of a variety of cancers • Clinical records • Genomics: gene, miRNA expression, copy number, sequence, DNA methylation • Imaging: pathology and radiology histology radiology genomic clincalpathology Integrated Analysis 7
  • 8. Slide scanning and image analysis • High throughput slide scanning systems • Digitize entire slides at 200X / 400X magnification • 250 slides / day • Algorithms to segment and describe cells and structures 8
  • 9. Glioblastoma morphology • Themes: morphology, subtypes, rich datasets Are there natural clusters of GBM morphology? Are there links to patient outcome and molecular characteristics? 9
  • 10. Methodology Cooper LA, Kong J, Gutman DA, Wang F, Gao J, Appin C, Cholleti S, Pan T, Sharma A, Scarpace L, Mikkelsen T, Kurc T, Moreno CS, Brat DJ, Saltz JH, “Integrated morphologic analysis for the identification and characterization of disease subtypes,” Journal of the American Medical Informatics Association, 2012 19:317-323 10
  • 11. Computational Pathology and Correlative Analysis 11
  • 13. Clustering engine Patient Morphology Profiles 13
  • 14. Correlative engine Patient Cluster Labels 14
  • 15. Genome wide analysis GISTIC 15
  • 17. Clustering identifies three morphological groups • Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides) • Named for functions of associated genes: Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB) • Prognostically-significant (logrank p=4.5e-4) CC CM PB 10 20 Feature Indices 30 40 50 17
  • 18. Gene Expression Class Associations • Cox proportional hazards • Verhaak expression class1 not significant p=0.58 • Morphology clustering p=5.0e-3 100 Classical Mesenchymal 80 Subtype Percentage (%) Neural Proneural 60 40 20 0 CC CM PB Cluster 1Verhaak RG, Hoadley KA, Purdom E, et al; Cancer Genome Atlas Research Network. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010;17:98e110. 18
  • 19. Clustering Validation • Separate set of 84 GBMs from Henry Ford Hospital • ClusterRepro: CC p=7.2e-3, CM p=1.3e-2 CC Mixed CM 1 CC 10 0.8 Mixed CM Feature Indices 20 0.6 30 Survival 0.4 40 0.2 50 0 0 20 40 60 80 100 Months 19
  • 20. Representative nuclei Large, Small light nuclei, Intermediate hyperchromatic Eosinophilic cyoplasm nuclei 20
  • 22. From Gene Lists to Biology • Nuclear lumen localization most highly enriched in cluster associated genes (CC p=2.8e-36, CM p=2.17e-19, PB p=1.08e-15) • Other enriched GO terms: DNA repair, m-phase , cell cycle, protein biosynthesis, chromatin modification • Differences in activation of cancer-related pathways including ATM and TP53 DNA damage checkpoints, NFκB pathway, Wnt signaling and PTEN/AKT pathways 22
  • 23. Software Infrastructure Wang F, Kong J, Cooper L, Pan T, Kurc T, Chen W, Sharma A, Niedermayr C, Oh T-W, Brat D, Farris A, Foran D, Saltz J, “A Data Model and Database for High-resolution Pathology Analytical Image Informatics,” Journal of Pathology Informatics, Vol. 2, Issue 1, pp. 32-40, 2011. Teodoro G, Kurc T, Pan T, Cooper L, Kong J, Widener P, Saltz J, “Accelerating Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems”, Accepted for presentation at the International Parallel and Distributed Processing Symposium, China, 2012. Also available as Emory University, Center for Comprehensive Informatics, Technical Report: CCI-TR- 2011-4, 2011. 23
  • 24. How to scale to 14,000 images? • TCGA contains 20 cancer types • 14K images – 4 Terabytes • How to analyze larger datasets? HPC Pipeline • How to organize results? PAIS Database • How to interact with the data? CDSA Portal 24
  • 25. HPC Segmentation and Feature Extraction Pipeline Tony Pan and George Teodoro 25
  • 26. PAIS (Pathology Analytical Imaging Standards) PAIS Logical Model: • 62 UML classes • markups, annotations, imageReferences, provenance • Semantic enabled PAIS Data Representation: • XML (compressed) or HDF5 PAIS Databases: • loading, managing and querying and sharing data • RDBMS + SDBMS + parallel DBMS Fusheng Wang 26
  • 27. Microscopy Image Database Image analysis PAIS model PAIS data management Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS Segmentation HDFS data staging MapReduce based queries On the fly data processing for algorithm validation/algorithm Feature extraction sensitivity studies, or discovery of preliminary results 27
  • 28. Cancer Digital Slide Archive 28
  • 32. Future Work and Conclusions 32
  • 34. Studying Protein Expression Patterns Using Quantum Dot Immunohistochemistry Cytoplasm Nucleus 34
  • 35. Conclusions • Pathology imagery contains important cues • Pipeline for analyzing whole slide imagery • Tooling to handle large datasets • Other TCGA diseases (14000 Images!) • Developing richer descriptions of image content • Resources: • Emory Websites: bmi.emory.edu cci.emory.edu • Cancer Digital Slide Archive: cancer.digitalslidearchive.net • TCGA Symposium Talk: http://cancergenome.nih.gov/newsevents/multimedialibrary/videos/morphol ogicalcooper • JAMIA Paper: http://jamia.bmj.com/content/19/2/317.abstract 35
  • 36. In Silico Brain Tumor Research Center Team • Emory University • Henry Ford Hospital • Joel Saltz (Director) • Tom Mikkelsen • Daniel Brat (Science PI) • Lisa Scarpace • Carlos Moreno (Bioinformatics Lead) • Thomas Jefferson University • Lee Cooper • Adam Flanders (Radiology • David Gutman Lead) • Jun Kong • Fusheng Wang • Stanford University • Chad Holder • Daniel Rubin • Christina Appin • Candace Chisolm • Erwin van Meir • Tahsin Kurc • Sharath Cholleti • Tony Pan • Ashish Sharma 36
  • 37. Related Papers and Acknowledgements • Cooper LA, Kong J, Gutman DA, Wang F, Gao J, Appin C, Cholleti S, Pan T, Sharma A, Scarpace L, Mikkelsen T, Kurc T, Moreno CS, Brat DJ, Saltz JH, “Integrated morphologic analysis for the identification and characterization of disease subtypes”, Journal of the American Medical Informatics Association, in press, 2012. Pre-print Available: http://jamia.bmj.com/content/19/2/317.long • Wang F, Kong J, Cooper L, Pan T, Kurc T, Chen W, Sharma A, Niedermayr C, Oh T-W, Brat D, Farris A, Foran D, Saltz J, “A Data Model and Database for High-resolution Pathology Analytical Image Informatics,” Journal of Pathology Informatics, Vol. 2, Issue 1, pp. 32-40, 2011. • Teodoro G, Kurc T, Pan T, Cooper L, Kong J, Widener P, Saltz J, “Accelerating Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems”, Accepted for presentation at the International Parallel and Distributed Processing Symposium, China, 2012. Also available as Emory University, Center for Comprehensive Informatics, Technical Report: CCI-TR-2011-4, 2011. This work is supported in part by NCI HHSN261200800001E, NHLBI R24HL085343, NLM R01LM011119-01 and R01LM009239, NIH RC4MD005964, NIH NIBIB BISTI P20EB000591, and CTSA PHS Grant UL1RR025008. 37