2. Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data Run lots of different algorithms to derive same features Run lots of algorithms to derive complementary features Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms Squeezing Information from Spatial Datasets
3. Outline Integrative biomedical informatics analysis –feature sets obtained from Pathology and Radiology studies Techniques, tools and methodologies for derivation, management and analysis of feature sets
4. INTEGrative Biomedical Informatics Analysis Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology) Integration of anatomic/functional characterization with multiple types of “omic” information Create categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment
5. In Silico Center for Brain Tumor Research Specific Aims: Influence of necrosis/ hypoxia on gene expression and genetic classification. 2. Molecular correlates of high resolution nuclear morphometry. Gene expression profiles that predict glioma progression. 4. Molecular correlates of MRI enhancement patterns.
6.
7. Exploit synergies between all initiatives to improve ability to forecast survival & response.Pathologic Features
8. Example: Pathology and Gene Expression Joint Predictors of Recurrence/Survival Lee Cooper Carlos Moreno
9. Feature Characterization in Pathology and Radiology Role – In silico Brain Tumor Research Algorithms Scaling Requirements
10. In Silico Center for Brain Tumor Research Key Data Sets REMBRANDT: Gene expression and genomics data set of all glioma subtypes The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology Pathology and Radiology Images from Henry Ford Hospital, Emory, Thomas Jefferson U, MD Anderson and others
27. Clustergram of selected features used in consensus clustering Nuclear Features Used to Classify GBMs
28. Nuclear Features Used to Classify GBMs Consensus clustering of morphological signatures Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients.
33. Multiscale Systems Biology Employ multi-resolution methods to characterize necrosis, angiogenesis and correlate these with “omics” Rim-enhancement Vascular Changes Rapid progression No enhancement Normal Vessels Stable lesion ?
34. Correlation of Necrosis, Angiogenesis and “omics” GBMs display variable and regionally heterogeneous degrees of necrosis (asterisk) and angiogenesis These factors may impact gene expression profiles
35.
36. Extent of both necrosis and angiogenesis calculated as a percentage of total tissue areaCarro MS, et al. Nature 263: 318-25, 2010
37. Feature Sets in Radiology(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI) Require standardized validated feature sets to describe de novo disease. Fundamental obstacle to new imaging criteria as treatment biomarkers is lack of standard terminology: To define a comprehensive set of imaging features of cancer For reporting imaging results To provide a more quantitative, reproducible basis for assessing baseline disease and treatment response
38. Defining Rich Set of Qualitative and Quantitative Image Biomarkers Community-driven ontology development project; collaboration with ASNR Imaging features (5 categories) Locationof lesion Morphology of lesion margin(definition, thickness, enhancement, diffusion) Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio) Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling) Resection features (extent of nCE tissue, CE tissue, resected components)
39. Imaging Predictors of survival and molecular profiles in the TCGA Glioblastoma Data set The TCGA glioma working group 1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD. 3Thomas Jefferson University Hospital, Philadelphia, PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD. 6Boston University School of Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9Northwestern University Chicago, IL
40. Assumed Dependence Between Features F1: Tumor Location F3: Eloq. Brain F5: Prop. Enhance F6: Prop. nCET F7: Prop. Necrosis F9: Distribution F10: T1/FLAIR F11: En. Marg.Thick. F13: Def. Non.Marg. F14: Prop. Edema F16: Hemorrhage F18: Pial Invasion F19: Ependymal F20: Cort. Involve. F21: Deep WM Inv. F22: nCET Cross. Mid. F24: Satellites F18 F20 F9 F3 F16 F7 F22 F21 F1 F6 F11 F5 F14 F19 F10 F13 F24 NOTE: each feature omitted from this graph is independent of every other feature. Slide thanks to Eric Huang, NIH
41. Estimation Problem Size Reduction Can ignore seven of the features size of contingency table reduced from 2.64 × 1012 cells to 1.34 × 1010 cells. Collapsibility reduces size of contingency table even further: Any binary feature Fj connected to only one feature on graph (i.e. given the feature Fj is connected to, Fj is independent of all other features) can also be ignored Eliminates need to deal with Hemorrhage (F16), Pial Invasion (F18), Cortical Involvement (F20), and Satellites (F24). Reduces size of contingency table to 1.68 × 109 cells. Additional analogous considerations can be used to reduce size of contingency table by more than two additional orders of magnitude Slide thanks to Eric Huang, NIH
42. Correlative Imaging Results Minimal enhancing tumor (≤5%) strongly associated with Proneural classification (p=0.0006). >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008). Greater maximum tumor dimension (T2 signal) associated with present/abundant microvascular hyperplasia (p=0.001). < 5% Enhancement
43. Correlative Imaging Results TP53 mutant tumors had a smaller mean tumor sizes (p=0.002) on T2-weighted or FLAIR images. EGFR mutant tumors were significantly larger than TP53 mutant tumors (p=0.0005). High level EGFR amplification was associated with >5% enhancement and >5% proportion of necrosis (p < 0.01). > 5% Necrosis
44. Squeezing Information from Spatial Datasets Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data Run lots of different algorithms to derive same features Run lots of algorithms to derive complementary features Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms
45. Pipeline for Whole Slide Feature Characterization 1010 pixels for each whole slide image 10 whole slide images per patient 108 image features per whole slide image 10,000 brain tumor patients 1015 pixels 1013 features Hundreds of algorithms Annotations and markups from dozens of humans
46.
47. Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
48.
49. Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
50. Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
51.
52. Pathology Imaging GIS Image analysis PAIS model PAIS data management Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS Segmentation HDFS data staging MapReduce based queries On the fly data processing for algorithm validation/algorithm sensitivity studies, or discovery of preliminary results Feature extraction
53. Generation and Analysis of Imaging Features In-transit data processing using filter/stream systems Semantic Workflows Hierarchical pipeline design with coarse and fine grained components Adaptivity and Quality of Service
56. Slides’ Preparation 8x 40x 64990 x 59412 pixels in full resolution Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb
57. Computerized Classification System for Grading Neuroblastoma Background? Yes Image Tile Label Initialization I = L No Create Image I(L) Training Tiles Segmentation I = I -1 Down-sampling Feature Construction Segmentation Yes No I > 1? Feature Extraction Feature Construction Feature Extraction Classification Classifier Training Within Confidence Region ? No Yes TRAINING TESTING Background Identification Image Decomposition (Multi-resolution levels) Image Segmentation (EMLDA) Feature Construction (2nd order statistics, Tonal Features) Feature Extraction (LDA) + Classification (Bayesian) Multi-resolution Layer Controller (Confidence Region)
58. Segmentation A typical segmentation result of an image from undifferentiated class with components segmented by this method is shown. (a) Original image; (b) Partitioned image shown in color; (c)Nuclei; (d)Cytoplasm; (e)Neuropil; (f)Background component.
59.
60. Search for the most appropriate implementation of both components and workflows
81. Thanks to: In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director) caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers) NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, DimaHammoud, ManalJilwan, PrashantRaghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-Ramos NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, EwaDeelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi