tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analytical Capabilities with Knowledge Content
Sirimon Ocharoen, Thomson Reuters
To effectively analyze data in tranSMART, biological analysis/knowledge-based approach is needed. Through a case study, we will demonstrate how system biology content can be integrated in tranSMART to enable functional analysis and biological interpretation. We will also share our experience and user feedbacks from various projects.
2. 4 EXAMPLES OF tranSMART USE CASES
• Use case 1: Leveraging public datasets
• Use case 2: Finding information on variant and
mutation
• Use case 3: Biological interpretation
• Use case 4: Implementing classification model
• Feedbacks from tranSMART users
2
8. MICROARRAY REPOSITORY:
PROCESSING PROCEDURE
A. Search for Datasets in public
databases & Data loading
(A)
(B)
B. Quality Control (QC) testing of Raw
Assays (filtering out of unsuitable
defective Assays)
C. GCRMA Processing of QC-approved
Assays
D. Assays Annotation:
i.
Assignment of experimental
meta-data values to the
Assays
ii.
Assignment of experimental
Assays Groups and their
Comparisons
E. Statistical analysis of defined
Comparisons:
i.
Differential expression testing
ii.
Calculation of Fold Changes
iii. Functional Descriptors
calculation
(D)
(C)
(E)
Optional: cutoffs
8
9. MICROARRAY REPOSITORY:
QC PROCEDURE
• Datasets undergo rigorous
quality control during processing
• An assay is removed from the
dataset if it’s identified as an
outlier by the majority of qc
metrics
• Users are able to see which
tests the datasets passed/failed
9
12. How does 17p13 deletion correlated to thalidomide
response in chronic lymphocytic leukemia (CLL) patients?
WBC Reduction at Day 7
12
No abberation vs. 17p13 deletion
13. What are other diseases or drugs that 17p13
deletion is associated to?
13
16. GENE VARIANT API
Significance of genotype-phenotype relationships across the translational pipeline
IDENTIFY ACTIONABLE
GENE VARIANTS
DISEASE RECORD
A. Establish variant significance
B. Characterize the variant
C. Asses the utility of the
variant:
RESPONSE RECORD
O
OH
O
O
VARIANT DISEASE
• Understanding Disease
Mechanism
• Treatment & Response
Disease Profiling
Diagnosis
Prognosis
Screening, Risk
VARIANT DRUG
DISEASE
Predicting Efficacy / Toxicity
Monitoring Efficacy / Toxicity
Selection for Therapy
Resistance
MANUALLY CURATED CONTENT FROM A RANGE OF SOURCES
DISCOVERY
HTP Studies
Candidate Studies
VALIDATION
Preclinical in vitro &
animal studies
Clinical studies in
patient segments
APPLICATION
FDA approvals
Clinical guidelines
16
17. GENE VARIANT API: PROCEDURE
SOURCES
SOURCE
SELECTION
VS.
REJECTION
SELECTION
CRITERIA BY
THE ANALYST
conference abstracts, patents, peer reviewed journal
articles, clinical trial registries, clinical guidelines, and
authority approval documents (ex. FDA)
•Retrospective selection or prospective screening for
frontfile. Items are screened by a text-mining tool to identify
and remove items that have no relevance to the Gene
Variant API.
•All articles not removed are sent to manual selection by
trained annotators who follow the policy.
•A clear study design, and valuable results are required by
the analyst. The item must satisfy requirements of
evidence-based medicine in order to be taken into
consideration.
•Statistics and / or statement by the author of the variant
effect on health are required. If both components are
absent, the item is rejected.
17
26. IMPLEMENTING MODEL IN tranSMART
Associated clinical phenotype
Sample
Molecular Subtype
Model
Biomarkers
Stratification rule
Mechanism
Drug target
Pathways
Standard tranSMART
Additional functionality in tranSMART
MetaCore
Cytoscape
Subnetworks
Applicable for both One Mind and
Orion projects
27. SYSTEMS BIOLOGY TOOLS
Network/Pathways based Approach–
•
•
METABASE
The most comprehensive data available
Drug Targets
Drug Repositioning
•
Biomarker Identification
•
Biological Mechanism
Reconstruction
SYSTEMS BIOLOGY TOOL LIBRARY
State of the art methods
Drug Combinations
Prognostic Biomarkers
•
OMICs data +
other data types
including clinical
response
•
•
Statistical Approach
Predictive Biomarkers
28. EXAMPLES OF NETWORK APPROACHES
Subnetworks
(Chuang et al., 2012)
Systems Biology
(Zhang et al., 2011)
Probabilistic Inference
Pathway Activity
(Su et al., 2009)
(Lee et al., 2007)
Pathway Based
(Kim et al., 2012)
RRFE
(Johannes et al., 2010)
30. STATISTICAL ANALYSIS QUESTIONS
• How to control Type 1 error rate?
– Testing set vs. Validation set
• How to perform longitudinal analysis?
– Regression models
• How to identify covariance variables? Which
variable has the highest correlation with the
outcome?
– Multivariate analysis
• Other analysis methods/workflows
30
31. SCIENTIFIC QUESTIONS
• How to set QC framework around uploaded data?
Community developed QC standard?
• How to do across study analysis (easier)?
• How to do across species analysis?
• How to the community report these (and bugs)?
31
32. FEATURE WISH LIST
• Multiple improvement to R advance workflows
• Sending results (gene list, patient subsets) from an
advanced workflow back to summary statistics
• Saving a workflow (history/output)
• Using gene expression data to create subsets
• Viewing specific subject records
• Adding data types (i.e. date, longitudinal measurement)
• Improving exported tables
and many more ….
32
Editor's Notes
The average practicing physician doesn't know what to do with this (genomic) information.Genetic Tests are as effective-but not more effective-as a persons family and personal medical history is assessing the risk for disease(PGx Reporter 17 / 7 / 2011)