The integration of computational and statistical approaches with visualization tools is becoming crucial as biomedical data sets are rapidly growing in size. Finding efficient solutions that address the interplay between data management, algorithmic and visual analysis tools is challenging. I will discuss some of these challenges and demonstrate how we are addressing them in our Refinery Platform project (http://www.refinery-platform.org).
Recombination DNA Technology (Nucleic Acid Hybridization )
Approaches for the Integration of Visual and Computational Analysis of Biomedical Data
1. Approaches for the Integration of Visual and
Computational Analysis of Biomedical Data
HARVARD MEDICAL SCHOOL
DEPARTMENT OF BIOMEDICAL INFORMATICS
NILS GEHLENBORG
@nils_gehlenborg
http://gehlenborglab.org
6. SINGLE OR FEW DATA SETS
Test hypotheses without generating new data.
Use published data as supporting evidence for findings based on
our your own data sets.
MANY DATA SETS
Conduct meta analyses, e.g. characterize expression patterns in
human tissues or to link diseases.
7. M. Lukk, et al., Nature Biotechnology, 28(4):322–324 (2010)
8. S. Suthram et al.,PLoS Computational Biology 6(2)(2010)
9. SINGLE OR FEW DATA SETS
Test hypotheses without generating new data.
Use published data as supporting evidence for findings based on
our your own data sets.
MANY DATA SETS
Conduct meta analyses, e.g. characterize expression patterns in
human tissues or to link diseases.
COMMON BEHAVIOR OF RESEARCH PARASITES!
10. N Gehlenborg et al. , manuscript in preparation
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
11. N Gehlenborg et al. , manuscript in preparation
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
13. ANALYSIS PIPELINES
N Gehlenborg et al. , manuscript in preparation
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
GALAXY Toolshed
Workflow Editor
Tools
REST
API
14. ANALYSIS PIPELINES
N Gehlenborg et al. , manuscript in preparation
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
GALAXY Toolshed
Workflow Editor
Tools
REST
API
Workflow Inputs
Workflow Outputs
15. N Gehlenborg et al. , manuscript in preparation
|
DATA REPOSITORY
VISUALIZATION TOOLS
ANALYSIS PIPELINES
http://www.refinery-platform.org
18. Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2
A3
A4
X Y
Z- -
K K K K
L M L M
Free Text
Annotation
Mapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subclassof
Keywords
19. Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2
A3
A4
X Y
Z- -
K K K K
L M L M
Free Text
Annotation
Mapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subclassof
Keywords
20. Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2
A3
A4
X Y
Z- -
K K K K
L M L M
Free Text
Annotation
Mapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subclassof
Keywords
21.
22.
23.
24.
25. Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2
A3
A4
X Y
Z- -
K K K K
L M L M
Free Text
Annotation
Mapping
K
L, M
X, Y
Z
X YZX Y
Terminal
Root
subclassof
Keywords
26. X
Semantic Visual
Exploration
Y
Z
Text-Based Search
Data Sets
Metadata
Data Files
X Y
Ontologies
Z
A1
X Y
Z
A2
A3
A4
X Y
Z- -
K K K K
L M L M
Free Text
Annotation
Mapping
K
L, M
X, Y
Z
X YZX Y
SATORI
Terminal
Root
subclassof
Keywords
YX
Z
Z
X
27. SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories
http://satori.refinery-platform.org
32. Need 1
find data sets that match certain experimental characteristics.
Need 2
find data sets that are similar (or dissimilar) to given data sets.
Need 3
get an overview of the distribution of the experimental characteristics
across a collection of data sets.
Need 4
get an overview of the annotation term hierarchy and term usage.
40. HARVARD MEDICAL SCHOOL
JOHANNES KEPLER UNIVERSITY LINZ Stefan Luger, Holger Stitz, Marc Streit
Web
http://satori.refinery-platform.org · http://refinery-platform.org
Acknowledgements
Peter J Park & all members of the Computational Genomics Lab
Fritz Lekschas, Jennifer K Marx, Scott Ouellette, Anton Xue,
Psalm Haseley
HARVARD SCHOOL OF PUBLIC HEALTH Ilya Sytchev, Shannan Ho Sui
UNIVERSITY OF SHEFFIELD David R Jones, Winston Hide
Funding
NIH/NHGRI R00 HG007583, Harvard Stem Cell Institute
41. We are hiring postdocs & developers!
HARVARD MEDICAL SCHOOL
DEPARTMENT OF BIOMEDICAL INFORMATICS
See http://gehlenborglab.org or http://dbmi.med.harvard.edu for details.
Data visualization, analysis, and management for:
• genomic structural variants
• dynamics of the 3D genome
• cancer subtypes in patient cohorts
• exploration tools for data repositories
• provenance graphs
42. X
B
A
D
A
X XX Term Terminal term To be deleted
A
A
X To be duplicated
A A
C
ABA
C
B
C'
0 0 00 5 5 5 5
0 5
1 5
5 10 5 10
Term size Cumulative sizeX1 2
2 7
2 7
1 5
D
C
F D
C
F
F'
1. Global 2. Tree Map 3. Node-Link Diagram
5 10
1 5 1 105 5
0 10
G G
BB
B
C
C
C E EA'C