The document discusses WikiBioPath, a company that provides integrative data analysis of bio-molecular data through statistical analysis and interpretation of results in the context of biological knowledge. It offers expertise in transcriptomics, proteomics and genomics data from samples to help identify biomarkers, optimize drug compounds, and support personalized medicine. Key services discussed include pre-processing and quality control of experimental data, statistical analysis at the gene and gene set level, identifying population structures, and integrating results with knowledge from literature and databases.
2. EISBM 2012
WikiBioPath
new perspectives in biological data analysis
laurent Buffat, MD, PhD
Simon de Bernard, X, PhD
mercredi 13 juin 2012 1
3. Core expertise
Analysis & high level
interpretation of
bio-molecular data :
• Transcriptomics
• Proteomics
• Genomics
Pharmaco-genomics
Functional Genomics
Samples Data Bio-statistical Interpretation
generation analysis Reports
For research & industry in
WikiBioPath the Life Sciences
Integrative data analysis
mercredi 13 juin 2012 2
4. Core expertise : application fields
Analysis & interpretation
& Data mIning of bio-
molecular data
Molecules
repositionning - new
Biomarker discovery
market applications
Pharmaco-genomics
Pathway studies, Functional Genomics
Toxicity studies Patient class
Identification /
support to
personalized
medecine
Drug compound
Optimization
mercredi 13 juin 2012 3
5. Challenge
Bio-statistical
Interpretation
Experimental Data analysis
Discovery
Data Warehouses
?
Scientific litterature
Private / Public
mercredi 13 juin 2012 4
6. Knowledge integration
Bio-statistical
Interpretation
Data generation analysis
Discovery
Data Warehouses
Knowledge integration
Scientific litterature
WikiBioPath
Private / Public
mercredi 13 juin 2012 5
7. expression Experimental
Data Analysis
Experimental data
• MicroArray (Illumina, RNG/MRC, Agilent, Affymetrix, etc.)
• High Throughput Sequencing
• SNPs
• Copy Number
• GenoTyping
mercredi 13 juin 2012 6
8. expression Experimental
Data Analysis
• PreProcessing & Quality Control
• Hybridization issues solving
• Background corrections
• Probeset summarization
• RNA degradation correction
• identification of annotations errors
• Correction of systematic bias
• etc.
mercredi 13 juin 2012 7
9. expression Experimental
Data Analysis
Statistical Analysis
• Gene level:
• expression level modelization − KRT75
− RNASE7
− UPP1
− LCN2
− C4orf19
− SPRR2G
− SPRR2A
− KRT17
− ODC1
− RPTN
− HMGCS2
− DHRS9
& visualization (linear models,
− C14orf34
− SPRR3
− TCHH
− MMP3
− MMP10
− S100A12
− S100A7
− CCAR1
− TAF1D
− TSPAN5
− LEPROT
− DNAJC1
− SECISBP2
− ANKRD11
− SFRS18
− PVRL4
− SMC3
− MALAT1
− TAOK1
− HIST1H4C
− USO1
contrasts, multitesting correction)
− M6PR
− TOP1
− LOC100131261
− EPRS
− FABP4
− KRT9
− CHST2
− SYNM
− SPRR4
− PI3
− SPINK6
− IGFL1
− IFIT3
− IFIT1
− IFIH1
• statistical expression level
− MX1
− XAF1
− IFI44
− IFITM1
− DDX60
− HLA−B
− ISG15
− IFITM3
− IFI27
− IFI6
− PRSS23
− YARS
− PCK2
− ASNS
− AARS
− XBP1
− TARS
− MARS
− HERPUD1
− CHAC1
− TUBE1
− PSPH
comparison
− SLC7A11
− SERPINE2
− BCHE
− CADM1
− CBS
− SHMT2
− MOCOS
− PSAT1
− RCN1
− ASS1
− NUPR1
− EIF4EBP1
− PHGDH
− ECM2
− STC2
− S100P
− CH25H
− PHLDA1
− SLC7A5
− ANKRD30BL
− S100A7A
− TCN1
• etc.
− KCTD11
− ADM
− PPP1R3C
− ANGPTL4
− C7orf68
− PFKFB3
− ANKRD37
− NDRG1
− FUT11
− SERPINE1
− MT1X
− PFKFB4
− PPFIA4
− PGF
− LRMP
− IER3
− P4HA1
− CA9
− EGLN3
− DDIT4
− FOS
− ALDOC
− IGFBP3
− NDUFA4L2
− PFKP
− PTTG1
− CLCA4
− SLC6A8
− ZNF395
− MKNK2
− HOXC6
− HMGB2
− NUAK1
− TYRP1
− C15orf59
− SLC46A2
− SLC1A6
− ALOX12
− SLC16A10
− TMEM99
− SLC45A4
− GAN
− ACPP
− CA6
− RARRES1
− FAM83D
− C1orf105
− HSD11B1
− SERPINB12
− TNNC1
− PTGS1
− TPX2
− SLC40A1
− INHBB
− KRT77
− CRNN
− CASP14
− SERPINB4
PBS A
PBS A
PBS A
PBS A
PBS A
PBS B
PBS B
PBS B
PBS B
PBS B
None B
None B
None B
None B
None B
None B
T1 A
T1 A
T1 A
T1 A
T1 A
T1 A
T1 B
T1 B
T1 B
T1 B
T1 B
T1 B
T2 A
T2 A
T2 A
T2 A
T2 A
T2 B
T2 B
T2 B
T2 B
T2 B
T2 B
T3 A
T3 A
T3 A
T3 A
T3 A
T3 B
T3 B
T3 B
T3 B
T3 B
T3 B
mercredi 13 juin 2012 8
10. expression Experimental
Data Analysis
Statistical Analysis, Knowledge integration
• Geneset level:
• geneset based analyses / correlation identification
• parametric analysis of geneset enrichment
p−values
< 1e−04 Broad:NAGASHIMA_EGF_SIGNALING
< 1e−04 Broad:JIANG_HYPOXIA_NORMAL
< 1e−04 Broad:MENSE_HYPOXIA
< 1e−04 Broad:ZHOU_INFLAMMATORY_RESPONSE_LIVE
< 1e−04 Broad:ZHOU_INFLAMMATORY_RESPONSE_LPS
< 1e−04 ...
0.00022 Broad:LEONARD_HYPOXIA
0.00025 CTD:Carbaryl
0.00032 CTD:Amiloride
0.00038 CTD:Kainic Acid
0.00050 positive regulation of response to stimulus (GO:0048584)
0.00082 CTD:Buthionine Sulfoximine
0.00085 Broad:BIOCARTA_TOLL_PATHWAY
0.00085 serine family amino acid metabolic process (GO:0009069)
0.00088 Broad:GOLDRATH_IMMUNE_MEMORY
0.00097 Broad:SCIAN_INVERSED_TARGETS_OF_TP53_AND_TP73
0.00112 neutral amino acid transport (GO:0015804)
0.00133 CTD:allylpyrocatechol
0.00160 CTD:temozolomide
0.00178 CTD:2,4−dinitrotoluene
0.00183 L2L:hypoxia_reg
0.00198 Broad:GARGALOVIC_RESPONSE_TO_OXIDIZED_PHOSPHOLIPIDS_RED
0.00252 CTD:Paraquat
0.00258 One carbon pool by folate (KEGG:00670)
0.00265 L2L:calres_mouse
0.00267 Broad:SABATES_COLORECTAL_ADENOMA_SIZE
0.00275 Broad:BREDEMEYER_RAG_SIGNALING_VIA_ATM_NOT_VIA_NFKB
0.00300 CTD:butylbenzyl phthalate
0.00300 CTD:Cocaine
0.00300 Broad:ST_GAQ_PATHWAY
0.00308 Broad:SCHLOSSER_MYC_TARGETS_AND_SERUM_RESPONSE
0.00323 CTD:Diclofenac
0.00340 secretion (GO:0046903)
0.00352 positive regulation of endothelial cell migration (GO:0010595)
0.00358 ...
0.00385 Broad:GENTILE_UV_RESPONSE_CLUSTER_D6
0.00390 nucleoside catabolic process (GO:0009164)
0.00395 L2L:hdaci_colon_cursul
0.00405 CTD:2−nitrotoluene
0.00405 CTD:Platelet Activating Factor
0.00417 CTD:Allopurinol
0.00455 CTD:2,4−diaminotoluene
0.00477 CTD:Pam(3)CSK(4) peptide
0.00480 secretion by cell (GO:0032940)
0.00505 protein amino acid O−linked glycosylation (GO:0006493)
0.00513 CTD:bis(maltolato)oxovanadium(IV)
0.00522 CTD:Apigenin
0.00565 L2L:stress_genotoxic_specific
0.00568 CTD:4−(4−fluorophenyl)−2−(4−hydroxyphenyl)−5−(4−pyridyl)imidazole
0.00568 Broad:ST_GA13_PATHWAY
PBS A
PBS A
PBS A
PBS A
PBS A
PBS B
PBS B
PBS B
PBS B
PBS B
None B
None B
None B
None B
None B
None B
T1 A
T1 A
T1 A
T1 A
T1 A
T1 A
T1 B
T1 B
T1 B
T1 B
T1 B
T1 B
T2 A
T2 A
T2 A
T2 A
T2 A
T2 B
T2 B
T2 B
T2 B
T2 B
T2 B
T3 A
T3 A
T3 A
T3 A
T3 A
T3 B
T3 B
T3 B
T3 B
T3 B
T3 B
mercredi 13 juin 2012 9
11. expression Experimental
Data Analysis
Population structuration
• Structuring genes (hidden state models)
Distance from closest class
New classes
Known
Discriminative power
mercredi 13 juin 2012 10
12. expression Experimental
Data Analysis
Population structuration
• Structuring genesets, knowledge integration
• geneset based analyses / correlation identification
• parametric analysis of geneset enrichment
Pvalues Concepts
Genes GeneSets Samples
mercredi 13 juin 2012 11
13. Interpretation of Data
Analysis Results
Why interpretation:
• Data analysis: Crucial step
• but need to contextualize results within the context
of biological knowledge to turn data into really
actionable results
Interpretation challenges: Integrative Data Analysis
• Gathering heterogeneous Knowledge and Data
resources
• Present Knowledge & Data in a manageable way
mercredi 13 juin 2012 12
14. Interpretation of Data
Analysis Results
WikiBioPath (in short):
Web App with online specific tools for
• storing/managing Knowledge & Data (K&D)
• enriching/curating K&D
• querying/exploring/visualizing / mining K&D
Allows heterogeneous K&D integration into the
interpretation process
Will be available as Free & Open Source
We are currently looking for collaborations / beta testers
mercredi 13 juin 2012 13
15. Web Portals/Apps, Applications Wikis
public/external + private/internal public/external + private/internal
External AND/OR External & WikiBioPath’s
private Web Portals/ dedicated specialized
Apps, Applications Wikis (Genes, Proteins,
(NCBI, Kegg, ... ) Diseases, ... )
WBP’s linking & “bridging” WBP’s specialized functionalities for
functionalities knowledge storing/querying/managing/...
WikiBioPath Core
Web based interface & online Tools for
Data & Knowledge storing, managing, querying,
visualizing, mining and more...
Text Mining based Search WBP’s specialized functionalities for
Engine & Explorer data storing/querying/managing/...
Textual Data (Unstructured data repositories) Data Warehouses
public/external + private/internal public/external + private/internal
Bibliographical Databases Experimental Data &
with low structured Experimental Data Analyses
textual data Data Warehouses
(Medline, own text (Micro Array Exp. Data, Micro
collection ...) Array Analyses Results ...)
mercredi 13 juin 2012 14