SlideShare a Scribd company logo
1 of 21
Download to read offline
Data Integration: what I haven’t yet achieved
Neil Saunders

MATHEMATICS, INFORMATICS AND STATISTICS
www.csiro.au
My main project

Ludwig colorectal cancer study

Data integration 2 of 21
Multiple “omics” platforms

exon expression

Data integration 3 of 21

methylation

copy number
We want to “integrate” these data

but what does that mean?

Data integration 4 of 21
Integration can mean “portals”

Data integration 5 of 21
Integration can mean “visualization”

Data integration 6 of 21
Integration can mean “correlation”

Data integration 7 of 21
What do we think integration means?

A

+

B

+

C

More information when combined than when separate
Data integration 8 of 21
What’s already “out there”? PubMed
PubMed Search: "data integration"
q
q

q

q

articles / 100 000

12

q

q

8
q

q
q

4

q

q

2002

2004

2006

Year

Data integration 9 of 21

2008

2010
What’s already “out there”? CiteULike

http://www.citeulike.org/user/neils/tag/integration

Data integration 10 of 21
Buzz-word compliant

Data integration 11 of 21
Quote from integIRTy paper

These methods can be roughly grouped into four categories:
stepwise, regression-based, correlation-based and
latent variable models
integIRTy: a method to identify genes altered in cancer by accounting for
multiple mechanisms of regulation using item response theory
Bioinformatics, Vol. 28, No. 22. (15 November 2012), pp. 2861-2869

Data integration 12 of 21
Regression: SIM

Integrated analysis of DNA copy number and gene expression microarray data using gene sets
BMC Bioinformatics 2009, 10:203

Data integration 13 of 21
1

2

3

4

5

6

7

8

10
9

11

12

13

14

15

16

17
18

19
20
21
22
0

0

Data integration 14 of 21
0.2
0.4
2

0.6
0.8
4

1

Correlation

010
026
142
011
115
018
037
145
017
009
023
002
116
117
120
003
036
029
040
114
118
121
112
006
113
119
034
035
028
004
007
013
014
016
024
012
019
021
015
001
067
068
072
077
048
058
064
050
075
080
086
051
061
070
076
087
092
096
099
101
104
110
093
097
100
089
109
091
103
127
130
131
135
133
136
134
137
125
128
138
146
032
033
043
038
041
042
140
141
144
153
152
147
122
123
132
126
139
069
074
085
055
095
005
066
010
026
142
011
115
018
037
145
017
009
023
002
116
117
120
003
036
029
040
114
118
121
112
006
113
119
034
035
028
004
007
013
014
016
024
012
019
021
015
001
067
068
072
077
048
058
064
050
075
080
086
051
061
070
076
087
092
096
099
101
104
110
093
097
100
089
109
091
103
127
130
131
135
133
136
134
137
125
128
138
146
032
033
043
038
041
042
140
141
144
153
152
147
122
123
132
126
139
069
074
085
055
095
005
066

Chr

Correlation: DR-Integrator
Latent variable: iCluster

(file under impractical)

Data integration 15 of 21
Basics that are never explained 1/2

Integration across groups or description of samples?

Data integration 16 of 21
Basics that are never explained 2/2

Genes x Samples

Data integration 17 of 21
Conclusions 1/3

We’re not the first people doing this...
...but it’s becoming a “hot topic”

Data integration 18 of 21
Conclusions 2/3

Room for improvement in software, much of which is:

• Poorly-written
• Poorly-documented
• Difficult to implement

Data integration 19 of 21
Conclusions 3/3

Too much for one individual!

Data integration 20 of 21
CSIRO Mathematics, Informatics and Statistics
Neil Saunders
t
+61 2 9325 3144
e Neil.Saunders@csiro.au
w Mathematics, Informatics and Statistics web

MATHEMATICS, INFORMATICS AND STATISTICS
www.csiro.au

More Related Content

Similar to Data Integration: What I Haven't Yet Achieved

Remote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care MonitoringRemote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care MonitoringVeselin Pizurica
 
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkImpact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkIJECEIAES
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...Rafael C. Jimenez
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightUniversity Medicine Greifswald
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-carehoot72
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...IJECEIAES
 
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition MethodKPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition Methodijtsrd
 
Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]HealthXn
 
Big Data and Business Intelligence in Health
Big Data and Business Intelligence in HealthBig Data and Business Intelligence in Health
Big Data and Business Intelligence in HealthHealthXn
 
Le Bauer: Data Driven Model Development
Le Bauer:  Data Driven Model DevelopmentLe Bauer:  Data Driven Model Development
Le Bauer: Data Driven Model DevelopmentquestRCN
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
Blockchain key Drivers
Blockchain key Drivers Blockchain key Drivers
Blockchain key Drivers SumaMeeran
 
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...IAEME Publication
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls Dan Elton
 
A comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithmA comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithmAlexander Decker
 
Acceliant white paper_edc_and_epro
Acceliant white paper_edc_and_eproAcceliant white paper_edc_and_epro
Acceliant white paper_edc_and_eproTrianz
 
Arcs conference
Arcs conferenceArcs conference
Arcs conferenceHealthXn
 

Similar to Data Integration: What I Haven't Yet Achieved (20)

Big Data - A view
Big Data - A viewBig Data - A view
Big Data - A view
 
Remote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care MonitoringRemote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care Monitoring
 
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkImpact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-care
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
 
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition MethodKPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
 
Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]
 
MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
 
Big Data and Business Intelligence in Health
Big Data and Business Intelligence in HealthBig Data and Business Intelligence in Health
Big Data and Business Intelligence in Health
 
Le Bauer: Data Driven Model Development
Le Bauer:  Data Driven Model DevelopmentLe Bauer:  Data Driven Model Development
Le Bauer: Data Driven Model Development
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
Blockchain key Drivers
Blockchain key Drivers Blockchain key Drivers
Blockchain key Drivers
 
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
A comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithmA comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithm
 
Acceliant white paper_edc_and_epro
Acceliant white paper_edc_and_eproAcceliant white paper_edc_and_epro
Acceliant white paper_edc_and_epro
 
Arcs conference
Arcs conferenceArcs conference
Arcs conference
 

More from Neil Saunders

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Neil Saunders
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomicsNeil Saunders
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansNeil Saunders
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesNeil Saunders
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitNeil Saunders
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for youNeil Saunders
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Neil Saunders
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificityNeil Saunders
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?Neil Saunders
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformaticsNeil Saunders
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsNeil Saunders
 

More from Neil Saunders (11)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 

Recently uploaded

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Data Integration: What I Haven't Yet Achieved

  • 1. Data Integration: what I haven’t yet achieved Neil Saunders MATHEMATICS, INFORMATICS AND STATISTICS www.csiro.au
  • 2. My main project Ludwig colorectal cancer study Data integration 2 of 21
  • 3. Multiple “omics” platforms exon expression Data integration 3 of 21 methylation copy number
  • 4. We want to “integrate” these data but what does that mean? Data integration 4 of 21
  • 5. Integration can mean “portals” Data integration 5 of 21
  • 6. Integration can mean “visualization” Data integration 6 of 21
  • 7. Integration can mean “correlation” Data integration 7 of 21
  • 8. What do we think integration means? A + B + C More information when combined than when separate Data integration 8 of 21
  • 9. What’s already “out there”? PubMed PubMed Search: "data integration" q q q q articles / 100 000 12 q q 8 q q q 4 q q 2002 2004 2006 Year Data integration 9 of 21 2008 2010
  • 10. What’s already “out there”? CiteULike http://www.citeulike.org/user/neils/tag/integration Data integration 10 of 21
  • 12. Quote from integIRTy paper These methods can be roughly grouped into four categories: stepwise, regression-based, correlation-based and latent variable models integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory Bioinformatics, Vol. 28, No. 22. (15 November 2012), pp. 2861-2869 Data integration 12 of 21
  • 13. Regression: SIM Integrated analysis of DNA copy number and gene expression microarray data using gene sets BMC Bioinformatics 2009, 10:203 Data integration 13 of 21
  • 14. 1 2 3 4 5 6 7 8 10 9 11 12 13 14 15 16 17 18 19 20 21 22 0 0 Data integration 14 of 21 0.2 0.4 2 0.6 0.8 4 1 Correlation 010 026 142 011 115 018 037 145 017 009 023 002 116 117 120 003 036 029 040 114 118 121 112 006 113 119 034 035 028 004 007 013 014 016 024 012 019 021 015 001 067 068 072 077 048 058 064 050 075 080 086 051 061 070 076 087 092 096 099 101 104 110 093 097 100 089 109 091 103 127 130 131 135 133 136 134 137 125 128 138 146 032 033 043 038 041 042 140 141 144 153 152 147 122 123 132 126 139 069 074 085 055 095 005 066 010 026 142 011 115 018 037 145 017 009 023 002 116 117 120 003 036 029 040 114 118 121 112 006 113 119 034 035 028 004 007 013 014 016 024 012 019 021 015 001 067 068 072 077 048 058 064 050 075 080 086 051 061 070 076 087 092 096 099 101 104 110 093 097 100 089 109 091 103 127 130 131 135 133 136 134 137 125 128 138 146 032 033 043 038 041 042 140 141 144 153 152 147 122 123 132 126 139 069 074 085 055 095 005 066 Chr Correlation: DR-Integrator
  • 15. Latent variable: iCluster (file under impractical) Data integration 15 of 21
  • 16. Basics that are never explained 1/2 Integration across groups or description of samples? Data integration 16 of 21
  • 17. Basics that are never explained 2/2 Genes x Samples Data integration 17 of 21
  • 18. Conclusions 1/3 We’re not the first people doing this... ...but it’s becoming a “hot topic” Data integration 18 of 21
  • 19. Conclusions 2/3 Room for improvement in software, much of which is: • Poorly-written • Poorly-documented • Difficult to implement Data integration 19 of 21
  • 20. Conclusions 3/3 Too much for one individual! Data integration 20 of 21
  • 21. CSIRO Mathematics, Informatics and Statistics Neil Saunders t +61 2 9325 3144 e Neil.Saunders@csiro.au w Mathematics, Informatics and Statistics web MATHEMATICS, INFORMATICS AND STATISTICS www.csiro.au