SlideShare une entreprise Scribd logo
1  sur  17
Data Management for Predictive Tools Paul Fearn, MBA NLM Informatics Research Fellow Biomedical and Health Informatics University of Washington | Fred Hutchinson Cancer Research Center Seattle, Washington PROSTATE CANCER: PREDICTIVE MODELS FOR DECISION MAKING April 7th – 9th, 2011  - MSKCC - New York, NY
Data Management Requirements Need to assemble large datasets for predictive modeling Pooling data across sites, systems and countries Linking data across clinical, specimen and lab repositories Quality assurance (for reproducibility of results) Tradeoffs between accuracy and reproducibility of data points Transparency of data processing Complete and up-to-date datasets Ease to access, sort, filter and export data Statistical analysis in Stata, R, SPSS, SAS, Excel SQL queries and reports Sustainability Secondary (N-ary) use of clinical and research data Cumulative cost of data entry Cumulative cost of staff training and turnover Cumulative risks and opportunity costs of staff entrenchment
The Growth Problem Lu Z. PubMed and Beyond. Database 2011;2011:baq036  21245076[pmid]
The Growth Problem http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
The Growth Problem http://www.ncbi.nlm.nih.gov/books/NBK44423/
The Breaking Point 1000 cases
The Growth Problem Microsoft Access databases 1999 ProstateDB 1.0 2000 PRDB / Prostabase ColdFusion & SQL Server web-based database 2002 Valhalla 1.0 – 1.1 Prostate 2003 Valhalla 1.2 (7,994 patients) Billing/EMR compliant populated clinic forms ASP.NET & SQL Server web-based database 2004 CAISIS 2.0 – 2.1 (26,470 patients) Integrated bladder, kidney, testis 2005 CAISIS 3.0 – 3.1 (44,000 patients) Prostatectomy eForm, protocol manager, tumor maps 2006 CAISIS 3.5 – (55,000 patients) GU and Urology Prostate Follow-up eForms 2007 CAISIS 4.0 – (80,000 patients) Metadata, dynamic forms, new diseases and eForms 2008 CAISIS 4.1 – (98,000 patients) Email eForms, advanced find, specimen tracking 2009 CAISIS 4.5 – (120,000+ patients) Project tracking, patient education, virtual fields, reporting module 2010 CAISIS 5.0x
The Curation Problem Increasing volume of data More data points for annotation Clinical / patient Genomic / biological Public health / environment Parallel curation issues in modern clinical and biological research databases (Krallinger 2008*) Development of NLP system to support clinical research operations (Savova 2010**) *18834499[pmid], **20819853[pmid]
On the Other Hand… Long tail of research efforts Small heterogeneous labs and projects Subsets of data Specialized requirements Innovative approaches
Spectrum of Approaches One dataset per project (i.e. study based systems) Registry databases (i.e. one treatment or disease) Data warehouse or data repository Common schema (data model) “Amalgamation” of heterogeneous datasets Common security and access Common syntax (data format) Defined links between records Indexed for searching and retrieval Federation / grid of semantically integrated data Common vocabulary / terminology Formal models (caBIG)
Loosely Linking Data http://www.ncbi.nlm.nih.gov/sites/gquery
Tightly Integrating Data Vocabulary / Terminology NCI Thesaurus (NCIt) NLM UMLS Standard data models caBIG / caDSR HL7/FDA/NCI CDISC / BRIDG Web services* Common syntax / format *Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-20 12000935[pmid]
The CAISIS System
Appendix: 394 people at 60 sites visited from Aug, 2008 to Jun, 2009 Driving Flying
[object Object]
Costly curation and support of research databases
Widespread and large scale implementation of EMRs

Contenu connexe

Tendances

Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Ashish Sharma
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineAndre Dekker
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0mehmood78
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook OntologyStuart Chalk
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemWolfgang Kuchinke
 
A model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsA model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsKody Moodley
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in researchLouise Corti
 
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...ASIS&T
 
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...ASIS&T
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 

Tendances (20)

Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision Medicine
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
A model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsA model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicals
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in research
 
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
 
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 

Similaire à NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized MedicineEdgewater
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Remedy Informatics
 
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...Mark Hawker
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceOla Spjuth
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataManjulaPatel
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016Warren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 

Similaire à NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools (20)

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized Medicine
 
Translational Biomedical Informatics 2010: Infrastructure and Scaling
Translational Biomedical Informatics 2010: Infrastructure and ScalingTranslational Biomedical Informatics 2010: Infrastructure and Scaling
Translational Biomedical Informatics 2010: Infrastructure and Scaling
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Markham2009
Markham2009Markham2009
Markham2009
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 

Plus de European School of Oncology

ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...European School of Oncology
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...European School of Oncology
 
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...European School of Oncology
 
A. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasA. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasEuropean School of Oncology
 
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasA. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasEuropean School of Oncology
 
S. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineS. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineEuropean School of Oncology
 
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...European School of Oncology
 
J.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artJ.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artEuropean School of Oncology
 
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...European School of Oncology
 
T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer European School of Oncology
 
N. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerN. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerEuropean School of Oncology
 
S. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artS. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artEuropean School of Oncology
 
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...European School of Oncology
 
G. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artG. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artEuropean School of Oncology
 
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...European School of Oncology
 

Plus de European School of Oncology (20)

ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
 
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
 
W. Hassen - Bladder cancer - Guidelines
W. Hassen - Bladder cancer - GuidelinesW. Hassen - Bladder cancer - Guidelines
W. Hassen - Bladder cancer - Guidelines
 
A. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasA. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomas
 
H. Khaled - Bladder cancer - State of the art
H. Khaled - Bladder cancer - State of the artH. Khaled - Bladder cancer - State of the art
H. Khaled - Bladder cancer - State of the art
 
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasA. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
 
1 azim
1 azim1 azim
1 azim
 
H. Azim - Lymphomas - State of the art
H. Azim - Lymphomas - State of the artH. Azim - Lymphomas - State of the art
H. Azim - Lymphomas - State of the art
 
S. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineS. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccine
 
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
 
J.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artJ.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the art
 
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
 
V. Kesic - Cervical cancer - State of the art
V. Kesic - Cervical cancer - State of the art V. Kesic - Cervical cancer - State of the art
V. Kesic - Cervical cancer - State of the art
 
T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer
 
N. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerN. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancer
 
S. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artS. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the art
 
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
 
G. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artG. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the art
 
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
 

Dernier

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Dernier (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools

  • 1. Data Management for Predictive Tools Paul Fearn, MBA NLM Informatics Research Fellow Biomedical and Health Informatics University of Washington | Fred Hutchinson Cancer Research Center Seattle, Washington PROSTATE CANCER: PREDICTIVE MODELS FOR DECISION MAKING April 7th – 9th, 2011 - MSKCC - New York, NY
  • 2. Data Management Requirements Need to assemble large datasets for predictive modeling Pooling data across sites, systems and countries Linking data across clinical, specimen and lab repositories Quality assurance (for reproducibility of results) Tradeoffs between accuracy and reproducibility of data points Transparency of data processing Complete and up-to-date datasets Ease to access, sort, filter and export data Statistical analysis in Stata, R, SPSS, SAS, Excel SQL queries and reports Sustainability Secondary (N-ary) use of clinical and research data Cumulative cost of data entry Cumulative cost of staff training and turnover Cumulative risks and opportunity costs of staff entrenchment
  • 3. The Growth Problem Lu Z. PubMed and Beyond. Database 2011;2011:baq036 21245076[pmid]
  • 4. The Growth Problem http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
  • 5. The Growth Problem http://www.ncbi.nlm.nih.gov/books/NBK44423/
  • 6. The Breaking Point 1000 cases
  • 7. The Growth Problem Microsoft Access databases 1999 ProstateDB 1.0 2000 PRDB / Prostabase ColdFusion & SQL Server web-based database 2002 Valhalla 1.0 – 1.1 Prostate 2003 Valhalla 1.2 (7,994 patients) Billing/EMR compliant populated clinic forms ASP.NET & SQL Server web-based database 2004 CAISIS 2.0 – 2.1 (26,470 patients) Integrated bladder, kidney, testis 2005 CAISIS 3.0 – 3.1 (44,000 patients) Prostatectomy eForm, protocol manager, tumor maps 2006 CAISIS 3.5 – (55,000 patients) GU and Urology Prostate Follow-up eForms 2007 CAISIS 4.0 – (80,000 patients) Metadata, dynamic forms, new diseases and eForms 2008 CAISIS 4.1 – (98,000 patients) Email eForms, advanced find, specimen tracking 2009 CAISIS 4.5 – (120,000+ patients) Project tracking, patient education, virtual fields, reporting module 2010 CAISIS 5.0x
  • 8. The Curation Problem Increasing volume of data More data points for annotation Clinical / patient Genomic / biological Public health / environment Parallel curation issues in modern clinical and biological research databases (Krallinger 2008*) Development of NLP system to support clinical research operations (Savova 2010**) *18834499[pmid], **20819853[pmid]
  • 9. On the Other Hand… Long tail of research efforts Small heterogeneous labs and projects Subsets of data Specialized requirements Innovative approaches
  • 10. Spectrum of Approaches One dataset per project (i.e. study based systems) Registry databases (i.e. one treatment or disease) Data warehouse or data repository Common schema (data model) “Amalgamation” of heterogeneous datasets Common security and access Common syntax (data format) Defined links between records Indexed for searching and retrieval Federation / grid of semantically integrated data Common vocabulary / terminology Formal models (caBIG)
  • 11. Loosely Linking Data http://www.ncbi.nlm.nih.gov/sites/gquery
  • 12. Tightly Integrating Data Vocabulary / Terminology NCI Thesaurus (NCIt) NLM UMLS Standard data models caBIG / caDSR HL7/FDA/NCI CDISC / BRIDG Web services* Common syntax / format *Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-20 12000935[pmid]
  • 14. Appendix: 394 people at 60 sites visited from Aug, 2008 to Jun, 2009 Driving Flying
  • 15.
  • 16. Costly curation and support of research databases
  • 17. Widespread and large scale implementation of EMRs
  • 18. Development of data warehouses and repositories
  • 20. Difficulties accessing and retrieving research data
  • 21. Skewed distribution of data systems
  • 22. Prevalence of Microsoft Access and Excel solutions
  • 23. Shifts to less expensive and more open source platforms
  • 24. REDCap, CAISIS, caTissue, Python and BioconductorAppendix: Site Visit Findings
  • 25. Appendix: Clinical Systems Surgical Reports Radiation Therapy Reports Pathology Reports Laboratory Reports Radiology Reports Review of Systems and Patient Reported Outcomes Electronic Medical / Health Records Registration / demographics Clinical trials eligibility and recruitment Scheduling and operations
  • 26. Appendix: Engaging Patients in Data Management Pre-first visit questionnaires Web-based survey systems (e.g. REDCap) Patient reported outcomes Longitudinal follow-up process Tablets, iPads and mobile applications

Notes de l'éditeur

  1. I hope you will give a broad overview of the key features of the database that would allow the development of optimal predictive models, demonstrate how Caisis works to collect clinical and research data, and has proved to be so valuable to the development of predictive models.
  2. Constraints on data entry increase reproducibility, but may decrease accuracyConducive to quantitative research and hypothesis testingOpen fields / coding may increase accuracy, but decrease reproducibilityConducive to qualitative research and discovery
  3. Krallinger et al. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol (2008) vol. 9 Suppl 2 pp. S8Savova et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc (2010) vol. 17 (5) pp. 507-13
  4. Caisis is a data repository. One data model to rule them all
  5. How much time and effort does it take to pool databases and spreadsheets for predictive modeling?Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-2012000935[pmid]If there is a need for large aggregated datasets from heterogeneous sources to support predictive modeling, we need to plan for this model.Building for one site and rolling out to other sites successfully is rare.
  6. Most people proclaimed that they did not want to “reinvent the wheel”, but proceeded to do so. Disconnect between beliefs and actions.Harris et al. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform (2009) vol. 42 (2) pp. 377-81