SlideShare a Scribd company logo
1 of 24
Download to read offline
Data re-use in the CALIBER
programme
Anoop Shah (a.shah@ucl.ac.uk)
Clinical Epidemiology Group, University College London

14th November 2013
1 The CALIBER programme

2 Why make research data re-usable?

3 The CALIBER approach

4 Summary
The CALIBER programme
UCL & LSHTM collaboration
General practice

MINAP registry

CALIBER
linked research database

Death
registrations

Hospital
Episode Statistics

Funded by NIHR and Wellcome Trust
CALIBER data
Defining continuous variables
clinical e.g. blood pressure, laboratory e.g. white cell
count
ˆ

Recorded in CPRD (primary care)
ˆ Identified by ‘entity code’ and medcode (more
granular)
ˆ Lab data now electronically transferred
ˆ Problems:
ˆ
ˆ
ˆ
ˆ

Missing units
Erroneous values
Inconsistent recording
Missing data
Medcodes associated with a test result
Example: neutrophil counts (a type of white blood
cell) – may be absolute or percentage
Medcode Percent Term
18

89.6

Neutrophil count

17622

9.9

Percentage neutrophils

23114

0.3

Granulocyte count

23115

0.1

13777

0.1

Percentage
granulocytes
Neutrophil count NOS
Distribution of values for different units
Most common units
Analysis issues
ˆ
ˆ

Extraction algorithm
Remove biologically implausible extreme values
ˆ

ˆ
ˆ

In a huge dataset with no restriction on possible
values, there will be some errors

Standardise units
Decide how to analyse
ˆ
ˆ
ˆ
ˆ

Timing e.g. relative to index date
Repeat measures
Transformation, splines, categories etc.
Missing data (e.g. multiple imputation)
Observation time in GP practice
ˆ

Observation time – when registered at GP
practice
ˆ Practice ‘up to standard date’ – date after
which we expect that data are recorded
ˆ If nothing recorded while registered at GP:
ˆ
ˆ

ˆ

Patient may be abroad
Patient may be genuinely healthy

Excluding observation time with no records
risks bias
Defining a diagnosis, e.g. atrial fibrillation
Defining a diagnosis

ˆ
ˆ

Cross-map against different datasets
Individual data sources may miss cases, so
consider using linked datasets
ˆ
ˆ

Important for accurate measures of incidence
May be less important for associations between
disease and risk factor, as long as the risk factor
does not influence recording
Non-fatal myocardial infarction – all
sources miss cases
MINAP
disease
registry

8%
6%
Primary
care
(CPRD)

18%

7%
20%

10%

Hospital
Episode
Statistics
Motivations for re-using data

ˆ

Time taken to prepare data and define
variables
ˆ

ˆ

Cost

Different definitions used by different groups
ˆ

Lack of transparency and reproducibility
Possible approaches

ˆ

Ad hoc sharing of codelists and algorithms
within a group
ˆ Publish codelists and algorithms with papers
ˆ The CALIBER approach
ˆ
ˆ

Repository of codelists and algorithms
Web portal for researcher access
CALIBER ‘LEGO’ data access model
1001, 2000-01-01, 23,1,NULL,I48
1001, 1994-08-11,1234,1,3,7L1H300
1001, 1993-01-01, 253,1,1,793Mz00
1231, 2012-03-03, 23,1,123,K65
1121, 2013-05-04, 7,1,3,5,14AN.00
1121, 2011-05-21, 81,1,9, G573100
1511, 1993-01-11, 91,1,6,9hF1.00
1511, 199-03-11, 91,1,6, G573100
9913, 2012-05-21, 81,1,9, G573100
67222, 1994-11-01,1234,1,3,7L1H300
67222, 1995-12-21,1234,1,3,7L1H300
67222, 1991-03-03,1234,1,3,7L1H310
682444, 1993-01-01, 253,1,1,793Mz00

1001, 2000-01-01, af_gprd=1
1231, 2012-03-03, af_hes=3
1121, 2013-05-04, af_procs_gprd=1
1511, 1993-01-11, heart_valve_gprd=2
9913, 2012-05-21, af_hes=1
67222, 1994-08-11, af_hes=1
682444, 1993-01-01, heart_valve_hes=2

af=1, af_diag_date=2001-12-01
CALIBER phenotypes (research variables)
ˆ

Consistent definitions for multiple studies (over
300 variables curated)
ˆ Read, ICD-9, ICD-10, OPCS codelists
ˆ Web portal to view variable definitions, and
registered users can view codelists (https:
//www.caliberresearch.org/portal)
ˆ Future: able to download scripts (e.g. Stata, R,
SQL)
CALIBER data portal
Open data
CALIBER data portal

ˆ

Encourage researchers to define variables in a
way that will be of use to others
ˆ Final validated versions of codelists and
variables
ˆ Review by clinician and researcher
CALIBER analysis software

ˆ

R packages for managing codelists and data
preparation (http://caliberanalysis.
r-forge.r-project.org/)
ˆ Lookup tables and data dictionaries
ˆ Functions to simplify / automate common
steps in data preparation
CALIBER expects researchers to
contribute to the resource
Investigators

Noninvestigators
Nonexperienced

Experienced

Research
coordinator

Industry

Website form

Approvals

Data

Analysis

Publication

Impacts

Website
content

Project feasibility and prioritization

Unified data access form

LEGO data access model
Contribute phenotyping algorithms, linkages

Contribute to knowledge base

Open access

Advancement of knowledge
Translation
Legislation, policy, guidelines
Economic benefit, industry
Difficulties encountered

ˆ

Setting up the data portal takes time, needs
dedicated staff
ˆ Researchers need to think outside their own
project
ˆ Variables are updated / corrected; need to
store different versions
Summary

ˆ

When analysing routine data think about how
the data were collected, and cross-check
different sources of information
ˆ Data sharing and re-use can bring benefits but
needs time and resources to manage

More Related Content

Similar to Data re-use in the CALIBER programme

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralPaolo Missier
 
2019 Project Showcase - Alexander Adam Laurence
2019 Project Showcase - Alexander Adam Laurence2019 Project Showcase - Alexander Adam Laurence
2019 Project Showcase - Alexander Adam LaurenceAlexanderAdamLaurenc
 
ThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersollThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersollKyle Ingersoll
 
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdf
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdfebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdf
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdfAgathaHaselvin
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
 
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Wei Zhong Toh
 
BPM19 - trace clustering on very large event data
BPM19 - trace clustering on very large event dataBPM19 - trace clustering on very large event data
BPM19 - trace clustering on very large event dataXixi Lu
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiersamreshkr19
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizLuis Marco Ruiz
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
REVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENT
REVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENTREVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENT
REVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENTFalcon Geomatics LLC
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
WE1.TO9.2.pdf
WE1.TO9.2.pdfWE1.TO9.2.pdf
WE1.TO9.2.pdfgrssieee
 
Human Studies Database Project (demo)
Human Studies Database Project (demo)Human Studies Database Project (demo)
Human Studies Database Project (demo)Ida Sim
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatisticshydrologyproject001
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatisticshydrologyproject0
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatisticshydrologyproject0
 

Similar to Data re-use in the CALIBER programme (20)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
2019 Project Showcase - Alexander Adam Laurence
2019 Project Showcase - Alexander Adam Laurence2019 Project Showcase - Alexander Adam Laurence
2019 Project Showcase - Alexander Adam Laurence
 
ThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersollThesisDefensePresentation_KyleIngersoll
ThesisDefensePresentation_KyleIngersoll
 
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdf
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdfebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdf
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdf
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
 
BPM19 - trace clustering on very large event data
BPM19 - trace clustering on very large event dataBPM19 - trace clustering on very large event data
BPM19 - trace clustering on very large event data
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco Ruiz
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
REVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENT
REVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENTREVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENT
REVOLUTIONIZING RESEARCH: CUTTING-EDGE TECHNOLOGIES IN LAB EQUIPMENT
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
WE1.TO9.2.pdf
WE1.TO9.2.pdfWE1.TO9.2.pdf
WE1.TO9.2.pdf
 
Human Studies Database Project (demo)
Human Studies Database Project (demo)Human Studies Database Project (demo)
Human Studies Database Project (demo)
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
 
Download-manuals-surface water-software-47basicstatistics
 Download-manuals-surface water-software-47basicstatistics Download-manuals-surface water-software-47basicstatistics
Download-manuals-surface water-software-47basicstatistics
 

More from London School of Hygiene and Tropical Medicine

More from London School of Hygiene and Tropical Medicine (20)

Preparing to submit your thesis at LSHTM
Preparing to submit your thesis at LSHTMPreparing to submit your thesis at LSHTM
Preparing to submit your thesis at LSHTM
 
Your research is more than a thesis: Make the most of research data and other...
Your research is more than a thesis: Make the most of research data and other...Your research is more than a thesis: Make the most of research data and other...
Your research is more than a thesis: Make the most of research data and other...
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
Information Security and GDPR
Information Security and GDPRInformation Security and GDPR
Information Security and GDPR
 
GDPR and Research Data Management
GDPR and Research Data ManagementGDPR and Research Data Management
GDPR and Research Data Management
 
Towards Open Research: practices, experiences, barriers and opportunities
Towards Open Research: practices, experiences, barriers and opportunitiesTowards Open Research: practices, experiences, barriers and opportunities
Towards Open Research: practices, experiences, barriers and opportunities
 
Data Journals and repositories: Getting academic credit for data sharing
Data Journals and repositories: Getting academic credit for data sharingData Journals and repositories: Getting academic credit for data sharing
Data Journals and repositories: Getting academic credit for data sharing
 
Crowd sourcing and high resolution satellite imagery in public health
Crowd sourcing and high resolution satellite imagery in public healthCrowd sourcing and high resolution satellite imagery in public health
Crowd sourcing and high resolution satellite imagery in public health
 
Determining the relationship between physical environment and weight status u...
Determining the relationship between physical environment and weight status u...Determining the relationship between physical environment and weight status u...
Determining the relationship between physical environment and weight status u...
 
i-Sense: an early-warning sensing systems for infectious diseases
i-Sense: an early-warning sensing systems for infectious diseasesi-Sense: an early-warning sensing systems for infectious diseases
i-Sense: an early-warning sensing systems for infectious diseases
 
Internet-based surveillance of illness: the FluSurvey platform
Internet-based surveillance of illness: the FluSurvey platformInternet-based surveillance of illness: the FluSurvey platform
Internet-based surveillance of illness: the FluSurvey platform
 
An overview of the MyHeart Counts app
An overview of the MyHeart Counts appAn overview of the MyHeart Counts app
An overview of the MyHeart Counts app
 
Electronic data collection for a modular household survey in Ethiopia
Electronic data collection for a modular household survey in EthiopiaElectronic data collection for a modular household survey in Ethiopia
Electronic data collection for a modular household survey in Ethiopia
 
Mobile-Based Experience Sampling for Behaviour Research
Mobile-Based Experience Sampling for Behaviour ResearchMobile-Based Experience Sampling for Behaviour Research
Mobile-Based Experience Sampling for Behaviour Research
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
RDM Training for health researchers: An institutional perspective
RDM Training for health researchers: An institutional perspectiveRDM Training for health researchers: An institutional perspective
RDM Training for health researchers: An institutional perspective
 
Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...
Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...
Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...
 
Research data services at the University of Oxford
Research data services at the University of OxfordResearch data services at the University of Oxford
Research data services at the University of Oxford
 
Research Data Management at The University of Edinburgh
Research Data Management at The University of EdinburghResearch Data Management at The University of Edinburgh
Research Data Management at The University of Edinburgh
 
Research data management at UAL
Research data management at UALResearch data management at UAL
Research data management at UAL
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Data re-use in the CALIBER programme

  • 1. Data re-use in the CALIBER programme Anoop Shah (a.shah@ucl.ac.uk) Clinical Epidemiology Group, University College London 14th November 2013
  • 2. 1 The CALIBER programme 2 Why make research data re-usable? 3 The CALIBER approach 4 Summary
  • 3. The CALIBER programme UCL & LSHTM collaboration General practice MINAP registry CALIBER linked research database Death registrations Hospital Episode Statistics Funded by NIHR and Wellcome Trust
  • 5. Defining continuous variables clinical e.g. blood pressure, laboratory e.g. white cell count ˆ Recorded in CPRD (primary care) ˆ Identified by ‘entity code’ and medcode (more granular) ˆ Lab data now electronically transferred ˆ Problems: ˆ ˆ ˆ ˆ Missing units Erroneous values Inconsistent recording Missing data
  • 6. Medcodes associated with a test result Example: neutrophil counts (a type of white blood cell) – may be absolute or percentage Medcode Percent Term 18 89.6 Neutrophil count 17622 9.9 Percentage neutrophils 23114 0.3 Granulocyte count 23115 0.1 13777 0.1 Percentage granulocytes Neutrophil count NOS
  • 7. Distribution of values for different units
  • 9. Analysis issues ˆ ˆ Extraction algorithm Remove biologically implausible extreme values ˆ ˆ ˆ In a huge dataset with no restriction on possible values, there will be some errors Standardise units Decide how to analyse ˆ ˆ ˆ ˆ Timing e.g. relative to index date Repeat measures Transformation, splines, categories etc. Missing data (e.g. multiple imputation)
  • 10. Observation time in GP practice ˆ Observation time – when registered at GP practice ˆ Practice ‘up to standard date’ – date after which we expect that data are recorded ˆ If nothing recorded while registered at GP: ˆ ˆ ˆ Patient may be abroad Patient may be genuinely healthy Excluding observation time with no records risks bias
  • 11. Defining a diagnosis, e.g. atrial fibrillation
  • 12. Defining a diagnosis ˆ ˆ Cross-map against different datasets Individual data sources may miss cases, so consider using linked datasets ˆ ˆ Important for accurate measures of incidence May be less important for associations between disease and risk factor, as long as the risk factor does not influence recording
  • 13. Non-fatal myocardial infarction – all sources miss cases MINAP disease registry 8% 6% Primary care (CPRD) 18% 7% 20% 10% Hospital Episode Statistics
  • 14. Motivations for re-using data ˆ Time taken to prepare data and define variables ˆ ˆ Cost Different definitions used by different groups ˆ Lack of transparency and reproducibility
  • 15. Possible approaches ˆ Ad hoc sharing of codelists and algorithms within a group ˆ Publish codelists and algorithms with papers ˆ The CALIBER approach ˆ ˆ Repository of codelists and algorithms Web portal for researcher access
  • 16. CALIBER ‘LEGO’ data access model 1001, 2000-01-01, 23,1,NULL,I48 1001, 1994-08-11,1234,1,3,7L1H300 1001, 1993-01-01, 253,1,1,793Mz00 1231, 2012-03-03, 23,1,123,K65 1121, 2013-05-04, 7,1,3,5,14AN.00 1121, 2011-05-21, 81,1,9, G573100 1511, 1993-01-11, 91,1,6,9hF1.00 1511, 199-03-11, 91,1,6, G573100 9913, 2012-05-21, 81,1,9, G573100 67222, 1994-11-01,1234,1,3,7L1H300 67222, 1995-12-21,1234,1,3,7L1H300 67222, 1991-03-03,1234,1,3,7L1H310 682444, 1993-01-01, 253,1,1,793Mz00 1001, 2000-01-01, af_gprd=1 1231, 2012-03-03, af_hes=3 1121, 2013-05-04, af_procs_gprd=1 1511, 1993-01-11, heart_valve_gprd=2 9913, 2012-05-21, af_hes=1 67222, 1994-08-11, af_hes=1 682444, 1993-01-01, heart_valve_hes=2 af=1, af_diag_date=2001-12-01
  • 17. CALIBER phenotypes (research variables) ˆ Consistent definitions for multiple studies (over 300 variables curated) ˆ Read, ICD-9, ICD-10, OPCS codelists ˆ Web portal to view variable definitions, and registered users can view codelists (https: //www.caliberresearch.org/portal) ˆ Future: able to download scripts (e.g. Stata, R, SQL)
  • 20. CALIBER data portal ˆ Encourage researchers to define variables in a way that will be of use to others ˆ Final validated versions of codelists and variables ˆ Review by clinician and researcher
  • 21. CALIBER analysis software ˆ R packages for managing codelists and data preparation (http://caliberanalysis. r-forge.r-project.org/) ˆ Lookup tables and data dictionaries ˆ Functions to simplify / automate common steps in data preparation
  • 22. CALIBER expects researchers to contribute to the resource Investigators Noninvestigators Nonexperienced Experienced Research coordinator Industry Website form Approvals Data Analysis Publication Impacts Website content Project feasibility and prioritization Unified data access form LEGO data access model Contribute phenotyping algorithms, linkages Contribute to knowledge base Open access Advancement of knowledge Translation Legislation, policy, guidelines Economic benefit, industry
  • 23. Difficulties encountered ˆ Setting up the data portal takes time, needs dedicated staff ˆ Researchers need to think outside their own project ˆ Variables are updated / corrected; need to store different versions
  • 24. Summary ˆ When analysing routine data think about how the data were collected, and cross-check different sources of information ˆ Data sharing and re-use can bring benefits but needs time and resources to manage