SlideShare une entreprise Scribd logo
1  sur  10
Preparing your
taxonomy to
be ready for data
scientists & machine
readability: A case study
and work in progress
Mary Chitty,
Library Director &
Taxonomist, MSLS
Cambridge Healthtech,
Needham MA
mchitty@healthtech.com
SLA Annual Conference, Cleveland Ohio, Tuesday, June 18, 2019 ,
Taxonomy-Ontology Conversions: Case Studies
1992
2000
2006-14
2016 2018-19
Historical Taxonomy Process
Taxonomies & Ontologies glossary&taxonomy http://www.genomicglossaries.com/content/ontologies.asp
Company founded.
Taxonomy created by CEO
with a few hundred terms.
Major products:
conferences on emerging
technologies. focus on
preclinical drug discovery.
Acquired companies dealing
with bioinformatics, clinical
trials, energy and batteries.
Still integrating their
databases.
Met people from
OntoForce, Belgian
semantic search
engine company.
Began informal
collaboration.
Acquired companies in artificial
intelligence and Internet of
Thing. Still determining how to
integrate databases. Several
data scientists hired. Signed
formal contract with OntoForce
to use Disqover search engine.
https://www.ontoforce.com/
Taxonomy stands at
1,600+ terms now.
Conferences and other
products in preclinical and
clinical biotech and
pharma, clinical trials,
energy , AI and Internet of
Things and more.
Published Genomic
Glossaries & Taxonomies
www.genomicglossaries.com
2019
Ongoing challenges
Legacy data with inconsistencies, redundancies and ambiguities.
Integrating company acquisitions’ data into in-house database.
Still cleaning up, disambiguating and documenting in-house data and database.
Scaling up difficulties often underestimated. A major pain point for us right now.
FAIR Data
Both the EuropeanCommissionand NIH have allocatedconsiderableresourcesto making dataFAIRer.
https://www.go-fair.org/fair-principles/
Findable
• First step in
(re)using data is
to find them.
Metadata and
data should be
easy to find for
both humans
and computers.
… an essential
component of
the FAIRification
process.
Accessible
• Once the user
finds the
required data,
she/he needs to
know how can
they be
accessed
Interoperable
• Data usually
need to be
integrated with
other data …
need to
interoperate with
applications or
workflows.
Reusable
• Ultimate goal of
FAIR is to
optimise the
reuse of data…
metadata and
data should be
well-described
so that they can
be replicated
and/or combined
in different
settings.
Taxonomies and ontologies are critical for interoperability
and reproducibility, particularly in the life sciences.
Life sciences data relatively
sparse, with many attributes
”highly dimensional”, leading
to complexity and sometimes
chaos. Data on longitudinal
health outcomes limited by
HIPAA & other privacy
regulations, but crucial for
validation.
Increasing attention
being paid to data
stewardship and data
curation. Support still
a tough sell.
Reproducibility crisis?
More than 70% of
researchers have tried and
failed to reproduce
experiments.
More than half have failed
to reproduce their own
experiments.
Nature 2016 survey of researchers.
https://www.nature.com/news/1-500-
scientists-lift-the-lid-on-reproducibility-
1.19970
Life science ontologies and taxonomies
So many to choose from!
BioPortal https://bioportal.bioontology.org/
repository of biomedical ontologies has almost
800 ontologies, and mapping from ontologies
to I2B2 http://i2b2.bioontology.org/
Interdisciplinary work holds great
promise – and needs mapping of
terms between disciplines.
Pistoia Alliance Ontologies Mapping
https://www.pistoiaalliance.org/projects/curre
nt-projects/ontologies-mapping/
Data mapping also known as “data
wrangling” or “data munging”. Many
people trying to automate. Still
works in progress.
ROI Return On Investment & Cost Benefit
Cost of not having FAIR research data, PwC EU Services, 2018, European Union Publications.
https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
Stakeholders may
balk at investing in
taxonomies or
ontologies. Software,
other IT & technology
considerations only
part of the issues.
Educating decision
makers is an
ongoing process,
even with CXOs who
value taxonomies
and ontologies.
Estimated cost
benefit analysis of
not having FAIR
research data:
Minimum of 10.2
billion Euros per
year.
Key insights
“…[T]here is a lot of work that needs doing
to prepare the data sets for these
technologies … there is a disproportionate
amount being invested in the technologies
as opposed to investing in "data-
readiness“… It's just not a slam dunk to
mash up a lot of data and think it will work."
Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype” Francois Nicolas and comment by
Christy Wilson https://www.lifescienceleader.com/doc/ai-in-life-sciences-seeing-past-the-hype-0001
“The AI solution may help accelerate some tasks, but
human expertise may be required for the broad
scope of what is needed. Currently AI in healthcare is
in the second stage of the Gartner Hype Cycle: “the
peak of inflated expectation.” However, if we don’t
allow it to catch up to the hype, it may fall back into
what Gartner calls the “trough of disillusionment.”
Key takeaways
Don’t try to “boil the
ocean”. Prototype early and
often. Think modular
• Pareto Principle 80/20
80% of effects come from
20% of effort.
Don’t try for 100%.
• Identify what your
stakeholders value.
Aim for quick wins.
Understand existing
workflows.
• Seek out allies and shared
buy-in for justification and
sustainability.
• Bundle stakeholders’ key
wants and items you know
they will eventually need.
Communicating ROI on
taxonomies, ontologies and
metadata is still challenging.
• Expectations and change
management are crucial
skills to cultivate.
• Report metrics quantitative
and qualitative.
• Recognize some challenges
not yet resolved by anyone.
Acknowledgments
Many people have participated in this ongoing project. I’m grateful for their work, insights and
encouragement.
Cambridge Innovation
Institute CII
& Cambridge Healthtech
• Phillips Kuhl, President
• Tonya Urquizo,
Knowledge Information
Services Analyst and IT
Liaison
Sanaye Bartlett, Data
Analyst & Project Manager
• Kaushik Chaudhuri,
Director of Product
Marketing
CII Disqover Team
• Kaitlyn Barago,
Associate Conference
Producer
• Nancy Clarke, Data Scientist
• Mike Croft,
Software Architect
• Ben Lakin,
Director New Initiatives
• Jaime Parlee, Director
Marketing Analytics
• Craig Wohlers, Manager
Knowledge Foundation
OntoForce
• Hans Constandt, CEO &
Founder
• Filip Pattyn, Scientific Lead
• Carla Suijkerbuijk, Business
Development North America
• Niels Vanneste,
Customer Data Scientist
• Berenice Wulbrecht, Data
Science Director, Systems
Biology
Fruitful Conversations and
emails
• Ingrid Akerblom, IEA
Diversified Consulting
• Juliane Schneider, Lead
Data Curator, eagle-I,
Harvard Catalyst
• Jane Lomax,
Head Ontologist, SciBite
• Terence Russell,
Chief Technologist, IRODS
Consortium
• John Wilbanks,
Chief Commons Officer,
Sage Bionetworks

Contenu connexe

Tendances

Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector
Chris Groves
 
Optimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studiesOptimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studies
rpochadt
 

Tendances (20)

Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
Big implications of Big Data in healthcare
Big implications of Big Data in healthcareBig implications of Big Data in healthcare
Big implications of Big Data in healthcare
 
The biggest opportunities in digital health for Turkey's Medical Sector
The biggest opportunities in digital health  for Turkey's Medical Sector The biggest opportunities in digital health  for Turkey's Medical Sector
The biggest opportunities in digital health for Turkey's Medical Sector
 
How to Achieve the Competencies of Successful Value-based Contracting Delive...
How to Achieve the Competencies of Successful Value-based Contracting Delive...How to Achieve the Competencies of Successful Value-based Contracting Delive...
How to Achieve the Competencies of Successful Value-based Contracting Delive...
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returns
 
Demand connected medical devices to improve military EHRs
Demand connected medical devices to improve military EHRsDemand connected medical devices to improve military EHRs
Demand connected medical devices to improve military EHRs
 
Distributed Ledger Tech Applications - Health Report V1.5
Distributed Ledger Tech Applications - Health Report V1.5Distributed Ledger Tech Applications - Health Report V1.5
Distributed Ledger Tech Applications - Health Report V1.5
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector
 
Transforming Healthcare: The Promise of Innovation
Transforming Healthcare: The Promise of InnovationTransforming Healthcare: The Promise of Innovation
Transforming Healthcare: The Promise of Innovation
 
HealthCare and Big Data with Hadoop
HealthCare and Big Data with HadoopHealthCare and Big Data with Hadoop
HealthCare and Big Data with Hadoop
 
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
 
Optimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studiesOptimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studies
 
Healthcare and Big Data - May 2017
Healthcare and Big Data -  May 2017Healthcare and Big Data -  May 2017
Healthcare and Big Data - May 2017
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
 
Analytics Hygiene - eMetrics Chicago 2014
Analytics Hygiene - eMetrics Chicago 2014Analytics Hygiene - eMetrics Chicago 2014
Analytics Hygiene - eMetrics Chicago 2014
 
What’s next for healthcare information technology innovation?
What’s next for healthcare information technology innovation?What’s next for healthcare information technology innovation?
What’s next for healthcare information technology innovation?
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized MedicineBig Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
 

Similaire à Chitty taxo cleveland 2019 june

Accelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder CollaborationsAccelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder Collaborations
WorldCongress
 
pc15257_brochure original
pc15257_brochure originalpc15257_brochure original
pc15257_brochure original
Daria Binder
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
DataScienceConferenc1
 
Data Driven Health Care Enterprise
Data Driven Health Care EnterpriseData Driven Health Care Enterprise
Data Driven Health Care Enterprise
albinpaul
 

Similaire à Chitty taxo cleveland 2019 june (20)

How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-in
 
Linked data in industry
Linked data in industryLinked data in industry
Linked data in industry
 
Accelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder CollaborationsAccelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder Collaborations
 
pc15257_brochure original
pc15257_brochure originalpc15257_brochure original
pc15257_brochure original
 
CLGPPT FOR DISEASE DETECTION PRESENTATION
CLGPPT FOR DISEASE DETECTION PRESENTATIONCLGPPT FOR DISEASE DETECTION PRESENTATION
CLGPPT FOR DISEASE DETECTION PRESENTATION
 
Lighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome Launchpad
 
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
 
CRO Industry Overview
CRO Industry OverviewCRO Industry Overview
CRO Industry Overview
 
Innovation series 112318
Innovation series 112318Innovation series 112318
Innovation series 112318
 
디지털 헬스케어, 그리고 예상되는 법적 이슈들
디지털 헬스케어, 그리고 예상되는 법적 이슈들디지털 헬스케어, 그리고 예상되는 법적 이슈들
디지털 헬스케어, 그리고 예상되는 법적 이슈들
 
SMi Group's BioBanking 2018
SMi Group's BioBanking 2018SMi Group's BioBanking 2018
SMi Group's BioBanking 2018
 
Data Driven Health Care Enterprise
Data Driven Health Care EnterpriseData Driven Health Care Enterprise
Data Driven Health Care Enterprise
 
BMSystems-corporate-management-summary
BMSystems-corporate-management-summaryBMSystems-corporate-management-summary
BMSystems-corporate-management-summary
 
Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...
 
Clinical Research Informatics World 2015
Clinical Research Informatics World 2015Clinical Research Informatics World 2015
Clinical Research Informatics World 2015
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
의료의 미래, 디지털 헬스케어
의료의 미래, 디지털 헬스케어의료의 미래, 디지털 헬스케어
의료의 미래, 디지털 헬스케어
 
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
 

Dernier

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 

Dernier (20)

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

Chitty taxo cleveland 2019 june

  • 1. Preparing your taxonomy to be ready for data scientists & machine readability: A case study and work in progress Mary Chitty, Library Director & Taxonomist, MSLS Cambridge Healthtech, Needham MA mchitty@healthtech.com SLA Annual Conference, Cleveland Ohio, Tuesday, June 18, 2019 , Taxonomy-Ontology Conversions: Case Studies
  • 2. 1992 2000 2006-14 2016 2018-19 Historical Taxonomy Process Taxonomies & Ontologies glossary&taxonomy http://www.genomicglossaries.com/content/ontologies.asp Company founded. Taxonomy created by CEO with a few hundred terms. Major products: conferences on emerging technologies. focus on preclinical drug discovery. Acquired companies dealing with bioinformatics, clinical trials, energy and batteries. Still integrating their databases. Met people from OntoForce, Belgian semantic search engine company. Began informal collaboration. Acquired companies in artificial intelligence and Internet of Thing. Still determining how to integrate databases. Several data scientists hired. Signed formal contract with OntoForce to use Disqover search engine. https://www.ontoforce.com/ Taxonomy stands at 1,600+ terms now. Conferences and other products in preclinical and clinical biotech and pharma, clinical trials, energy , AI and Internet of Things and more. Published Genomic Glossaries & Taxonomies www.genomicglossaries.com 2019
  • 3. Ongoing challenges Legacy data with inconsistencies, redundancies and ambiguities. Integrating company acquisitions’ data into in-house database. Still cleaning up, disambiguating and documenting in-house data and database. Scaling up difficulties often underestimated. A major pain point for us right now.
  • 4. FAIR Data Both the EuropeanCommissionand NIH have allocatedconsiderableresourcesto making dataFAIRer. https://www.go-fair.org/fair-principles/ Findable • First step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. … an essential component of the FAIRification process. Accessible • Once the user finds the required data, she/he needs to know how can they be accessed Interoperable • Data usually need to be integrated with other data … need to interoperate with applications or workflows. Reusable • Ultimate goal of FAIR is to optimise the reuse of data… metadata and data should be well-described so that they can be replicated and/or combined in different settings.
  • 5. Taxonomies and ontologies are critical for interoperability and reproducibility, particularly in the life sciences. Life sciences data relatively sparse, with many attributes ”highly dimensional”, leading to complexity and sometimes chaos. Data on longitudinal health outcomes limited by HIPAA & other privacy regulations, but crucial for validation. Increasing attention being paid to data stewardship and data curation. Support still a tough sell. Reproducibility crisis? More than 70% of researchers have tried and failed to reproduce experiments. More than half have failed to reproduce their own experiments. Nature 2016 survey of researchers. https://www.nature.com/news/1-500- scientists-lift-the-lid-on-reproducibility- 1.19970
  • 6. Life science ontologies and taxonomies So many to choose from! BioPortal https://bioportal.bioontology.org/ repository of biomedical ontologies has almost 800 ontologies, and mapping from ontologies to I2B2 http://i2b2.bioontology.org/ Interdisciplinary work holds great promise – and needs mapping of terms between disciplines. Pistoia Alliance Ontologies Mapping https://www.pistoiaalliance.org/projects/curre nt-projects/ontologies-mapping/ Data mapping also known as “data wrangling” or “data munging”. Many people trying to automate. Still works in progress.
  • 7. ROI Return On Investment & Cost Benefit Cost of not having FAIR research data, PwC EU Services, 2018, European Union Publications. https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1 Stakeholders may balk at investing in taxonomies or ontologies. Software, other IT & technology considerations only part of the issues. Educating decision makers is an ongoing process, even with CXOs who value taxonomies and ontologies. Estimated cost benefit analysis of not having FAIR research data: Minimum of 10.2 billion Euros per year.
  • 8. Key insights “…[T]here is a lot of work that needs doing to prepare the data sets for these technologies … there is a disproportionate amount being invested in the technologies as opposed to investing in "data- readiness“… It's just not a slam dunk to mash up a lot of data and think it will work." Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype” Francois Nicolas and comment by Christy Wilson https://www.lifescienceleader.com/doc/ai-in-life-sciences-seeing-past-the-hype-0001 “The AI solution may help accelerate some tasks, but human expertise may be required for the broad scope of what is needed. Currently AI in healthcare is in the second stage of the Gartner Hype Cycle: “the peak of inflated expectation.” However, if we don’t allow it to catch up to the hype, it may fall back into what Gartner calls the “trough of disillusionment.”
  • 9. Key takeaways Don’t try to “boil the ocean”. Prototype early and often. Think modular • Pareto Principle 80/20 80% of effects come from 20% of effort. Don’t try for 100%. • Identify what your stakeholders value. Aim for quick wins. Understand existing workflows. • Seek out allies and shared buy-in for justification and sustainability. • Bundle stakeholders’ key wants and items you know they will eventually need. Communicating ROI on taxonomies, ontologies and metadata is still challenging. • Expectations and change management are crucial skills to cultivate. • Report metrics quantitative and qualitative. • Recognize some challenges not yet resolved by anyone.
  • 10. Acknowledgments Many people have participated in this ongoing project. I’m grateful for their work, insights and encouragement. Cambridge Innovation Institute CII & Cambridge Healthtech • Phillips Kuhl, President • Tonya Urquizo, Knowledge Information Services Analyst and IT Liaison Sanaye Bartlett, Data Analyst & Project Manager • Kaushik Chaudhuri, Director of Product Marketing CII Disqover Team • Kaitlyn Barago, Associate Conference Producer • Nancy Clarke, Data Scientist • Mike Croft, Software Architect • Ben Lakin, Director New Initiatives • Jaime Parlee, Director Marketing Analytics • Craig Wohlers, Manager Knowledge Foundation OntoForce • Hans Constandt, CEO & Founder • Filip Pattyn, Scientific Lead • Carla Suijkerbuijk, Business Development North America • Niels Vanneste, Customer Data Scientist • Berenice Wulbrecht, Data Science Director, Systems Biology Fruitful Conversations and emails • Ingrid Akerblom, IEA Diversified Consulting • Juliane Schneider, Lead Data Curator, eagle-I, Harvard Catalyst • Jane Lomax, Head Ontologist, SciBite • Terence Russell, Chief Technologist, IRODS Consortium • John Wilbanks, Chief Commons Officer, Sage Bionetworks

Notes de l'éditeur

  1. Key motivations for taxonomy changes were company acquisitions in new disciplines, and new data science hires.
  2. No easy answers. issues around integrating internal and external ontologies.. Starting to look into issues around ambiguity. Progress often seems to be three steps forward, one or two steps back.
  3. A colleague commented “As science becomes ever more interdisciplinary, it is a huge challenge to map data on different granular levels but semantically link them across different languages, standards, and cultures .
  4. An ontology colleague notes “Institutions either underestimate the resources needed to do this work , or they are daunted by the entire prospect and researchers have to find repositories/help outside the institution to store and curate their data, if they bother to do so. Honestly, very little data will ever be reused. ”
  5. Some resources for locating life science ontologies and mappings. Bioportal has 773 ontologies as of May 2019. Graph based ontologies, open vs proprietary ontologies, My in-house taxonomy tends to be narrow and deep. Some external taxonomies tend to be broad and shallow.
  6. PwC publication estimates time lost per year at 4.5 billion Euros, cost of storage 5.3 billion Euros [only data from academic research, private sector data not available]; license cost 360 million [private sector data not available]. Interdisciplinary and potential economic growth impacts cannot be estimated reliably.
  7. People don’t always know what they want or will eventually need., and can have difficulty articulating their desires. Important to have understanding of the challenges of the people whose problems you are trying to solve. If you ask them to change their workflow drastically, change will never happen. Don’t be too hard on yourself . Some of these are issues everyone else is still trying to figure out.