SlideShare a Scribd company logo
1 of 1
Download to read offline
Introduction
Next-generation sequencing technologies have led to a rapid
production of high-throughput sequence data characterizing
adaptive immune-receptor repertoires (AIRRs). As part of the
AIRR community (http://airr-community.org) data standards
working group, we have developed an initial set of metadata
recommendations for publishing AIRR sequencing studies.
These recommendations will be implemented in several
public repositories, including the NCBI sequence read archive
(SRA). Submissions to SRA typically use a flat-file template
and include only a minimal amount of term validation. In
order to ease the metadata authoring and to implement the
ontological terms validation of repertoire sequence data, we
are developing an interactive template through CEDAR
workbench that will allow for ontological validation, and
subsequent deposition in SRA. CEDAR workbench also allows
the user to populate the template with metadata for data
submission to various data repositories. The incorporation of
template-element level ontology mapping not only facilitates
validation of data submission, but also enables intelligent
queries within and across repositories.
High-quality Metadata and Challenges
High-quality metadata are seen as crucial to facilitate
knowledge discovery. The biomedical community has a
strong history of tackling metadata challenge by driving the
development of metadata templates. These templates focus
on addressing the reproducibility challenge by providing
detailed checklists of the metadata needed to describe
particular types of experimental data sources. The key goal is
to provide sufficient metadata to enable the source studies
to be reproduced. While individual metadata templates can
provide a standard format for a particular data source, they
rarely share common structure or semantics. There is also a
disconnect between the high-level checklist-based template
definitions developed by scientific communities and the
submission formats required by metadata repositories.
Moreover, different repositories provide their locally defined
templates for describing metadata. These templates lack the
use of common data elements and standard vocabularies.
This creates a barrier for sharing and using metadata to
enable knowledge discovery. We use CEDAR workbench to
create common templates for entering metadata. To
enhance machine readability, we use CEDAR’s capability to
link individual data elements and their values to ontology
concepts
AIRR Data Submission to SRA Leveraging
CEDAR Workbench
CEDAR is supported by grant U54 AI117925 awarded by the National Institute of Allergy and Infectious Diseases through funds provided by the trans-NIH Big Data toKnowledge (BD2K) initiative (www.bd2k.nih.gov).
Syed Ahmad Chan Bukhari1, Martin J. O'Connor2, John Graybeal2, Mark A. Musen2, Kei-Hoi Cheung3, Steven H. Kleinstein1
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Immune
Receptor Repertoire Data to the Sequence Read Archive (SRA)
1Department of Pathology, Yale School of Medicine, New Haven, CT , 2Center for Expanded Data Annotation and Retrieval, Stanford Center for Biomedical
Informatics Research, Stanford University and 3Department of Emergency Medicine, Yale School of Medicine, New Haven, CT
Figure 1. Metadata Life Cycle in CEDAR Workbench
The CEDAR Workbench
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical
datasets to facilitate data discovery, data interpretation, and data reuse. CEDAR takes advantage of emerging community-based standard
templates for describing different kinds of biomedical datasets. CEDAR workbench investigates the use of computational techniques to help
investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify
metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture
annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including
secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology
Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the
problems of metadata authoring and management that will generalize to other data-management environments.
CEDAR
CENTER FOR EXPANDED DATA
ANNOTATION AND RETRIEVAL
CEDAR
R EXPANDED DATA
ON AND RETRIEVAL
Minimal Standards WG Recommendations for The AIRR Sequencing Data
As high throughput experiments become more prevalent
in the field of Immunology and elsewhere, there is an
increased need for collective organization of data and
standardized methods of data reporting. No current
standards exist for adaptive immune receptor repertoire
sequencing data. Data and metadata formats need to be
harmonized so that data from different experiments can
be mined. Once recovered, the mined data need to have
sufficient descriptive metadata in order to be useful. To
fulfill these unmet needs, we propose a set of minimal
standards that we recommend journals adopt and that
could form the requirements for submission to a public
data repository:
1. The experimental study design including sample data
relationships (e.g., which raw data file(s) relate to
which sample, which samples are technical, which are
biological replicates).
2. The essential sample annotation including
experimental factors and their values (e.g., the set of
markers used to sort the cell population being
studied).
3. Sufficient	annotation	of	the	amplicon being	sequenced that	would	allow	the	
raw	data	to	be	transformed	into	the	processed	sequences	(e.g.,	barcodes,	
primers,	unique	molecular	identifiers).
4. The	raw	data for	each	sequencing	run	(e.g.,	FASTQ	files)
5. The	essential	laboratory	and	data	processing	protocols (e.g.,	software	tools	
with	version	numbers,	quality	thresholds,	 primer	match	cutoffs,	etc.)	that	
have	been	used	to	obtain	the	final	processed	data.	
6. The	final	processed	antigen	receptor	sequences for	the	set	of	samples	in	the	
experiment	(e.g.,	the	set	of	sequences	used	for	V(D)J	assignment),	 along	
with	the	V(D)J	assignments	for	each	sequence.
Figure 2. Overview of the six high-level principles and associated data elements that
comprise the AIRR standard draft agreed to at the second annualAIRR Community
meeting in 2016.
Figure 4. CEDAR Workbench to SRA Conversion Workflow
c
a b
Figure 3(a). ARR Minimal Standard Data Elements, 3(b) Ontology Controlled
Template Authoring Through CEDAR Workbench and 3(c) AIRR Data
Submission Template
CEDAR JSON-LD to SRA XML Converter Demo
References
1- Musen, Mark A., et al. "The center for expanded data annotation and retrieval." Journal of the American Medical
Informatics Association 22.6 (2015): 1148-1152.
2- Leinonen, Rasko, Hideaki Sugawara, and Martin Shumway. "The sequence read archive." Nucleic acids research (2010): gkq1019.
Acknowledgement: We acknowledge Dr. Ben Busby from NCBI for his valuable suggestions during this research
work.

More Related Content

What's hot

Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Syed Ahmad Chan Bukhari, PhD
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discoverySyed Ahmad Chan Bukhari, PhD
 
Semantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence imagesSemantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence imagesSyed Ahmad Chan Bukhari, PhD
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesMichel Dumontier
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataMichel Dumontier
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Valery Tkachenko
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Todd Vision
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openlyFAIRDOM
 

What's hot (20)

Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discovery
 
Semantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence imagesSemantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence images
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Phenoflow 2021
Phenoflow 2021Phenoflow 2021
Phenoflow 2021
 

Similar to Leveraging CEDAR Workbench for Ontology-linked Submission of AIRR Data

eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
PNNL April 2011 ogce
PNNL April 2011 ogcePNNL April 2011 ogce
PNNL April 2011 ogcemarpierc
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesManjulaPatel
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked DataENCODE-DCC
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paolo Missier
 
Microarrays Databases.pptx
Microarrays Databases.pptxMicroarrays Databases.pptx
Microarrays Databases.pptxMuzzamilahmed14
 
Rescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information SystemsRescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information SystemsHealth Informatics New Zealand
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFelipe Gutierrez
 
Standardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarStandardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarAhmad C. Bukhari
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Sage Base
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 

Similar to Leveraging CEDAR Workbench for Ontology-linked Submission of AIRR Data (20)

eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
PNNL April 2011 ogce
PNNL April 2011 ogcePNNL April 2011 ogce
PNNL April 2011 ogce
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked Data
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
 
Microarrays Databases.pptx
Microarrays Databases.pptxMicroarrays Databases.pptx
Microarrays Databases.pptx
 
Rescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information SystemsRescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information Systems
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
CEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR SubmissionsCEDAR Technologies for AIRR Submissions
CEDAR Technologies for AIRR Submissions
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODS
 
Standardization of the HIPC Data Templates
Standardization of the HIPC Data TemplatesStandardization of the HIPC Data Templates
Standardization of the HIPC Data Templates
 
Standardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarStandardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So Far
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 

More from Syed Ahmad Chan Bukhari, PhD

CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataSyed Ahmad Chan Bukhari, PhD
 
CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR Workbench
CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR WorkbenchCAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR Workbench
CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR WorkbenchSyed Ahmad Chan Bukhari, PhD
 
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...Syed Ahmad Chan Bukhari, PhD
 
AN Intelligent Realtime multiple vessel collision risk assessment system
AN Intelligent Realtime multiple vessel collision risk assessment system AN Intelligent Realtime multiple vessel collision risk assessment system
AN Intelligent Realtime multiple vessel collision risk assessment system Syed Ahmad Chan Bukhari, PhD
 

More from Syed Ahmad Chan Bukhari, PhD (6)

CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
 
CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR Workbench
CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR WorkbenchCAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR Workbench
CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR Workbench
 
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
 
Type 2 fuzzy ontology ahmadchan
Type 2 fuzzy ontology ahmadchanType 2 fuzzy ontology ahmadchan
Type 2 fuzzy ontology ahmadchan
 
AN Intelligent Realtime multiple vessel collision risk assessment system
AN Intelligent Realtime multiple vessel collision risk assessment system AN Intelligent Realtime multiple vessel collision risk assessment system
AN Intelligent Realtime multiple vessel collision risk assessment system
 
Type-2 Fuzzy Ontology
Type-2 Fuzzy OntologyType-2 Fuzzy Ontology
Type-2 Fuzzy Ontology
 

Recently uploaded

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 

Recently uploaded (20)

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 

Leveraging CEDAR Workbench for Ontology-linked Submission of AIRR Data

  • 1. Introduction Next-generation sequencing technologies have led to a rapid production of high-throughput sequence data characterizing adaptive immune-receptor repertoires (AIRRs). As part of the AIRR community (http://airr-community.org) data standards working group, we have developed an initial set of metadata recommendations for publishing AIRR sequencing studies. These recommendations will be implemented in several public repositories, including the NCBI sequence read archive (SRA). Submissions to SRA typically use a flat-file template and include only a minimal amount of term validation. In order to ease the metadata authoring and to implement the ontological terms validation of repertoire sequence data, we are developing an interactive template through CEDAR workbench that will allow for ontological validation, and subsequent deposition in SRA. CEDAR workbench also allows the user to populate the template with metadata for data submission to various data repositories. The incorporation of template-element level ontology mapping not only facilitates validation of data submission, but also enables intelligent queries within and across repositories. High-quality Metadata and Challenges High-quality metadata are seen as crucial to facilitate knowledge discovery. The biomedical community has a strong history of tackling metadata challenge by driving the development of metadata templates. These templates focus on addressing the reproducibility challenge by providing detailed checklists of the metadata needed to describe particular types of experimental data sources. The key goal is to provide sufficient metadata to enable the source studies to be reproduced. While individual metadata templates can provide a standard format for a particular data source, they rarely share common structure or semantics. There is also a disconnect between the high-level checklist-based template definitions developed by scientific communities and the submission formats required by metadata repositories. Moreover, different repositories provide their locally defined templates for describing metadata. These templates lack the use of common data elements and standard vocabularies. This creates a barrier for sharing and using metadata to enable knowledge discovery. We use CEDAR workbench to create common templates for entering metadata. To enhance machine readability, we use CEDAR’s capability to link individual data elements and their values to ontology concepts AIRR Data Submission to SRA Leveraging CEDAR Workbench CEDAR is supported by grant U54 AI117925 awarded by the National Institute of Allergy and Infectious Diseases through funds provided by the trans-NIH Big Data toKnowledge (BD2K) initiative (www.bd2k.nih.gov). Syed Ahmad Chan Bukhari1, Martin J. O'Connor2, John Graybeal2, Mark A. Musen2, Kei-Hoi Cheung3, Steven H. Kleinstein1 Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Immune Receptor Repertoire Data to the Sequence Read Archive (SRA) 1Department of Pathology, Yale School of Medicine, New Haven, CT , 2Center for Expanded Data Annotation and Retrieval, Stanford Center for Biomedical Informatics Research, Stanford University and 3Department of Emergency Medicine, Yale School of Medicine, New Haven, CT Figure 1. Metadata Life Cycle in CEDAR Workbench The CEDAR Workbench The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. CEDAR takes advantage of emerging community-based standard templates for describing different kinds of biomedical datasets. CEDAR workbench investigates the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments. CEDAR CENTER FOR EXPANDED DATA ANNOTATION AND RETRIEVAL CEDAR R EXPANDED DATA ON AND RETRIEVAL Minimal Standards WG Recommendations for The AIRR Sequencing Data As high throughput experiments become more prevalent in the field of Immunology and elsewhere, there is an increased need for collective organization of data and standardized methods of data reporting. No current standards exist for adaptive immune receptor repertoire sequencing data. Data and metadata formats need to be harmonized so that data from different experiments can be mined. Once recovered, the mined data need to have sufficient descriptive metadata in order to be useful. To fulfill these unmet needs, we propose a set of minimal standards that we recommend journals adopt and that could form the requirements for submission to a public data repository: 1. The experimental study design including sample data relationships (e.g., which raw data file(s) relate to which sample, which samples are technical, which are biological replicates). 2. The essential sample annotation including experimental factors and their values (e.g., the set of markers used to sort the cell population being studied). 3. Sufficient annotation of the amplicon being sequenced that would allow the raw data to be transformed into the processed sequences (e.g., barcodes, primers, unique molecular identifiers). 4. The raw data for each sequencing run (e.g., FASTQ files) 5. The essential laboratory and data processing protocols (e.g., software tools with version numbers, quality thresholds, primer match cutoffs, etc.) that have been used to obtain the final processed data. 6. The final processed antigen receptor sequences for the set of samples in the experiment (e.g., the set of sequences used for V(D)J assignment), along with the V(D)J assignments for each sequence. Figure 2. Overview of the six high-level principles and associated data elements that comprise the AIRR standard draft agreed to at the second annualAIRR Community meeting in 2016. Figure 4. CEDAR Workbench to SRA Conversion Workflow c a b Figure 3(a). ARR Minimal Standard Data Elements, 3(b) Ontology Controlled Template Authoring Through CEDAR Workbench and 3(c) AIRR Data Submission Template CEDAR JSON-LD to SRA XML Converter Demo References 1- Musen, Mark A., et al. "The center for expanded data annotation and retrieval." Journal of the American Medical Informatics Association 22.6 (2015): 1148-1152. 2- Leinonen, Rasko, Hideaki Sugawara, and Martin Shumway. "The sequence read archive." Nucleic acids research (2010): gkq1019. Acknowledgement: We acknowledge Dr. Ben Busby from NCBI for his valuable suggestions during this research work.