SlideShare une entreprise Scribd logo
1  sur  3
Workgroup 4 Meeting Report
Group leader: Deanna M. Church
Co-leader: Melissa Landrum
Meeting date: Jan 27- Jan 28
Location: Stanford University

Executive Summary
Workgroup 4 is tasked with defining how users in the community interact with and
use the GIAB data. During the course of the two-day meeting we focused on aspects
of the user interface that need to be addressed. These topics include defining the
target audience, understanding how these various user groups will interface with the
tools and integrating with visualization tools, such as the GeT-RM browser. The goal
of this workgroup is to produce a specification document by the end of February
2014.

Detailed description
Defining the target audience
It is anticipated that a wide variety of users will want to interact with this data. A
prioritized list of users was proposed:
1. Regulators (FDA)
2. Accreditors (CLIA/CAP)
3. Clinical Labs
4. Platform Developers

Tool Development
There were four aspects of tools development that were discussed and need to be
addressed in the specification document.
1. Software development and licensing
Francisco de la Vega presented very nice software for comparing VCF files that
was developed by Real Time Genomics. While the software is freely available it
is not open source; this lead to a discussion of source code availability and
licensing. The workgroup unanimously agreed that software should be open
source. There was less clarity on the licensing requirements but Nils Homer
volunteered to research license types and make recommendations in this
area.
2. Software interface
There was also unanimous agreement that software used to compare user
variant calls to GIAB datasets would need to be accessible via a web interface
and an API.
3. Inputs and outputs
The input and output formats need to be well defined in the specification
document. It is likely we will need some translation tools to help support the
web interface though, as many users of this interface may have difficulty
producing well-formatted VCF files. NCBI is building a suite of tools to handle
this problem.
4. Development cycles
We are likely better off getting tools out to the community sooner rather than
later so we can get feedback from the community. This means we may need
to be prepared to throw away early versions of software if they don’t fully
meet our needs (which will be better defined as we get feedback from the
user community)
5. User feedback
It is critical to provide a mechanism to allow users to provide feedback on the
utility of the tool.

Data Analysis
Much of the discussion focused on data analysis. For some aspects of analysis, there
was strong agreement:
Users need to be able to provide a BED file of the regions analyzed so that
they are not overly penalized with false negative calls in regions of the
genome they did not analyze.
Analysis needs to be performed at various levels depending on the users
needs. For example, some users will only want to score variant calling, others
will want to score genotype calls and others may want to score phasing.
It is likely we will need to support >1 ‘Truth set’ though a reasonable default
will need to be chosen.
o It is critical to allow a mechanism that allows users to provide feedback
concerning problems or errors with the ‘Truth set’
We need to have crisp definitions of comparison terms, so that as different
developers begin developing software we can all communicate using the
same terms.
We will need to support different analysis for different variant types.
o We will need to support all variant types defined in the truth set. This
means no SV/CNVs in phase 1.
o We will not likely have the same level of support for complex variants
as we do for substitution variants.
We need clear definitions for defining sensitivity and specificity calculations.
We need to provide users with concise summaries, but we also need to
provide very detailed analysis files as well.
Ideally the software will produce files suitable for import into the GeT-RM
browser to facilitate manual review of the data.
There was a great deal of discussion about the best way to deal with complex
variants. While it is clear that there is no standard approach to dealing with complex
variant comparison, and that it is a very difficult problem there was no strong
consensus about how important it was for this to be handled robustly in phase 1 of
implementing this software. This will need to be addressed more fully in the
requirements document.

Contenu connexe

Tendances

IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology
William Hsiao
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
Michel Dumontier
 
Aug2013 bioinformatics working group
Aug2013 bioinformatics working groupAug2013 bioinformatics working group
Aug2013 bioinformatics working group
GenomeInABottle
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databases
Valery Tkachenko
 

Tendances (20)

IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology
 
tools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital worldtools for reproducible research in an increasingly digital world
tools for reproducible research in an increasingly digital world
 
Giab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summaryGiab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summary
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Giab product and tool roadmap small variants
Giab product and tool roadmap   small variantsGiab product and tool roadmap   small variants
Giab product and tool roadmap small variants
 
Aug2013 bioinformatics working group
Aug2013 bioinformatics working groupAug2013 bioinformatics working group
Aug2013 bioinformatics working group
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
Panel discussion principles of dissemination
Panel discussion principles of disseminationPanel discussion principles of dissemination
Panel discussion principles of dissemination
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meeting
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databases
 
Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principles
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software Testing
 

En vedette

Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materials
GenomeInABottle
 
Aug2013 performance metrics working group
Aug2013 performance metrics working groupAug2013 performance metrics working group
Aug2013 performance metrics working group
GenomeInABottle
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
GenomeInABottle
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
GenomeInABottle
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browser
GenomeInABottle
 

En vedette (6)

Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materials
 
Aug2013 performance metrics working group
Aug2013 performance metrics working groupAug2013 performance metrics working group
Aug2013 performance metrics working group
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
 
Aug2014 working group report rm selection and design
Aug2014 working group report rm selection and designAug2014 working group report rm selection and design
Aug2014 working group report rm selection and design
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browser
 

Similaire à 140127 Performance Metrics WG

Appendix AProof of effectiveness of some of the agile methods us.docx
Appendix AProof of effectiveness of some of the agile methods us.docxAppendix AProof of effectiveness of some of the agile methods us.docx
Appendix AProof of effectiveness of some of the agile methods us.docx
armitageclaire49
 
COMPUTER APPLICATION PROJECT ON
COMPUTER APPLICATION PROJECT ON COMPUTER APPLICATION PROJECT ON
COMPUTER APPLICATION PROJECT ON
Jitender Suryavansh
 

Similaire à 140127 Performance Metrics WG (20)

Using Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research SoftwareUsing Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research Software
 
Appendix AProof of effectiveness of some of the agile methods us.docx
Appendix AProof of effectiveness of some of the agile methods us.docxAppendix AProof of effectiveness of some of the agile methods us.docx
Appendix AProof of effectiveness of some of the agile methods us.docx
 
1. introducción a la Ingeniería de Software (UTM 2071)
1. introducción a la Ingeniería de Software (UTM 2071)1. introducción a la Ingeniería de Software (UTM 2071)
1. introducción a la Ingeniería de Software (UTM 2071)
 
Software Evaluation
Software EvaluationSoftware Evaluation
Software Evaluation
 
Quantitative And Qualitative Evaluation Of F/Oss Volunteer Participation In D...
Quantitative And Qualitative Evaluation Of F/Oss Volunteer Participation In D...Quantitative And Qualitative Evaluation Of F/Oss Volunteer Participation In D...
Quantitative And Qualitative Evaluation Of F/Oss Volunteer Participation In D...
 
Open Source Project Management
Open Source Project ManagementOpen Source Project Management
Open Source Project Management
 
COMPUTER APPLICATION PROJECT ON
COMPUTER APPLICATION PROJECT ON COMPUTER APPLICATION PROJECT ON
COMPUTER APPLICATION PROJECT ON
 
Web Application Vulnerabilities
Web Application VulnerabilitiesWeb Application Vulnerabilities
Web Application Vulnerabilities
 
Software development life cycle
Software development life cycleSoftware development life cycle
Software development life cycle
 
API Integration
API IntegrationAPI Integration
API Integration
 
Software Process and Requirement
Software Process and RequirementSoftware Process and Requirement
Software Process and Requirement
 
FINAL_40058464
FINAL_40058464FINAL_40058464
FINAL_40058464
 
Quality Software Development
Quality Software DevelopmentQuality Software Development
Quality Software Development
 
OHA Usability Test Plan.pdf
OHA Usability Test Plan.pdfOHA Usability Test Plan.pdf
OHA Usability Test Plan.pdf
 
How to improve Developer Documentations ?
How to improve Developer Documentations ?How to improve Developer Documentations ?
How to improve Developer Documentations ?
 
Software design.edited (1)
Software design.edited (1)Software design.edited (1)
Software design.edited (1)
 
Native vs. Cross-Platform
Native vs. Cross-PlatformNative vs. Cross-Platform
Native vs. Cross-Platform
 
Why Don't Software Developers Use Static Analysis Tools to Find Bugs?
Why Don't Software Developers Use Static Analysis Tools to Find Bugs?Why Don't Software Developers Use Static Analysis Tools to Find Bugs?
Why Don't Software Developers Use Static Analysis Tools to Find Bugs?
 
Building a design system with (p)react
Building a design system with (p)reactBuilding a design system with (p)react
Building a design system with (p)react
 
How Custom Software Development is Transforming the Traditional Business Prac...
How Custom Software Development is Transforming the Traditional Business Prac...How Custom Software Development is Transforming the Traditional Business Prac...
How Custom Software Development is Transforming the Traditional Business Prac...
 

Plus de GenomeInABottle

Plus de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

140127 Performance Metrics WG

  • 1. Workgroup 4 Meeting Report Group leader: Deanna M. Church Co-leader: Melissa Landrum Meeting date: Jan 27- Jan 28 Location: Stanford University Executive Summary Workgroup 4 is tasked with defining how users in the community interact with and use the GIAB data. During the course of the two-day meeting we focused on aspects of the user interface that need to be addressed. These topics include defining the target audience, understanding how these various user groups will interface with the tools and integrating with visualization tools, such as the GeT-RM browser. The goal of this workgroup is to produce a specification document by the end of February 2014. Detailed description Defining the target audience It is anticipated that a wide variety of users will want to interact with this data. A prioritized list of users was proposed: 1. Regulators (FDA) 2. Accreditors (CLIA/CAP) 3. Clinical Labs 4. Platform Developers Tool Development There were four aspects of tools development that were discussed and need to be addressed in the specification document. 1. Software development and licensing Francisco de la Vega presented very nice software for comparing VCF files that was developed by Real Time Genomics. While the software is freely available it is not open source; this lead to a discussion of source code availability and licensing. The workgroup unanimously agreed that software should be open source. There was less clarity on the licensing requirements but Nils Homer volunteered to research license types and make recommendations in this area.
  • 2. 2. Software interface There was also unanimous agreement that software used to compare user variant calls to GIAB datasets would need to be accessible via a web interface and an API. 3. Inputs and outputs The input and output formats need to be well defined in the specification document. It is likely we will need some translation tools to help support the web interface though, as many users of this interface may have difficulty producing well-formatted VCF files. NCBI is building a suite of tools to handle this problem. 4. Development cycles We are likely better off getting tools out to the community sooner rather than later so we can get feedback from the community. This means we may need to be prepared to throw away early versions of software if they don’t fully meet our needs (which will be better defined as we get feedback from the user community) 5. User feedback It is critical to provide a mechanism to allow users to provide feedback on the utility of the tool. Data Analysis Much of the discussion focused on data analysis. For some aspects of analysis, there was strong agreement: Users need to be able to provide a BED file of the regions analyzed so that they are not overly penalized with false negative calls in regions of the genome they did not analyze. Analysis needs to be performed at various levels depending on the users needs. For example, some users will only want to score variant calling, others will want to score genotype calls and others may want to score phasing. It is likely we will need to support >1 ‘Truth set’ though a reasonable default will need to be chosen. o It is critical to allow a mechanism that allows users to provide feedback concerning problems or errors with the ‘Truth set’ We need to have crisp definitions of comparison terms, so that as different developers begin developing software we can all communicate using the same terms. We will need to support different analysis for different variant types. o We will need to support all variant types defined in the truth set. This means no SV/CNVs in phase 1. o We will not likely have the same level of support for complex variants as we do for substitution variants. We need clear definitions for defining sensitivity and specificity calculations.
  • 3. We need to provide users with concise summaries, but we also need to provide very detailed analysis files as well. Ideally the software will produce files suitable for import into the GeT-RM browser to facilitate manual review of the data. There was a great deal of discussion about the best way to deal with complex variants. While it is clear that there is no standard approach to dealing with complex variant comparison, and that it is a very difficult problem there was no strong consensus about how important it was for this to be handled robustly in phase 1 of implementing this software. This will need to be addressed more fully in the requirements document.