Cancer researchers routinely use High Throughput Sequencing (HTS), but its uptake into the clinical environment has been slow, mainly because of sequencing costs and the complexity of data analysis. The former issue is being addressed by rapid technology improvements in HTS equipment but data complexity and problems associated with data analysis remain a serious impediment to the wider use of genomics in clinical pathology.
The Peter MacCallum Cancer Centre is the largest cancer hospital in Australia and has implemented HTS services as a routine assay for tumour and germline samples. To resolve the clinical sequencing analysis bottleneck we have developed PathOS, a web application that streamlines the clinical curation of HTS generated cancer variants. The integration of all aspects of the molecular pathology pipeline from sequencer through to clinical report facilitates a ‘curation workbench’ to structure workflows needed for managing HTS data. PathOS is now implemented as the internal curation system for the hospital’s Molecular Pathology laboratory. The lab performs routine sequencing analysis of blood and tumour samples from both in-patient and external customers using Illumina HiSeq and MiSeq instruments. Raw sequence data is automatically processed by an in-house developed bioinformatics pipeline to identify variants. Raw variants are loaded into the system where they are normalised, filtered and matched with previously curated variants and external databases. Stringent filtering is applied to raw variants based on the assay method. The system allows filters to be described in a flexible domain specific language (DSL) by the scientist creating the HTS assay. The sequencing QC data for runs, samples and variants are stored, and displayed as interactive charts. Integration of the IGV browser through
a web server allows the user to directly visualise reads giving rise to a variant. Designed with accreditation for clinical testing in mind, a full audit trail of source data and external evidence used to classify variants allows the pedigree of any clinical data to be ascertained. After curation, variants can be exported to a file to support external curation systems such as LOVD. PathOS generates clinical reports as PDF or Word documents using MS-Word templates giving the reporting scientist maximum presentation flexibility. Reclassification of previously reported variants causes PathOS to flag reports that need to be amended.
To validate PathOS, the Cancer 2015 study was used: a large-scale, prospective, longitudinal, multi-site cohort study of cancer in the Victorian population. As at January 2014, 769 patients were sampled and generated 141,657 unfiltered variants. PathOS has provided the pipeline framework, repository and curation for this data. Of these variants 1,618 were automatically flagged as inferred deleterious mutations.
4. What we do
• Peter MacCallum Cancer Centre
– Molecular Pathology Department
• Provide pathology services to the hospital and ext. labs.
• Blood and tumour tissue samples
• Targeted genetic sequencing using amplicon panels
• Between 4-50 cancer specific genes
• Looking for needles in haystacks
• Very sensitive assays
...
...
AAAAGCAGGT TATATAGGCT AAATAGAACT AATCATTGTT TTAGACATAC TTATTGACTC TAAGAGGAAA
TCATAATGCT TGCTCTGATA GGAAAATGAG ATCTACTGTT TTCCTTTACT TACTACACCT CAGATATATT
TCTTCATGAA GACCTCACAG TAAAAATAGG TGATGTTGGT AGCTAGGAGT GAAATCTCGA TGGAGTGGGT
CCCATCAGTT TGAACAGTTG TCTGGATCCA TTTTGTGGAT GGTAAGAATT GAGGCTATTT TTCCACTGAT
TAGTTCCCAG TATTCACAAA AATCAGTGTT CTTATTTTTT ATGTAAATAG ATTTTTTAAC TTTTTTCTTT
...
...
22 May 2014 HVP5 Path-OS 4
5. Peter Mac Curation Scope
• Automate the processing from sequencer
to draft report
• Automate curation evidence collection
• Sanitise data from external sources
• Automated reporting
• Best practice software engineering
22 May 2014 HVP5 Path-OS 5
7. Patient
Sample
Genologics
wet lab LIMS
External
Variant DBs
• COSMIC
• Ensembl
• Annovar
• UCSC
• Clinvar
etc
Loader
Pipeline data repository
FASTQ
BAM
VCF
VEP
Pipeline PipeCleaner
PathOS
Web Server
Pipeline
Validation QC
Reporting
Pipeline
configuration
Sequencers
ETL
configuration
Periodic DB download
and integration
Sequencing QC
Clinical Reporting
Read QC
Synthtetic Reads
Known samples
Filtering
configuration
Users
• molecular scientists
• clinicians
• researchers
Export curated variants
to global repositories
Hospital Records
22 May 2014 HVP5 Path-OS 7
Path-OS Overview
8. Run QC
This run in the context of
past runs of the same panel
Per sample read yield
highlighting below average
Amplicon performance
read distribution
22 May 2014 HVP5 Path-OS 8
9. Classification
Page
22 May 2014 HVP5 Path-OS 9
Automatically
generated
classification
Justification
free text field
Check boxes for
variant evidence
Evidence type
tool tip
10. Classifying variants for the clinic
22 May 2014 HVP5 Path-OS 10
C5: Pathogenic
C4: Likely pathogenic
C3: Unknown
pathogenicity
C2: Unlikely pathogenic
C1: Not pathogenic
5 Level Classification
Stand alone
Strong
Supporting
Criteria
or or
Pathogenic
evidence
Stand alone
Strong
Supporting
Benign
evidence
=
or =
or =or
or =
All other combinations =
11. Software Components
Role Package Overview
Language Groovy Java on steroids, powerful JVM language
Web Framework Grails Rich Groovy based high productivity framework
Code repository GitLab Private GitHub instance
Database MySQL Widely adopted RDB, good performance
User interactivity Javascript plugins Leverage best available js e.g. Jquery, Google Charts
Object Persistence Hibernate Java standard for mapping POJOs to RDB
Searching Lucene Full-featured text search engine
IoC Layer Spring Java standard for inversion of control
IDE IntelliJ Comprehensive developers environment for Java etc
Build Management Gradle Groovy based DSL leverages Ant and CoC
DB Migration Mgmt LiquiBase DSL based data migration tool for schema versioning
Issue Management Jira Best of breed issue management tracker
LIMS GenoLogics User friendly LIMS for NGS
Aligner Primal Peter Mac in-house amplicon aligner, tuned for amplicons
Variant Caller VarScan 2 Suitable for somatic and germline (for now)
Annotation Ensembl, Annovar Rich set of annotations for multiple transcripts
22 May 2014 HVP5 Path-OS 11
13. Somatic Panel
22 May 2014 HVP5 Path-OS 13
Oncogenes
Tumour suppressors
Consequence type
Other
Missense
Frame shift
Splice site
Stop gained
Gene type