The accumulation of genetic variant annotations has been increasing explosively with the recent technological advances. However, the fragmentation across many data silos is often frustrating and inefficient. We created a platform, called MyVariant.info (http://myvariant.info), to aggregate variant-specific annotations from community resources and provide high-performance programmatic access. Annotations from each resource are first converted into JSON-based objects with their id fields as the canonical names following HGVS nomenclature (genomic DNA based). This scheme allows merging of all annotations relevant to a unique variant into a single annotation object. A high-performance and scalable query engine was built to index the merged annotation objects and provides programmatic access to the developers. MyVariant.info decouples two fundamental steps in management of variant annotations: the creation and maintenance of centralized web services (which requires deep software-engineering expertise), and the task of structuring biological annotations (which requires broad community effort). Annotation providers from the community can provide data parsers to convert their raw data into JSON-compatible objects. The only requirement is that a valid HGVS name is used as the id field for each object. These data can then be queryable through the query engine we built. The data provider doesn’t have to worry about building their own query infrastructure. And the research community doesn’t have to learn another query interface in order to access new annotations.
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
MyVariant.info--Community Aggregated Variant Annotation as a Service (NGS2016, Barcelona)
1. Jiwen (Kevin) Xin, Cyrus Afrasiabi, Sean D. Mooney, Andrew I. Su, Chunlei Wu
kevinxin@scripps.edu
The Scripps Research Institute
La Jolla, CA, USA
NGS 2016
04/05/2016
MyVariant.info
Community-aggregated Variant Annotations As a Service
5. MyVariant.info for the end users:
http://MyVariant.info
(currently v1 API, two endpoints)
http://MyVariant.info/v1/query?q=<query>
any query term(s)
matching variant hits
http://MyVariant.info/v1/variant/<variantid>
hgvs id(s)
matching variant object(s)
Both supports batch-mode via POST
Simple API. No sign-up. No API key.
Try our live API , and documentations
8. Making flexible queries
• All variants with dbNSFP annotation:
http://myvariant.info/v1/query?q=_exists_:dbnsfp
• All non-synonymous variants on gene "BTK":
http://myvariant.info/v1/query?q=dbnsfp.genename:BTK
• All variants within a genomic range:
http://myvariant.info/v1/query?q=chr1:69000-70000
• Query Wellderly variants together with other annotation sources:
http://myvariant.info/v1/query?q=_exists_:wellderly AND cadd.polyphen.cat:possibly_damaging
&fields=wellderly,cadd.polyphen
9. Many more ways of querying, across resources
Full-text queries
Wildcard queries
Range queries
Boolean queries
Regex queries
Field existing/missing
Faceting
Paging
Sorting
Batch queries
Support JSONP, CORS
…
11. MyVariant.info official Python/R Clients
myvariant Python client hosted in PyPI
(initial release in Aug 2015)
myvariant R client hosted in Bioconductor
(initial release in Oct 2015)
12. Use case 1
An easy resource to retrieve
well-structured variant
annotations
14. User Case 2: An example workflow for variant prioritization
input variants
output variants
filter1 <- lapply(vars, function(i) subset(i,
cadd.consequence %in% c("NON_SYNONYMOUS",
"STOP_GAINED", "STOP_LOST", "CANONICAL_SPLICE",
"SPLICE_SITE")))
filter2 <- lapply(filter1, function(i)
subset(i, exac.af < 0.01))
filter3 <- lapply(filter2, function(i)
subset(i, sapply(dbnsfp.1000gp1.af, function(j)
j < 0.01 )))
15. Use case 3
For curator/data provider:
A platform for
integrating with other resources
(saving repetitive efforts)
distribute your valuable data
(under your own source field)
16. Use case 4
For variant curation itself:
Identify discrepancies
Serve as the base of community-engaged curation
process
17. Linked data
URI (Uniform Resource
Identifier):
Provide unique identifier for
anything or any concept on the
website
Connective:
connecting data, concepts,
applications and ultimately
people.
URL (Uniform Resource Link):
Provide unique identifier for
webpages
Text files, images, music, videos
Interactive:
Twitter, Facebook, blogs
18. Why Linked Data?
Providing Unique Identifier for a concept
Genenam
e
e.g. CDK2
genename,
(database1)
gene_name,
(database2)
{’gene’: {‘name’:…}},
(database3)
URI:
http://identifiers.org/hgnc.symb
ol
19. Data Discrepancy ---- Example
http://myvariant.info/v1/variant/chr12:g.111351981C>T?fields=clinvar.rsid,dbsnp.rsid,evs.rsid
21. Acknowledgement
Funding and Support
U54GM114833
U01HG008473
Washington U:
Ben Ainscough
Obi Griffith
TSRI:
Chunlei Wu
Andrew Su
Jiwen Xin
Cyrus Afrasiabi
Ginger Tsueng
Adam Mark
Greg Stupp
Tim Putman
STSI:
Eric Topol
Ali Torkamani
Galina Erikson
U. Washington:
Sean Mooney
Moritz Juchler
Nikhil Gopal
OICR:
Robin Haw
UC Berkeley:
Chris Mungall
UCSD:
Trish Whetzel
MyVariant.info
A high-performance query engine for aggregated variant annotations.
Multiple Variant Annotation Resources
Massively Parallel Sequencing has become an important tool for identifying medically significant variants in both research and the clinic. Accurate variation and genotype-phenotype databases are critical in our ability to make sense of the vast amount of information that parallel sequencing generates.
There are multiple variant databases available. I have listed some of the major ones here, including ClinVar, dbSNP, dbNSFP, etc. However, all these databases are scattered and maintained by different groups. For variant annotation, databases vary greatly in their ease of use, the use of standard mutation nomenclature, and the comprehensiveness of variant cataloging.
Ultimately, we need a comprehensive reference database of medically important variants that is easily cross referenced to exome and whole genome sequencing data, allowing researchers to query multiple sources for each variant. The creation of a comprehensive reference database requires the mining of all available data for each variant, and an ability to contribute new data and annotations over time. Such a database must also be searchable using standardized nomenclature to allow for computerized annotation.
Organize locally, parse, download
Up-to-date
Performance
Comprehensiveness
Fast retrival via HGVS id/ rsid/gene
filtering based on fields parameter
support batch
Rich query syntax
some query examples
Add link to MyVarint Python client tutorial
some usecases
- as data sources, saving parsing efforts
- direct query in a application/pipeline
- high-performance / scalable web service api for data provider
Benefit: saving efforts of integrating other annotations repetitively, focus on the unique content
Saving your effort to build your own infrastructure for high-performance web services; because we are just web API, you can use it to build your own GUI query interface.
We also value your hard work, the source of your data is properly credited.