A Beginners Guide to Building a RAG App Using Open Source Milvus
One man's *1 is another man's *13? Trouble with nomenclatures in personalized medicine
1. Trouble with nomenclatures in personalized medicine
Asst.-Prof. Mag. Dr. Matthias Samwald
CeMSIIS, Medical University of Vienna
SUMMER SCHOOL: GENOMIC MEDICINE – Bridging research and the clinic, May 6 2016,
Portoroz, Slovenia
One man's *1 is another man's *13?
Funded by Austrian Science Fund (FWF): [P 25608-N15]
This project has received funding from the European Union’s Horizon
2020 research and Innovation programme under grant agreement
No 668353 (KB and MS).
7. We simulated the accuracy of various targeted, low-
cost assays suitable for pre-emptive testing compared
to next-gen sequencing
Venn diagram displaying the numbers and overlaps of polymorphisms covered by constrained
views derived from four pharmacogenomic assays. DMET: derived from the Affymetrix
DMET™ Plus assay, VERA: Illumina VeraCode® ADME Core Panel, TAQM: TaqMan® OpenArray®
PGx Panel, FLOR: University of Florida and Stanford Custom Array.
8. We simulated the accuracy of various targeted, low-
cost assays suitable for pre-emptive testing compared
to next-gen sequencing
9. We simulated the accuracy of various targeted, low-
cost assays suitable for pre-emptive testing compared
to next-gen sequencing
10.
11. We simulated the accuracy of various targeted, low-
cost assays suitable for pre-emptive testing compared
to next-gen sequencing
Fraction of tested genes resulting in aberrations in haplotype calling with restricted assay
compared to next-gen sequencing. Based on full genome sequences of 2504 persons. Manuscript
currently under review at ‘Pharmacogenomics’.
12. We simulated the accuracy of various targeted, low-
cost assays suitable for pre-emptive testing compared
to next-gen sequencing
Fraction of tested genes resulting in aberrations in haplotype calling with restricted assay
compared to next-gen sequencing. Based on full genome sequences of 2504 persons. Manuscript
currently under review at ‘Pharmacogenomics’.
18. From the lab: experimental mnemonic nomenclature
• Idea: Experiment with human-friendly nomenclature
o No human committee
o Less cryptic alphanumeric descriptors
19. From the lab: experimental mnemonic nomenclature
• Synthetic pseudo-words can encode a lot of information
• CVCVCV pattern examples (C = consonant, V = vowel):
o binoru
o nivudi
o pekuvo
o jutoxu
o hacifi
o dejula
• CVCVCV tuple (Y as vowel) can denote: 20 * 6 * 20 * 6 * 20 * 6 = 1
728 000 variants
20. Algorithm (no human curation / committee)
• Take large dataset containing variant data of our usual (1000
Genomes, 100.000 Genomes, 1M genomes…) as reference
• Create list of genome loci and variants observed there (some
loci might have more than 2 possible variants)
• For each gene:
o For each locus:
Sort observed variants based on their frequencies
define most frequently observed variant as ‘wild type’;
remove these variants from the table we use for constructing
the mnemonics (they are considered to be the default)
o Sort loci based on the frequency of the most frequent non-wild-
type variant of each locus
o Assign mnemonics to each variant systematically, starting with
shorter mnemonic strings (i.e., 2-character tuple)
21. Algorithm (no human curation / committee)
• Take large dataset containing variant data of our usual (1000
Genomes, 100.000 Genomes, 1M genomes…) as reference
• Create list of genome loci and variants observed there (some
loci might have more than 2 possible variants)
• For each gene:
o For each locus:
Sort observed variants based on their frequencies
define most frequently observed variant as ‘wild type’;
remove these variants from the table we use for constructing
the mnemonics (they are considered to be the default)
o Sort loci based on the frequency of the most frequent non-wild-
type variant of each locus
o Assign mnemonics to each variant systematically, starting with
shorter mnemonic strings (i.e., 2-character tuple)
22.
23. Example mnemonic code sequences
VKORC1: cy-do-du | be-do-du
CYP2D6: nai / nai-pek
CYP2D6: nai / be-wi / nai-pek (copy number variation)
TMPT: be-fu-fy | ba-bi-fi-tek
Mnemonic code + reference to variants/regions covered by assay =
automatically decompress to full sequence / genotype result
Sets auf co-occuring SNP variants could automatically be assigned
identifier of their own and combined with individual SNP variant
identifiers
Currently creating humble proof-of-concept based on 1000
Genomes data
24. Local team (Medical University of Vienna)
Asst.-Prof. Mag. Dr. Matthias Samwald (PI)
Dr. Kathrin Blagec
Mag. Sebastian Hofer
Hong Xu, BSc
Wolfgang Kuch
Web
http://samwald.info/
http://safety-code.org/
http://upgx.eu
Thanks!
25. • Reference: Matthias Samwald, Kathrin Blagec, Sebastian Hofer and Robert R. Freimuth. “Analysing
the potential for incorrect haplotype calls with different pharmacogenomic assays in different
populations: a simulation based on 1000 Genomes data.” Pharmacogenomics, September 30,
2015. doi:10.2217/pgs.15.108
• Code Availability: The curated resources and the IPython notebooks available at
https://gitlab.com/medication-safety/ms-ipython
Further info