SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Douglas Cork
Steven Lembark
HIV­1, W­curves, & Shoe Leather
●   Existing genetics tools fail on HIV­1
    ●   They make assumptions based on “normal” DNA 
        that fail on HIV – or cancer, or plants.
    ●   Correlation tools look at evolution, not state.
●   We are working on tools for clinical analysis.
    ●   The W­curve abstracts DNA into geometry.
    ●   The TSP clusters genenes rather than trying to 
        impute inheritence.
Sequences Inform Treatment
●   Treating HIV requires sequencing it to choose 
    appropriate drugs:
    ●   HIV­1 evolves drug resistence in months.
    ●   Multiple strains in a single pateint are common, 
        both from multiple sources or evolution.
    ●   Crossover recombination relatively common due to 
        cross­infected cells.
Problem: HIV is Hard to Analyze
●   HIV is a non­correcting retrovirus.
●   Evolves 10,000 times faster than humans or 
    influenza – one new strain per patient per day.
●   Genomes for wild types range from 8349 to 
    9829 bases, making localized comparisions 
    difficult.
●   The single FDA approved algorithm directing 
    treatment from sequence handles only type­B; 
    the U.S. Army has 15%+ non­B infections.
The Current Tools
●   Blast, Fasta, ClustalW perform alignment.
    ●   Table­driven analysis of base transitions.
    ●   Score the entire sequence with a single value.
●   Graphical tools are designed to display 
    inheritence rather than state.
    ●   Output is difficult to read in a clinical setting.
Phenogram of Drug­
Resistant and Random
Samples
●   Tries to show ancestory, 
    not state.
●   Not very good for visual 
    identification of which 
    patients are drug 
    resistant.
Trees are not particularly
helpful either.
HIVHXB2CG            TGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGA
                            AY736838-gp120_      -------------------------------TACAGTTTATTATGGGGTGCCTGTGTGGA
                                                                                 ***** *********** **********
                            HIVHXB2CG            AGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTAC
                            AY736838-gp120_      GAGATGCAGATACCACCCTATTTTGTGCATCAGATGCCAAGGCACATGAGACAGAAGTGC




ClustalW of gp120
                                                   ** ***   ***** ******************** ** *** **** ***** ** *
                            HIVHXB2CG            ATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTAT
                            AY736838-gp120_      ACAATGTCTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATACACC
                                                 * ***** ********************************************* **
                            HIVHXB2CG            TGGTAAATGTGACAGAAAATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATG
                            AY736838-gp120_      TGGAAAATGTAACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAGCAGATGCAGG
                                                 *** ****** *************************** ********** ******** *
                            HIVHXB2CG            AGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCT
                            AY736838-gp120_      AGGATGTAATCAGTTTATGGGATCAAAGTCTAAAGCCATGTGTAAAGTTAACTCCTCTCT
                                                 ***** ********************** ***************** ***** ** ****


    Difficult to compare 
                            HIVHXB2CG            GTGTTAGTTTAAAGTGCAC------TGATTTGAAGAATGATACTAATACCAATAGTAGTA
                            AY736838-gp120_      GCGTTACTTTAAATTGTACCAATGCTAATTTGACCAATGGCAGTAGCAAAACCAATGTCT
●                                                * **** ****** ** **      * ****** **** * ** * * * *
                            HIVHXB2CG            GCGGGAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAA
                            AY736838-gp120_      CTAACATAATAGGAAATATAACAGATGAAGTAAGAAACTGTACTTTTAATATGACCACAG


    sequences vis.ually.
                                                      * ***   **     * ** ** *** ****** **** ***** * ****
                            HIVHXB2CG            GCATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTT
                            TTATAAACTTGATATAATACCAA

                            AY736838-gp120_      AACTAACAGATAAGAAGCAGAAGGTCCATGCACTCTTTTATAAGCTTGATATAGTACAAA
                                                     *** ** **** ****** *    ***** * ******** ********* *** **

●   Not useful for large    HIVHXB2CG
                            AY736838-gp120_

                            HIVHXB2CG
                                                 T---AGATAATGATACTACCAGC---TATAAGTTGACAAGTTGTAACACCTCAGTCATTA
                                                 TTGAAGATAAGAAGAATAGTAGTGAGTATAGGTTAATAAATTGTAATACTTCAGTCATTA
                                                 *    ****** * * ** **      **** *** * ** ****** ** **********
                                                 CACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTG


    numbers of 
                            AY736838-gp120_      AGCAGGCTTGTCCAAAGATATCCTTTGATCCAATTCCTATACATTATTGTACTCCAGCTG
                                                    ***** ********* ********** ******** ************ * ** ****
                            HIVHXB2CG            GTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATG
                            AY736838-gp120_      GTTATGCGATTTTAAAGTGTAATGATAAGAATTTCAATGGGACAGGGCCATGTAAAAATG


    sequences.
                                                 *** ******* **** ****** ****** ******** ***** ******* *****
                            HIVHXB2CG            TCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAA
                            AY736838-gp120_      TCAGCTCAGTACAATGCACACATGGAATTAAGCCAGTGGTATCAACTCAATTGCTGTTAA
                                                 ***** ********** ************* ****** ************ *********
                            HIVHXB2CG            ATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTA
                            AY736838-gp120_      ATGGCAGTCTAGCAGAAGAAGAGATAATAATCAGATCTGAAGATCTCACAAACAATGCCA


    Gaps make analysis 
                                                 *********************** ** **** *******     ** **** ******* *
                            HIVHXB2CG            AAACCATAATAGTACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACA
●                           AY736838-gp120_      AAACCATAATAGTGCACCTTAATAAATCTGTAGAAATCAATTGTACCAGACCCTCCAACA
                                                 ************* ** ** ** * ************ ******** ****** *****
                            HIVHXB2CG            ATACAAGAAAAAGAATCCGTATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAA


    difficult               AY736838-gp120_

                            HIVHXB2CG
                            AY736838-gp120_
                                                 ATACAAGAACAAGTATAACTAT------AGGACCAGGACGAGTATTCTATAGAACAGGAG
                                                 ********* *** **    ***      ********* *** ***     ** ** ****
                                                 A---AATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACA
                                                 ATATAATAGGAAATATAAGAAAAGCATATTGTGAGATTAATGGAACAAAATGGAATAAAG
                                                 *    ************ *** ***** ***** * **** * ** *************
                            HIVHXB2CG            CTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCT
                            AY736838-gp120_      TTTTAAAACAGGTAACTGAAAAATTAAAAGAGCACTTT------AATAAGACAATAATCT
                                                   ********** ** **   ******* *** ** ***       ***** **********
                            HIVHXB2CG            TTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGG
                            AY736838-gp120_      TTCAACCACCCTCAGGAGGAGATCTAGAAATTACAATGCATCATTTTAATTGTAGAGGGG
                                                 ** * * * ********** ** * ******* ** ***      ********** ******
                            HIVHXB2CG            AATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGGA
                            AY736838-gp120_      AATTTTTCTATTGCAATACAACAAAACTGTTTAATAATATTTGCCTAGGAAATG---AAA
                                                 ********** ** *** ***** ************ ** *** *       * *       *
                            HIVHXB2CG            GTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCAGAATAA
                            AY736838-gp120_      CCATGGCGGGGTGTAATGACACT---------------ATCACACTTCCATGCAAGATAA
                                                    * * **** *** *****                   ***** ** ******* ****
New Tools
●   Clinical vs. evolutionary.
●   Avoid assumptions that break current tools.
●   Suitable for a repeatable process in clinics or 
    data mining in research.
●   We are using:
    ●   W­curve for analysis.
    ●   TSP for clustering.
    ●   R for data management & display.
W­curve
●   Geometric abstraction of DNA.
●   Manufactured by a simple state machine.
●   Alignment at finer scale available using 
    geometry than character strings.
●   Avoids assumptions about transition 
    probabilities by taking the figure as­is.
W­Curve Generator is a State Machine
●   C,A,T,G are assigned to corners of a square.
●   Successive points move halfway to the next 
    base's corner.
W­curve for “CG”
●   Curve shown 
    in Blue.
●   Halfway to C 
    then G in 
    X‑Y, single 
    steps in Z.
●   Cyl. storage 
    simplifies 
    comparision.
W­curve of Wild HIV­1 POL Gene
W­curve of Wild HIV­1 POL
W­curves of Wild & Drug Resistant Pol
Detail of Wild & Drug Resistant Pol
Distance Metric
●   Bases are arranged in 
    square to minimize 
    effects of SNP's.
●   Synonymous SNP's 
    are usually in the 
    same quadrant.
●   Points within same 
    quadrant have small 
    difference, opposite 
    quad's get larger.
Comparison Produces “Chunks”
●   Comparison yields a list of chunks.
●   Curves are aligned within the chunk.
●   Summing chunks gives single value two curves.
●   Analyzing them in detail allows mining local 
    similarities and variations.
●   Grouping allows examination of crossover­
    recombination events.
Clustering: Traveling Salesman Problem
●   The TSP is simple to describe, hard to solve:
    ●   Starting and finishing in the same city.
    ●   Visit a list of cities once each.
    ●   Minimize the distance (cost).
●   Optimal solutions will cluster the nearby cities.
●   The problem was always in defining the 
    clusters.
Take a Walk and Cluster Your Genes
●   Climer & Zhang, 2004.
●   Method for detecting N clusters:
    ●   Add  N dummy cities to the distance map.
    ●   Each one has the same, small distance to all other 
        cities (we use 2­20).
    ●   Dummy cities end up in the inter­cluster gaps.
●   The process is trivial to implement: just add that 
    many rows and columns to the original 
    comparison matrix.
Displaying the Tour
●   Mapping the tour onto a circle gives a good 
    view of the distances.
●   Coloring simplifies inspection.
    ●   Black dots for dummy cities.
    ●   Single type at the top (e.g. wild type).
    ●   Color successive data points using the “rainbow” 
        sequence with a large number of colors.
    ●   Sequences more alike get more similar colors.
Example with 8 D­R, 100 Samples
Multiple uses for color sequence.
●   Track individual over time.
    ●   Progression through colors shows history.
    ●   Clustering highlights progression towards drug 
        resistance.
●   Track sample population.
    ●   Recycling the colors from one initial tour helps show 
        changes in successive graphs.
    ●   Simplifies tracking progression in anonymous 
        populations found in HIV treatment centers.
Visualizing W­curves
●   We use a WebGL­based package “WebCurve”.
●   Developed at IIT as a web­friendly solution for  
    examining 3D geometry.
●   Gracefully handles displaying 100+ sequences 
    at 10K bases each on a notebook computer.
●   Available from github, archive includes a web 
    server and code to generate files for display.
Summary
●   W­curve and TSP allow us to cluster genes.
●   Provides a more useful output in a clinical 
    setting.
●   Color coding the TSP results allows tracking 
    changes in a population or progression an 
    individual over time.

Contenu connexe

Similaire à Clustering Genes: W-curve + TSP

Analysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequenceAnalysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequencenongkat
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsTaha A. Taha
 
Interesting Tuur
Interesting TuurInteresting Tuur
Interesting Tuurmeneertuur
 
Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)Alexander Ward
 
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Aniket Bagul
 
Stability resume
  Stability  resume  Stability  resume
Stability resumeRabah HELAL
 
primer analysis.pdf
primer analysis.pdfprimer analysis.pdf
primer analysis.pdfAaimaAfzaal
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis modelpunxsyscience
 
Protein synthesis model
Protein synthesis modelProtein synthesis model
Protein synthesis modelpunxsyscience
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis modelpunxsyscience
 
Steven detrie's protein synthesis model
Steven detrie's protein synthesis modelSteven detrie's protein synthesis model
Steven detrie's protein synthesis modelpunxsyscience
 

Similaire à Clustering Genes: W-curve + TSP (12)

Analysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequenceAnalysis of Manduca sexta Chitinase-h sequence
Analysis of Manduca sexta Chitinase-h sequence
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: Transcriptomics
 
Interesting Tuur
Interesting TuurInteresting Tuur
Interesting Tuur
 
Allegato 2
Allegato 2Allegato 2
Allegato 2
 
Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)Sequence Analysis - 2-16-16 (1)
Sequence Analysis - 2-16-16 (1)
 
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
 
Stability resume
  Stability  resume  Stability  resume
Stability resume
 
primer analysis.pdf
primer analysis.pdfprimer analysis.pdf
primer analysis.pdf
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis model
 
Protein synthesis model
Protein synthesis modelProtein synthesis model
Protein synthesis model
 
Steven detrie protein synthesis model
Steven detrie protein synthesis modelSteven detrie protein synthesis model
Steven detrie protein synthesis model
 
Steven detrie's protein synthesis model
Steven detrie's protein synthesis modelSteven detrie's protein synthesis model
Steven detrie's protein synthesis model
 

Plus de Workhorse Computing

Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWorkhorse Computing
 
Paranormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpParanormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpWorkhorse Computing
 
The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.Workhorse Computing
 
Generating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlGenerating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlWorkhorse Computing
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Workhorse Computing
 
BSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationBSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationWorkhorse Computing
 
BASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationBASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationWorkhorse Computing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.Workhorse Computing
 
Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Workhorse Computing
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Workhorse Computing
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Workhorse Computing
 

Plus de Workhorse Computing (20)

Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
 
mro-every.pdf
mro-every.pdfmro-every.pdf
mro-every.pdf
 
Paranormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpParanormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add Up
 
The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.
 
Unit Testing Lots of Perl
Unit Testing Lots of PerlUnit Testing Lots of Perl
Unit Testing Lots of Perl
 
Generating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlGenerating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in Posgresql
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
BSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationBSDM with BASH: Command Interpolation
BSDM with BASH: Command Interpolation
 
Findbin libs
Findbin libsFindbin libs
Findbin libs
 
Memory Manglement in Raku
Memory Manglement in RakuMemory Manglement in Raku
Memory Manglement in Raku
 
BASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationBASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic Interpolation
 
Effective Benchmarks
Effective BenchmarksEffective Benchmarks
Effective Benchmarks
 
Metadata-driven Testing
Metadata-driven TestingMetadata-driven Testing
Metadata-driven Testing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
 
Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
 
Smoking docker
Smoking dockerSmoking docker
Smoking docker
 
Getting Testy With Perl6
Getting Testy With Perl6Getting Testy With Perl6
Getting Testy With Perl6
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
 
Neatly folding-a-tree
Neatly folding-a-treeNeatly folding-a-tree
Neatly folding-a-tree
 

Dernier

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Clustering Genes: W-curve + TSP

  • 2. HIV­1, W­curves, & Shoe Leather ● Existing genetics tools fail on HIV­1 ● They make assumptions based on “normal” DNA  that fail on HIV – or cancer, or plants. ● Correlation tools look at evolution, not state. ● We are working on tools for clinical analysis. ● The W­curve abstracts DNA into geometry. ● The TSP clusters genenes rather than trying to  impute inheritence.
  • 3. Sequences Inform Treatment ● Treating HIV requires sequencing it to choose  appropriate drugs: ● HIV­1 evolves drug resistence in months. ● Multiple strains in a single pateint are common,  both from multiple sources or evolution. ● Crossover recombination relatively common due to  cross­infected cells.
  • 4. Problem: HIV is Hard to Analyze ● HIV is a non­correcting retrovirus. ● Evolves 10,000 times faster than humans or  influenza – one new strain per patient per day. ● Genomes for wild types range from 8349 to  9829 bases, making localized comparisions  difficult. ● The single FDA approved algorithm directing  treatment from sequence handles only type­B;  the U.S. Army has 15%+ non­B infections.
  • 5. The Current Tools ● Blast, Fasta, ClustalW perform alignment. ● Table­driven analysis of base transitions. ● Score the entire sequence with a single value. ● Graphical tools are designed to display  inheritence rather than state. ● Output is difficult to read in a clinical setting.
  • 6. Phenogram of Drug­ Resistant and Random Samples ● Tries to show ancestory,  not state. ● Not very good for visual  identification of which  patients are drug  resistant.
  • 8. HIVHXB2CG TGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGA AY736838-gp120_ -------------------------------TACAGTTTATTATGGGGTGCCTGTGTGGA ***** *********** ********** HIVHXB2CG AGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTAC AY736838-gp120_ GAGATGCAGATACCACCCTATTTTGTGCATCAGATGCCAAGGCACATGAGACAGAAGTGC ClustalW of gp120 ** *** ***** ******************** ** *** **** ***** ** * HIVHXB2CG ATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTAT AY736838-gp120_ ACAATGTCTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATACACC * ***** ********************************************* ** HIVHXB2CG TGGTAAATGTGACAGAAAATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATG AY736838-gp120_ TGGAAAATGTAACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAGCAGATGCAGG *** ****** *************************** ********** ******** * HIVHXB2CG AGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACTCT AY736838-gp120_ AGGATGTAATCAGTTTATGGGATCAAAGTCTAAAGCCATGTGTAAAGTTAACTCCTCTCT ***** ********************** ***************** ***** ** **** Difficult to compare  HIVHXB2CG GTGTTAGTTTAAAGTGCAC------TGATTTGAAGAATGATACTAATACCAATAGTAGTA AY736838-gp120_ GCGTTACTTTAAATTGTACCAATGCTAATTTGACCAATGGCAGTAGCAAAACCAATGTCT ● * **** ****** ** ** * ****** **** * ** * * * * HIVHXB2CG GCGGGAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAA AY736838-gp120_ CTAACATAATAGGAAATATAACAGATGAAGTAAGAAACTGTACTTTTAATATGACCACAG sequences vis.ually. * *** ** * ** ** *** ****** **** ***** * **** HIVHXB2CG GCATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTT TTATAAACTTGATATAATACCAA AY736838-gp120_ AACTAACAGATAAGAAGCAGAAGGTCCATGCACTCTTTTATAAGCTTGATATAGTACAAA *** ** **** ****** * ***** * ******** ********* *** ** ● Not useful for large  HIVHXB2CG AY736838-gp120_ HIVHXB2CG T---AGATAATGATACTACCAGC---TATAAGTTGACAAGTTGTAACACCTCAGTCATTA TTGAAGATAAGAAGAATAGTAGTGAGTATAGGTTAATAAATTGTAATACTTCAGTCATTA * ****** * * ** ** **** *** * ** ****** ** ********** CACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTG numbers of  AY736838-gp120_ AGCAGGCTTGTCCAAAGATATCCTTTGATCCAATTCCTATACATTATTGTACTCCAGCTG ***** ********* ********** ******** ************ * ** **** HIVHXB2CG GTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATG AY736838-gp120_ GTTATGCGATTTTAAAGTGTAATGATAAGAATTTCAATGGGACAGGGCCATGTAAAAATG sequences. *** ******* **** ****** ****** ******** ***** ******* ***** HIVHXB2CG TCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAA AY736838-gp120_ TCAGCTCAGTACAATGCACACATGGAATTAAGCCAGTGGTATCAACTCAATTGCTGTTAA ***** ********** ************* ****** ************ ********* HIVHXB2CG ATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTA AY736838-gp120_ ATGGCAGTCTAGCAGAAGAAGAGATAATAATCAGATCTGAAGATCTCACAAACAATGCCA Gaps make analysis  *********************** ** **** ******* ** **** ******* * HIVHXB2CG AAACCATAATAGTACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACA ● AY736838-gp120_ AAACCATAATAGTGCACCTTAATAAATCTGTAGAAATCAATTGTACCAGACCCTCCAACA ************* ** ** ** * ************ ******** ****** ***** HIVHXB2CG ATACAAGAAAAAGAATCCGTATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAA difficult AY736838-gp120_ HIVHXB2CG AY736838-gp120_ ATACAAGAACAAGTATAACTAT------AGGACCAGGACGAGTATTCTATAGAACAGGAG ********* *** ** *** ********* *** *** ** ** **** A---AATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAATAACA ATATAATAGGAAATATAAGAAAAGCATATTGTGAGATTAATGGAACAAAATGGAATAAAG * ************ *** ***** ***** * **** * ** ************* HIVHXB2CG CTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCT AY736838-gp120_ TTTTAAAACAGGTAACTGAAAAATTAAAAGAGCACTTT------AATAAGACAATAATCT ********** ** ** ******* *** ** *** ***** ********** HIVHXB2CG TTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGG AY736838-gp120_ TTCAACCACCCTCAGGAGGAGATCTAGAAATTACAATGCATCATTTTAATTGTAGAGGGG ** * * * ********** ** * ******* ** *** ********** ****** HIVHXB2CG AATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGGA AY736838-gp120_ AATTTTTCTATTGCAATACAACAAAACTGTTTAATAATATTTGCCTAGGAAATG---AAA ********** ** *** ***** ************ ** *** * * * * HIVHXB2CG GTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCAGAATAA AY736838-gp120_ CCATGGCGGGGTGTAATGACACT---------------ATCACACTTCCATGCAAGATAA * * **** *** ***** ***** ** ******* ****
  • 9. New Tools ● Clinical vs. evolutionary. ● Avoid assumptions that break current tools. ● Suitable for a repeatable process in clinics or  data mining in research. ● We are using: ● W­curve for analysis. ● TSP for clustering. ● R for data management & display.
  • 10. W­curve ● Geometric abstraction of DNA. ● Manufactured by a simple state machine. ● Alignment at finer scale available using  geometry than character strings. ● Avoids assumptions about transition  probabilities by taking the figure as­is.
  • 11. W­Curve Generator is a State Machine ● C,A,T,G are assigned to corners of a square. ● Successive points move halfway to the next  base's corner.
  • 12. W­curve for “CG” ● Curve shown  in Blue. ● Halfway to C  then G in  X‑Y, single  steps in Z. ● Cyl. storage  simplifies  comparision.
  • 16. Distance Metric ● Bases are arranged in  square to minimize  effects of SNP's. ● Synonymous SNP's  are usually in the  same quadrant. ● Points within same  quadrant have small  difference, opposite  quad's get larger.
  • 17. Comparison Produces “Chunks” ● Comparison yields a list of chunks. ● Curves are aligned within the chunk. ● Summing chunks gives single value two curves. ● Analyzing them in detail allows mining local  similarities and variations. ● Grouping allows examination of crossover­ recombination events.
  • 18. Clustering: Traveling Salesman Problem ● The TSP is simple to describe, hard to solve: ● Starting and finishing in the same city. ● Visit a list of cities once each. ● Minimize the distance (cost). ● Optimal solutions will cluster the nearby cities. ● The problem was always in defining the  clusters.
  • 19. Take a Walk and Cluster Your Genes ● Climer & Zhang, 2004. ● Method for detecting N clusters: ● Add  N dummy cities to the distance map. ● Each one has the same, small distance to all other  cities (we use 2­20). ● Dummy cities end up in the inter­cluster gaps. ● The process is trivial to implement: just add that  many rows and columns to the original  comparison matrix.
  • 20. Displaying the Tour ● Mapping the tour onto a circle gives a good  view of the distances. ● Coloring simplifies inspection. ● Black dots for dummy cities. ● Single type at the top (e.g. wild type). ● Color successive data points using the “rainbow”  sequence with a large number of colors. ● Sequences more alike get more similar colors.
  • 21.
  • 22.
  • 24. Multiple uses for color sequence. ● Track individual over time. ● Progression through colors shows history. ● Clustering highlights progression towards drug  resistance. ● Track sample population. ● Recycling the colors from one initial tour helps show  changes in successive graphs. ● Simplifies tracking progression in anonymous  populations found in HIV treatment centers.
  • 25. Visualizing W­curves ● We use a WebGL­based package “WebCurve”. ● Developed at IIT as a web­friendly solution for   examining 3D geometry. ● Gracefully handles displaying 100+ sequences  at 10K bases each on a notebook computer. ● Available from github, archive includes a web  server and code to generate files for display.
  • 26. Summary ● W­curve and TSP allow us to cluster genes. ● Provides a more useful output in a clinical  setting. ● Color coding the TSP results allows tracking  changes in a population or progression an  individual over time.