2. HIV1, Wcurves, & Shoe Leather
● Existing genetics tools fail on HIV1
● They make assumptions based on “normal” DNA
that fail on HIV – or cancer, or plants.
● Correlation tools look at evolution, not state.
● We are working on tools for clinical analysis.
● The Wcurve abstracts DNA into geometry.
● The TSP clusters genenes rather than trying to
impute inheritence.
3. Sequences Inform Treatment
● Treating HIV requires sequencing it to choose
appropriate drugs:
● HIV1 evolves drug resistence in months.
● Multiple strains in a single pateint are common,
both from multiple sources or evolution.
● Crossover recombination relatively common due to
crossinfected cells.
4. Problem: HIV is Hard to Analyze
● HIV is a noncorrecting retrovirus.
● Evolves 10,000 times faster than humans or
influenza – one new strain per patient per day.
● Genomes for wild types range from 8349 to
9829 bases, making localized comparisions
difficult.
● The single FDA approved algorithm directing
treatment from sequence handles only typeB;
the U.S. Army has 15%+ nonB infections.
5. The Current Tools
● Blast, Fasta, ClustalW perform alignment.
● Tabledriven analysis of base transitions.
● Score the entire sequence with a single value.
● Graphical tools are designed to display
inheritence rather than state.
● Output is difficult to read in a clinical setting.
9. New Tools
● Clinical vs. evolutionary.
● Avoid assumptions that break current tools.
● Suitable for a repeatable process in clinics or
data mining in research.
● We are using:
● Wcurve for analysis.
● TSP for clustering.
● R for data management & display.
10. Wcurve
● Geometric abstraction of DNA.
● Manufactured by a simple state machine.
● Alignment at finer scale available using
geometry than character strings.
● Avoids assumptions about transition
probabilities by taking the figure asis.
16. Distance Metric
● Bases are arranged in
square to minimize
effects of SNP's.
● Synonymous SNP's
are usually in the
same quadrant.
● Points within same
quadrant have small
difference, opposite
quad's get larger.
17. Comparison Produces “Chunks”
● Comparison yields a list of chunks.
● Curves are aligned within the chunk.
● Summing chunks gives single value two curves.
● Analyzing them in detail allows mining local
similarities and variations.
● Grouping allows examination of crossover
recombination events.
18. Clustering: Traveling Salesman Problem
● The TSP is simple to describe, hard to solve:
● Starting and finishing in the same city.
● Visit a list of cities once each.
● Minimize the distance (cost).
● Optimal solutions will cluster the nearby cities.
● The problem was always in defining the
clusters.
19. Take a Walk and Cluster Your Genes
● Climer & Zhang, 2004.
● Method for detecting N clusters:
● Add N dummy cities to the distance map.
● Each one has the same, small distance to all other
cities (we use 220).
● Dummy cities end up in the intercluster gaps.
● The process is trivial to implement: just add that
many rows and columns to the original
comparison matrix.
20. Displaying the Tour
● Mapping the tour onto a circle gives a good
view of the distances.
● Coloring simplifies inspection.
● Black dots for dummy cities.
● Single type at the top (e.g. wild type).
● Color successive data points using the “rainbow”
sequence with a large number of colors.
● Sequences more alike get more similar colors.
24. Multiple uses for color sequence.
● Track individual over time.
● Progression through colors shows history.
● Clustering highlights progression towards drug
resistance.
● Track sample population.
● Recycling the colors from one initial tour helps show
changes in successive graphs.
● Simplifies tracking progression in anonymous
populations found in HIV treatment centers.
25. Visualizing Wcurves
● We use a WebGLbased package “WebCurve”.
● Developed at IIT as a webfriendly solution for
examining 3D geometry.
● Gracefully handles displaying 100+ sequences
at 10K bases each on a notebook computer.
● Available from github, archive includes a web
server and code to generate files for display.
26. Summary
● Wcurve and TSP allow us to cluster genes.
● Provides a more useful output in a clinical
setting.
● Color coding the TSP results allows tracking
changes in a population or progression an
individual over time.