Exploring Genomes Using DNA Walks in A-T G-C Space
1. Genome exploration in A-T G-C space introducing Icarus a DNA walking program Jonathan Blakes MSc Biotechnology and Computation Department of Biosciences Faculty of Science, Technology and Medical Studies
4. Hypothesis Can DNA sequences be plotted in such a way that long sequences can be easily interpreted by humans without a prior i knowledge? “ It seems that the simplest method of visualizing some properties of genomes is to send a virtual walker for a genomic walk, ask "it" to talk about what it has seen and note its observations. If our walker doesn't move with a Brownian-like motion, it is possible to extract from its walk a lot of information . ” Stanislaw Cebrat , the principal Polish proponent of DNA walks Assigning a cardinal coordinate ( north , south , east or west ) to each of the four nucleotide bases ( A , T , G , C ) and taking steps in those directions as a sequence is read sequentially will produce a ‘walk’ of the sequence in which repetitive DNA elements will be seen as repetitive 2-dimensional ‘structures’.
5. DNA walks are plots of DNA or RNA sequences where each of the four nucleotide bases is assigned a direction and distance, the sequence is read off one nucleotide at a time and for each nucleotide the virtual walker takes a step in the designated direction creating a 'walk' of the sequence that reveals elements of structure in the nucleotide composition. DNA walking From Comparative Genometrics website, L'Université de Lausanne
7. Mapping 24 possible combinations of cardinal vectors: 4 rotations for each of the 3 above mappings, and 4 rotations of each of their reflections about the x or y plane. Choosing which 3 ‘unique’ mappings of those 24 is a matter of parsimony.
12. A-T G-C is consistently smallest Smaller pictures can contain more information in less space and are therefore more amenable to publication, hence Genome Exploration in A-T G-C space
13. Duplications exons introns a 7 fold contiguous duplication in the male Y chromosome. Members of the TSPY (Testis-specific Y-encoded proteins) family identified by Skaletsky et al 1 using a combination of a whole chromosome dotplot with a 2-kb window and a custom Perl script running BLAST alignments of all 5-kb sequence segments, in 2-kb steps, of the entire MSY (Male Specific Y). In contrast I stumbled upon this purely by accident. 1. Skaletsky et al. Nature 2003 423.
19. Does summing distances from 3 mappings eliminate bias and produce a better phylogeny? NO. A better distance measure is needed.
20.
21. Acknowledgements I would like to thank: Dr. Gary Robinson Dr. Colin Johnson Dr. Anthony Baines And everyone I have met during the Biotechnology and Computation MSc.