The document describes algorithms for sequence alignment, including the Needleman-Wunsch algorithm for global alignment and the Smith-Waterman algorithm for local alignment. It provides examples of filling in a scoring matrix according to the algorithms' recurrence relations and tracing back to find the optimal alignment between two sequences.
2. Where we left off
• Output from Sequencing:
AACTGACCTA…
CCGTTGGCAT…
TTTGCGGTCA…
…
…
3. Where we left off
• I have hundreds of millions of short
• Most applications need to figure out where they
came from
• Given a reference genome sequence and millions of
short sequences, how do I figure out where each of
the short sequences came from?
4. What is an algorithm?
• A process or set of rules to be followed in
calculations or other problem-solving operations,
esp. by a computer
• X = 2 * n
• Make a burrito
• Unwrap burrito
• Place on plate in microwave
• Turn microwave on for 2 minutes
5. What makes a good algorithmic
solution?
• Speed
• Memory
• Optimality of answer
6. The alignment problem
• Input
• Two sequences s and t of length n and m
• Output
• An alignment between the two sequences with gaps
inserted appropriately
• Objective Function
• A scoring function that weights particular character to
character alignments
7. How fast do I have to eat?
• Input
• # of sandwiches
• Output
• Sandwiches / minute
• Objective function
• Minimize the number of sandwiches I have to eat per
minute such that I finish all sandwiches in an hour
8. Scoring Function
• Which alignment of ACCTG and ACTTG is better?
• AATAC AATA-C
• ATATC -ATATC
• How did you decide?
• Example scoring function:
• +1 for matches
• -1 for gaps and mismatches
11. Rules
• Fill in top row and left-most column according to
scoring function
• Start in the upper left-most corner of unfilled squares
• Move left to right filling in the result of the scoring
function
• Break ties arbitrarily
• Trace back from bottom right corner to upper left
corner
45. Rules
• Fill in top row and left-most column according to
scoring function
• Start in the upper left-most corner of unfilled squares
• Move left to right filling in the result of the scoring
function
• Break ties arbitrarily
• Trace back from the max element in the matrix to the
first STOP
46. Rules
• Fill in top row and left-most column according to
scoring function
• Start in the upper left-most corner of unfilled squares
• Move left to right filling in the result of the scoring
function
• Break ties arbitrarily
• Trace back from the max element in the matrix to the
first STOP
79. How to evaluate statistical
significance?
• Everyone pick a number between 1 and 10 (keep it
to yourself!)
80. The problem with databases
• Query is: ACCT
• Is a match significant?
• Database A:
• ACCT
• CAGG
• AAAA
• Database B:
• ACCT
• ACCT
• ACCT
81. Alignment Projects
• Research BWA
• Research Bowtie
• Research MAQ
• Code a program in the language of your choice that
performs Needleman-Wunsch or Smith Waterman