4. Homologous refers to conclusion drawn from the data that the two genes or sequences have descended from a common ancestor Homologous sequences are of two types Orthologous Homologous sequences in different species that arose from a common ancestral gene during speciation Parologous Homologous sequences within a single species that arose by gene duplication
5. What is Alignment ? Explicit mapping between two or more sequences To place one sequence over another in such a fashion so as to get maximum similarity SEQUENCE ALIGNMENT STRUCTURAL ALIGNMENT
22. Extreme Value Distribution Probability density function for the extreme value distribution resulting from parameter values = 0 and = 1, [ y = 1 – exp(- e -x )], where is the characteristic value and is the decay constant. y = 1 – exp(- e - ( x - ) )
23. Extreme Value Distribution (EDV) You know that an optimal alignment of two sequences is selected out of many suboptimal alignments, and that a database search is also about selecting the best alignment(s). This bodes well with the EDV which has a right tail that falls off more slowly than the left tail. Compared to using the normal distribution, when using the EDV an alignment has to score further away from the expected mean value to become a significant hit. real data EDV approximation
24. Extreme Value Distribution The probability of a score S to be larger than a given value x can be calculated following the EDV as: E-value: P ( S x ) = 1 – exp(- e - ( x - ) ) , where =(ln Kmn )/ , and K a constant that can be estimated from the background amino acid distribution and scoring matrix (see Altschul and Gish, 1996, for a collection of values for and K over a set of widely used scoring matrices).
25. Extreme Value Distribution Using the equation for (preceding slide), the probability for the raw alignment score S becomes P ( S x ) = 1 – exp(- Kmne - x ). In practice, the probability P ( S x ) is estimated using the approximation 1 – exp(- e -x ) e -x , which is valid for large values of x . This leads to a simplification of the equation for P ( S x ): P ( S x ) e - (x- ) = Kmn e - x . The lower the probability (E value) for a given threshold value x, the more significant the score S .
26.
27.
28. Construction of the Lookup Table Position Number Residue Seq 1 Seq2 Offset(p1-p2) F 1 - - L 2 - - W 3,6 2,5 1(3,2) 1(6,5) 4(6,2) -2(3,5) R 4 - - T 5 4,6 1(5,4) - 1(5,6) S 7 1 6(7,1) K - 3 - Pos no. 1 2 3 4 5 6 7 Sequence 1 F L W R T W S Sequence 2 S W K T W T
29. Calculation of Offset Frequency Offset Frequency 1 3 4 1 -1 1 -2 1 6 1 Final Local Alignment Pos no. 1 2 3 4 5 6 7 Sequence 1 F L W R T W S Sequence 2 - S W K T W T
30. Extreme Value Distribution Using the equation for (preceding slide), the probability for the raw alignment score S becomes P ( S x ) = 1 – exp(- Kmne - x ). In practice, the probability P ( S x ) is estimated using the approximation 1 – exp(- e -x ) e -x , which is valid for large values of x . This leads to a simplification of the equation for P ( S x ): P ( S x ) e - (x- ) = Kmn e - x . The lower the probability (E value) for a given threshold value x, the more significant the score S .