SlideShare une entreprise Scribd logo
1  sur  18
Protein structure alignment
beyond spatial proximity
3DSIG 2012
Jul 14, Long Beach, California
Sheng WANG
Toyota Technological Institute at Chicago
Related works on Pairwise Structure Alignment
1
2
Almost all the structure alignment tools
TMalign, fr-TMalign
3 DALI,
MUSTANG
4
MAMMOTH, Vorolign,
YAKUSA
5 FATCAT, CE, MATT,
FlexProt
Note: all proteins we align only consider their C-alpha atom
Our contribution
Design a scoring function
• local sub-structure similarity
• evolutionary and functional information
• angular similarity for hydrogen bonding
Employ a fast and efficient search algorithm
• from highly similar local sub-structures pair (SFP)
• recruit new SFPs that satisfies spatial constrains
• final refine the alignment within a bound
Scoring Function
local similarity global similarity
CLESUM is the local structure substitution matrix;
BLOSUM is the amino acid substitution matrix;
v(i,j) measures the angular similarity using three vectors;
d(i,j) measures the spatial proximity of two aligned residues.
Note: both v(i,j) and d(i,j) are calculated after rigid-body superposition.
Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
θ
θ’
τ
i-2
i-1
i
i+1
(A
)
(B)
RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCP
LDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR
The transformation from 3D
structure to 1D CLE strings
alpha
beta
coil
S Wang, WM Zheng, “CLePAPS: Fast Pair Alignment of Protein
Structures Based on Conformational Letters.” JBCB, 2008
CLESUM : Conformational LEtter SUbstitution Matrix
Mij = 20* log 2 (Pij/PiPj)
Note: CLESUM is constructed using FSSP representatives.
typical helix
typical sheet
evolutionary
+ geometric
HHHHHHH
EGHILLI
DGHVLLV
HHHHHHH
HHHHHHH
GHILLIQ
DGHVLLV
HHHHHHH
(A) (B)
correct incorrect
Same CLESUM, different BLOSUM
CLE ->
AMI ->
Why Max and Add ?
max(0,CLESUM(i,j)+BLOSUM(i,j) )
BLOSUM
CLESUM
+ -
+
-
√ o
×o
Note: log (Cij/ CiCj) + log (Bij/ BiBj) = log(CijBij / CiCj BiBj)
(A) (B)
incorrect correct
smaller RMSD larger RMSD
Why use angular similarity ?
The three vectors used in the vect-score v(i,j).
Using three vector's deviation for angular similarity
DeepAlign-score
SFP_long
SFP_short
Search Algorithm
[2] SFP stands for Similar Fragment Pair, using ∑max(0,CLESUM(i,j)+BLOSUM(i,j) )
Note:
[1] TopK > TopJ > M
Sort both SFP lists
SFP_long
score rank
5 2 4 1
Example: TopK = 5; TopJ = 1
# of consistent SFPs = 4 # of consistent SFPs = 1
From TopK coarse-grained to TopJ fine-grained initial alignment
Top2 SFP is globally supported by three other SFPs,
while Top1 SFP is supported only by itself.
3
Third
Update
d1 d2
d3
d1 > d2 > d3
Output
Alignment
Fisrt
Update
Second
Update
Refine each fine-grained initial alignment by three iteration
Final refinement
SFP_short score rank
(high -> low)
Final refinement on DeepAlign-score only in bounded area
(1) refined fine-grained alignment (2) bounded area upon the alignment
(3) dynamic programming to find a path
with maximal DeepAlign-score within
bounded area
• CDD (Conserved Domain Database): contains 3591
conserved domain structure alignments.
• MALUDUP: contains 241 alignments for homologous
domains originated from internal duplication.
• MALISAM: contains 130 alignments for structurally
analogous motifs in proteins.
Result on manually-curated data
Result on discrimination data
• We use SABmark to test the ability of identifying distant
homologs (super-family) and structural analogs (fold)
among those negative data (with no structural similarity)
DeepAlign
DeepAlign
super-family fold
One example
Superimposition of domain d1pqsa_ and d1poh__ from
MALISAM. (A) TMalign, (B) DeepAlign optimizing TM-
score and (C) DeepAlign.
TMscore
0.288
TMscore
0.514
TMscore
0.473
Thank you !!
Please find the executable program of DeepAlign at:
http://ttic.uchicago.edu/~jinbo/DeepAlign/DeepAlign_exe_V1.00.tar.gz

Contenu connexe

Tendances

DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
 
Data base management systems question paper
Data base management systems question paperData base management systems question paper
Data base management systems question papersuthi
 
Modified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystemsModified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystemsIAEME Publication
 
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...Sungha Choi
 
An optimal general type-2 fuzzy controller for Urban Traffic Network
An optimal general type-2 fuzzy controller for Urban Traffic NetworkAn optimal general type-2 fuzzy controller for Urban Traffic Network
An optimal general type-2 fuzzy controller for Urban Traffic NetworkISA Interchange
 
Crdom cell re ordering based domino on-the-fly mapping
Crdom  cell re ordering based domino on-the-fly mappingCrdom  cell re ordering based domino on-the-fly mapping
Crdom cell re ordering based domino on-the-fly mappingVLSICS Design
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...csandit
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...cscpconf
 
An Alternative Genetic Algorithm to Optimize OSPF Weights
An Alternative Genetic Algorithm to Optimize OSPF WeightsAn Alternative Genetic Algorithm to Optimize OSPF Weights
An Alternative Genetic Algorithm to Optimize OSPF WeightsEM Legacy
 
IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...
IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...
IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...EM Legacy
 
Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...TELKOMNIKA JOURNAL
 
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...Voica Gavrilut
 
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORCOUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORIJNSA Journal
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesVLSICS Design
 
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...grssieee
 
A downlink scheduler supporting real time services in LTE cellular networks
A downlink scheduler supporting real time services in LTE cellular networksA downlink scheduler supporting real time services in LTE cellular networks
A downlink scheduler supporting real time services in LTE cellular networksUniversity of Piraeus
 
1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-main1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-mainPraveen Jesudhas
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...Kusano Hitoshi
 
Knowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path PlanningKnowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path PlanningTarundeep Dhot
 

Tendances (20)

DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
Data base management systems question paper
Data base management systems question paperData base management systems question paper
Data base management systems question paper
 
Modified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystemsModified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystems
 
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...
(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...
 
An optimal general type-2 fuzzy controller for Urban Traffic Network
An optimal general type-2 fuzzy controller for Urban Traffic NetworkAn optimal general type-2 fuzzy controller for Urban Traffic Network
An optimal general type-2 fuzzy controller for Urban Traffic Network
 
Crdom cell re ordering based domino on-the-fly mapping
Crdom  cell re ordering based domino on-the-fly mappingCrdom  cell re ordering based domino on-the-fly mapping
Crdom cell re ordering based domino on-the-fly mapping
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
 
An Alternative Genetic Algorithm to Optimize OSPF Weights
An Alternative Genetic Algorithm to Optimize OSPF WeightsAn Alternative Genetic Algorithm to Optimize OSPF Weights
An Alternative Genetic Algorithm to Optimize OSPF Weights
 
IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...
IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...
IMPACT OF PARTIAL DEMAND INCREASE ON THE PERFORMANCE OF IP NETWORKS AND RE-OP...
 
Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...
 
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
 
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORCOUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder Topologies
 
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
 
A downlink scheduler supporting real time services in LTE cellular networks
A downlink scheduler supporting real time services in LTE cellular networksA downlink scheduler supporting real time services in LTE cellular networks
A downlink scheduler supporting real time services in LTE cellular networks
 
1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-main1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-main
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
 
PRML 5.5
PRML 5.5PRML 5.5
PRML 5.5
 
Knowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path PlanningKnowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path Planning
 

En vedette

STRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningSTRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningLars Juhl Jensen
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
Protein database ..... of NCBI
Protein database ..... of NCBI Protein database ..... of NCBI
Protein database ..... of NCBI Alagppa University
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database nadeem akhter
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

En vedette (12)

STRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningSTRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Protein database ..... of NCBI
Protein database ..... of NCBI Protein database ..... of NCBI
Protein database ..... of NCBI
 
Protein database
Protein databaseProtein database
Protein database
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
protein data bank
protein data bankprotein data bank
protein data bank
 
Biological databases
Biological databasesBiological databases
Biological databases
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Similaire à Protein structure alignment beyond spatial proximity 3 dsig_2012

Efficient Forecasting of Exchange rates with Recurrent FLANN
Efficient Forecasting of Exchange rates with Recurrent FLANNEfficient Forecasting of Exchange rates with Recurrent FLANN
Efficient Forecasting of Exchange rates with Recurrent FLANNIOSR Journals
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
 
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsLTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsIllaKolani1
 
computervision project
computervision projectcomputervision project
computervision projectLianli Liu
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachEditor Jacotech
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachEditor Jacotech
 
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor NetworkEnergy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Networkijsrd.com
 
A0420105
A0420105A0420105
A0420105inventy
 
A0420105
A0420105A0420105
A0420105inventy
 
Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition
Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition
Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition IJECEIAES
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...
COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...
COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...ijwmn
 
research journal
research journalresearch journal
research journalakhila1001
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...ijceronline
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionMengmeng Xu
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...csandit
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
EFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSO
EFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSOEFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSO
EFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSOcscpconf
 

Similaire à Protein structure alignment beyond spatial proximity 3 dsig_2012 (20)

Efficient Forecasting of Exchange rates with Recurrent FLANN
Efficient Forecasting of Exchange rates with Recurrent FLANNEfficient Forecasting of Exchange rates with Recurrent FLANN
Efficient Forecasting of Exchange rates with Recurrent FLANN
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsLTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
 
computervision project
computervision projectcomputervision project
computervision project
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor NetworkEnergy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
 
A0420105
A0420105A0420105
A0420105
 
A0420105
A0420105A0420105
A0420105
 
Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition
Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition
Optimized Kernel Extreme Learning Machine for Myoelectric Pattern Recognition
 
Group Project
Group ProjectGroup Project
Group Project
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...
COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...
COMP-JT WITH DYNAMIC CELL SELECTION, GLOBAL PRECODING MATRIX AND IRC RECEIVER...
 
research journal
research journalresearch journal
research journal
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action Detection
 
H046014853
H046014853H046014853
H046014853
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
EFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSO
EFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSOEFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSO
EFFECTS OF THE DIFFERENT MIGRATION PERIODS ON PARALLEL MULTI-SWARM PSO
 

Dernier

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Protein structure alignment beyond spatial proximity 3 dsig_2012

  • 1. Protein structure alignment beyond spatial proximity 3DSIG 2012 Jul 14, Long Beach, California Sheng WANG Toyota Technological Institute at Chicago
  • 2. Related works on Pairwise Structure Alignment 1 2 Almost all the structure alignment tools TMalign, fr-TMalign 3 DALI, MUSTANG 4 MAMMOTH, Vorolign, YAKUSA 5 FATCAT, CE, MATT, FlexProt Note: all proteins we align only consider their C-alpha atom
  • 3. Our contribution Design a scoring function • local sub-structure similarity • evolutionary and functional information • angular similarity for hydrogen bonding Employ a fast and efficient search algorithm • from highly similar local sub-structures pair (SFP) • recruit new SFPs that satisfies spatial constrains • final refine the alignment within a bound
  • 4. Scoring Function local similarity global similarity CLESUM is the local structure substitution matrix; BLOSUM is the amino acid substitution matrix; v(i,j) measures the angular similarity using three vectors; d(i,j) measures the spatial proximity of two aligned residues. Note: both v(i,j) and d(i,j) are calculated after rigid-body superposition. Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
  • 5. θ θ’ τ i-2 i-1 i i+1 (A ) (B) RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCP LDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR The transformation from 3D structure to 1D CLE strings alpha beta coil S Wang, WM Zheng, “CLePAPS: Fast Pair Alignment of Protein Structures Based on Conformational Letters.” JBCB, 2008
  • 6. CLESUM : Conformational LEtter SUbstitution Matrix Mij = 20* log 2 (Pij/PiPj) Note: CLESUM is constructed using FSSP representatives. typical helix typical sheet evolutionary + geometric
  • 8. Why Max and Add ? max(0,CLESUM(i,j)+BLOSUM(i,j) ) BLOSUM CLESUM + - + - √ o ×o Note: log (Cij/ CiCj) + log (Bij/ BiBj) = log(CijBij / CiCj BiBj)
  • 9. (A) (B) incorrect correct smaller RMSD larger RMSD Why use angular similarity ?
  • 10. The three vectors used in the vect-score v(i,j). Using three vector's deviation for angular similarity
  • 11. DeepAlign-score SFP_long SFP_short Search Algorithm [2] SFP stands for Similar Fragment Pair, using ∑max(0,CLESUM(i,j)+BLOSUM(i,j) ) Note: [1] TopK > TopJ > M Sort both SFP lists
  • 12. SFP_long score rank 5 2 4 1 Example: TopK = 5; TopJ = 1 # of consistent SFPs = 4 # of consistent SFPs = 1 From TopK coarse-grained to TopJ fine-grained initial alignment Top2 SFP is globally supported by three other SFPs, while Top1 SFP is supported only by itself. 3
  • 13. Third Update d1 d2 d3 d1 > d2 > d3 Output Alignment Fisrt Update Second Update Refine each fine-grained initial alignment by three iteration Final refinement SFP_short score rank (high -> low)
  • 14. Final refinement on DeepAlign-score only in bounded area (1) refined fine-grained alignment (2) bounded area upon the alignment (3) dynamic programming to find a path with maximal DeepAlign-score within bounded area
  • 15. • CDD (Conserved Domain Database): contains 3591 conserved domain structure alignments. • MALUDUP: contains 241 alignments for homologous domains originated from internal duplication. • MALISAM: contains 130 alignments for structurally analogous motifs in proteins. Result on manually-curated data
  • 16. Result on discrimination data • We use SABmark to test the ability of identifying distant homologs (super-family) and structural analogs (fold) among those negative data (with no structural similarity) DeepAlign DeepAlign super-family fold
  • 17. One example Superimposition of domain d1pqsa_ and d1poh__ from MALISAM. (A) TMalign, (B) DeepAlign optimizing TM- score and (C) DeepAlign. TMscore 0.288 TMscore 0.514 TMscore 0.473
  • 18. Thank you !! Please find the executable program of DeepAlign at: http://ttic.uchicago.edu/~jinbo/DeepAlign/DeepAlign_exe_V1.00.tar.gz

Notes de l'éditeur

  1. Currently, in our market there exists a bunch of pairwise structure alignment tools. In fact, the RMSD and number of aligned residue pairs (Ne) could be considered as the universal measurement for all the structure alignment tools. However, using this scoring function will have the drawback that, consider just a few aligned pairs that have a very larger local distance, then the whole RMSD will become large, even though all the other aligned pairs have a very small distance. Actually, each methods are unique in their scoring functions as well as their search algorithm that aiming to maximize the scores. Just take TMalign as example, it put the di into the denominator that could solve the RMSD’s drawback by lower down the contribution of those outlier’s pair while enhance those good-aligned pairs. However, it’s obvious that, in current popular scoring functions, they more consider the geometrical distance, while neglect the other important measure such as evolutionary or functional relationship between the aligned residue pairs.
  2. Our contribution of this work is in two fold: First, we’ve designed a new scoring function that considers the three following things. One, we consider the two proteins’s local sub-structure’s similarity. Two, we consider the evolutionary as well as functional information among the two proteins. Three, we describe the hydrogen bonding similarity using a vect-based score. Second, we’ve employed a search algorithm that could maximize our scoring function while keep the running time fast. It starts from highly similar local sub-structure pair using the local part of the score, while recruit new SFPs that satisfies the global part constrains of the score. Finally, a dynamic programming refinement is applied on the whole part of the scoring function only within a bound, which make this procedure only O(n) time complexity.
  3. It is very challenging to design a scoring function to capture all the criteria used by human experts, who align protein structures using not only geometric information, but also evolutionary and functional information. Here we design a score to measure the two corresponding residue i and j from two proteins, which composed of two parts: the first part is describing local similarity that is a MAX function with 0 and two substitution matrix, BLOSUM and CLESUM. BLOSUM is all we known that describe the similarity between two amino acid, while CLESUM is a substitution matrix for describing the similarity between two local structures. The second part of the function is for describing the global similarity, and here global means that we calculate the score after rigid-body superposition. The d(i,j) might derive from conventional spatial proximity of two aligned residues, such as RMSD or TM-score, while v(i,j) measures the angular similarity that designed for hydrogen bonding comparison. In the following slides, we’ll discuss CLESUM and v(i,j) separately.
  4. In our previous work, we've clustered 17 local protein structure motifs constructed by 4 continuous CA atom. These motifs are called CLE (conformational letter), and each CLE is actually a distribution over three angles. Among 17 CLEs, 4 of them is alpha-helix like, 4 of them is beta-sheet like, while 9 of them is coil-state. Given one protein, it's straight-forward to transform from 3D structure to 1D CLE string, by a sliding window of four continuous CA atoms. And then check which CLE is the most similar to the query one. It's noteworthy to say that, among these 9 coil-state motif, A appeared mostly at the begin of helix, O at the end of helix; while L appeared at the begin of sheet, G appeared at the end of sheet. M is mostly 310 helix like, So these 5 states could be regarded as conserved coil states. On the other hand, Q is a state that diverges a lot and appears in disorder or flexible loop regions. Given two proteins, if they are evolutionarily or functionally related, their corresponding residues should also be similar in their CLE motif, especially at those conserved coil regions, such as CLE A, O, L and G . So just like BLOSUM, the amino acid substitution matrix, a substitution matrix that could measure the relationship between two CLE are also required.
  5. CLESUM is such a matrix for the similarity measure for CLE. It is constructed using the pairwise alignment by representatives structures from FSSP database. The index of CLESUM is by the same means as BLOSUM. As we see from this matrix, typical helix and typical sheet don't have a higher score although their local geometrical distance should be more close. However those evolutionary and functional related regions, such as the two terminal of helix and sheet, (e.g., A, O) and those flexible regions, (e.g., Q ). This makes CLESUM a proper measure of similarity between two CLE states, that beyond spatial approximity, into evolutionary and functional relationship.
  6. Consider this alignment on two helix. If we only consider the local structure pattern, then each residue is marked as "H". So if there appears one-position translation, then the two helix could still be aligned well under CLESUM score. However, this might not be correct when we considers amino acid information. Using BLOSUM as a measure, the one-position translation alignment would be much worse than correct one in terms of BLOSUM score. This case concludes that, in order to define a good scoring function to depict the local fragment pair's evolutionary similarity, both CLESUM and BLOSUM should be considered.
  7. Both CLESUM and BLOSUM measure the evolutionary similarity between two residues from two proteins. The more positive, the more similar; while more negative, the less similar. First question, why we use max function? This is due to fact that, we only consider those evolutionary conserved or functionally related residue pairs, and neglect all those un-related ones. Second question, why we should use add function instead of multiply them? suppose we assume that CLESUM pair and BLOSUM pair is independent, then by their original log-odds form, the add function also will derive a log-odds form. This also explains that, suppose BLOSUM and CLESUM are in the same scaling, then their should be no weight on each matrix. To this end, we may conclude that, the score of max(0,CLESUM(i,j)+BLOSUM(i,j) ) can be used to sort the importance of SFPs between two proteins, that considers not only their pure geometrical similarity, but also their evolutionary and function relationship.
  8. The reason why we use angular similarity v(i,j) is based on the fact that, in aligning two superimposed beta-sheet, for example, if we only minimize the geometrical-based distance, the incorrect alignment would occur. However, human expert would choose the correct alignment shown in (B) that have larger RMSD value. This is because two beta sheet that ought to be aligned should in the same direction.
  9. The vect-score v(i,j) is designed to solve the problem. Consider three vectors shown here, say i-> i-1, i-> i+1 and i -> i_cb; then if i and j are in the same direction, then the vector-score defined here will have a large value; while in the intersected case that are incorrect, this score would be very negative.
  10. Till now, we have already a good scoring function for aligning two proteins. Then next requirement is, how to fast and efficient optimize the scoring function while not sacrificing the running speed? Our strategy is, we create two lists of Similar Fragment Pair (SFP) that , one with long length and high similarity cutoff, while the other with short length and low similarity cutoff. Both SFP lists are sorted by their score. We only choose TopK SFP_long, which is used to construct the coarse-grained initial alignment. Then among the consistency degree check, we select TopJ coarse-grained initial alignment to form fine-grained initial alignment by enlarging the corresponding set from consistency check for SFP_short. After final refinement, we return M solution sorted by their DeepAlign-score. Since we added SFP’s to our corresponding set according to the ranking of their local similarity score, while in the same time we keep their spatial consistency. So this strategy guarantees that we can get the alignment with high enough DeepAlign score. We’ve also tried that, using small numbers of TopK, TopJ won’t affect the result much.
  11. Suppose we set TopK=5 and TopJ=1 for example, then the coarse-grained initial alignment is just by superimpose two structures according the the given SFP. Then for each such superimposition, we check the degree of consistent of all the other SFPs. Here in the example show that, although Top1 SFP has local similarity score higher than Top2 SFP, it’s consistency degree is less than Top2. Then if only one fine-grained initial alignment is chosen, the Top1 SFP based alignment would be omitted.
  12. Given one fine-grained initial alignment, we gradually add SFP_short into our corresponding set according to their score rank. Then at previous update, since the superimposition is only determined by a small set of correspondence, then we should have a higher spatial consistency cutoff in order to add those high similar SFPs. While at the later update, since a bunch of SFPs have already been added, then we should lower down our distance cutoff. This procedure guarantees the total DeepAlign-score to increase continually during each iteration.
  13. The final refinement step on DeepAlign-score is conducted by running dynamic programming on the L1*L2 matrix with each indices be the DeepAlign-score (i,j) given the refined fine-grained superimposition. This procedure is actually very similar to the final step of CE, prosup and Tmalign. However, one improvement in our method is the bounded area upon the initial alignment, since during our previous step, such alignment is already accurate enough, so we only need consider about a small boundary of the alignment. This will reduce the time complexity from O(n^2) to about O(n) which make this algorithm fast enough compared to all the current method.
  14. Here is the result of DeepAlign on the three manually-curated data. The reason why we use human-curated data is that, it’s very hard to judge the better-or-worse of two protein structure alignment just by a single criteria such as the commonly used RMSD, or by some combination of RMSD and the length of alignment. It’s also unfair to compare them according to a certain algorithm-specific score such as Tmscore or DALI-score, since the corresponding algorithm will maximize this score, while others not. One good method to compare different methods is to take a human-curated alignment benchmark as gold-standard. Here we used three such data, from CDD by NCBI that contains 3591 alignments, to MALIDUP by Nick Grishin’s group that contains 241 alignments, and to MALISAM also by Nick Grishin’s gourp, contains 130 alignments. The difficulty of these three datasets, are from SCOP family level, to superfamily and finally to fold level. From the result we could see that, in both structural level, DeepAlign could the most accuracy with the human-curated alignment, compared to all the other popular methods.
  15. Here is another fair comparison method which first appeared in Matt’s paper in 2008. In particular, we could use SABmark to test the ability of discriminate positive data, that are within the same SCOP level, with those negative data that with no structural similarity at all. We could use ROC curve and AUC value to judge the performance of one method. From the result we know that, in both super-family and fold level, DeepAlign could reach the highest AUC value than others.
  16. Finally, I’ll finish this talk with one example. Here the three superimposition comes from three different method on the alignment of two proteins. In (A), we use Tmalign, in (B) we use DeepAlign but to optimizing TMscore and in (C), we run ordinary DeepAlign. From the result we see that, Tmalign totally fail on this case, that only returns a 0.288 Tmscore. However, although DeepAlign actually could generate a alignment with the highest Tmscore at 0.514, if we take a look at the detail of alignment, we find the Beta-sheet and Alpha-helix regions are not aligned well. Finally, if we run DeepAlign on optimizing DeepAlign-score, the incorrectness of these regions are fixed, even that the final Tmscore is not as high as the previous one.