SlideShare une entreprise Scribd logo
1  sur  21
Protein Threading Using Context-
Specific Alignment Potential
Sheng Wang
http://raptorx.uchicago.edu
Toyota Technological Institute at Chicago,
Joint work with Jianzhu Ma, Feng Zhao and Jinbo Xu
ISMB 2013
Jul 22, ICC Berlin, Germany
Outline
• Where we are @ template-based modeling
• What’s our work
• What’s the problem
• What’s our solution
• Welcome to our server
Template-based Modeling (or, Threading)
• Observation
– ~50,000 non-redundant structures in PDB
– ~ 1,200 unique structure folds (SCOP)
• Methodology
– Use known structures to predict a new one
Template sequence
Query sequence DDVYILDQAEEG
DE-FIVD-PDEH
DDVYILDQAEEG
SPCKR---ADEG
DDVYILDQAEEG
E--IFVDQADDS
DDVYILDQAEEG
NMCVFGQWERTY
database
Template-based Modeling Procedures
 Easy: similar sequences → similar structures
 Sequence-based method, e.g., BLAST, FASTA
 Works only for close homologous (>70% sequence identity)
 Medium: similar profiles → similar structures
 Protein profile is a matrix that represents a multiple sequence
alignment of the similar proteins
 Profile-based method, e.g., PSI-BLAST , HHMER, HHpred,
 Works for relative remote homologous (>40% sequence identity)
 Challenge: dissimilar profiles → similar structures
 Adding structural information, or context-specific into sequence/profile
based methods
 Threading method, e.g., MUSTER, RAPTOR, CS-BLAST
 Works for distant remote homologous (<40% sequence identity)
Our Work
• CNFpred: Transform a template-sequence
alignment problem into a Machine Learning
problem to calculate the alignment’s probability.
• DeepAlign: Prepare for high quality training
data of structural alignment.
• CNF model: Combined Machine Learning model
that incorporate Conditional Random Field (CRF)
and Neural Network (NN).
Protein Alignment Model
S A L R Q
L
P
L
S
E
M
M
M
M
L P L S - E
S A - L R Q
Template
Sequence
Match states (M)
M M Is M It M
Insertion at sequence (Is)
Insertion at template (It)
The structural alignment generated by DeepAlign is used for training data
DeepAlign for Structure Alignment
• evolutionary information
• local sub-structure similarity
• angular similarity for hydrogen bonding
BLOSUM is the local amino acid substitution matrix;
CLESUM is the local sub-structure substitution matrix;
v(i,j) measures the angular similarity for hydrogen bonding;
d(i,j) measures the spatial proximity of two aligned residues.
local similarity global similarity
Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
CNF-based Alignment Model
E: a neural network estimating the log-likelihood of state transition
Z(S,T): normalization factor
1 2{ , ,..., }LA a a a { , , }i t sa M I IGiven an alignment
Define a conditional probability
between Sequence S and Template T
Where,
),(/)),,,(exp(),|( 1 TSZTSaaETSAp
i
ii 
Context-Specific
Comprehensive Features
MTYKLILN--GKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
How similar two
residues : EAA
How similar query’s
sequence and profile and
template’s profile: Esp,
Epp
How similar template’s
secondary structure and
sequence’s predicted second
structure (3-class and 8-class):
Ess3, Ess8
Sequence S
How similar is the query’s solvent
accessibility and template’s
solvent accessibility: Esa
Total scoring function is a non-linear combination of:
E( ai, ai-1, EAA , Esp , Epp , Ediso, Ess3 , Ess8 , Esa )
Template T
MTYKLILNSTVRTKSDTVTDAVP---ADKICSFAQQLPWEREWSF--
For disordered regions, Ediso,
no structure information used.
What’s the problem?
• Only the alignment probability is described,
instead of the log-odds potential compared to
background.
• Only incorporate local information, insufficient
of global information.
Our solution
Propose a protein alignment potential
• With an elaborately designed reference state.
• Can be generalized into sequence-sequence,
sequence-structure as well as structure-structure
alignment.
Incorporate both local and global terms
• For local term, CNFpred potential is applied.
• For global term, EPAD potential is employed.
Protein alignment potential
Similarly, given one alignment A between sequence S and template T,
we define the potential of A as follows.
N
N
i
ref
yxAP
TSAP
AP
TSAP
TSAu
 


1
),|(
),|(
log
)(
),|(
log),|(
Given 2 AAs a and b, their mutation potential is defined as follows.
)()(
)(
log
)(
)(
log)(
bPaP
baP
baP
baP
bau
ref





x and y are two random proteins with
the as S and T, respectively.
Assumption: the alignment maximizing the potential is the optimal.
),(/)),|(),|(exp(),|( TSZTSAGTSAFTSAP 
The alignment probability given sequence S and template T could be modeled
as follows,
local term global term
partition function

A
TSAPtsZ ),|(),(
Protein alignment potential
),(),|(),|(
),|(),|(
),(/)),|(),|(exp(
),(/)),|(),|(exp(
log
),|(
),|(
log),|(
,
,
1
1
TScyxAGEXPTSAG
yxAFEXPTSAF
yxZyxAGyxAF
TSZTSAGTSAF
yxAP
TSAP
TSAu
yx
yx
N
N
i
N
N
i










Expected score, can be calculated in advance by sampling
Independent of any
specific alignment.
Protein alignment potential
Model the local potential
 
i
ii TSaaETSAF ),,,(),|( 1
From CNFpred, we use a context-specific linear chain model as,
The expectation term can be calculated by uniformly sampling a few
thousand protein pairs, so the local potential is
The local potential is defined as,
),|(),|(),|( , yxAFEXPTSAFTSAU yxlocal 
  
i
iiiilocal aaETSaaETSAU )),(),,,((),|( 11
Maximize on probability Maximize on potential
Long but less informative and
highly false positive.
Good for building models.
Template Template
Sequence
Sequence
Short but relevant and highly
significant.
Good for ranking templates.
What’s the difference between
Model the global potential


ji
ji
T
ij ssdPTSAG ),|(log),|(
From EPAD, we use a context-specific distance-dependent model as,
The expectation term can be calculated by uniformly sampling a few
thousand residue pairs from templates, so the global potential is
The global potential is defined as,
),|(),|(),|( , yxAGEXPTSAGTSAU yxglobal 


ji
T
ijji
T
ijglobal dPssdPTSAU ))(log),|((log),|(
What’s global information given an
alignment?
i j
i j


ji
ji
T
ij ssdPTSAG ),|(log),|(
Template T
Sequence S
T
ijd
T
ijd
i j
If the alignment is good, the distance of a sequence residue pair
shall match well with that of their aligned template residue pair.
si
sj
Result on 1000*6000
CNFpred (local+global potential) compared to,
HHpred CNFpred (local potential)
Welcome to our server
http://raptorx.uchicago.edu/
Binding
Contact
Thank you 
Jinbo Xu
Feng Zhao
Jianzhu Ma
National Institutes of Health (R01GM0897532)
National Science Foundation (DBI-0960390)
NSF CAREER award CCF-1149811
Alfred P. Sloan Research Fellowship

Contenu connexe

Tendances

Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
SHRADHEYA GUPTA
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
Abhilash Kannan
 

Tendances (20)

Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein structure 2
Protein structure 2Protein structure 2
Protein structure 2
 
Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysis
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
 
Molecular modelling (1)
Molecular modelling (1)Molecular modelling (1)
Molecular modelling (1)
 
Homology modeling and molecular docking
Homology modeling and molecular dockingHomology modeling and molecular docking
Homology modeling and molecular docking
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
 
Protein Remote Homology Detection
Protein Remote Homology DetectionProtein Remote Homology Detection
Protein Remote Homology Detection
 
Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
 
Protien Structure Prediction
Protien Structure PredictionProtien Structure Prediction
Protien Structure Prediction
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)
 
Sir hussain
Sir hussainSir hussain
Sir hussain
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)
 

Similaire à Protein threading using context specific alignment potential ismb-2013

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
alizain9604
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
Rai University
 
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
ISSEL
 
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού ΛογισμικούΕξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
ISSEL
 

Similaire à Protein threading using context specific alignment potential ismb-2013 (20)

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shah
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
PPT
PPTPPT
PPT
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAMOptimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAM
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Colombo14a
Colombo14aColombo14a
Colombo14a
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Foundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual SystemsFoundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual Systems
 
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
 
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού ΛογισμικούΕξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Protein threading using context specific alignment potential ismb-2013

  • 1. Protein Threading Using Context- Specific Alignment Potential Sheng Wang http://raptorx.uchicago.edu Toyota Technological Institute at Chicago, Joint work with Jianzhu Ma, Feng Zhao and Jinbo Xu ISMB 2013 Jul 22, ICC Berlin, Germany
  • 2. Outline • Where we are @ template-based modeling • What’s our work • What’s the problem • What’s our solution • Welcome to our server
  • 3. Template-based Modeling (or, Threading) • Observation – ~50,000 non-redundant structures in PDB – ~ 1,200 unique structure folds (SCOP) • Methodology – Use known structures to predict a new one Template sequence Query sequence DDVYILDQAEEG DE-FIVD-PDEH DDVYILDQAEEG SPCKR---ADEG DDVYILDQAEEG E--IFVDQADDS DDVYILDQAEEG NMCVFGQWERTY database
  • 4. Template-based Modeling Procedures  Easy: similar sequences → similar structures  Sequence-based method, e.g., BLAST, FASTA  Works only for close homologous (>70% sequence identity)  Medium: similar profiles → similar structures  Protein profile is a matrix that represents a multiple sequence alignment of the similar proteins  Profile-based method, e.g., PSI-BLAST , HHMER, HHpred,  Works for relative remote homologous (>40% sequence identity)  Challenge: dissimilar profiles → similar structures  Adding structural information, or context-specific into sequence/profile based methods  Threading method, e.g., MUSTER, RAPTOR, CS-BLAST  Works for distant remote homologous (<40% sequence identity)
  • 5. Our Work • CNFpred: Transform a template-sequence alignment problem into a Machine Learning problem to calculate the alignment’s probability. • DeepAlign: Prepare for high quality training data of structural alignment. • CNF model: Combined Machine Learning model that incorporate Conditional Random Field (CRF) and Neural Network (NN).
  • 6. Protein Alignment Model S A L R Q L P L S E M M M M L P L S - E S A - L R Q Template Sequence Match states (M) M M Is M It M Insertion at sequence (Is) Insertion at template (It) The structural alignment generated by DeepAlign is used for training data
  • 7. DeepAlign for Structure Alignment • evolutionary information • local sub-structure similarity • angular similarity for hydrogen bonding BLOSUM is the local amino acid substitution matrix; CLESUM is the local sub-structure substitution matrix; v(i,j) measures the angular similarity for hydrogen bonding; d(i,j) measures the spatial proximity of two aligned residues. local similarity global similarity Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
  • 8. CNF-based Alignment Model E: a neural network estimating the log-likelihood of state transition Z(S,T): normalization factor 1 2{ , ,..., }LA a a a { , , }i t sa M I IGiven an alignment Define a conditional probability between Sequence S and Template T Where, ),(/)),,,(exp(),|( 1 TSZTSaaETSAp i ii  Context-Specific
  • 9. Comprehensive Features MTYKLILN--GKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE How similar two residues : EAA How similar query’s sequence and profile and template’s profile: Esp, Epp How similar template’s secondary structure and sequence’s predicted second structure (3-class and 8-class): Ess3, Ess8 Sequence S How similar is the query’s solvent accessibility and template’s solvent accessibility: Esa Total scoring function is a non-linear combination of: E( ai, ai-1, EAA , Esp , Epp , Ediso, Ess3 , Ess8 , Esa ) Template T MTYKLILNSTVRTKSDTVTDAVP---ADKICSFAQQLPWEREWSF-- For disordered regions, Ediso, no structure information used.
  • 10. What’s the problem? • Only the alignment probability is described, instead of the log-odds potential compared to background. • Only incorporate local information, insufficient of global information.
  • 11. Our solution Propose a protein alignment potential • With an elaborately designed reference state. • Can be generalized into sequence-sequence, sequence-structure as well as structure-structure alignment. Incorporate both local and global terms • For local term, CNFpred potential is applied. • For global term, EPAD potential is employed.
  • 12. Protein alignment potential Similarly, given one alignment A between sequence S and template T, we define the potential of A as follows. N N i ref yxAP TSAP AP TSAP TSAu     1 ),|( ),|( log )( ),|( log),|( Given 2 AAs a and b, their mutation potential is defined as follows. )()( )( log )( )( log)( bPaP baP baP baP bau ref      x and y are two random proteins with the as S and T, respectively. Assumption: the alignment maximizing the potential is the optimal.
  • 13. ),(/)),|(),|(exp(),|( TSZTSAGTSAFTSAP  The alignment probability given sequence S and template T could be modeled as follows, local term global term partition function  A TSAPtsZ ),|(),( Protein alignment potential
  • 15. Model the local potential   i ii TSaaETSAF ),,,(),|( 1 From CNFpred, we use a context-specific linear chain model as, The expectation term can be calculated by uniformly sampling a few thousand protein pairs, so the local potential is The local potential is defined as, ),|(),|(),|( , yxAFEXPTSAFTSAU yxlocal     i iiiilocal aaETSaaETSAU )),(),,,((),|( 11
  • 16. Maximize on probability Maximize on potential Long but less informative and highly false positive. Good for building models. Template Template Sequence Sequence Short but relevant and highly significant. Good for ranking templates. What’s the difference between
  • 17. Model the global potential   ji ji T ij ssdPTSAG ),|(log),|( From EPAD, we use a context-specific distance-dependent model as, The expectation term can be calculated by uniformly sampling a few thousand residue pairs from templates, so the global potential is The global potential is defined as, ),|(),|(),|( , yxAGEXPTSAGTSAU yxglobal    ji T ijji T ijglobal dPssdPTSAU ))(log),|((log),|(
  • 18. What’s global information given an alignment? i j i j   ji ji T ij ssdPTSAG ),|(log),|( Template T Sequence S T ijd T ijd i j If the alignment is good, the distance of a sequence residue pair shall match well with that of their aligned template residue pair. si sj
  • 19. Result on 1000*6000 CNFpred (local+global potential) compared to, HHpred CNFpred (local potential)
  • 20. Welcome to our server http://raptorx.uchicago.edu/ Binding Contact
  • 21. Thank you  Jinbo Xu Feng Zhao Jianzhu Ma National Institutes of Health (R01GM0897532) National Science Foundation (DBI-0960390) NSF CAREER award CCF-1149811 Alfred P. Sloan Research Fellowship

Notes de l'éditeur

  1. Currently, template-based modeling is the main-stream approach in protein structure prediction. This is based on the observation that although we have around 50,000 non-redundant structures in PDB, the unique structure fold in SCOP is only about 12 hundred. And what most important thing is, in recent years after 2010, the new unique fold less appeared, which implies that number of naturally occurring protein fold is limited, and this becomes a fundamental assumption that, we could use known structures to predict an unknown query sequence.More formally, the definition of template-based modelingis, given a query protein one-dimension amino acid sequence, and a template database with known three-dimension structure, we align each template and query to find the best match and build the query model upon the template.
  2. Here we move into the first part, how to define the label for protein alignment data. In details, we transfer an alignment path into a series of continuous labels with M,Is and It, these three states. So there are nine adjacent state transitions in total.After defined the label, we could apply DeepAlign to generate the training data by structurally similar proteins.