SlideShare une entreprise Scribd logo
1  sur  38
DNABind: A hybrid algorithm for structure-based prediction of
DNA-binding residues by combining machine learning- and
template-based approaches. Proteins. 2013 Jun 5.

20131019
生物物理若手関西支部 Journal Club
Topics
Prediction of protein-DNA binding residues
Statistics of network
Machine learning
Result: DNABind, a hybrid method of machine learning and template-based
approaches showed excellent performance on predicting DNA-binding residues.
Template

DNABind

EcoRV(1RVE:A)

CprK (3E6C:C)

Machine learning

True positive residues.
DNABind improves classification.
Query protein, Template protein, TP,

, FN
Aim

Protein-DNA interactions is important for cell biology.
Its determination by experiments is time- and cost-consuming.

Computational approaches are desirable.
Computational approaches
Data bank (PDB)
Binding residues characters
Exposed solvents
Higher electrostatics potential
More conserved
Hotspots as clusters of conserved residues

Structural properties (DNA-binding residue vs surface)
Packing density
Surface curvature
B-factor
Residue fluctuation
Hydrogen bond donor
http://www.rcsb.org/pdb/home/home.do
Computational algorithms
Feature-based
Extract effective features

Template-based
Align template and retrieve the best match

Template!!
Computational algorithms
Feature-based
Extract effective features

Template-based
Align template and retrieve the best match

Template!!
Computational algorithms
Feature-based
Extract effective features

Template-based
Align template and retrieve the best match

Template!!
Features used in machine learning
Structure-based
PSSM (position specific scoring matrix)
Evolutionally conservation
Solvent accessibility
Local geometry (depth and protrusion index)
Topological features
degree, closeness, betweenness, clustering coefficient

Relative position (distance to centroid)
Statistical potential (Boltzmann distribution)

Sequence-based (more difficult than structure)
Amino acid identity
Residue physicochemical properties
polarity, secondary structure, molecular volume, codon diversity, electrostatic charge

Predicted structure (Not need 3D structure !!)
Features used in machine learning
Structure-based
PSSM
Relative solvent accessibility
Depth and protrusion index
Topological features
Distance to centroid
Statistical potentials

Sequence-based
PSSM
Predicted structures
Amino acid indices
Statistical potentials

Construct machine learning (SVM)
Template-based approach
Used in image recognition, etc…
Recognition of faces in the camera.
Template!!
Template-based approach
Used in image recognition, etc…
Recognition of faces in the camera.
Match!!

Template!!
Template-based prediction
Template-based
Structural alignment and statistical potential
The binding residue prediction will be conducted only if the
target protein was considered as a DNA-binding protein.

312 templates were selected.
Network

Degree is a commonly used measure to reflect the local
connectivity of a node.
Closeness is a global centrality metric used to determine
how critical a residue is in a residue interaction network.
Betweenness of residue i is defined to be the sum of the
fraction of shortest paths between all pairs of residues
that pass through residue i.
Motif, hub, and community
are also important…

Clustering coefficient (transitivity) quantifies how close
its neighbors are to being a clique. Probability that the
adjacent vertices of a vertex are connected.
Network sample; human protein interactome
Scale-free
Small-world
Cluster
Power law (Pareto distribution)

Bioinformatics. 2012 Jan 1;28(1):84-90.
Machine learning
Example; spam
4601 samples, 57 parameters.
Classification; spam or nonspam
Machine learning
Support vector machine (SVM)
Decision tree
RandomForest
Logistic regression
LASSO (Elastic net and Ridge)
Neural networks (Deep learning)
Evolutionary algorithm
Gaussian processing
k nearest neighbor
Clustering
Bayesian networks
Association rule learning
Inductive logic programming (ILP)
Support vector machine (SVM)
Make hyperplane to divide groups.
Kernel method; non-linear to linear
Easy to do.
Much computational time.
Tuning is very difficult.
Decision tree
Make many trees.
Easy to understand graphically.
Performance is not so good.
RandomForest
Make many decision trees.
Much precise.
A little time consumer.
Logistic regression
Many medical researchers use…
Easy to use but tuning is very difficult.
(to tell the truth…)
LASSO, Elastic net, and Ridge regression
Least Absolute Shrinkage and Selection Operator

LASSO
Elastic Net
Ridge
Neural networks
Artificial mammal brain (perceptron).
Hidden multi-layer.
Deep learning is hot topic!!
(hard to understand…)

http://opencv.jp/opencv-1.0.0/document/opencvref_ml_nn.html
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data

Test 1
One-leave out CV
Performance

SVM

Tree

RandomForest

LASSO

Elastic net

Ridge

Logistic

nnet

Recall

0.917

0.872

0.927

0.894

0.892

0.852

0.893

0.930

Precision

0.948

0.914

0.954

0.932

0.926

0.926

0.930

0.935

F

0.932

0.893

0.940

0.913

0.911

0.887

0.911

0.932

MMC

0.890

0.826

0.902

0.858

0.856

0.821

0.856

0.888
Combine two approaches
Statistical features of structure
A: Binding residues are highly solvent
accessible.
B, C: Binding residues have low depth and
high protrusion.
D-G: Not so much difference in networks.
H: Binding residues are less distant to the
centroid.
Performance
Performance

Higher TM score is required for good prediction.

TM-score is a measure of similarity between two protein structures with different tertiary
structures. < 0.2 is random relation and > 0.5 is highly related.
Proteins. 2004 Dec 1;57(4):702-10.
Nucleic Acids Res. 2005 Apr 22;33(7):2302-9.
Performance
Comparison among ML, TL, and DNABind.

Comparison between DNABind and other software.
Result: DNABind, a hybrid method of machine learning and template-based
approaches showed excellent performance on predicting DNA-binding residues.
Template

DNABind

EcoRV(1RVE:A)

CprK (3E6C:C)

Machine learning

True positive residues.
DNABind improves classification.
Query protein, Template protein, TP,

, FN

Contenu connexe

Tendances

ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATANexgen Technology
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLPZachary S. Brown
 
Open science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectomeOpen science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectomeCameron Craddock
 
A new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryA new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryIAEME Publication
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genapnooriasukmaningtyas
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assemblyc.titus.brown
 
Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224Gully Burns
 
Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationShuangshuang Zhou
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopMahmoud Parsian
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataChris Southan
 

Tendances (20)

Rna seq
Rna seqRna seq
Rna seq
 
Illumina sequencing introduction
Illumina sequencing introductionIllumina sequencing introduction
Illumina sequencing introduction
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
P24120125
P24120125P24120125
P24120125
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLP
 
Open science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectomeOpen science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectome
 
A new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryA new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binary
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assembly
 
Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224
 
Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text Classification
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/Hadoop
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Myers CV_2015
Myers CV_2015Myers CV_2015
Myers CV_2015
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
 

Similaire à 20131019 生物物理若手 Journal Club

Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksDR.P.S.JAGADEESH KUMAR
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataDmitry Grapov
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Arinze Akutekwe
 
Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...ijfcstjournal
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Alexander Decker
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesAlexander Decker
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Christopher Neighbor
 

Similaire à 20131019 生物物理若手 Journal Club (20)

2224d_final
2224d_final2224d_final
2224d_final
 
2015-03-31_MotifGP
2015-03-31_MotifGP2015-03-31_MotifGP
2015-03-31_MotifGP
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
 
Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural Networks
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
PPT
PPTPPT
PPT
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
 
Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
 
Data mining
Data mining Data mining
Data mining
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machines
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Ijetr042111
Ijetr042111Ijetr042111
Ijetr042111
 
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
 
1207.2600
1207.26001207.2600
1207.2600
 

Plus de Med_KU

20160730tokyor55
20160730tokyor5520160730tokyor55
20160730tokyor55Med_KU
 
20151205japanr
20151205japanr20151205japanr
20151205japanrMed_KU
 
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワークMed_KU
 
20131216 Stat Journal
20131216 Stat Journal20131216 Stat Journal
20131216 Stat JournalMed_KU
 
20131207 Japan.R#4 LT
20131207 Japan.R#4 LT20131207 Japan.R#4 LT
20131207 Japan.R#4 LTMed_KU
 
20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会Med_KU
 
20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGIS20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGISMed_KU
 
20131011 KashiwaR#9
20131011 KashiwaR#920131011 KashiwaR#9
20131011 KashiwaR#9Med_KU
 
20121120 検査と臨床判断
20121120 検査と臨床判断20121120 検査と臨床判断
20121120 検査と臨床判断Med_KU
 
20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法Med_KU
 
20130609 アイドルマスター解析
20130609 アイドルマスター解析20130609 アイドルマスター解析
20130609 アイドルマスター解析Med_KU
 
20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデル20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデルMed_KU
 
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析Med_KU
 
20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 Rでプロット20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 RでプロットMed_KU
 
20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaC20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaCMed_KU
 
20130220 Kashiwa.R#6
20130220 Kashiwa.R#620130220 Kashiwa.R#6
20130220 Kashiwa.R#6Med_KU
 
20121210 統計論文勉強会
20121210 統計論文勉強会20121210 統計論文勉強会
20121210 統計論文勉強会Med_KU
 
20121130 Kashiwa.R#5
20121130 Kashiwa.R#520121130 Kashiwa.R#5
20121130 Kashiwa.R#5Med_KU
 
20130727niconico
20130727niconico20130727niconico
20130727niconicoMed_KU
 
20130727niconicoLT
20130727niconicoLT20130727niconicoLT
20130727niconicoLTMed_KU
 

Plus de Med_KU (20)

20160730tokyor55
20160730tokyor5520160730tokyor55
20160730tokyor55
 
20151205japanr
20151205japanr20151205japanr
20151205japanr
 
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
 
20131216 Stat Journal
20131216 Stat Journal20131216 Stat Journal
20131216 Stat Journal
 
20131207 Japan.R#4 LT
20131207 Japan.R#4 LT20131207 Japan.R#4 LT
20131207 Japan.R#4 LT
 
20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会
 
20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGIS20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGIS
 
20131011 KashiwaR#9
20131011 KashiwaR#920131011 KashiwaR#9
20131011 KashiwaR#9
 
20121120 検査と臨床判断
20121120 検査と臨床判断20121120 検査と臨床判断
20121120 検査と臨床判断
 
20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法
 
20130609 アイドルマスター解析
20130609 アイドルマスター解析20130609 アイドルマスター解析
20130609 アイドルマスター解析
 
20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデル20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデル
 
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
 
20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 Rでプロット20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 Rでプロット
 
20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaC20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaC
 
20130220 Kashiwa.R#6
20130220 Kashiwa.R#620130220 Kashiwa.R#6
20130220 Kashiwa.R#6
 
20121210 統計論文勉強会
20121210 統計論文勉強会20121210 統計論文勉強会
20121210 統計論文勉強会
 
20121130 Kashiwa.R#5
20121130 Kashiwa.R#520121130 Kashiwa.R#5
20121130 Kashiwa.R#5
 
20130727niconico
20130727niconico20130727niconico
20130727niconico
 
20130727niconicoLT
20130727niconicoLT20130727niconicoLT
20130727niconicoLT
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

20131019 生物物理若手 Journal Club

  • 1. DNABind: A hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches. Proteins. 2013 Jun 5. 20131019 生物物理若手関西支部 Journal Club
  • 2. Topics Prediction of protein-DNA binding residues Statistics of network Machine learning
  • 3.
  • 4. Result: DNABind, a hybrid method of machine learning and template-based approaches showed excellent performance on predicting DNA-binding residues. Template DNABind EcoRV(1RVE:A) CprK (3E6C:C) Machine learning True positive residues. DNABind improves classification. Query protein, Template protein, TP, , FN
  • 5. Aim Protein-DNA interactions is important for cell biology. Its determination by experiments is time- and cost-consuming. Computational approaches are desirable.
  • 6. Computational approaches Data bank (PDB) Binding residues characters Exposed solvents Higher electrostatics potential More conserved Hotspots as clusters of conserved residues Structural properties (DNA-binding residue vs surface) Packing density Surface curvature B-factor Residue fluctuation Hydrogen bond donor http://www.rcsb.org/pdb/home/home.do
  • 7. Computational algorithms Feature-based Extract effective features Template-based Align template and retrieve the best match Template!!
  • 8. Computational algorithms Feature-based Extract effective features Template-based Align template and retrieve the best match Template!!
  • 9. Computational algorithms Feature-based Extract effective features Template-based Align template and retrieve the best match Template!!
  • 10. Features used in machine learning Structure-based PSSM (position specific scoring matrix) Evolutionally conservation Solvent accessibility Local geometry (depth and protrusion index) Topological features degree, closeness, betweenness, clustering coefficient Relative position (distance to centroid) Statistical potential (Boltzmann distribution) Sequence-based (more difficult than structure) Amino acid identity Residue physicochemical properties polarity, secondary structure, molecular volume, codon diversity, electrostatic charge Predicted structure (Not need 3D structure !!)
  • 11. Features used in machine learning Structure-based PSSM Relative solvent accessibility Depth and protrusion index Topological features Distance to centroid Statistical potentials Sequence-based PSSM Predicted structures Amino acid indices Statistical potentials Construct machine learning (SVM)
  • 12. Template-based approach Used in image recognition, etc… Recognition of faces in the camera. Template!!
  • 13. Template-based approach Used in image recognition, etc… Recognition of faces in the camera. Match!! Template!!
  • 14. Template-based prediction Template-based Structural alignment and statistical potential The binding residue prediction will be conducted only if the target protein was considered as a DNA-binding protein. 312 templates were selected.
  • 15. Network Degree is a commonly used measure to reflect the local connectivity of a node. Closeness is a global centrality metric used to determine how critical a residue is in a residue interaction network. Betweenness of residue i is defined to be the sum of the fraction of shortest paths between all pairs of residues that pass through residue i. Motif, hub, and community are also important… Clustering coefficient (transitivity) quantifies how close its neighbors are to being a clique. Probability that the adjacent vertices of a vertex are connected.
  • 16. Network sample; human protein interactome Scale-free Small-world Cluster Power law (Pareto distribution) Bioinformatics. 2012 Jan 1;28(1):84-90.
  • 17. Machine learning Example; spam 4601 samples, 57 parameters. Classification; spam or nonspam
  • 18. Machine learning Support vector machine (SVM) Decision tree RandomForest Logistic regression LASSO (Elastic net and Ridge) Neural networks (Deep learning) Evolutionary algorithm Gaussian processing k nearest neighbor Clustering Bayesian networks Association rule learning Inductive logic programming (ILP)
  • 19. Support vector machine (SVM) Make hyperplane to divide groups. Kernel method; non-linear to linear Easy to do. Much computational time. Tuning is very difficult.
  • 20. Decision tree Make many trees. Easy to understand graphically. Performance is not so good.
  • 21. RandomForest Make many decision trees. Much precise. A little time consumer.
  • 22. Logistic regression Many medical researchers use… Easy to use but tuning is very difficult. (to tell the truth…)
  • 23. LASSO, Elastic net, and Ridge regression Least Absolute Shrinkage and Selection Operator LASSO Elastic Net Ridge
  • 24. Neural networks Artificial mammal brain (perceptron). Hidden multi-layer. Deep learning is hot topic!! (hard to understand…) http://opencv.jp/opencv-1.0.0/document/opencvref_ml_nn.html
  • 25. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set.
  • 26. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 27. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 28. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 29. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 30. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 31. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data Test 1 One-leave out CV
  • 34. Statistical features of structure A: Binding residues are highly solvent accessible. B, C: Binding residues have low depth and high protrusion. D-G: Not so much difference in networks. H: Binding residues are less distant to the centroid.
  • 36. Performance Higher TM score is required for good prediction. TM-score is a measure of similarity between two protein structures with different tertiary structures. < 0.2 is random relation and > 0.5 is highly related. Proteins. 2004 Dec 1;57(4):702-10. Nucleic Acids Res. 2005 Apr 22;33(7):2302-9.
  • 37. Performance Comparison among ML, TL, and DNABind. Comparison between DNABind and other software.
  • 38. Result: DNABind, a hybrid method of machine learning and template-based approaches showed excellent performance on predicting DNA-binding residues. Template DNABind EcoRV(1RVE:A) CprK (3E6C:C) Machine learning True positive residues. DNABind improves classification. Query protein, Template protein, TP, , FN