SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
International Journal of Research in Computer Science
eISSN 2249-8265 Volume 2 Issue 4 (2012) pp. 1-5
© White Globe Publications
www.ijorcs.org


           A NOVEL STATISTICAL METHOD FOR
        THERMOSTABLE PROTEINS DISCRIMINATION
                                       Elham Nikookar1, Kambiz Badie2, Mehdi Sadeghi3
        1
            Department of Algorithm and Calculations, Faculty of Engineering, University of Tehran, Tehran, Iran
                                              Email: e.nikookar@ut.ac.ir
                                             2
                                                 Research Institute for ICT, Tehran, Iran
                                                      Email: k_badie@itrc.ac.ir
                         3
                             National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
                                                     Email: sadeghi@nigeb.ac.ir

Abstract: In this study, we used features that can be                better the folding mechanism and the function of
extracted from protein sequences to discriminate                     protein [5]. To understand the principles that rule
mesophilic, thermophilic and hyper-thermophilic                      protein thermostability has become of great interest in
proteins. Amino acid frequency, dipeptide amino acid                 basic research as well as in industrial applications.
frequency and physical-chemical features are used in                 Several investigations [6, 7] where carried out and
this study. The effect of mentioned features on                      researchers found that the protein amino acid
proposed discrimination algorithm was evaluated both                 composition was correlated to its thermostability.
separately and in combination. Statistical methods are
                                                                         Initial research conducted in the field of protein
used in the proposed algorithm. The results of
                                                                     thermal stability dates back to three decades ago [2].
implementing the algorithm on a dataset containing
                                                                     The reason is that thermal instability of proteins at
239 mesophilic proteins, 69 thermophilic proteins and
                                                                     high temperature is a barrier against the functional
59 hyper-thermophilic proteins show the effect of each
                                                                     development of proteins. Haney [8] summarized the
bunch of features on the evaluation measures.
                                                                     net change in amino acid composition between
Keywords:       protein    mesophile      thermophile,               mesophilic      and    thermophilic   proteins.     The
discrimination, thermostability, amino acid frequency,               thermophilic proteins were characteristically reduced
feature extraction.                                                  in Ser, Asn, Gln, Thr and Met, and were increased in
                                                                     Ile, Arg, Glu, Lys and Pro. The comparison of residue
                    I. INTRODUCTION                                  contents in hyper-thermophilic and mesophilic proteins
   Proteins are one of the four major classes of                     on the basics of the genome sequences of eight
biological macromolecules. They are linear chains of                 mesophilic and seven hyper-thermophilic organisms
typically ~50-1200 amino acids. Some of the proteins                 showed that more charged residues existed in hyper-
functions include catalyzing metabolic reactions,                    thermophilic proteins than in mesophilic proteins [9].
chemical signal transduction, and forming the physical                  However, it was difficult to find the influence of
skeleton of some cellular components. Most types of                  dipeptide composition on protein thermostability [10]
protein molecule fold into a well-defined shape under                from the diverse collection of studies. Proteins that
physiological conditions and this shape is uniquely                  have similar amino acid composition vary in dipeptide
determined by the amino acid sequence of the protein.                composition; while, amino acids function with the help
The shape of the molecule facilitates its biochemical                of other residues nearby in sequence or space.
function [1]. There are twenty amino acid types that                 Accordingly, the function of specific amino acid was
occur in living things [2, 3]. Amino acids are                       influenced by its neighboring amino acid in sequence
sometimes called residues when covalently bonded                     or space, while the dipeptide reflects the influence in
together to form a protein molecule.                                 sequence. Hence, the dipeptide composition may be
   Organisms that thrive in very high temperatures                   correlated to protein thermostability. For the past two
have been actively studied since the discovery of                    decades, the methods based on dipeptide composition
Thermophiles aquatics in the hot springs of                          have been used for predicating protein structure class
Yellowstone in the 1960’s [4]. Since then,                           [11, 12]. Structurally, proteins are in different 3-D
thermostable proteins, because of their overall inherent             dimensional shapes which their shape determines
stability, have become a part of a number of                         biological and chemical duty of protein [1]. Shape of a
commercial applications. If we know the protein                      protein, has a one-to-one relationship with amino acid
thermostability very well, it would be helpful to know               sequence of it.


                                                                                      www.ijorcs.org
2                                                                          Elham Nikookar, Kambiz Badie, Mehdi Sadeghi


   If we find effective factors in thermal stability of       have an affinity for elevated temperatures and grow in
proteins, using reasonable design, more stable proteins       temperatures between 45-80 degrees Celsius and
can be achieved. Previous investigations [13] show            hyper-thermophilic proteins that their optimum growth
that proteins thermal stability depends on its amino          temperature is over 80 degrees Celsius. Mesophilic
acid composition. In previous studies, classification is      proteins function in 15-45 degrees Celsius. Hyper-
done only based on thermophile and mesophile classes          mesophilic proteins have optimum growth temperature
[5]. This division leads to limitation of methods on          below 15 degrees Celsius [13].
only some applications and specific datasets. In this
paper, in addition to amino acids frequencies, we used           In this paper, we used the dataset of [14], which
other    features    that   can    improve      proteins      contains 58 hyper-thermophilic proteins and 118
discrimination. Also, proteins are divided into               mesophilic proteins homolog with them and also 69
mesophile, thermophile and hyper-thermophile classes.         thermophilic proteins and 121 mesophilic proteins
This makes possible to generalize proposed method to          homolog with them. All the proteins have been
more general data sets and applications.                      downloaded from PDB (Protein Data Bank) [15]
                                                              repository. Structure of each sample of dataset
                         II. DATASET                          contains the amino acid sequence with the length of
                                                              50-1200 from which we extracted elements of feature
   Scientists divide organisms based on their thermal         vector [13].
stability into two classes: thermophilic proteins that
                                        Table 1: Chemical and physical properties

      ID     Property         AA       ID     Property        AA      ID        Property             AA

       1     aromatic        HFWY      17     carboxyl        DE      33   Polar/hydrophobilic   RNDEQHKSTWY

               UV
       2                     FWY       18     carbonyl       NDEQ     34      hydrophobic         ACILMFPWYV
            absorbance

              Single
       3                      FY       19     imidazol        H       35    Very hydrophobic       ACILMFV
           aromatic ring

       4   heteroaromatic     HW       20     guanidio         R      36    Weak hydrophobic         PWY

       5     aliphatic      GAILVP     21      amino          RK      37       H-bonding         RNDCEQHKSTWY

                                            Systematical
       6     branched        ILTV      22                     G       38       H-acceptor          NDEQHSTY
                                              alpha-C
             Branched
       7                      ITV      23       alkyl        AILV     39        H-donor          RNCQHKSTWY
            beta-carbon
       8       felible         G       24     achiral         G       40            tiny             GA
                                              2 chiral
       9     inflexible        P       25                     IT      41       very small            GASC
                                              centers
      10    alpha imino        P       26     ionizble      RDCEHKY   42      medium small          VTNDP

      11     hydroxyl         STY      27     charged        RDEHK    43            small         GASCVTNDP
              hydroxyl
      12                      ST       28      acidic         DE      44            large           KRFYW
           straight chain
      13      phenol           Y       29       basic        RKH      45            long             KREQ
      14       sulfur         CM       30   strong basic      RK      46        very long             KR
                                                weak
      15     sulfhydryl        C       31                    STWY     47      medium-long             EQ
                                            hydrophobilic
                                                very
      16       amide          NQ       32                   RNDEQHK   48            short           GASCT
                                            hydrophobilic


   We have addressed the discrimination problem,              mesophilic analogs and another large non-redundant
where given the sequences of a mesophilic protein and         set of hyper-thermophilic PDB protein chains along
a thermophilic or hyper-thermophilic counterpart, the         with their mesophilic analogs. They will be referred to
objective is to determine which is which. This was            as the pair sets: pairs thermophile and pairs hyper-
done by assembling a large set of thermophilic protein        thermophile. We have computed several sequence
chains from the PDB and their corresponding                   based numerical indices, based on the quantities that



                                                                               www.ijorcs.org
A Novel Statistical Method for Thermostable Proteins Discrimination                                                    3


other authors have reported that are associated with          features is computed based on equation (3) for
thermostability. We tested their ability to successfully      mesophile class as well as equation (4) for thermophile
discriminate between thermophile/mesophile pairs [2].         class and equation (5) for hyper-thermophile class
                                                              [19].

                                                                 𝜎 𝑀 = ��𝑐𝑜𝑚𝑝 𝑓 − ������������
                                                                                   𝑐𝑜𝑚𝑝 𝑓,𝑀                      (3)
           III. FEATURE VECTOR CREATION

                                                                        𝑓
   The feature vector of this study is extracted from


                                                                 𝜎 𝑇 = ��𝑐𝑜𝑚𝑝 𝑓 − �����������
                                                                                   𝑐𝑜𝑚𝑝 𝑓,𝑇                      (4)
protein sequence and consists of 468 entries. Twenty
features represent frequency of 20 amino acids and are
                                                                        𝑓
         𝑁𝑖
calculated according to equation (1):

 𝐴𝐴 𝑖 =     , 𝑖 = 1,2, … ,20                        (1)          𝜎 𝐻 = � �𝑐𝑜𝑚𝑝 𝑓 − ������������                  (5)
         𝑛                                                                  𝑓
                                                                                    𝑐𝑜𝑚𝑝 𝑓,𝐻
where Ni is the frequency of ith amino acid and n is the
length of protein sequence.                                     According to previous equations, protein is


                                                                 𝜎 𝑀 < 𝜎 𝐻 and 𝜎 𝑀 < 𝜎 𝑇                         (6)
                                                              mesophilic if
   Since having twenty different amino acids,
there is a maximum of 400 different dipeptide


                                                                 𝜎 𝑇 < 𝜎 𝐻 and 𝜎 𝑇 < 𝜎 𝑀                         (7)
compositions findable in protein sequences [16],              and is thermophilic if
and therefore 400 features are assigned to
dipeptides. Each feature of this type is calculated

          𝑀𝑖
according to equation (2):

 𝐷𝑒𝑝 𝑖 =     , 𝑖 = 1,2, … ,400                         (2)
                                                              otherwise, we have a hyper-thermophilic protein.


         𝑛−1
                                                                            V. EXPERIMENTAL RESULTS
where Mi is the frequency             of   ith   dipeptide       The final performance of proposed method is
composition.                                                  measured by three evaluation measures: sensitivity,
                                                              specificity and accuracy. These indices are calculated

                                                                                     𝑇𝑃
    Forty-eight features represent chemical-physical          using equations 8 to 10 [20].

                                                                 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =                                   (8)
                                                                                  𝑇𝑃 + 𝐹𝑁
attributes of amino acids that for a particular protein,
each attribute is obtained by calculating frequency of

                                                                                     𝑇𝑁
                                                                 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =                                   (9)
amino acids relevant to that attribute in the protein.

                                                                                  𝑇𝑁 + 𝐹𝑃
Table (1) shows chemical-physical features used in

                                                                                      𝑇𝑃 + 𝑇𝑁
this study [17].

                                                                 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =                                    (10)
                                                                                 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
                     IV. METHOD
   In our proposed method, for the creation of training
and testing datasets, k-fold cross validation method has         As we investigated three classes in this study,
been used [18]. In each iteration of k-fold cross             sensitivity and specificity are calculated separately for
validation, training samples are divided into                 each class. For example, for thermophile class, TP is
thermophile, hyper-thermophile and mesophile classes          the number of thermophile samples that our proposed
based on their class label.                                   system identifies them correctly as thermophilic
                                                              proteins. FP is the number of mesophile or hyper-

                �����������)
    For each feature, f, average of values of that feature
                                                              thermophile samples that system identifies them
we calculate (comp 𝑓,𝐻 for hyper-thermophile class,
in all samples of each class is calculated separately; i.e.

 �����������)                              �����������)
                                                              wrongly as thermophilic proteins. TN is the number of
(comp 𝑓,𝑇 for thermophile class and (comp 𝑓,𝑀 for             mesophile or hyper-thermophile samples that system
                                                              identifies them correctly as mesophilic or hyper-
mesophile class. This procedure is repeated for all
                                                              thermophilic proteins. FN is the number of
features. Therefore, we have a matrix with dimension
                                                              thermophile samples that system identifies them
of 3×468 that its i,jth entry is the average of jth feature
                                                              wrongly as mesophilic or hyper-thermophilic proteins.
for all the samples of ith class.
                                                              TP, FP, TN and FN are defined similarly for mesophile
    To test the proposed model, at the end of each            and hyper-thermophile classes. The other measure,
iteration and for each test sample, feature vector of the     accuracy, shows the number of sample which system
sample is created (compf). Then, subtraction of values        identifies their class correctly.
of each feature of test sample and corresponding
                                                                 We applied our proposed method on dataset with
average of that feature for three classes is calculated.
                                                              four different feature vectors. In the first mode, the
After that, sum of absolute differences for all the
                                                              feature vector only contained amino acid frequencies.




                                                                                 www.ijorcs.org
4                                                                         Elham Nikookar, Kambiz Badie, Mehdi Sadeghi


In second mode, dipeptide composition frequencies; in        system in 2-class mode evaluated that the results are
third mode, physical-chemical features of table (2) and      shown in tables (3) to (5).
in forth mode, all the previous features were used to
build feature vector of samples. We applied this                For a more comprehensive review, we have defined
division to study the effect of different bunches of         a new performance measure called “SESPAVGij”
features on discrimination power and evaluation              which is arithmetic mean of all sensitivities and
measures. As mentioned earlier, K-fold cross                 specificities of i and j classes for each feature set. i.e.,
validation was used to create train and test datasets and    when we are studying the results of applying two-class
the results shown in tables are average of different         mode of algorithm on thermophilic and mesophilic
iterations of the method.                                    class, SESPAVGMT is the arithmetic mean of SEM,
                                                             SET, SPM and SPT. These values have been added to
  Because some applications work only with                   tables (2) to (5) as last column.
mesophile and thermophile classes, performance of
                               Table 2: Results of the proposed method on Three-class mode
                             Accuracy SEM     SET    SEH   SPM     SPT    SPH  SESPAVGHMT
    20 Amino Acids           63.57% 61.69% 54.18% 82.26% 52.87% 88.82% 95.98%    72.63%
  48 Chemical-Physical        57.62% 59.27% 38.44% 73.88% 50.94% 85.02% 93.94%   66.92%
400 Dipeptide Composition     59.74% 55.92% 50.58% 84.77% 50.09% 88.19% 96.39%   70.99%
    468 All Features          59.05% 59.00% 41.21% 80.37% 51.97% 85.94% 95.32%   68.97%

                     Table 3: Results of Two-class mode on hyper-thermophilic and mesophilic classes
                                      Accuracy        SEM          SEH          SPM         SPH          SESPAVGHM
           20 Amino Acids              80.83%       78.42%       90.36%       51.19%      97.27%           79.31%
        48 Chemical-Physical          84.00%        83.42%       85.96%       56.16%      96.03%           80.39%
      400 Dipeptide Composition        58.95%       50.02%       94.89%       31.83%      97.33%           68.52%
           468 All Features            73.68%       68.96%       93.28%       42.90%      97.63%           75.69%

                         Table 4: Results of Two-class mode on thermophilic and mesophilic classes
                                      Accuracy        SEM         SET          SPM          SPT          SESPAVGMT
          20 Amino Acids              73.86%        76.37%      64.78%       44.55%       88.21%           68.48%
        48 Chemical-Physical           67.21%       68.75%      62.16%       35.83%       86.54%           63.32%
      400 Dipeptide Composition        65.72%       64.09%      71.45%       36.81%       88.47%           65.21%
          468 All Features             72.28%       75.01%      63.25%       42.56%       87.58%           67.10%

                     Table 5: Results of Two-class mode on hyper-thermophilic and thermophilic classes
                                      Accuracy        SET         SEH          SPT          SPH          SESPAVGHT
           20 Amino Acids             83.44%        82.66%      84.03%       80.36%       86.30%           83.34%
         48 Chemical-Physical          66.94%       65.58%      67.85%       62.36%       70.70%           66.62%
       400 Dipeptide Composition       77.86%       67.03%      92.27%       70.79%       89.79%           79.97%
           468 All Features            75.90%       71.02%      81.26%       70.95%       81.78%           76.25%

                 VI. CONCLUSION                              RI, AK, LA, YI, YE, RK and SK are more frequent in
                                                             hyper-thermophilic proteins. In addition, Amino acids
   Studying the frequency of amino acids in hyper-
                                                             with charge and acidic attributes are more frequent in
thermophilic, thermophilic and mesophilic proteins
                                                             hyper-thermophilic proteins.
shows that Gly, Ser and Thr are more frequent in
mesophilic proteins. Arg, Leu and Pro amino acids are           Studying SESPAVG measures and according to the
more frequent in thermophilic proteins and Glu, Iso          results of table (2) which is related to three-class
and Lys amino acids are more frequent in hyper-              mode, values of Accuracy and SESPAVGHMT is
thermophilic proteins.                                       maximized if we use 20 amino acid frequency features
                                                             to discriminate samples. In two-class mode, in table
   The other result of this study is that among 400
                                                             (3), accuracy and SESPAVGHM are maximized if we
dipeptide amino acid compositions, AA, LL, LA, AL,
                                                             use physical-chemical features and in tables (4) and
QA, QL, AQ, LT, TL and EQ are more frequent in
                                                             (5), accuracy, SESPAVGMT and SESPAVGHT are
mesophilic proteins; IE, EE, EK, KE, VE, EI, KI, KK
                                                             maximized if we use 20 amino acid frequency features.
and VK are more frequent in thermophilic proteins and



                                                                               www.ijorcs.org
A Novel Statistical Method for Thermostable Proteins Discrimination                                                          5

So this can be concluded that 20 amino acid frequency           [12] Gromiha MM, Shandar A, Makiko S, “Application of
features are better that other features for discriminating           residue distribution along the sequence for
thermophile class samples from mesophile class                       discriminating outer membrane proteins”, Comput Biol
samples and hyper-thermophile class samples from                     Chem, vol. 29, pp. 135-142, 2005.
thermophile class samples. Also, physical-chemical              [13] Seung P, Young J, “Protein Thermostability: Structure-
features are better than other features for                          Based Difference of Amino acid between Thermophilic
discriminating hyper-thermophile class samples from                  and Mesophilic Proteins”, Elsevier Journal of Biotech,
                                                                     vol. 111, pp. 269-277, 2004.
mesophile class samples.
                                                                [14] Michael     G, Xavier S, “Discrimination and
   Because the number of entries of 20 amino acid                    Classification of Mesophilic and Thermophilic Protein
frequency and physical-chemical feature vectors which                using Machine Learning Algorithms”, Willy
are suggested based on results are lower than 400                    InterSience, pp. 1274-1279, 2007.
dipeptide compositions and 468 all feature vectors,             [15] http://www.pdb.org. Available at June 24, 2012.
required time for identifying the class of a new protein        [16] Zahng G, Fang B, “Study on the Discrimination of
using former vectors is also less than latter vectors.               Thermophilic and Mesophilic Proteins Based on
                                                                     Dipeptide Composition”, Chinese journal of
  Dataset used in this research include three-                       biotechnology, vol. 22, pp. 293-298, 2006.
dimensional structure of proteins in addition to                [17] Eshkin A, Ghafuri H, “Predication of relative solvent
convention amino acid sequence. This advantage                       accessibility by support vector regression and best-first
makes possible the use of features extracted from the                method”, Excli Journal, vol. 9 , pp. 29-38, 2010.
3D structure in future researches.                              [18] Platzer A, Percpo P, “Characterization of protein-
                                                                     interaction networks in tumors”, BMC Bioinformatics,
                   VII. REFERENCES                                   vol. 8, 2007.
                                                                [19] Szilagyi A, Zavodszky P, “Structural differences
[1] C-I     Branden, J.Tooze, “Introduction to protein               between mesophilic, moderately thermophilic and
     structure”, 2nd edition, New York: Garland Pub., 1999.          extremely thermophilic protein subunits: results of
[2] Todd T, Iosif V, “Discrimination of thermophilic and             comprehensive survey”, Elsevier, vol. 8, pp. 493-504,
     mesophilic protein”, Taylor and Vaisman BMC                     2000.
     Structural Biology, vol. 10, 2010.                         [20] Chen Yu, Han, Kyungsook, “BSFINDER: Finding
[3] Xingyu, W., Shouliang, C., Mingde, G., “General                  Binding Sites of HCV Proteins Using a Support Vector
     Biology”, Version 2, Higher Education Press, Beijing,           Machine”, Protein and Peptide Letters, vol. 16, pp. 373-
     2005.                                                           382(10), 2009.
[4] Brock T, Freeze H, “Thermus aquaticus gen. n.and sp.
     n.., a Nonsporulating Extreme Thermophile”, J Bactriol,
     vol. 98, pp. 289-297, 1969.
[5] Jingru Xu, Yuehui Chen, “Discrimination of Protein
     Thermostability Based on a New Integrated Neural
     Network”, vol. 1, pp. 107-112, Springer, 2011.
[6] Kumar S, Nussinov R, “How do thermophilic proteins
     deal with heat?”, Cell Mol Life Sci, vol. 58, pp. 1216-
     1233, 2001.
[7] Thompson MJ, Eisenberg D, “Transprotemic evidence
     of a loop-deletion mechanism for enhancing protein
     thermostability”, J Mol Biol, vol. 290, pp. 595-604,
     1999.
[8] Haney PJ, Jonathan HB, Berald LB, “Thermal adaption
     analyzed by comparison of protein sequences from
     mesophilic and extremely thermophilic Methanococcus
     species”, PNAS, vol. 96, pp. 3578-3583, 1999.
[9] Vielle C, Zeikus GJ, “Hyperthermophilic enzymes:
     sources, uses, and molecular mechanisms for
     thermostability”, Microbiol Mol Biol Rev, vol. 65, pp.
     1-43, 2001.
[10] Ding YR, Cai YJ, Zhang GX, “The influence of
     dipeptide composition on protein thermostability”,
     FEBS Lett, vol. 569, pp. 284-288, 2004.
[11] Kumarevel TS, Gromiha MM, Ponnuswamy MN,
     “Structural class predication: an application of residue
     distribution along the sequence”, Biophys Chemist, vol.
     88, pp. 81-101, 2000.


                                                                                  www.ijorcs.org

Contenu connexe

Tendances

Bk 1997-0675%2 ech003
Bk 1997-0675%2 ech003Bk 1997-0675%2 ech003
Bk 1997-0675%2 ech003mganguly123
 
Protein structure
Protein structureProtein structure
Protein structuremartyynyyte
 
Chemical synthesis of polypetide presentation (1)
Chemical synthesis of polypetide presentation (1)Chemical synthesis of polypetide presentation (1)
Chemical synthesis of polypetide presentation (1)Tayyaba Fayaz
 
Stability Of Peptides And Proteins
Stability Of Peptides And ProteinsStability Of Peptides And Proteins
Stability Of Peptides And ProteinsThilak Chandra
 
Bk 1997-0675%2 ech004
Bk 1997-0675%2 ech004Bk 1997-0675%2 ech004
Bk 1997-0675%2 ech004mganguly123
 
تطبيقات الانزيمات فى التصنيع الغذائى
تطبيقات الانزيمات فى التصنيع الغذائىتطبيقات الانزيمات فى التصنيع الغذائى
تطبيقات الانزيمات فى التصنيع الغذائىMohamed Hassanien
 
Solid phase peptide synthesis
Solid phase peptide synthesisSolid phase peptide synthesis
Solid phase peptide synthesisABHISHEK SIRSIKAR
 
Vitamins & Coenzymes
Vitamins & CoenzymesVitamins & Coenzymes
Vitamins & CoenzymesÜlger Ahmet
 
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteinslehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteinskrupal parmar
 
Evaluation of protein &amp; peptide dds
Evaluation of protein &amp; peptide ddsEvaluation of protein &amp; peptide dds
Evaluation of protein &amp; peptide ddsMāľāý Păųļ
 
Proteins and peptides
Proteins and peptides Proteins and peptides
Proteins and peptides alizamasood1
 
Macromolecule scramble intro
Macromolecule scramble introMacromolecule scramble intro
Macromolecule scramble introMaria Donohue
 
الخصائص الطبيعية والكيميائية للبروتينات
الخصائص الطبيعية والكيميائية للبروتيناتالخصائص الطبيعية والكيميائية للبروتينات
الخصائص الطبيعية والكيميائية للبروتيناتMohamed Hassanien
 
تفاعلات البروتينات اثناء تصنيع الاغذية
تفاعلات البروتينات اثناء تصنيع الاغذيةتفاعلات البروتينات اثناء تصنيع الاغذية
تفاعلات البروتينات اثناء تصنيع الاغذيةMohamed Hassanien
 
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]Shikha Popali
 
12 th biotechnology and Chemistry ,By Kalpana Wagh
12 th biotechnology and Chemistry ,By Kalpana Wagh 12 th biotechnology and Chemistry ,By Kalpana Wagh
12 th biotechnology and Chemistry ,By Kalpana Wagh kalpanawagh
 

Tendances (20)

Bk 1997-0675%2 ech003
Bk 1997-0675%2 ech003Bk 1997-0675%2 ech003
Bk 1997-0675%2 ech003
 
Protein structure
Protein structureProtein structure
Protein structure
 
Chemical synthesis of polypetide presentation (1)
Chemical synthesis of polypetide presentation (1)Chemical synthesis of polypetide presentation (1)
Chemical synthesis of polypetide presentation (1)
 
Stability Of Peptides And Proteins
Stability Of Peptides And ProteinsStability Of Peptides And Proteins
Stability Of Peptides And Proteins
 
الببتيدات
الببتيداتالببتيدات
الببتيدات
 
Bk 1997-0675%2 ech004
Bk 1997-0675%2 ech004Bk 1997-0675%2 ech004
Bk 1997-0675%2 ech004
 
Peptidomimetics
PeptidomimeticsPeptidomimetics
Peptidomimetics
 
تطبيقات الانزيمات فى التصنيع الغذائى
تطبيقات الانزيمات فى التصنيع الغذائىتطبيقات الانزيمات فى التصنيع الغذائى
تطبيقات الانزيمات فى التصنيع الغذائى
 
Solid phase peptide synthesis
Solid phase peptide synthesisSolid phase peptide synthesis
Solid phase peptide synthesis
 
Vitamins & Coenzymes
Vitamins & CoenzymesVitamins & Coenzymes
Vitamins & Coenzymes
 
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteinslehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
 
Evaluation of protein &amp; peptide dds
Evaluation of protein &amp; peptide ddsEvaluation of protein &amp; peptide dds
Evaluation of protein &amp; peptide dds
 
Proteins and peptides
Proteins and peptides Proteins and peptides
Proteins and peptides
 
Macromolecule scramble intro
Macromolecule scramble introMacromolecule scramble intro
Macromolecule scramble intro
 
الخصائص الطبيعية والكيميائية للبروتينات
الخصائص الطبيعية والكيميائية للبروتيناتالخصائص الطبيعية والكيميائية للبروتينات
الخصائص الطبيعية والكيميائية للبروتينات
 
تفاعلات البروتينات اثناء تصنيع الاغذية
تفاعلات البروتينات اثناء تصنيع الاغذيةتفاعلات البروتينات اثناء تصنيع الاغذية
تفاعلات البروتينات اثناء تصنيع الاغذية
 
Proteins
ProteinsProteins
Proteins
 
Proteins and peptides dds
Proteins and peptides ddsProteins and peptides dds
Proteins and peptides dds
 
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
 
12 th biotechnology and Chemistry ,By Kalpana Wagh
12 th biotechnology and Chemistry ,By Kalpana Wagh 12 th biotechnology and Chemistry ,By Kalpana Wagh
12 th biotechnology and Chemistry ,By Kalpana Wagh
 

En vedette

JPetty Ltr Recommendation
JPetty Ltr RecommendationJPetty Ltr Recommendation
JPetty Ltr RecommendationJudy Petty
 
sylvia y johanna
sylvia y johannasylvia y johanna
sylvia y johannaguest91f7ab
 
Franz Kafka Ameerika pildid
Franz Kafka Ameerika pildidFranz Kafka Ameerika pildid
Franz Kafka Ameerika pildidLaura Kassin
 
Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...
Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...
Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...lizcare
 
Die Evolution des Messaging
Die Evolution des MessagingDie Evolution des Messaging
Die Evolution des MessagingTWT
 
UX and Software Engineering in Harmony
UX and Software Engineering in HarmonyUX and Software Engineering in Harmony
UX and Software Engineering in Harmonybriankerr
 
Trabajos ciencias estado
Trabajos ciencias estadoTrabajos ciencias estado
Trabajos ciencias estadolissetheguevara
 
OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...
OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...
OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...Van Der Häägen Brazil
 
X2 t06 06 banked curves (2012)
X2 t06 06 banked curves (2012)X2 t06 06 banked curves (2012)
X2 t06 06 banked curves (2012)Nigel Simmons
 
Moneywise Lesson
Moneywise LessonMoneywise Lesson
Moneywise LessonB Babcock
 
Proyecto redes-sociales-paz-angela-131013194203-phpapp01
Proyecto redes-sociales-paz-angela-131013194203-phpapp01Proyecto redes-sociales-paz-angela-131013194203-phpapp01
Proyecto redes-sociales-paz-angela-131013194203-phpapp01Angela Paz
 
Utah's Wolves in Sheep's Clothing: Part 1
Utah's Wolves in Sheep's Clothing: Part 1Utah's Wolves in Sheep's Clothing: Part 1
Utah's Wolves in Sheep's Clothing: Part 1Mormons4justice
 
Iwb And The Early Years
Iwb And The Early YearsIwb And The Early Years
Iwb And The Early YearsLauren Sayer
 
business communication slide show
business communication slide showbusiness communication slide show
business communication slide showTy Ningdiah
 

En vedette (18)

[UX Project] C-Saw
[UX Project] C-Saw[UX Project] C-Saw
[UX Project] C-Saw
 
QUINUA GRANO DE ORO
QUINUA GRANO DE OROQUINUA GRANO DE ORO
QUINUA GRANO DE ORO
 
JPetty Ltr Recommendation
JPetty Ltr RecommendationJPetty Ltr Recommendation
JPetty Ltr Recommendation
 
sylvia y johanna
sylvia y johannasylvia y johanna
sylvia y johanna
 
Franz Kafka Ameerika pildid
Franz Kafka Ameerika pildidFranz Kafka Ameerika pildid
Franz Kafka Ameerika pildid
 
Vuc35 l2
Vuc35 l2Vuc35 l2
Vuc35 l2
 
Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...
Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...
Ute_Castillo_Carmen_Dr. Gonzalo_Remache_B_"Fortalecer las capacidades y poten...
 
Die Evolution des Messaging
Die Evolution des MessagingDie Evolution des Messaging
Die Evolution des Messaging
 
UX and Software Engineering in Harmony
UX and Software Engineering in HarmonyUX and Software Engineering in Harmony
UX and Software Engineering in Harmony
 
Trabajos ciencias estado
Trabajos ciencias estadoTrabajos ciencias estado
Trabajos ciencias estado
 
Xina cheng cheng i ying
Xina cheng cheng i yingXina cheng cheng i ying
Xina cheng cheng i ying
 
OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...
OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...
OVÁRIOS MICROPOLICÍSTICOS; MULHERES ANOVULATÓRIAS COM SÍNDROME DE OVÁRIOS MIC...
 
X2 t06 06 banked curves (2012)
X2 t06 06 banked curves (2012)X2 t06 06 banked curves (2012)
X2 t06 06 banked curves (2012)
 
Moneywise Lesson
Moneywise LessonMoneywise Lesson
Moneywise Lesson
 
Proyecto redes-sociales-paz-angela-131013194203-phpapp01
Proyecto redes-sociales-paz-angela-131013194203-phpapp01Proyecto redes-sociales-paz-angela-131013194203-phpapp01
Proyecto redes-sociales-paz-angela-131013194203-phpapp01
 
Utah's Wolves in Sheep's Clothing: Part 1
Utah's Wolves in Sheep's Clothing: Part 1Utah's Wolves in Sheep's Clothing: Part 1
Utah's Wolves in Sheep's Clothing: Part 1
 
Iwb And The Early Years
Iwb And The Early YearsIwb And The Early Years
Iwb And The Early Years
 
business communication slide show
business communication slide showbusiness communication slide show
business communication slide show
 

Similaire à Novel statistical method for discriminating mesophilic, thermophilic and hyper-thermophilic proteins

Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...
Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...
Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...John Clarkson
 
August 2021 - JBEI Research Highlights
August 2021 - JBEI Research HighlightsAugust 2021 - JBEI Research Highlights
August 2021 - JBEI Research HighlightsSaraHarmon4
 
Antimicrobial Activity of 2-Aminothiophene Derivatives
Antimicrobial Activity of 2-Aminothiophene DerivativesAntimicrobial Activity of 2-Aminothiophene Derivatives
Antimicrobial Activity of 2-Aminothiophene DerivativesBRNSSPublicationHubI
 
JBEI Research Highlights - October 2018
JBEI Research Highlights - October 2018 JBEI Research Highlights - October 2018
JBEI Research Highlights - October 2018 Irina Silva
 
biochemistry ppt 3 by Sohail Riaz.pptx
biochemistry ppt 3 by Sohail Riaz.pptxbiochemistry ppt 3 by Sohail Riaz.pptx
biochemistry ppt 3 by Sohail Riaz.pptxriazsohail448
 
primary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptxprimary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptxazizulislampatna
 
Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...
Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...
Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...John Clarkson
 
1 s2.0-s105046481300034 x-main
1 s2.0-s105046481300034 x-main1 s2.0-s105046481300034 x-main
1 s2.0-s105046481300034 x-mainalem010
 
Biochemistry and-human-nutrition
Biochemistry and-human-nutritionBiochemistry and-human-nutrition
Biochemistry and-human-nutritionsunildawer
 
Autofluorscent protein based biosensor
Autofluorscent protein based biosensorAutofluorscent protein based biosensor
Autofluorscent protein based biosensorsharma93vidushi
 
PEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCES
PEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCESPEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCES
PEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCESSagarMudgil1
 
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivativesPreparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivativeselshimaa eid
 
Enzyme production by biotechnology
Enzyme production by biotechnologyEnzyme production by biotechnology
Enzyme production by biotechnologyangelina2654
 
Physical aspects, biochemistry of protein &amp; peptide
Physical aspects, biochemistry of protein &amp; peptidePhysical aspects, biochemistry of protein &amp; peptide
Physical aspects, biochemistry of protein &amp; peptideShubham Talpe
 
Peptides and probable degradation pathways
Peptides and probable degradation pathwaysPeptides and probable degradation pathways
Peptides and probable degradation pathwaysVeeprho Laboratories
 
Synthetic peptides as models for intrinsic membrane proteins
Synthetic peptides as models for intrinsic membrane proteinsSynthetic peptides as models for intrinsic membrane proteins
Synthetic peptides as models for intrinsic membrane proteinsLadislav Šigut
 

Similaire à Novel statistical method for discriminating mesophilic, thermophilic and hyper-thermophilic proteins (20)

Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...
Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...
Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...
 
August 2021 - JBEI Research Highlights
August 2021 - JBEI Research HighlightsAugust 2021 - JBEI Research Highlights
August 2021 - JBEI Research Highlights
 
PROTEOMICS.pptx
PROTEOMICS.pptxPROTEOMICS.pptx
PROTEOMICS.pptx
 
09_IJPSCR_2021_0006.pdf
09_IJPSCR_2021_0006.pdf09_IJPSCR_2021_0006.pdf
09_IJPSCR_2021_0006.pdf
 
Antimicrobial Activity of 2-Aminothiophene Derivatives
Antimicrobial Activity of 2-Aminothiophene DerivativesAntimicrobial Activity of 2-Aminothiophene Derivatives
Antimicrobial Activity of 2-Aminothiophene Derivatives
 
JBEI Research Highlights - October 2018
JBEI Research Highlights - October 2018 JBEI Research Highlights - October 2018
JBEI Research Highlights - October 2018
 
biochemistry ppt 3 by Sohail Riaz.pptx
biochemistry ppt 3 by Sohail Riaz.pptxbiochemistry ppt 3 by Sohail Riaz.pptx
biochemistry ppt 3 by Sohail Riaz.pptx
 
Org Biomol Chem
Org Biomol ChemOrg Biomol Chem
Org Biomol Chem
 
primary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptxprimary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptx
 
Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...
Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...
Thermodynamic and biophysical characterization of cytochrome p450 bio i from ...
 
J24077086
J24077086J24077086
J24077086
 
1 s2.0-s105046481300034 x-main
1 s2.0-s105046481300034 x-main1 s2.0-s105046481300034 x-main
1 s2.0-s105046481300034 x-main
 
Biochemistry and-human-nutrition
Biochemistry and-human-nutritionBiochemistry and-human-nutrition
Biochemistry and-human-nutrition
 
Autofluorscent protein based biosensor
Autofluorscent protein based biosensorAutofluorscent protein based biosensor
Autofluorscent protein based biosensor
 
PEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCES
PEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCESPEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCES
PEPTIDOMIMETICS/SAGAR SHARMA/DEPARTMENT OF PHARMACEUTICAL SCIENCES
 
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivativesPreparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
 
Enzyme production by biotechnology
Enzyme production by biotechnologyEnzyme production by biotechnology
Enzyme production by biotechnology
 
Physical aspects, biochemistry of protein &amp; peptide
Physical aspects, biochemistry of protein &amp; peptidePhysical aspects, biochemistry of protein &amp; peptide
Physical aspects, biochemistry of protein &amp; peptide
 
Peptides and probable degradation pathways
Peptides and probable degradation pathwaysPeptides and probable degradation pathways
Peptides and probable degradation pathways
 
Synthetic peptides as models for intrinsic membrane proteins
Synthetic peptides as models for intrinsic membrane proteinsSynthetic peptides as models for intrinsic membrane proteins
Synthetic peptides as models for intrinsic membrane proteins
 

Plus de IJORCS

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsIJORCS
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4IJORCS
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemIJORCS
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...IJORCS
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionIJORCS
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicIJORCS
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2IJORCS
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1IJORCS
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessIJORCS
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...IJORCS
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...IJORCS
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsIJORCS
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...IJORCS
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...IJORCS
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 

Plus de IJORCS (20)

Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
 
Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4Call for Papers - IJORCS, Volume 4 Issue 4
Call for Papers - IJORCS, Volume 4 Issue 4
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition System
 
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveFPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
FPGA Implementation of FIR Filter using Various Algorithms: A Retrospective
 
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...Using Virtualization Technique to Increase Security and Reduce Energy Consump...
Using Virtualization Technique to Increase Security and Reduce Energy Consump...
 
Algebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression FunctionAlgebraic Fault Attack on the SHA-256 Compression Function
Algebraic Fault Attack on the SHA-256 Compression Function
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State Logic
 
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...
 
CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2CFP. IJORCS, Volume 4 - Issue2
CFP. IJORCS, Volume 4 - Issue2
 
Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1Call for Papers - IJORCS - Vol 4, Issue 1
Call for Papers - IJORCS - Vol 4, Issue 1
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and FairnessChannel Aware Mac Protocol for Maximizing Throughput and Fairness
Channel Aware Mac Protocol for Maximizing Throughput and Fairness
 
A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...A Review and Analysis on Mobile Application Development Processes using Agile...
A Review and Analysis on Mobile Application Development Processes using Agile...
 
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...
 
A Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETsA Study of Routing Techniques in Intermittently Connected MANETs
A Study of Routing Techniques in Intermittently Connected MANETs
 
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...The Design of Cognitive Social Simulation Framework using Statistical Methodo...
The Design of Cognitive Social Simulation Framework using Statistical Methodo...
 
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 

Dernier

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Dernier (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Novel statistical method for discriminating mesophilic, thermophilic and hyper-thermophilic proteins

  • 1. International Journal of Research in Computer Science eISSN 2249-8265 Volume 2 Issue 4 (2012) pp. 1-5 © White Globe Publications www.ijorcs.org A NOVEL STATISTICAL METHOD FOR THERMOSTABLE PROTEINS DISCRIMINATION Elham Nikookar1, Kambiz Badie2, Mehdi Sadeghi3 1 Department of Algorithm and Calculations, Faculty of Engineering, University of Tehran, Tehran, Iran Email: e.nikookar@ut.ac.ir 2 Research Institute for ICT, Tehran, Iran Email: k_badie@itrc.ac.ir 3 National Institute of Genetic Engineering and Biotechnology, Tehran, Iran Email: sadeghi@nigeb.ac.ir Abstract: In this study, we used features that can be better the folding mechanism and the function of extracted from protein sequences to discriminate protein [5]. To understand the principles that rule mesophilic, thermophilic and hyper-thermophilic protein thermostability has become of great interest in proteins. Amino acid frequency, dipeptide amino acid basic research as well as in industrial applications. frequency and physical-chemical features are used in Several investigations [6, 7] where carried out and this study. The effect of mentioned features on researchers found that the protein amino acid proposed discrimination algorithm was evaluated both composition was correlated to its thermostability. separately and in combination. Statistical methods are Initial research conducted in the field of protein used in the proposed algorithm. The results of thermal stability dates back to three decades ago [2]. implementing the algorithm on a dataset containing The reason is that thermal instability of proteins at 239 mesophilic proteins, 69 thermophilic proteins and high temperature is a barrier against the functional 59 hyper-thermophilic proteins show the effect of each development of proteins. Haney [8] summarized the bunch of features on the evaluation measures. net change in amino acid composition between Keywords: protein mesophile thermophile, mesophilic and thermophilic proteins. The discrimination, thermostability, amino acid frequency, thermophilic proteins were characteristically reduced feature extraction. in Ser, Asn, Gln, Thr and Met, and were increased in Ile, Arg, Glu, Lys and Pro. The comparison of residue I. INTRODUCTION contents in hyper-thermophilic and mesophilic proteins Proteins are one of the four major classes of on the basics of the genome sequences of eight biological macromolecules. They are linear chains of mesophilic and seven hyper-thermophilic organisms typically ~50-1200 amino acids. Some of the proteins showed that more charged residues existed in hyper- functions include catalyzing metabolic reactions, thermophilic proteins than in mesophilic proteins [9]. chemical signal transduction, and forming the physical However, it was difficult to find the influence of skeleton of some cellular components. Most types of dipeptide composition on protein thermostability [10] protein molecule fold into a well-defined shape under from the diverse collection of studies. Proteins that physiological conditions and this shape is uniquely have similar amino acid composition vary in dipeptide determined by the amino acid sequence of the protein. composition; while, amino acids function with the help The shape of the molecule facilitates its biochemical of other residues nearby in sequence or space. function [1]. There are twenty amino acid types that Accordingly, the function of specific amino acid was occur in living things [2, 3]. Amino acids are influenced by its neighboring amino acid in sequence sometimes called residues when covalently bonded or space, while the dipeptide reflects the influence in together to form a protein molecule. sequence. Hence, the dipeptide composition may be Organisms that thrive in very high temperatures correlated to protein thermostability. For the past two have been actively studied since the discovery of decades, the methods based on dipeptide composition Thermophiles aquatics in the hot springs of have been used for predicating protein structure class Yellowstone in the 1960’s [4]. Since then, [11, 12]. Structurally, proteins are in different 3-D thermostable proteins, because of their overall inherent dimensional shapes which their shape determines stability, have become a part of a number of biological and chemical duty of protein [1]. Shape of a commercial applications. If we know the protein protein, has a one-to-one relationship with amino acid thermostability very well, it would be helpful to know sequence of it. www.ijorcs.org
  • 2. 2 Elham Nikookar, Kambiz Badie, Mehdi Sadeghi If we find effective factors in thermal stability of have an affinity for elevated temperatures and grow in proteins, using reasonable design, more stable proteins temperatures between 45-80 degrees Celsius and can be achieved. Previous investigations [13] show hyper-thermophilic proteins that their optimum growth that proteins thermal stability depends on its amino temperature is over 80 degrees Celsius. Mesophilic acid composition. In previous studies, classification is proteins function in 15-45 degrees Celsius. Hyper- done only based on thermophile and mesophile classes mesophilic proteins have optimum growth temperature [5]. This division leads to limitation of methods on below 15 degrees Celsius [13]. only some applications and specific datasets. In this paper, in addition to amino acids frequencies, we used In this paper, we used the dataset of [14], which other features that can improve proteins contains 58 hyper-thermophilic proteins and 118 discrimination. Also, proteins are divided into mesophilic proteins homolog with them and also 69 mesophile, thermophile and hyper-thermophile classes. thermophilic proteins and 121 mesophilic proteins This makes possible to generalize proposed method to homolog with them. All the proteins have been more general data sets and applications. downloaded from PDB (Protein Data Bank) [15] repository. Structure of each sample of dataset II. DATASET contains the amino acid sequence with the length of 50-1200 from which we extracted elements of feature Scientists divide organisms based on their thermal vector [13]. stability into two classes: thermophilic proteins that Table 1: Chemical and physical properties ID Property AA ID Property AA ID Property AA 1 aromatic HFWY 17 carboxyl DE 33 Polar/hydrophobilic RNDEQHKSTWY UV 2 FWY 18 carbonyl NDEQ 34 hydrophobic ACILMFPWYV absorbance Single 3 FY 19 imidazol H 35 Very hydrophobic ACILMFV aromatic ring 4 heteroaromatic HW 20 guanidio R 36 Weak hydrophobic PWY 5 aliphatic GAILVP 21 amino RK 37 H-bonding RNDCEQHKSTWY Systematical 6 branched ILTV 22 G 38 H-acceptor NDEQHSTY alpha-C Branched 7 ITV 23 alkyl AILV 39 H-donor RNCQHKSTWY beta-carbon 8 felible G 24 achiral G 40 tiny GA 2 chiral 9 inflexible P 25 IT 41 very small GASC centers 10 alpha imino P 26 ionizble RDCEHKY 42 medium small VTNDP 11 hydroxyl STY 27 charged RDEHK 43 small GASCVTNDP hydroxyl 12 ST 28 acidic DE 44 large KRFYW straight chain 13 phenol Y 29 basic RKH 45 long KREQ 14 sulfur CM 30 strong basic RK 46 very long KR weak 15 sulfhydryl C 31 STWY 47 medium-long EQ hydrophobilic very 16 amide NQ 32 RNDEQHK 48 short GASCT hydrophobilic We have addressed the discrimination problem, mesophilic analogs and another large non-redundant where given the sequences of a mesophilic protein and set of hyper-thermophilic PDB protein chains along a thermophilic or hyper-thermophilic counterpart, the with their mesophilic analogs. They will be referred to objective is to determine which is which. This was as the pair sets: pairs thermophile and pairs hyper- done by assembling a large set of thermophilic protein thermophile. We have computed several sequence chains from the PDB and their corresponding based numerical indices, based on the quantities that www.ijorcs.org
  • 3. A Novel Statistical Method for Thermostable Proteins Discrimination 3 other authors have reported that are associated with features is computed based on equation (3) for thermostability. We tested their ability to successfully mesophile class as well as equation (4) for thermophile discriminate between thermophile/mesophile pairs [2]. class and equation (5) for hyper-thermophile class [19]. 𝜎 𝑀 = ��𝑐𝑜𝑚𝑝 𝑓 − ������������ 𝑐𝑜𝑚𝑝 𝑓,𝑀 (3) III. FEATURE VECTOR CREATION 𝑓 The feature vector of this study is extracted from 𝜎 𝑇 = ��𝑐𝑜𝑚𝑝 𝑓 − ����������� 𝑐𝑜𝑚𝑝 𝑓,𝑇 (4) protein sequence and consists of 468 entries. Twenty features represent frequency of 20 amino acids and are 𝑓 𝑁𝑖 calculated according to equation (1): 𝐴𝐴 𝑖 = , 𝑖 = 1,2, … ,20 (1) 𝜎 𝐻 = � �𝑐𝑜𝑚𝑝 𝑓 − ������������ (5) 𝑛 𝑓 𝑐𝑜𝑚𝑝 𝑓,𝐻 where Ni is the frequency of ith amino acid and n is the length of protein sequence. According to previous equations, protein is 𝜎 𝑀 < 𝜎 𝐻 and 𝜎 𝑀 < 𝜎 𝑇 (6) mesophilic if Since having twenty different amino acids, there is a maximum of 400 different dipeptide 𝜎 𝑇 < 𝜎 𝐻 and 𝜎 𝑇 < 𝜎 𝑀 (7) compositions findable in protein sequences [16], and is thermophilic if and therefore 400 features are assigned to dipeptides. Each feature of this type is calculated 𝑀𝑖 according to equation (2): 𝐷𝑒𝑝 𝑖 = , 𝑖 = 1,2, … ,400 (2) otherwise, we have a hyper-thermophilic protein. 𝑛−1 V. EXPERIMENTAL RESULTS where Mi is the frequency of ith dipeptide The final performance of proposed method is composition. measured by three evaluation measures: sensitivity, specificity and accuracy. These indices are calculated 𝑇𝑃 Forty-eight features represent chemical-physical using equations 8 to 10 [20]. 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (8) 𝑇𝑃 + 𝐹𝑁 attributes of amino acids that for a particular protein, each attribute is obtained by calculating frequency of 𝑇𝑁 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (9) amino acids relevant to that attribute in the protein. 𝑇𝑁 + 𝐹𝑃 Table (1) shows chemical-physical features used in 𝑇𝑃 + 𝑇𝑁 this study [17]. 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (10) 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 IV. METHOD In our proposed method, for the creation of training and testing datasets, k-fold cross validation method has As we investigated three classes in this study, been used [18]. In each iteration of k-fold cross sensitivity and specificity are calculated separately for validation, training samples are divided into each class. For example, for thermophile class, TP is thermophile, hyper-thermophile and mesophile classes the number of thermophile samples that our proposed based on their class label. system identifies them correctly as thermophilic proteins. FP is the number of mesophile or hyper- �����������) For each feature, f, average of values of that feature thermophile samples that system identifies them we calculate (comp 𝑓,𝐻 for hyper-thermophile class, in all samples of each class is calculated separately; i.e. �����������) �����������) wrongly as thermophilic proteins. TN is the number of (comp 𝑓,𝑇 for thermophile class and (comp 𝑓,𝑀 for mesophile or hyper-thermophile samples that system identifies them correctly as mesophilic or hyper- mesophile class. This procedure is repeated for all thermophilic proteins. FN is the number of features. Therefore, we have a matrix with dimension thermophile samples that system identifies them of 3×468 that its i,jth entry is the average of jth feature wrongly as mesophilic or hyper-thermophilic proteins. for all the samples of ith class. TP, FP, TN and FN are defined similarly for mesophile To test the proposed model, at the end of each and hyper-thermophile classes. The other measure, iteration and for each test sample, feature vector of the accuracy, shows the number of sample which system sample is created (compf). Then, subtraction of values identifies their class correctly. of each feature of test sample and corresponding We applied our proposed method on dataset with average of that feature for three classes is calculated. four different feature vectors. In the first mode, the After that, sum of absolute differences for all the feature vector only contained amino acid frequencies. www.ijorcs.org
  • 4. 4 Elham Nikookar, Kambiz Badie, Mehdi Sadeghi In second mode, dipeptide composition frequencies; in system in 2-class mode evaluated that the results are third mode, physical-chemical features of table (2) and shown in tables (3) to (5). in forth mode, all the previous features were used to build feature vector of samples. We applied this For a more comprehensive review, we have defined division to study the effect of different bunches of a new performance measure called “SESPAVGij” features on discrimination power and evaluation which is arithmetic mean of all sensitivities and measures. As mentioned earlier, K-fold cross specificities of i and j classes for each feature set. i.e., validation was used to create train and test datasets and when we are studying the results of applying two-class the results shown in tables are average of different mode of algorithm on thermophilic and mesophilic iterations of the method. class, SESPAVGMT is the arithmetic mean of SEM, SET, SPM and SPT. These values have been added to Because some applications work only with tables (2) to (5) as last column. mesophile and thermophile classes, performance of Table 2: Results of the proposed method on Three-class mode Accuracy SEM SET SEH SPM SPT SPH SESPAVGHMT 20 Amino Acids 63.57% 61.69% 54.18% 82.26% 52.87% 88.82% 95.98% 72.63% 48 Chemical-Physical 57.62% 59.27% 38.44% 73.88% 50.94% 85.02% 93.94% 66.92% 400 Dipeptide Composition 59.74% 55.92% 50.58% 84.77% 50.09% 88.19% 96.39% 70.99% 468 All Features 59.05% 59.00% 41.21% 80.37% 51.97% 85.94% 95.32% 68.97% Table 3: Results of Two-class mode on hyper-thermophilic and mesophilic classes Accuracy SEM SEH SPM SPH SESPAVGHM 20 Amino Acids 80.83% 78.42% 90.36% 51.19% 97.27% 79.31% 48 Chemical-Physical 84.00% 83.42% 85.96% 56.16% 96.03% 80.39% 400 Dipeptide Composition 58.95% 50.02% 94.89% 31.83% 97.33% 68.52% 468 All Features 73.68% 68.96% 93.28% 42.90% 97.63% 75.69% Table 4: Results of Two-class mode on thermophilic and mesophilic classes Accuracy SEM SET SPM SPT SESPAVGMT 20 Amino Acids 73.86% 76.37% 64.78% 44.55% 88.21% 68.48% 48 Chemical-Physical 67.21% 68.75% 62.16% 35.83% 86.54% 63.32% 400 Dipeptide Composition 65.72% 64.09% 71.45% 36.81% 88.47% 65.21% 468 All Features 72.28% 75.01% 63.25% 42.56% 87.58% 67.10% Table 5: Results of Two-class mode on hyper-thermophilic and thermophilic classes Accuracy SET SEH SPT SPH SESPAVGHT 20 Amino Acids 83.44% 82.66% 84.03% 80.36% 86.30% 83.34% 48 Chemical-Physical 66.94% 65.58% 67.85% 62.36% 70.70% 66.62% 400 Dipeptide Composition 77.86% 67.03% 92.27% 70.79% 89.79% 79.97% 468 All Features 75.90% 71.02% 81.26% 70.95% 81.78% 76.25% VI. CONCLUSION RI, AK, LA, YI, YE, RK and SK are more frequent in hyper-thermophilic proteins. In addition, Amino acids Studying the frequency of amino acids in hyper- with charge and acidic attributes are more frequent in thermophilic, thermophilic and mesophilic proteins hyper-thermophilic proteins. shows that Gly, Ser and Thr are more frequent in mesophilic proteins. Arg, Leu and Pro amino acids are Studying SESPAVG measures and according to the more frequent in thermophilic proteins and Glu, Iso results of table (2) which is related to three-class and Lys amino acids are more frequent in hyper- mode, values of Accuracy and SESPAVGHMT is thermophilic proteins. maximized if we use 20 amino acid frequency features to discriminate samples. In two-class mode, in table The other result of this study is that among 400 (3), accuracy and SESPAVGHM are maximized if we dipeptide amino acid compositions, AA, LL, LA, AL, use physical-chemical features and in tables (4) and QA, QL, AQ, LT, TL and EQ are more frequent in (5), accuracy, SESPAVGMT and SESPAVGHT are mesophilic proteins; IE, EE, EK, KE, VE, EI, KI, KK maximized if we use 20 amino acid frequency features. and VK are more frequent in thermophilic proteins and www.ijorcs.org
  • 5. A Novel Statistical Method for Thermostable Proteins Discrimination 5 So this can be concluded that 20 amino acid frequency [12] Gromiha MM, Shandar A, Makiko S, “Application of features are better that other features for discriminating residue distribution along the sequence for thermophile class samples from mesophile class discriminating outer membrane proteins”, Comput Biol samples and hyper-thermophile class samples from Chem, vol. 29, pp. 135-142, 2005. thermophile class samples. Also, physical-chemical [13] Seung P, Young J, “Protein Thermostability: Structure- features are better than other features for Based Difference of Amino acid between Thermophilic discriminating hyper-thermophile class samples from and Mesophilic Proteins”, Elsevier Journal of Biotech, vol. 111, pp. 269-277, 2004. mesophile class samples. [14] Michael G, Xavier S, “Discrimination and Because the number of entries of 20 amino acid Classification of Mesophilic and Thermophilic Protein frequency and physical-chemical feature vectors which using Machine Learning Algorithms”, Willy are suggested based on results are lower than 400 InterSience, pp. 1274-1279, 2007. dipeptide compositions and 468 all feature vectors, [15] http://www.pdb.org. Available at June 24, 2012. required time for identifying the class of a new protein [16] Zahng G, Fang B, “Study on the Discrimination of using former vectors is also less than latter vectors. Thermophilic and Mesophilic Proteins Based on Dipeptide Composition”, Chinese journal of Dataset used in this research include three- biotechnology, vol. 22, pp. 293-298, 2006. dimensional structure of proteins in addition to [17] Eshkin A, Ghafuri H, “Predication of relative solvent convention amino acid sequence. This advantage accessibility by support vector regression and best-first makes possible the use of features extracted from the method”, Excli Journal, vol. 9 , pp. 29-38, 2010. 3D structure in future researches. [18] Platzer A, Percpo P, “Characterization of protein- interaction networks in tumors”, BMC Bioinformatics, VII. REFERENCES vol. 8, 2007. [19] Szilagyi A, Zavodszky P, “Structural differences [1] C-I Branden, J.Tooze, “Introduction to protein between mesophilic, moderately thermophilic and structure”, 2nd edition, New York: Garland Pub., 1999. extremely thermophilic protein subunits: results of [2] Todd T, Iosif V, “Discrimination of thermophilic and comprehensive survey”, Elsevier, vol. 8, pp. 493-504, mesophilic protein”, Taylor and Vaisman BMC 2000. Structural Biology, vol. 10, 2010. [20] Chen Yu, Han, Kyungsook, “BSFINDER: Finding [3] Xingyu, W., Shouliang, C., Mingde, G., “General Binding Sites of HCV Proteins Using a Support Vector Biology”, Version 2, Higher Education Press, Beijing, Machine”, Protein and Peptide Letters, vol. 16, pp. 373- 2005. 382(10), 2009. [4] Brock T, Freeze H, “Thermus aquaticus gen. n.and sp. n.., a Nonsporulating Extreme Thermophile”, J Bactriol, vol. 98, pp. 289-297, 1969. [5] Jingru Xu, Yuehui Chen, “Discrimination of Protein Thermostability Based on a New Integrated Neural Network”, vol. 1, pp. 107-112, Springer, 2011. [6] Kumar S, Nussinov R, “How do thermophilic proteins deal with heat?”, Cell Mol Life Sci, vol. 58, pp. 1216- 1233, 2001. [7] Thompson MJ, Eisenberg D, “Transprotemic evidence of a loop-deletion mechanism for enhancing protein thermostability”, J Mol Biol, vol. 290, pp. 595-604, 1999. [8] Haney PJ, Jonathan HB, Berald LB, “Thermal adaption analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species”, PNAS, vol. 96, pp. 3578-3583, 1999. [9] Vielle C, Zeikus GJ, “Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability”, Microbiol Mol Biol Rev, vol. 65, pp. 1-43, 2001. [10] Ding YR, Cai YJ, Zhang GX, “The influence of dipeptide composition on protein thermostability”, FEBS Lett, vol. 569, pp. 284-288, 2004. [11] Kumarevel TS, Gromiha MM, Ponnuswamy MN, “Structural class predication: an application of residue distribution along the sequence”, Biophys Chemist, vol. 88, pp. 81-101, 2000. www.ijorcs.org