SlideShare a Scribd company logo
1 of 14
Computer Security:
                  A Machine Learning
                  Approach
                                                                                  We analyze two learning algorithms, NBTree and
                                                                                          VFI, for the task of detecting intrusions.
                                                                                                        SANDEEP V. SABNANI AND ANDREAS FUCHSBERGER




Produced by the Information Security Group at Royal Holloway, University of London in conjunction with TechTarget. Copyright © 2008 TechTarget. All rights reserved.
Royal Holloway series      Machine learning approaches



                                              ABSTRACT
       Information Security is one of the key areas today as securing computers especially
     against novel attacks becomes a daunting task. Intrusion detection is a method by which
     unauthorised access to one’s assets is detected. In this paper, we present an application           Sandeep V.
     of the field of machine learning to computer security, particularly to intrusion detection.         Sabnani
     We analyse two learning algorithms (NBTree and VFI) for the task of detecting intrusions            Information Security Group
     and compare their relative performances. We then comment on the suitability of NBTree               Royal Holloway, University of London
                                                                                                         Egham, Surrey, TW20 0EX,
     algorithm over VFI for the intrusion detection task based on its high accuracy and high             United Kingdom
     recall. We finally state the usefulness of machine learning to the field of computer security.      s.sabnani@rhul.ac.uk

                                                                                                         Andreas
                                                                                                         Fuchsberger
                       Computer Security:                                                                Information Security Group
                   A Machine Learning Approach                                                           Royal Holloway, University of London
                                                                                                         Egham, Surrey, TW20 0EX,
                                                                                                         United Kingdom
   1. INTRODUCTION                                        There have been quite a few approaches         a.fuchsberger@rhul.ac.uk
   Computer security has become a chal-                which prevent and/or detect known attacks.
   lenging task these days with the rapid              Novel or unknown attacks on the other hand        This article was prepared by students and
                                                                                                         staff involved with the award-winning M.Sc.
   growth of the internet and the increasing           are more difficult to detect and have received    in Information Security offered by the
   complexity of communication protocols.              considerable attention in the recent past.        Information Security Group at Royal Holloway,
   New and complicated attack methods are                 Another important problem in the field of      University of London. The student was judged
                                                                                                         to have produced an outstanding M.Sc. thesis
   being constantly developed by attackers             computer security has been that of insider        on a business-related topic. The full thesis
   thereby compromising the confidentiality,           threats. Many of the insider threats may be       is available as a technical report on the
                                                                                                         Royal Holloway website
   integrity and/or availability of one’s data.        unintentional, nevertheless it has become         http://www.ma.rhul.ac.uk/tech.
   CERT has reported 8064 new vulnerabili-             essential to ensure that insider behaviour
   ties in the year 2006 and the number has            is in sync with the security policy of the        For more information about the Information
                                                                                                         Security Group at Royal Holloway or on the
   been increasing significantly over the past         organisation.                                     M.Sc. in Information Security, please visit
   few years [1].                                         These issues can prove to be quite expen-      http://www.isg.rhul.ac.uk.




                  • MACHINE LEARNING          • COMPUTER SECURITY              • EVALUATION             • CONCLUSIONS
                                                  APPLICATIONS                                                                                           2
Royal Holloway series       Machine learning approaches


   sive for organisations to handle. The task             2.      MACHINE LEARNING
   of detecting intrusions involves the formula-               2.1 Basic Concepts
   tion of efficient rules which require a high               Learning can be described in many ways
   level of domain expertise and analysis of              including acquisition of new knowledge,
   large amounts of data, which might make                enhancement of existing knowledge, repre-
   the process slow and unreliable with human             sentation of knowledge, organisation of
   experts. For checking compliance with a                knowledge and discovery of facts through              Every computer
   security policy, an administrator may have             experiments [11]. When such learning is per-          action can
   to be extremely cautious as normal user                formed with the help of computer programs,            be modeled
   behaviour may change over time.                        it is referred to as machine learning.
                                                              Every computer action can be modeled
                                                                                                                as a function
      Machine learning is a field related to artifi-
   cial intelligence which deals with construct-          as a function with sets of inputs and outputs.        with sets of
   ing computer programs that automatically               A learning task may be considered as the              inputs and
   improve with experience [12]. The learning             estimation of this function by observing the          outputs.
   experience is provided in the form of data             sets of inputs and outputs. (We use the term
   and actual learning is achieved with the help          estimation as the exact function may not
   of algorithms. The two main tasks that are             be determinate.) The function estimating
   addressed by machine learning are the                  process usually consists of a search in the
   ability to learn more about the given data             hypothesis space (i.e. the space of all such
   and to make predictions about new data                 possible functions that might represent the
   based on learning outcomes from the learn-             input and output sets under consideration).
   ing experience [9]; both of which are difficult            The authors in [14] formally describe the
   and time-consuming for human analysts.                 function approximation process. Consider a
   Machine learning is thus, well-suited to               set of input instances X = ( x1, x2, x3 . . . xn).
   problems that depend on rare, expensive                Let f be a function which is to be guessed
   and unreliable human experts.                          by the learner. Let h be the learner’s
      This paper presents the intrusion detec-            hypothesis about f.
   tion problem and a machine learning based                  Also, we assume a priori that both f and h
   solution to it.                                        belong to a class of functions H. The function



                   • MACHINE LEARNING            • COMPUTER SECURITY                • EVALUATION               • CONCLUSIONS
                                                     APPLICATIONS                                                                3
Royal Holloway series       Machine learning approaches


   f maps the input instances in X as,                 an important issue for machine learning. The
                        ∋
                     X h→H h(X)                        learning element may be trained in different
                                                       ways [3]. For classification problems like intru-
      A machine learning task may thus be              sion detection, knowledge may be learned in
   defined as a search in this space H. This           a supervised, unsupervised or semi-super-
   search results in approximating the relevant        vised manner. In this paper, we use super-
   h, based on the training instances (i.e. the        vised learning in which the learner is provided      The inputs and
   set X).                                             with training examples with the associated           outputs to a
      The approximation is then checked against        classes or values for the attribute to be            machine learning
   a set of test instances which are then used         predicted.
                                                                                                            task may be of
   to indicate the correctness of h. The search                                                             different kinds.
   requires algorithms which are efficient and                       Choosing a learning
   which best-fit the training data [12].                                algorithm
     2.2 Inputs and Outputs
      The inputs and outputs to a machine                           Training the algorithm
   learning task may be of different kinds.                           using training data
   Generally, they are in the form of numeric or
   nominal attributes. For instance, an attribute                Evaluating the algorithm
   like temperature if used as a numeric attrib-                 by running it on test data
   ute, may have values like 25o C, 28o C, etc.
   On the other hand, if it is used as a nominal       Figure 1. Life Cycle of a Machine Learning Task
   attribute, it may take values from a fixed set
   (like high, medium, low). In many cases, the          2.4 Defining a Machine Learning Task
   output may also be a boolean value (like yes           In general, a machine learning task can be
   and no).                                            defined formally in terms of three elements,
                                                       viz. the learning experience E, the tasks T
     2.3 Production of Knowledge                       and the performance element P.
     The way in which knowledge is learned is             TM Mitchell in Machine Learning [12]



                  • MACHINE LEARNING          • COMPUTER SECURITY                    • EVALUATION          • CONCLUSIONS
                                                  APPLICATIONS                                                                 4
Royal Holloway series     Machine learning approaches


   defines a learning task more precisely as           becomes difficult to create separate sets
   follows:                                            for training and testing. In such cases, some
      “A computer program is said to learn from        data might be held over for testing, and the
   experience E with respect to some class of          remaining used for training. This is called the
   tasks T and performance measure P, if its           holdout procedure [20]. However, the data in
   performance at tasks in T, as measured              this case might be distributed in an uneven
   by P, improves with experience E.”                  way in the training and test sets and might        Once the
                                                       not represent the output classes in the            algorithm is
     2.5 Life Cycle of a Machine                       correct proportions. Stratification [4] and        selected, the
         Learning Task                                 cross-validation [5] can be used to
      Figure 1 shows the life cycle of a machine                                                          next step is to
                                                       circumvent this problem.
   learning task. Depending on the nature of                                                              train the algorithm
   the knowledge to be learned, different                2.6 Benefits of Machine Learning                 by providing it
   types of algorithms may be chosen at                  The field of machine learning has been           with a set of
   different times. Also, the type of inputs           found to be extremely useful in the following      training instances.
   and outputs are also instrumental in                areas relating to software engineering [12]:
   choosing an algorithm.                                1. Data mining problems where large
      Once the algorithm is selected, the next         databases may contain valuable implicit reg-
   step is to train the algorithm by providing it      ularities that can be discovered automatically.
   with a set of training instances. The training        2. Difficult to understand domains where
   instances are used to build a model that rep-       humans might not have the knowledge to
   resents the target concept to be learned (i.e.      develop effective algorithms.
   the hypothesis). This model is then evaluated         3. Domains in which the program is
   using the set of test instances.                    required to adapt to dynamic conditions.
      In the case where a large amount of data
   is available, the general approach is to con-          In the case of traditional intrusion detec-
   struct two independent sets, one for training       tion systems, the alerts generated are
   and the other for testing. On the other hand,       analysed by human analysts who evaluate
   if a limited amount of data is available, it        them and take suitable actions. However,



                 • MACHINE LEARNING           • COMPUTER SECURITY               • EVALUATION             • CONCLUSIONS
                                                  APPLICATIONS                                                                  5
Royal Holloway series      Machine learning approaches


   this is an extremely onerous task as the             set for intrusion detection obtained from the
   number of alerts generated may be quite              UCI machine learning repository. The KDD
   large and the environment may change                 Cup ’99 data set is a version of a data set
   continuously [16]. This makes machine                used at the DARPA Intrusion Detection
   learning well suited for intrusion detection.        Evaluation program (www.ll.mit.edu/IST/
                                                        ideval/data/data index.html).
   3. MACHINE LEARNING APPLIED                             The data set consists of TCP dump data
      TO COMPUTER SECURITY                              for a simulated Air Force LAN. In addition to    In addition to
     3.1 Intrusion Detection as                         normal LAN simulation, attacks were also
         a Machine Learning Task                        simulated and the corresponding TCP data
                                                                                                         normal LAN
      A machine learning task can be formally           was captured. The attacks were launched          simulation,
   defined as shown in section 2.4. We use              on three UNIX machines, Windows NT               attacks were also
   this notation to formulate intrusion detection       hosts and a router along with background         simulated and the
   as a machine learning task.                          traffic. Every record in the data set repre-     corresponding
      Thus, for intrusion detection, we have,           sents a TCP connection. Each connection
      1. Task: To detect intrusions in an
                                                                                                         TCP data was
                                                        was labeled as normal or as a specific
   accurate manner.                                     attack type [15]. The attacks fall into one      captured.
      2. Experience: A dataset with instances           of the following categories:
   representing normal as well as attack data.             • DOS attacks (Denial of Service attacks)
      3. Performance Measure: Accuracy                     • R2L attacks (unauthorised access from
   in terms of correct classification of intru-              a remote machine)
   sion events and normal events and other                 • U2R attacks (unauthorised access to
   statistical metrics including precision,                  super user privileges)
   recall, F-measure and kappa statistic                   • Probing attacks
   which are described in section 3.5.
                                                          A detailed description and format of the
     3.2 Data Set Description                           dataset can be found in [17].
     The data set used for evaluation in this
   paper is a subset of the KDD Cup ’99 data



                  • MACHINE LEARNING           • COMPUTER SECURITY             • EVALUATION             • CONCLUSIONS
                                                   APPLICATIONS                                                              6
Royal Holloway series       Machine learning approaches


     3.3 Algorithms                                         3.3.2 VFI
     The following algorithms were used in the               The VFI4 algorithm is a classification
   experiments carried out for this paper.                algorithm based on the voting frequency
                                                          intervals. In VFI, each training instance is
     3.3.1 NBTree                                         represented as a vector of features along
      The NBTree algorithm is a hybrid between            with a label that represents the class of
   decision-tree classifiers and Naive Bayes              the instance. Feature intervals are then          The NBTree
   classifiers. It represents the learned knowl-          constructed for each feature. An interval         algorithm is a
   edge in the form of a tree which is constructed        represents a set of values for a given feature    hybrid between
   recursively. However, the leaf nodes are               where the same subset of class values
   Naive Bayes categorizers rather than nodes             are observed. Thus, two adjacent intervals
                                                                                                            decision-tree
   predicting a single class [6]. For continuous          represent different classes.                      classifiers and
   attributes, a threshold is chosen so as to                A detailed explanation of both the above       Naive Bayes
   limit the entropy measure. The utility of a            algorithms can be found in [17].                  classifiers.
   node is evaluated by discretizing the data
   and computing the fivefold cross-validation              3.4 Experimental Analysis
   accuracy estimation using Naive Bayes at                  The experiments done in this paper
   the node. The utility of the split is the weighted     consist of the evaluation of the performance
   sum of utility of the nodes and this depends           of NBTree and VFI algorithms for the task
   on the number of instances that go through             of classifying novel intrusions. The dataset
   that node. The NBTree algorithm tries to               described in section 3.2 was used in the
   approximate whether the generalisation                 experiments.
   accuracy of Naive Bayes at each leaf is                   Weka [20], a machine learning toolkit was
   higher than a single Naive Bayes classifier            used for the implementation of the algo-
   at the current node. A split is said to be             rithms described in sections 3.3.1 and
   significant if the relative reduction in error         3.3.2. Due to the limitation in the available
   is greater that 5% and there are at least              memory and processing power, it was not
   30 instances in the node [6].                          possible to use the full dataset described in
                                                          section 3.2. Instead a reduced subset was



                   • MACHINE LEARNING            • COMPUTER SECURITY             • EVALUATION              • CONCLUSIONS
                                                     APPLICATIONS                                                             7
Royal Holloway series            Machine learning approaches


   used and 10-fold cross-validation (explained                 negative. The value b is referred to as a false
   in section 2.5) was used to overcome this                    negative and c is known as false positive.
   limitation.                                                     In the context of intrusion detection, a true
                                                                positive is an instance which is normal and
     3.5 Evaluation Metrics                                     is also classified as normal by the intrusion
     In order to analyse and compare the                        detector. A true negative is an instance
   performance of the above mentioned                           which is an attack and is classified as an          In the context
   algorithms, metrics like the classification                  attack.                                             of intrusion
   accuracy, precision, recall, F-Measure and                                                                       detection, a
   kappa statistic were used. These metrics are                   3.5.1 Classification Accuracy
   derived from a basic data structure known                      Classification accuracy is the most basic
                                                                                                                    true positive is
   as the confusion matrix. A sample confusion                  measure of the performance of a learning            an instance which
   matrix for a two-class problem is shown in                   method. It determines the percentage of             is normal and is
   Table 1.                                                     correctly classified instances. From the            also classified
                                                                confusion matrix, we can say that:                  as normal by the
                            Predicted          Predicted                                                            intrusion detector.
                                                                             Accuracy = a+d
                            Class              Class
                            Positive           Negative                               a+b+c+d
     Actual Class           a                  b
     Positive
                                                                   This metric gives the number of instances
                                                                from the dataset which are classified correctly
     Actual Class           c                  d                i.e. the ratio of true positives and true nega-
     Negative
                                                                tives to the total number of instances.

   Table 1. Confusion Matrix for a two-class                      3.5.2 Precision, Recall and F-Measure
   problem (Expected predictions)                                  Precision gives the percentage of slots
                                                                in the hypothesis that are correct, whereas
      In this confusion matrix, the value a is called           recall gives the percentage of reference
   a true positive and the value d is called a true             slots for which the hypothesis is correct.



                      • MACHINE LEARNING               • COMPUTER SECURITY               • EVALUATION              • CONCLUSIONS
                                                           APPLICATIONS                                                                   8
Royal Holloway series      Machine learning approaches


     Referring from the confusion matrix, we            figure and deducts it from the predictor’s
   can define precision and recall for our              success and expresses the result as a pro-
   purposes as [19]:                                    portion of the total for a perfect predictor [20].
                   Precision = a                           In addition to the above statistical metrics,
                              a+c                       the time taken to build the model was also
                     Recall = a                         considered as a performance indicator.
                             a+b
                                                          3.6 Attribute Selection
      The precision of an intrusion detection              Attribute Selection is the process of
   learner would thus indicate the proportion of        identifying and removing much of the redun-
   correctly classified positive instances to the       dant and irrelevant information possible. The
   total number of predicted positive instances         experiments conducted in this paper use the
   and recall would indicate the proportion of          information gain attribute selection method
   correctly classified positive instances to the       which is described in [17].
   total number of actual positive instances.
      The F-measure is another metric defined             3.7 Summary of Experiments
   as the weighted harmonic mean of precision             Based on the above algorithms, attribute
   and recall [8] to address a problem identified       selection methods and the type of cross-vali-
   in [7], which may be present in any classifica-      dation, the following table 2 shows a summary
   tion scenario.                                       of the experiments conducted.
   F-measure = 2*Precision*Recall
                                                                          Feature Reduction     Cross
                      Precision+Recall                    Algorithm       Method                Validation
                                                          NBTree          —                     10-fold
     3.5.3 Kappa Statistic
     The Kappa statistic is used to measure               VFI             —                     10-fold
   the agreement between predicted and                    NBTree          Information Gain      10-fold
   observed categorizations of a dataset, while           VFI             Information Gain      10-fold
   correcting for agreements that occur by
   chance. It takes into account the expected           Table 2. Summary of Experiments




                  • MACHINE LEARNING           • COMPUTER SECURITY                   • EVALUATION            • CONCLUSIONS
                                                   APPLICATIONS                                                              9
Royal Holloway series             Machine learning approaches


   4. EVALUATION
                                                                      Metric                         Value
   The results of the experiments described
                                                                      Time taken to
   in section 3.4 are discussed in this section.                      build the model                0.92s
   A comparison between NBTree and VFI                                Accuracy                       86.58 %
   methods is also made based on the values                           Average Precision              41.27 %
   of the metrics defined in section 3.5.                             Average Recall                 80.54 %                  For an IDS,
      For an IDS, the accuracy indicates how                          Average F-Measure              44.05 %                  the accuracy
   correct the algorithm is in identifying normal                     Kappa Statistic                79.50 %
                                                                                                                              indicates how
   and adversary behaviour.
                                                                    Table 5. Results of VFI with all attributes               correct the
     Metric                        Value                                                                                      algorithm is in
                                                                      Metric                         Value
     Time taken to
                                                                      Time taken to
                                                                                                                              identifying normal
     build the model               1115.05s                                                                                   and adversary
                                                                      build the model                0.2s
     Accuracy                      99.94 %
     Average Precision             90.33 %
                                                                      Accuracy                       75.81 %                  behaviour.
                                                                      Average Precision              35.71 %
     Average Recall                92.72 %
                                                                      Average Recall                 75.82 %
     Average F-Measure             91.14 %
                                                                      Average F-Measure              37.43 %
     Kappa Statistic               99.99 %
                                                                      Kappa Statistic                66.21 %
   Table 3. Results of NBTree with all attributes
                                                                    Table 6. Results of VFI with selected attributes using
     Metric                        Value                            information gain measure
     Time taken to
     build the model               38.97s                              Recall would indicate the proportion of
     Accuracy                      99.89 %                          correctly classified normal instances from
     Average Precision             94.54 %                          the total number of actual normal instances
     Average Recall                90.84 %                          whereas precision would indicate the num-
     Average F-Measure             92.28 %                          ber of correctly classified normal instances
     Kappa Statistic               99.82 %
                                                                    from the total number of instances identified
   Table 4. Results of NBTree with selected attributes using        as normal by the IDS. The Kappa statistic
   information gain measure                                         is a general statistical indicator and the



                       • MACHINE LEARNING                  • COMPUTER SECURITY                      • EVALUATION             • CONCLUSIONS
                                                               APPLICATIONS                                                                        10
Royal Holloway series             Machine learning approaches


   F-Measure is related to the problem men-                     The graphs in Figures 2 and 3 show a rel-
   tioned in section 3.5.2. In addition to these,            ative performance of NBTree and VFI for the
   the time taken by the learning algorithm for              intrusion detection task on the dataset under
   model construction is also important as it                consideration. Figure 2 shows the compari-
   may have to handle extremely large amounts                son with all attributes under consideration
   of data.                                                  and figure 3 depicts the comparison for
                                                             attributes selected using the information        There are tremen-
                                                             gain measure.                                    dous differences
                                                                As per the definitions in section 3.5.2,      in the precision
                                                             a good IDS should have a recall that is as
                                                                                                              and recall values
                                                             high as possible. A high precision is also
                                                             desired. From our results, we see that the       of NBTree and
                                                             classification accuracy of NBTree is better      VFI where the
                                                             than that of VFI in both cases. There are        NBTree exhibits
                                                             tremendous differences in the precision and      a relatively higher
                                                             recall values of NBTree and VFI where the        precision and
   Figure 2. NBTree v/s VFI - All Attributes                 NBTree exhibits a relatively higher precision
                                                             and higher recall. Also, when all attributes
                                                                                                              higher recall.
                                                             are used, NBTree has a lower precision
                                                             value than the case when selected attributes
                                                             are used. In both these cases, the recall is
                                                             more or less the same. Also, the F- Measure
                                                             value is high for NBTree in comparison
                                                             to VFI. NBTree is seen to have a better
                                                             performance as compared to the VFI in
                                                             both the cases and it can thus be said
                                                             that it is more suited to the intrusion
                                                             detection task on the given data set.
   Figure 3. NBTree v/s VFI - Selected Attributes




                       • MACHINE LEARNING           • COMPUTER SECURITY             • EVALUATION             • CONCLUSIONS
                                                        APPLICATIONS                                                                11
Royal Holloway series      Machine learning approaches


   5. CONCLUSIONS                                      assigning users to roles and vice-versa is            Ron Condon
   Based on the experiments done in this               described in [18]. Learning algorithms can also       UK bureau chief
   paper and their corresponding results, we           be used to develop applications which, for            searchsecurity.co.UK
   can state the following:                            instance, can check whether people in an              Ron Condon
      • Machine learning is an effective method-       organisation are adhering to the defined              has been writing
   ology which can be used in the field of com-        security policy.                                      about develop-
   puter security.                                        • It is possible to analyse huge quantities        ments in the IT
                                                                                                             industry for
      • The inherent nature of machine learning        of audit data by using machine learning               more than 30
   algorithms makes them more suited to the            techniques, which is otherwise an extremely           years. In that
   intrusion detection field of information secu-      difficult task.                                       time, he has
                                                                                                             charted the evolution from big main-
   rity. However, it is not limited to intrusion                                                             frames, to minicomputers and PCs in
   detection. The authors in [10] have developed          Finally it can be said that in order to realise    the 1980s, and the rise of the Internet
   a tool using machine learning to infer access       the full potential of machine learning to the         over the last decade or so. In recent
   control policies where policy requests and          field of computer security, it is essential to        years he has specialized in information
                                                                                                             security. He has edited daily, weekly
   responses are generated by using learning           experiment with various machine learning              and monthly publications, and has
   algorithms. These are effective with new pol-       schemes towards addressing security-related           written for national and regional
   icy specification languages like XACML [13].        problems and choose the one which is the              newspapers, in Europe and the U.S.
   Similarly, a classifier-based approach to           most appropriate to the problem at hand. m




                  • MACHINE LEARNING          • COMPUTER SECURITY                • EVALUATION               • CONCLUSIONS
                                                  APPLICATIONS                                                                                         12
Royal Holloway series     Machine learning approaches



   REFERENCES
   [1] CERT Vulnerability Statistics 1995 - 2006. http://www.cert.org/stats/vulnerability_
   remediation.html, 2007.
   [2] G. Demiroz and H. A. Guvenir. Classification by voting feature intervals. In European Conference on Machine Learning,
   pages 85-92, 1997.
   [3] T. Dietterich and P. Langley. Machine learning for cognitive networks:technology assessments and research challenges,
   Draft of May 11, 2003. http://web.engr.oregonstate.edu/_tgd/kp/dl-report.pdf ,
   2003.
   [4] M. A. Hall. Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Department of
   Computer Science, 1999.
   [5] R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the
   Fourteenth International Joint Conference on Artificial Intelligence, pages 1137-1145, 1995.
   [6] R. Kohavi. Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the Second Inter-
   national Conference on Knowledge Discovery and Data Mining, pages 202-207, 1996.
   [7] M. Kubat and S. Matwin. Addressing the curse of imbalanced training sets: one-sided selection. In Proc. 14th Interna-
   tional Conference on Machine Learning, pages 179-186. Morgan Kaufmann, 1997.
   [8] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel.
   Performance measures for information extraction. http://www.nist.gov/speech/publications/darpa99/html/dir10/dir10.htm,
   1999.
   [9] M. A. Maloof, editor. Machine Learning and Data Mining for Computer Security. Springer, 2006.
   [10] E. Martin and T. Xie. Inferring access-control policy properties via machine learning. In POLICY ‘06: Proceedings of the
   Seventh IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY’06), pages 235-238,
   Washington, DC, USA, 2006. IEEE Computer Society.
   [11] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors. Machine Learning: An Artificial Intelligence Approach. Tioga


                 • MACHINE LEARNING           • COMPUTER SECURITY              • EVALUATION                • CONCLUSIONS
                                                  APPLICATIONS                                                                        13
Royal Holloway series      Machine learning approaches


   Publishing Company, 1983.
   [12] T. M. Mitchell. Machine Learning. McGraw Hill, 1997.
   [13] T. Mose. Oasis, extensible access control markup language, (xacml) version 2.0. http://docs.oasis-
   open.org/xacml/2.0/access_control-xacml-2.0-core-spec-os.pdf, 2005.
   [14] N. J. Nilsson. Introduction to Machine Learning - an early draft of a proposed book.
   http://ai.stanford.edu/_nilsson/MLDraftBook/MLBOOK.pdf, 1996.
   [15] U. Of California. The UCI KDD Archive, University of California. http://kdd.ics.uci.edu/databases/kddcup99/task.html,
   1999.
   [16] T. Pietraszek. Using adaptive alert classification to reduce false positives in intrusion detection. Recent Advances in
   Intrusion Detection, 3224:102-124, 2004.
   [17] S. V. Sabnani. Computer security: A machine learning approach. Master’s thesis, Royal Holloway, University of London,
   2007.
   [18] S. Sheng and S. L. Osborn. A classifier-based approach to user-role assignment for web applications. In Secure Data
   Management, pages 163-171, 2004.
   [19] S. Tesink. Improving intrusion detection systems through machine learning. http://ilk.uvt.nl/downloads/pub/papers/the-
   sis-tesink.pdf, 2007.
   [20] I. H. Witten and E. Frank. Data Mining - Practical MachineLearning Tools and Techniques, Second Edition. Elsevier,
   2005.
   [21] S. Wolthusen. Lecture 11 - Intrusion Detection and Prevention, Notes on Network Security, Royal Holloway University
   of London, 2006.




                  • MACHINE LEARNING          • COMPUTER SECURITY             • EVALUATION                • CONCLUSIONS
                                                  APPLICATIONS                                                                    14

More Related Content

What's hot

Security for Hard AI Problems Using CaRP Authentication
Security for Hard AI Problems Using CaRP AuthenticationSecurity for Hard AI Problems Using CaRP Authentication
Security for Hard AI Problems Using CaRP Authenticationpaperpublications3
 
Online Intrusion Alert Aggregation with Generative Data Stream Modeling
Online Intrusion Alert Aggregation with Generative Data Stream  ModelingOnline Intrusion Alert Aggregation with Generative Data Stream  Modeling
Online Intrusion Alert Aggregation with Generative Data Stream ModelingIJMER
 
Soft computing and artificial intelligence techniques for intrusion
Soft computing and artificial intelligence techniques for intrusionSoft computing and artificial intelligence techniques for intrusion
Soft computing and artificial intelligence techniques for intrusionAlexander Decker
 
Yolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda Chiramba
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeCSCJournals
 
Problem and Improvement of the Composition Documents for Smart Card Composed ...
Problem and Improvement of the Composition Documents for Smart Card Composed ...Problem and Improvement of the Composition Documents for Smart Card Composed ...
Problem and Improvement of the Composition Documents for Smart Card Composed ...Seungjoo Kim
 
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM
Wmn06MODERNIZED INTRUSION DETECTION USING  ENHANCED APRIORI ALGORITHM Wmn06MODERNIZED INTRUSION DETECTION USING  ENHANCED APRIORI ALGORITHM
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM ijwmn
 
An Approach for Securing Voice Communication via Image Watermarking Technique
An Approach for Securing Voice Communication via Image Watermarking TechniqueAn Approach for Securing Voice Communication via Image Watermarking Technique
An Approach for Securing Voice Communication via Image Watermarking TechniqueCSCJournals
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionGrijesh Chauhan
 
IRJET- Improving Cyber Security using Artificial Intelligence
IRJET- Improving Cyber Security using Artificial IntelligenceIRJET- Improving Cyber Security using Artificial Intelligence
IRJET- Improving Cyber Security using Artificial IntelligenceIRJET Journal
 
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTIONCOMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTIONIJNSA Journal
 
Progress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection SystemsProgress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection Systemsijcisjournal
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

What's hot (18)

Security for Hard AI Problems Using CaRP Authentication
Security for Hard AI Problems Using CaRP AuthenticationSecurity for Hard AI Problems Using CaRP Authentication
Security for Hard AI Problems Using CaRP Authentication
 
Online Intrusion Alert Aggregation with Generative Data Stream Modeling
Online Intrusion Alert Aggregation with Generative Data Stream  ModelingOnline Intrusion Alert Aggregation with Generative Data Stream  Modeling
Online Intrusion Alert Aggregation with Generative Data Stream Modeling
 
Soft computing and artificial intelligence techniques for intrusion
Soft computing and artificial intelligence techniques for intrusionSoft computing and artificial intelligence techniques for intrusion
Soft computing and artificial intelligence techniques for intrusion
 
Yolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda chiramba Survey Paper
Yolinda chiramba Survey Paper
 
Ijetr012045
Ijetr012045Ijetr012045
Ijetr012045
 
IDS / IPS Survey
IDS / IPS SurveyIDS / IPS Survey
IDS / IPS Survey
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision Tree
 
Problem and Improvement of the Composition Documents for Smart Card Composed ...
Problem and Improvement of the Composition Documents for Smart Card Composed ...Problem and Improvement of the Composition Documents for Smart Card Composed ...
Problem and Improvement of the Composition Documents for Smart Card Composed ...
 
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM
Wmn06MODERNIZED INTRUSION DETECTION USING  ENHANCED APRIORI ALGORITHM Wmn06MODERNIZED INTRUSION DETECTION USING  ENHANCED APRIORI ALGORITHM
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM
 
Ijmet 10 01_020
Ijmet 10 01_020Ijmet 10 01_020
Ijmet 10 01_020
 
Research Overview
Research OverviewResearch Overview
Research Overview
 
An Approach for Securing Voice Communication via Image Watermarking Technique
An Approach for Securing Voice Communication via Image Watermarking TechniqueAn Approach for Securing Voice Communication via Image Watermarking Technique
An Approach for Securing Voice Communication via Image Watermarking Technique
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and Detection
 
A45010107
A45010107A45010107
A45010107
 
IRJET- Improving Cyber Security using Artificial Intelligence
IRJET- Improving Cyber Security using Artificial IntelligenceIRJET- Improving Cyber Security using Artificial Intelligence
IRJET- Improving Cyber Security using Artificial Intelligence
 
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTIONCOMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
 
Progress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection SystemsProgress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection Systems
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

Viewers also liked

Machine Learning for Threat Detection
Machine Learning for Threat DetectionMachine Learning for Threat Detection
Machine Learning for Threat DetectionNapier University
 
Jim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for CybersecurityJim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for Cybersecurityidsecconf
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learningSandeep Sabnani
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Saksham Agrawal
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberOWASP Delhi
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningjaumebp
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...butest
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningSabidur Rahman
 
Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011Pluribus One
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksJoan Capdevila Pujol
 
Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)Yoshihiro Iwanaga
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical jimmy9090909
 
Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer securityFFRI, Inc.
 
IoT Mobility Forensics
IoT Mobility ForensicsIoT Mobility Forensics
IoT Mobility ForensicsSabidur Rahman
 
Machine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityMachine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityIAEME Publication
 
AI vs MALWARE - La nuova generazione della sicurezza difensiva
AI vs MALWARE - La nuova generazione della sicurezza difensivaAI vs MALWARE - La nuova generazione della sicurezza difensiva
AI vs MALWARE - La nuova generazione della sicurezza difensivaVirginie Pasquon
 
KM technologies and strategy
KM technologies and strategyKM technologies and strategy
KM technologies and strategyAndre Saito
 
Airline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAirline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAyman Qaddumi
 

Viewers also liked (20)

Machine Learning for Threat Detection
Machine Learning for Threat DetectionMachine Learning for Threat Detection
Machine Learning for Threat Detection
 
Jim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for CybersecurityJim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for Cybersecurity
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learning
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed Zuber
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learning
 
Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social Networks
 
Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical
 
Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer security
 
IoT Mobility Forensics
IoT Mobility ForensicsIoT Mobility Forensics
IoT Mobility Forensics
 
Machine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityMachine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber security
 
AI vs MALWARE - La nuova generazione della sicurezza difensiva
AI vs MALWARE - La nuova generazione della sicurezza difensivaAI vs MALWARE - La nuova generazione della sicurezza difensiva
AI vs MALWARE - La nuova generazione della sicurezza difensiva
 
KM technologies and strategy
KM technologies and strategyKM technologies and strategy
KM technologies and strategy
 
Airline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAirline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learning
 

Similar to Computer security - A machine learning approach

A novel secure e contents system for multi-media interchange workflows in e-l...
A novel secure e contents system for multi-media interchange workflows in e-l...A novel secure e contents system for multi-media interchange workflows in e-l...
A novel secure e contents system for multi-media interchange workflows in e-l...IJCNCJournal
 
A Live Virtual Simulator for Teaching Cybersecurity
A Live Virtual Simulator for Teaching CybersecurityA Live Virtual Simulator for Teaching Cybersecurity
A Live Virtual Simulator for Teaching Cybersecuritysaman zaker
 
An Overview of Information Systems Security Measures in Zimbabwean Small and ...
An Overview of Information Systems Security Measures in Zimbabwean Small and ...An Overview of Information Systems Security Measures in Zimbabwean Small and ...
An Overview of Information Systems Security Measures in Zimbabwean Small and ...researchinventy
 
Deterring hacking strategies via
Deterring hacking strategies viaDeterring hacking strategies via
Deterring hacking strategies viaIJNSA Journal
 
DETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIES
DETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIESDETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIES
DETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIESIJNSA Journal
 
SECURE CLOUD ARCHITECTURE
SECURE CLOUD ARCHITECTURESECURE CLOUD ARCHITECTURE
SECURE CLOUD ARCHITECTUREacijjournal
 
Investigative analysis of security issues and challenges in cloud computing a...
Investigative analysis of security issues and challenges in cloud computing a...Investigative analysis of security issues and challenges in cloud computing a...
Investigative analysis of security issues and challenges in cloud computing a...IAEME Publication
 
Network Security Research Paper
Network Security Research PaperNetwork Security Research Paper
Network Security Research PaperPankaj Jha
 
A Novel Framework to Improve Secure Digital Library at Cloud Environment
A Novel Framework to Improve Secure Digital Library at Cloud EnvironmentA Novel Framework to Improve Secure Digital Library at Cloud Environment
A Novel Framework to Improve Secure Digital Library at Cloud EnvironmentIJCSIS Research Publications
 
Widget and Smart Devices. A Different Approach for Remote and Virtual labs
Widget and Smart Devices. A Different Approach for Remote and Virtual labsWidget and Smart Devices. A Different Approach for Remote and Virtual labs
Widget and Smart Devices. A Different Approach for Remote and Virtual labsUNED
 
An organized and Secured Local Area Network in Naval Post Graduate School
An organized and Secured Local Area Network in Naval Post Graduate SchoolAn organized and Secured Local Area Network in Naval Post Graduate School
An organized and Secured Local Area Network in Naval Post Graduate SchoolJude Rainer
 
A Security Analysis Framework Powered by an Expert System
A Security Analysis Framework Powered by an Expert SystemA Security Analysis Framework Powered by an Expert System
A Security Analysis Framework Powered by an Expert SystemCSCJournals
 
7132019 Originality Reporthttpsucumberlands.blackboar.docx
7132019 Originality Reporthttpsucumberlands.blackboar.docx7132019 Originality Reporthttpsucumberlands.blackboar.docx
7132019 Originality Reporthttpsucumberlands.blackboar.docxsleeperharwell
 
Deep self-taught learning framework for intrusion detection in cloud computin...
Deep self-taught learning framework for intrusion detection in cloud computin...Deep self-taught learning framework for intrusion detection in cloud computin...
Deep self-taught learning framework for intrusion detection in cloud computin...IAESIJAI
 
Dismantling intrusion prevention_systems
Dismantling intrusion prevention_systemsDismantling intrusion prevention_systems
Dismantling intrusion prevention_systemsOlli-Pekka Niemi
 
Information security approach in open distributed multi agent virtual learnin...
Information security approach in open distributed multi agent virtual learnin...Information security approach in open distributed multi agent virtual learnin...
Information security approach in open distributed multi agent virtual learnin...ijcsit
 
SYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITY
SYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITYSYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITY
SYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITYIJNSA Journal
 
An overview of cyber security data science from a perspective of machine lear...
An overview of cyber security data science from a perspective of machine lear...An overview of cyber security data science from a perspective of machine lear...
An overview of cyber security data science from a perspective of machine lear...PhD Assistance
 

Similar to Computer security - A machine learning approach (20)

A novel secure e contents system for multi-media interchange workflows in e-l...
A novel secure e contents system for multi-media interchange workflows in e-l...A novel secure e contents system for multi-media interchange workflows in e-l...
A novel secure e contents system for multi-media interchange workflows in e-l...
 
B018211016
B018211016B018211016
B018211016
 
A Live Virtual Simulator for Teaching Cybersecurity
A Live Virtual Simulator for Teaching CybersecurityA Live Virtual Simulator for Teaching Cybersecurity
A Live Virtual Simulator for Teaching Cybersecurity
 
Wise6 davidson
Wise6 davidsonWise6 davidson
Wise6 davidson
 
An Overview of Information Systems Security Measures in Zimbabwean Small and ...
An Overview of Information Systems Security Measures in Zimbabwean Small and ...An Overview of Information Systems Security Measures in Zimbabwean Small and ...
An Overview of Information Systems Security Measures in Zimbabwean Small and ...
 
Deterring hacking strategies via
Deterring hacking strategies viaDeterring hacking strategies via
Deterring hacking strategies via
 
DETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIES
DETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIESDETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIES
DETERRING HACKING STRATEGIES VIA TARGETING SCANNING PROPERTIES
 
SECURE CLOUD ARCHITECTURE
SECURE CLOUD ARCHITECTURESECURE CLOUD ARCHITECTURE
SECURE CLOUD ARCHITECTURE
 
Investigative analysis of security issues and challenges in cloud computing a...
Investigative analysis of security issues and challenges in cloud computing a...Investigative analysis of security issues and challenges in cloud computing a...
Investigative analysis of security issues and challenges in cloud computing a...
 
Network Security Research Paper
Network Security Research PaperNetwork Security Research Paper
Network Security Research Paper
 
A Novel Framework to Improve Secure Digital Library at Cloud Environment
A Novel Framework to Improve Secure Digital Library at Cloud EnvironmentA Novel Framework to Improve Secure Digital Library at Cloud Environment
A Novel Framework to Improve Secure Digital Library at Cloud Environment
 
Widget and Smart Devices. A Different Approach for Remote and Virtual labs
Widget and Smart Devices. A Different Approach for Remote and Virtual labsWidget and Smart Devices. A Different Approach for Remote and Virtual labs
Widget and Smart Devices. A Different Approach for Remote and Virtual labs
 
An organized and Secured Local Area Network in Naval Post Graduate School
An organized and Secured Local Area Network in Naval Post Graduate SchoolAn organized and Secured Local Area Network in Naval Post Graduate School
An organized and Secured Local Area Network in Naval Post Graduate School
 
A Security Analysis Framework Powered by an Expert System
A Security Analysis Framework Powered by an Expert SystemA Security Analysis Framework Powered by an Expert System
A Security Analysis Framework Powered by an Expert System
 
7132019 Originality Reporthttpsucumberlands.blackboar.docx
7132019 Originality Reporthttpsucumberlands.blackboar.docx7132019 Originality Reporthttpsucumberlands.blackboar.docx
7132019 Originality Reporthttpsucumberlands.blackboar.docx
 
Deep self-taught learning framework for intrusion detection in cloud computin...
Deep self-taught learning framework for intrusion detection in cloud computin...Deep self-taught learning framework for intrusion detection in cloud computin...
Deep self-taught learning framework for intrusion detection in cloud computin...
 
Dismantling intrusion prevention_systems
Dismantling intrusion prevention_systemsDismantling intrusion prevention_systems
Dismantling intrusion prevention_systems
 
Information security approach in open distributed multi agent virtual learnin...
Information security approach in open distributed multi agent virtual learnin...Information security approach in open distributed multi agent virtual learnin...
Information security approach in open distributed multi agent virtual learnin...
 
SYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITY
SYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITYSYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITY
SYSTEM END-USER ACTIONS AS A THREAT TO INFORMATION SYSTEM SECURITY
 
An overview of cyber security data science from a perspective of machine lear...
An overview of cyber security data science from a perspective of machine lear...An overview of cyber security data science from a perspective of machine lear...
An overview of cyber security data science from a perspective of machine lear...
 

Recently uploaded

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Recently uploaded (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

Computer security - A machine learning approach

  • 1. Computer Security: A Machine Learning Approach We analyze two learning algorithms, NBTree and VFI, for the task of detecting intrusions. SANDEEP V. SABNANI AND ANDREAS FUCHSBERGER Produced by the Information Security Group at Royal Holloway, University of London in conjunction with TechTarget. Copyright © 2008 TechTarget. All rights reserved.
  • 2. Royal Holloway series Machine learning approaches ABSTRACT Information Security is one of the key areas today as securing computers especially against novel attacks becomes a daunting task. Intrusion detection is a method by which unauthorised access to one’s assets is detected. In this paper, we present an application Sandeep V. of the field of machine learning to computer security, particularly to intrusion detection. Sabnani We analyse two learning algorithms (NBTree and VFI) for the task of detecting intrusions Information Security Group and compare their relative performances. We then comment on the suitability of NBTree Royal Holloway, University of London Egham, Surrey, TW20 0EX, algorithm over VFI for the intrusion detection task based on its high accuracy and high United Kingdom recall. We finally state the usefulness of machine learning to the field of computer security. s.sabnani@rhul.ac.uk Andreas Fuchsberger Computer Security: Information Security Group A Machine Learning Approach Royal Holloway, University of London Egham, Surrey, TW20 0EX, United Kingdom 1. INTRODUCTION There have been quite a few approaches a.fuchsberger@rhul.ac.uk Computer security has become a chal- which prevent and/or detect known attacks. lenging task these days with the rapid Novel or unknown attacks on the other hand This article was prepared by students and staff involved with the award-winning M.Sc. growth of the internet and the increasing are more difficult to detect and have received in Information Security offered by the complexity of communication protocols. considerable attention in the recent past. Information Security Group at Royal Holloway, New and complicated attack methods are Another important problem in the field of University of London. The student was judged to have produced an outstanding M.Sc. thesis being constantly developed by attackers computer security has been that of insider on a business-related topic. The full thesis thereby compromising the confidentiality, threats. Many of the insider threats may be is available as a technical report on the Royal Holloway website integrity and/or availability of one’s data. unintentional, nevertheless it has become http://www.ma.rhul.ac.uk/tech. CERT has reported 8064 new vulnerabili- essential to ensure that insider behaviour ties in the year 2006 and the number has is in sync with the security policy of the For more information about the Information Security Group at Royal Holloway or on the been increasing significantly over the past organisation. M.Sc. in Information Security, please visit few years [1]. These issues can prove to be quite expen- http://www.isg.rhul.ac.uk. • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 2
  • 3. Royal Holloway series Machine learning approaches sive for organisations to handle. The task 2. MACHINE LEARNING of detecting intrusions involves the formula- 2.1 Basic Concepts tion of efficient rules which require a high Learning can be described in many ways level of domain expertise and analysis of including acquisition of new knowledge, large amounts of data, which might make enhancement of existing knowledge, repre- the process slow and unreliable with human sentation of knowledge, organisation of experts. For checking compliance with a knowledge and discovery of facts through Every computer security policy, an administrator may have experiments [11]. When such learning is per- action can to be extremely cautious as normal user formed with the help of computer programs, be modeled behaviour may change over time. it is referred to as machine learning. Every computer action can be modeled as a function Machine learning is a field related to artifi- cial intelligence which deals with construct- as a function with sets of inputs and outputs. with sets of ing computer programs that automatically A learning task may be considered as the inputs and improve with experience [12]. The learning estimation of this function by observing the outputs. experience is provided in the form of data sets of inputs and outputs. (We use the term and actual learning is achieved with the help estimation as the exact function may not of algorithms. The two main tasks that are be determinate.) The function estimating addressed by machine learning are the process usually consists of a search in the ability to learn more about the given data hypothesis space (i.e. the space of all such and to make predictions about new data possible functions that might represent the based on learning outcomes from the learn- input and output sets under consideration). ing experience [9]; both of which are difficult The authors in [14] formally describe the and time-consuming for human analysts. function approximation process. Consider a Machine learning is thus, well-suited to set of input instances X = ( x1, x2, x3 . . . xn). problems that depend on rare, expensive Let f be a function which is to be guessed and unreliable human experts. by the learner. Let h be the learner’s This paper presents the intrusion detec- hypothesis about f. tion problem and a machine learning based Also, we assume a priori that both f and h solution to it. belong to a class of functions H. The function • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 3
  • 4. Royal Holloway series Machine learning approaches f maps the input instances in X as, an important issue for machine learning. The ∋ X h→H h(X) learning element may be trained in different ways [3]. For classification problems like intru- A machine learning task may thus be sion detection, knowledge may be learned in defined as a search in this space H. This a supervised, unsupervised or semi-super- search results in approximating the relevant vised manner. In this paper, we use super- h, based on the training instances (i.e. the vised learning in which the learner is provided The inputs and set X). with training examples with the associated outputs to a The approximation is then checked against classes or values for the attribute to be machine learning a set of test instances which are then used predicted. task may be of to indicate the correctness of h. The search different kinds. requires algorithms which are efficient and Choosing a learning which best-fit the training data [12]. algorithm 2.2 Inputs and Outputs The inputs and outputs to a machine Training the algorithm learning task may be of different kinds. using training data Generally, they are in the form of numeric or nominal attributes. For instance, an attribute Evaluating the algorithm like temperature if used as a numeric attrib- by running it on test data ute, may have values like 25o C, 28o C, etc. On the other hand, if it is used as a nominal Figure 1. Life Cycle of a Machine Learning Task attribute, it may take values from a fixed set (like high, medium, low). In many cases, the 2.4 Defining a Machine Learning Task output may also be a boolean value (like yes In general, a machine learning task can be and no). defined formally in terms of three elements, viz. the learning experience E, the tasks T 2.3 Production of Knowledge and the performance element P. The way in which knowledge is learned is TM Mitchell in Machine Learning [12] • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 4
  • 5. Royal Holloway series Machine learning approaches defines a learning task more precisely as becomes difficult to create separate sets follows: for training and testing. In such cases, some “A computer program is said to learn from data might be held over for testing, and the experience E with respect to some class of remaining used for training. This is called the tasks T and performance measure P, if its holdout procedure [20]. However, the data in performance at tasks in T, as measured this case might be distributed in an uneven by P, improves with experience E.” way in the training and test sets and might Once the not represent the output classes in the algorithm is 2.5 Life Cycle of a Machine correct proportions. Stratification [4] and selected, the Learning Task cross-validation [5] can be used to Figure 1 shows the life cycle of a machine next step is to circumvent this problem. learning task. Depending on the nature of train the algorithm the knowledge to be learned, different 2.6 Benefits of Machine Learning by providing it types of algorithms may be chosen at The field of machine learning has been with a set of different times. Also, the type of inputs found to be extremely useful in the following training instances. and outputs are also instrumental in areas relating to software engineering [12]: choosing an algorithm. 1. Data mining problems where large Once the algorithm is selected, the next databases may contain valuable implicit reg- step is to train the algorithm by providing it ularities that can be discovered automatically. with a set of training instances. The training 2. Difficult to understand domains where instances are used to build a model that rep- humans might not have the knowledge to resents the target concept to be learned (i.e. develop effective algorithms. the hypothesis). This model is then evaluated 3. Domains in which the program is using the set of test instances. required to adapt to dynamic conditions. In the case where a large amount of data is available, the general approach is to con- In the case of traditional intrusion detec- struct two independent sets, one for training tion systems, the alerts generated are and the other for testing. On the other hand, analysed by human analysts who evaluate if a limited amount of data is available, it them and take suitable actions. However, • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 5
  • 6. Royal Holloway series Machine learning approaches this is an extremely onerous task as the set for intrusion detection obtained from the number of alerts generated may be quite UCI machine learning repository. The KDD large and the environment may change Cup ’99 data set is a version of a data set continuously [16]. This makes machine used at the DARPA Intrusion Detection learning well suited for intrusion detection. Evaluation program (www.ll.mit.edu/IST/ ideval/data/data index.html). 3. MACHINE LEARNING APPLIED The data set consists of TCP dump data TO COMPUTER SECURITY for a simulated Air Force LAN. In addition to In addition to 3.1 Intrusion Detection as normal LAN simulation, attacks were also a Machine Learning Task simulated and the corresponding TCP data normal LAN A machine learning task can be formally was captured. The attacks were launched simulation, defined as shown in section 2.4. We use on three UNIX machines, Windows NT attacks were also this notation to formulate intrusion detection hosts and a router along with background simulated and the as a machine learning task. traffic. Every record in the data set repre- corresponding Thus, for intrusion detection, we have, sents a TCP connection. Each connection 1. Task: To detect intrusions in an TCP data was was labeled as normal or as a specific accurate manner. attack type [15]. The attacks fall into one captured. 2. Experience: A dataset with instances of the following categories: representing normal as well as attack data. • DOS attacks (Denial of Service attacks) 3. Performance Measure: Accuracy • R2L attacks (unauthorised access from in terms of correct classification of intru- a remote machine) sion events and normal events and other • U2R attacks (unauthorised access to statistical metrics including precision, super user privileges) recall, F-measure and kappa statistic • Probing attacks which are described in section 3.5. A detailed description and format of the 3.2 Data Set Description dataset can be found in [17]. The data set used for evaluation in this paper is a subset of the KDD Cup ’99 data • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 6
  • 7. Royal Holloway series Machine learning approaches 3.3 Algorithms 3.3.2 VFI The following algorithms were used in the The VFI4 algorithm is a classification experiments carried out for this paper. algorithm based on the voting frequency intervals. In VFI, each training instance is 3.3.1 NBTree represented as a vector of features along The NBTree algorithm is a hybrid between with a label that represents the class of decision-tree classifiers and Naive Bayes the instance. Feature intervals are then The NBTree classifiers. It represents the learned knowl- constructed for each feature. An interval algorithm is a edge in the form of a tree which is constructed represents a set of values for a given feature hybrid between recursively. However, the leaf nodes are where the same subset of class values Naive Bayes categorizers rather than nodes are observed. Thus, two adjacent intervals decision-tree predicting a single class [6]. For continuous represent different classes. classifiers and attributes, a threshold is chosen so as to A detailed explanation of both the above Naive Bayes limit the entropy measure. The utility of a algorithms can be found in [17]. classifiers. node is evaluated by discretizing the data and computing the fivefold cross-validation 3.4 Experimental Analysis accuracy estimation using Naive Bayes at The experiments done in this paper the node. The utility of the split is the weighted consist of the evaluation of the performance sum of utility of the nodes and this depends of NBTree and VFI algorithms for the task on the number of instances that go through of classifying novel intrusions. The dataset that node. The NBTree algorithm tries to described in section 3.2 was used in the approximate whether the generalisation experiments. accuracy of Naive Bayes at each leaf is Weka [20], a machine learning toolkit was higher than a single Naive Bayes classifier used for the implementation of the algo- at the current node. A split is said to be rithms described in sections 3.3.1 and significant if the relative reduction in error 3.3.2. Due to the limitation in the available is greater that 5% and there are at least memory and processing power, it was not 30 instances in the node [6]. possible to use the full dataset described in section 3.2. Instead a reduced subset was • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 7
  • 8. Royal Holloway series Machine learning approaches used and 10-fold cross-validation (explained negative. The value b is referred to as a false in section 2.5) was used to overcome this negative and c is known as false positive. limitation. In the context of intrusion detection, a true positive is an instance which is normal and 3.5 Evaluation Metrics is also classified as normal by the intrusion In order to analyse and compare the detector. A true negative is an instance performance of the above mentioned which is an attack and is classified as an In the context algorithms, metrics like the classification attack. of intrusion accuracy, precision, recall, F-Measure and detection, a kappa statistic were used. These metrics are 3.5.1 Classification Accuracy derived from a basic data structure known Classification accuracy is the most basic true positive is as the confusion matrix. A sample confusion measure of the performance of a learning an instance which matrix for a two-class problem is shown in method. It determines the percentage of is normal and is Table 1. correctly classified instances. From the also classified confusion matrix, we can say that: as normal by the Predicted Predicted intrusion detector. Accuracy = a+d Class Class Positive Negative a+b+c+d Actual Class a b Positive This metric gives the number of instances from the dataset which are classified correctly Actual Class c d i.e. the ratio of true positives and true nega- Negative tives to the total number of instances. Table 1. Confusion Matrix for a two-class 3.5.2 Precision, Recall and F-Measure problem (Expected predictions) Precision gives the percentage of slots in the hypothesis that are correct, whereas In this confusion matrix, the value a is called recall gives the percentage of reference a true positive and the value d is called a true slots for which the hypothesis is correct. • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 8
  • 9. Royal Holloway series Machine learning approaches Referring from the confusion matrix, we figure and deducts it from the predictor’s can define precision and recall for our success and expresses the result as a pro- purposes as [19]: portion of the total for a perfect predictor [20]. Precision = a In addition to the above statistical metrics, a+c the time taken to build the model was also Recall = a considered as a performance indicator. a+b 3.6 Attribute Selection The precision of an intrusion detection Attribute Selection is the process of learner would thus indicate the proportion of identifying and removing much of the redun- correctly classified positive instances to the dant and irrelevant information possible. The total number of predicted positive instances experiments conducted in this paper use the and recall would indicate the proportion of information gain attribute selection method correctly classified positive instances to the which is described in [17]. total number of actual positive instances. The F-measure is another metric defined 3.7 Summary of Experiments as the weighted harmonic mean of precision Based on the above algorithms, attribute and recall [8] to address a problem identified selection methods and the type of cross-vali- in [7], which may be present in any classifica- dation, the following table 2 shows a summary tion scenario. of the experiments conducted. F-measure = 2*Precision*Recall Feature Reduction Cross Precision+Recall Algorithm Method Validation NBTree — 10-fold 3.5.3 Kappa Statistic The Kappa statistic is used to measure VFI — 10-fold the agreement between predicted and NBTree Information Gain 10-fold observed categorizations of a dataset, while VFI Information Gain 10-fold correcting for agreements that occur by chance. It takes into account the expected Table 2. Summary of Experiments • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 9
  • 10. Royal Holloway series Machine learning approaches 4. EVALUATION Metric Value The results of the experiments described Time taken to in section 3.4 are discussed in this section. build the model 0.92s A comparison between NBTree and VFI Accuracy 86.58 % methods is also made based on the values Average Precision 41.27 % of the metrics defined in section 3.5. Average Recall 80.54 % For an IDS, For an IDS, the accuracy indicates how Average F-Measure 44.05 % the accuracy correct the algorithm is in identifying normal Kappa Statistic 79.50 % indicates how and adversary behaviour. Table 5. Results of VFI with all attributes correct the Metric Value algorithm is in Metric Value Time taken to Time taken to identifying normal build the model 1115.05s and adversary build the model 0.2s Accuracy 99.94 % Average Precision 90.33 % Accuracy 75.81 % behaviour. Average Precision 35.71 % Average Recall 92.72 % Average Recall 75.82 % Average F-Measure 91.14 % Average F-Measure 37.43 % Kappa Statistic 99.99 % Kappa Statistic 66.21 % Table 3. Results of NBTree with all attributes Table 6. Results of VFI with selected attributes using Metric Value information gain measure Time taken to build the model 38.97s Recall would indicate the proportion of Accuracy 99.89 % correctly classified normal instances from Average Precision 94.54 % the total number of actual normal instances Average Recall 90.84 % whereas precision would indicate the num- Average F-Measure 92.28 % ber of correctly classified normal instances Kappa Statistic 99.82 % from the total number of instances identified Table 4. Results of NBTree with selected attributes using as normal by the IDS. The Kappa statistic information gain measure is a general statistical indicator and the • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 10
  • 11. Royal Holloway series Machine learning approaches F-Measure is related to the problem men- The graphs in Figures 2 and 3 show a rel- tioned in section 3.5.2. In addition to these, ative performance of NBTree and VFI for the the time taken by the learning algorithm for intrusion detection task on the dataset under model construction is also important as it consideration. Figure 2 shows the compari- may have to handle extremely large amounts son with all attributes under consideration of data. and figure 3 depicts the comparison for attributes selected using the information There are tremen- gain measure. dous differences As per the definitions in section 3.5.2, in the precision a good IDS should have a recall that is as and recall values high as possible. A high precision is also desired. From our results, we see that the of NBTree and classification accuracy of NBTree is better VFI where the than that of VFI in both cases. There are NBTree exhibits tremendous differences in the precision and a relatively higher recall values of NBTree and VFI where the precision and Figure 2. NBTree v/s VFI - All Attributes NBTree exhibits a relatively higher precision and higher recall. Also, when all attributes higher recall. are used, NBTree has a lower precision value than the case when selected attributes are used. In both these cases, the recall is more or less the same. Also, the F- Measure value is high for NBTree in comparison to VFI. NBTree is seen to have a better performance as compared to the VFI in both the cases and it can thus be said that it is more suited to the intrusion detection task on the given data set. Figure 3. NBTree v/s VFI - Selected Attributes • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 11
  • 12. Royal Holloway series Machine learning approaches 5. CONCLUSIONS assigning users to roles and vice-versa is Ron Condon Based on the experiments done in this described in [18]. Learning algorithms can also UK bureau chief paper and their corresponding results, we be used to develop applications which, for searchsecurity.co.UK can state the following: instance, can check whether people in an Ron Condon • Machine learning is an effective method- organisation are adhering to the defined has been writing ology which can be used in the field of com- security policy. about develop- puter security. • It is possible to analyse huge quantities ments in the IT industry for • The inherent nature of machine learning of audit data by using machine learning more than 30 algorithms makes them more suited to the techniques, which is otherwise an extremely years. In that intrusion detection field of information secu- difficult task. time, he has charted the evolution from big main- rity. However, it is not limited to intrusion frames, to minicomputers and PCs in detection. The authors in [10] have developed Finally it can be said that in order to realise the 1980s, and the rise of the Internet a tool using machine learning to infer access the full potential of machine learning to the over the last decade or so. In recent control policies where policy requests and field of computer security, it is essential to years he has specialized in information security. He has edited daily, weekly responses are generated by using learning experiment with various machine learning and monthly publications, and has algorithms. These are effective with new pol- schemes towards addressing security-related written for national and regional icy specification languages like XACML [13]. problems and choose the one which is the newspapers, in Europe and the U.S. Similarly, a classifier-based approach to most appropriate to the problem at hand. m • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 12
  • 13. Royal Holloway series Machine learning approaches REFERENCES [1] CERT Vulnerability Statistics 1995 - 2006. http://www.cert.org/stats/vulnerability_ remediation.html, 2007. [2] G. Demiroz and H. A. Guvenir. Classification by voting feature intervals. In European Conference on Machine Learning, pages 85-92, 1997. [3] T. Dietterich and P. Langley. Machine learning for cognitive networks:technology assessments and research challenges, Draft of May 11, 2003. http://web.engr.oregonstate.edu/_tgd/kp/dl-report.pdf , 2003. [4] M. A. Hall. Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Department of Computer Science, 1999. [5] R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1137-1145, 1995. [6] R. Kohavi. Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the Second Inter- national Conference on Knowledge Discovery and Data Mining, pages 202-207, 1996. [7] M. Kubat and S. Matwin. Addressing the curse of imbalanced training sets: one-sided selection. In Proc. 14th Interna- tional Conference on Machine Learning, pages 179-186. Morgan Kaufmann, 1997. [8] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel. Performance measures for information extraction. http://www.nist.gov/speech/publications/darpa99/html/dir10/dir10.htm, 1999. [9] M. A. Maloof, editor. Machine Learning and Data Mining for Computer Security. Springer, 2006. [10] E. Martin and T. Xie. Inferring access-control policy properties via machine learning. In POLICY ‘06: Proceedings of the Seventh IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY’06), pages 235-238, Washington, DC, USA, 2006. IEEE Computer Society. [11] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors. Machine Learning: An Artificial Intelligence Approach. Tioga • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 13
  • 14. Royal Holloway series Machine learning approaches Publishing Company, 1983. [12] T. M. Mitchell. Machine Learning. McGraw Hill, 1997. [13] T. Mose. Oasis, extensible access control markup language, (xacml) version 2.0. http://docs.oasis- open.org/xacml/2.0/access_control-xacml-2.0-core-spec-os.pdf, 2005. [14] N. J. Nilsson. Introduction to Machine Learning - an early draft of a proposed book. http://ai.stanford.edu/_nilsson/MLDraftBook/MLBOOK.pdf, 1996. [15] U. Of California. The UCI KDD Archive, University of California. http://kdd.ics.uci.edu/databases/kddcup99/task.html, 1999. [16] T. Pietraszek. Using adaptive alert classification to reduce false positives in intrusion detection. Recent Advances in Intrusion Detection, 3224:102-124, 2004. [17] S. V. Sabnani. Computer security: A machine learning approach. Master’s thesis, Royal Holloway, University of London, 2007. [18] S. Sheng and S. L. Osborn. A classifier-based approach to user-role assignment for web applications. In Secure Data Management, pages 163-171, 2004. [19] S. Tesink. Improving intrusion detection systems through machine learning. http://ilk.uvt.nl/downloads/pub/papers/the- sis-tesink.pdf, 2007. [20] I. H. Witten and E. Frank. Data Mining - Practical MachineLearning Tools and Techniques, Second Edition. Elsevier, 2005. [21] S. Wolthusen. Lecture 11 - Intrusion Detection and Prevention, Notes on Network Security, Royal Holloway University of London, 2006. • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 14