SlideShare une entreprise Scribd logo
1  sur  41
Muffler: An Approach Using Mutation
to Facilitate Fault Localization
                                                        Tao He
                                              elfinhe@gmail.com
           Department of Computer Science, Sun Yat-Sen University
          Department of Computer Science and Engineering, HKUST

                                               Group Discussion
                                                  February 2012
                                       HKUST, Hong Kong, China




                                                              1/34
Outline

   Background
   Motivation
   Why does our approach work?
   Our Approach – Muffler
   Empirical Evaluation
   Conclusion



                                  2/34
Background

   Coverage-Based Fault Localization (CBFL)
       Input
          Coverage
          Testing results (passed or failed)

       Output
            A ranking list of statements
       Ranking functions
            Most CBFL techniques are similar with each other
             except that different ranking functions are used to
             compute suspiciousness.

                                                                   3/34
What is the limitation of existing
CBFL techniques?




                                     4/34
Motivation

                     One fundamental assumption [YPW08] of CBFL
                           The observed behaviors from passed runs can precisely
                            represent the correct behaviors of this program;
                           and the observed behaviors from failed runs can represent the
                            infamous behaviors.
                           Therefore, the different observed behaviors of program
                            entities between passed runs and failed runs will indicate the
                            fault’s location.
                           But this does not always hold.


[YPW08] C. Yilmaz, A. Paradkar, and C. Williams. Time will tell: fault localization using time spectra. In Proceedings
of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 81-90. 2008.
                                                                                                            5/34
Motivation
            Coincidental Correctness (CC)
                 “No failure is detected, even though a fault has been executed.” [RT93]
                 i.e., the passed runs may cover the fault.
            Weaken the first part of CBFL‟s assumption:
                 The observed behaviors from passed runs can precisely represent
                  the correct behaviors of this program;
                 More, CC occurs frequently in practice.[MAE+09]



[RT93] D.J. Richardson and M.C. Thompson, An analysis of test data selection criteria using the RELAY model of
fault detection, Software Engineering, IEEE Transactions on, vol. 19, (no. 6), pp. 533-553, 1993.
[MAE+09] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi, An empirical study of the factors that reduce the
effectiveness of coverage-based fault localization, in Proceedings of the 2nd International Workshop on Defects in
Large Software Systems: Held in conjunction with the ACM SIGSOFT International Symposium on Software Testing
                                                                                                             6/34
and Analysis (ISSTA 2009), pp. 1-5, 2009.
Our goal is to address the CC issue via mutation analysis
What is the idea?




                                                            7/34
Why does our approach work?
- Key hypothesis
   Mutating the faulty statement tends to maintain the
    results of passed test cases.
   By contrast, mutating a correct statement tends to
    change the results of passed test cases (from passed to
    failed).




                                                         8/34
Why does our approach work?
- Three comprehensive scenarios (1/3)
  - If we mutate an M in different basic blocks with F
   Test cases


                                                            Passed


   Program                                                  Failed

                   F          M


                                                    M: Mutant point
   Test results                                     F: Fault point



      3 test results change from passed to failed
                                                                     9/34
Why does our approach work?
- Three comprehensive scenarios (1/3)
  - If we mutate an M in different basic blocks with F
   Test cases


                                                            Passed
                         M
   Program                                                  Failed

                   F


                                                    M: Mutant point
   Test results                                     F: Fault point



      3 test results change from passed to failed
                                                                  10/34
Why does our approach work?
- Three comprehensive scenarios (1/3)
                                        - If we mutate F
   Test cases


                                                            Passed


   Program                                                  Failed

                   F +M


                                                    M: Mutant point
   Test results                                     F: Fault point



      0 test result changes from passed to failed
                                                                     11/34
Why does our approach work?
- Three comprehensive scenarios (2/3)
         - If we mutate an M in the same basic block with F
   Test cases                      Due to different data flow to affect output


                                                               Passed

                         F
   Program                                                     Failed
                         M

                                                     M: Mutant point
                                                     F: Fault point


                                                         Control Flow
   Test results

      3 test results change from passed to failed        Data Flow
                                                                      12/34
Why does our approach work?
- Three comprehensive scenarios (2/3)
                                        - If we mutate F
   Test cases


                                                           Passed

                         F +M
   Program                                                 Failed


                                                   M: Mutant point
                                                   F: Fault point


                                                      Control Flow
   Test results

      0 test result change from passed to failed      Data Flow
                                                                  13/34
Why does our approach work?
- Three comprehensive scenarios (3/3)
                                    - When CC occurs frequently
   Test cases                       - If we mutate F
                                    Due to weak ability to affect output

                                                                 Passed


   Program                                                       Failed

                   F +M
                                                       M: Mutant point
                                                       F: Fault point

   Test results                                     Weak ability to generate
                                                    an infectious state or to
                                                    propagate the infectious
                                                    state to output
      0 test result changes from passed to failed
                                                                       14/34
Does this work in real programs?




                                   15/34
Why does our approach work?
1000
            - A feasibility study                                               2500

                                                           2000
 800                            800                                             2000


                                                           1500
 600                            600                                             1500


                                400                        1000
 400                                                                            1000


 200                            200                         500                  500


   0                              0                           0                    0
               tcas v7                    tot_info v17            schedule v4          schedule2 v1
                               4000                        4000
4000                                                                             150


                               3000                        3000
3000
                                                                                 100


2000                           2000                        2000


                                                                                  50
1000                           1000                        1000



   0                              0                           0                    0
           print_tokens v7              print_tokens2 v3          replace v24           space v20

                             Figure: Distribution of statements’ result changes
                               and faulty statement’s testing result changes.
       The vertical axis denotes the number of testing results changes (from „passed‟ to
       „failed‟), and horizontal width denotes the probability density at corresponding amount of
       testing results changes.                                                              16/34
Why does our approach work?
    - Another feasibility study (When CC%≥95%)
                                       25


                                                       ∎ Result changes (avg. 16.33%)
                                       20
        Frequency of faulty versions
                                                       ∎ Naish (avg. 47.55%)

                                       15



                                       10



                                        5



                                        0
                                                0%       20 %         40 %          60 %   80 %
                                                           Percentage of code examined
                                            Figure: Frequency distribution of effectiveness
                                                          when CC%≥ 95%.
   When CC% is greater or equal than 95%, code examination effort
    reduction of result changes is 65.66% (=100%-16.33%/47.55%).
   Only 6 faulty versions need to examine less than 20% of statements for
    Naish, while 22 versions by using result changes                    17/34
How to design our new ranking
function?




                                18/34
Our Approach – Muffler

            




[LRR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM
Transaction on Software Engineering Methodology, 20(3):11, 2011.
                                                                                              19/34
How do we evaluate our approach?
What is the result?




                                   20/34
Empirical Evaluation






                                 Lines of
                    Number of                Number of
    Program suite               Executable                  LOC
                     versions                test cases
                                  Code
        tcas           41         63-67        1608        133-137
      tot_info         23        122-123       1052        272-273
      schedule         9         149-152       2650        290-294
     schedule2         10        127-129       2710        261-263
    print_tokens       7         189-190       4130        341-343

    print_tokens2      10        199-200       4115        350-355

       replace         32        240-245       5542        508-515
                                                                      21/34
       space           38       3633-3647      13585      5882-5904
Empirical Evaluation             100%

                                 95%

                                 90%

                                 85%

                                 80%

                                 75%
   Percentage of fault located


                                 70%

                                 65%

                                 60%

                                 55%

                                 50%

                                 45%

                                 40%

                                 35%

                                 30%
                                                                                         Techiniques
                                 25%
                                                                                            Muffler
                                 20%                                                        Naish
                                 15%                                                        Ochiai
                                                                                            Tarantula
                                 10%                                                        Wong3
                                  5%

                                  0%
                                        0%    10%   20%    30%   40%   50%   60%   70%       80%       90%   100%


                                                          Percentage of code examined
                                             Figure: Overall effectiveness comparison.
                                                                                                                    22/34
Empirical Evaluation
    % of code
                 Tarantula    Ochiai     χDebug        Naish       Muffler
    examined
       1%            14         18           19         21           35
       5%            38         48           56         58           74
       10%           54         63           68         68           85
       15%           57         65           80         80           94
       20%           60         67           84         84           99
       30%           79         88           91         92          110
        Table: Number of faults located at different 99
       40%           92         98           98          level of code
                                                                    117
               examination effort using Naish and Muffler.
       50%           98         99          101         102         121
       60%           99        103          105         106         123
       70%          101        107          117         119         123
    When 1% of the statements have been examined, 123 can reach the
       80%          114        122          122         Naish       123
     fault in 17.07% of faulty versions. At 122 same time, Muffler 123 reach
       90%          123        123
                                            the         123
                                                                    can
     the fault in 28.46% of faulty versions.
      100%          123        123          123         123         123
                                                                           23/34
Empirical Evaluation
               Tarantula       Ochiai        χDebug          Naish         Muffler
    Min           0.00           0.00          0.00           0.00           0.00
   Max           87.89          84.25          93.85         78.46          55.38
 Median          20.33           9.52          7.69           7.32           3.25
   Mean          27.68          23.62          20.04         19.34           9.62
   Stdev         28.29          26.36          24.61         23.86          13.22

               Table: Statistics of code examination effort.

Among these five techniques, Muffler always scores the best in the rows that correspond to
the minimum, median, and mean code examination effort. In addition, Muffler gets much
lower standard deviation, which means that their performances vary less widely than
others, and are shown to be more stable in terms of effectiveness. Results also show that
Muffler reduces the average code examination effort from Naish by 50.26% (=100%-
(9.62%/19.34%).

                                                                                             24/34
How about the coincidental
correctness issue?




                             25/34
‹#›/34
Conclusion and future work
   We propose Muffler, a technique using mutation to
    help locate program faults.
   On 123 faulty versions of seven programs, we conduct
    a comparison of effectiveness and efficiency with
    Naish technique. Results show that Muffler reduces the
    average code examination effort on each faulty version
    by 50.26%.
   For future work, we plan to generalize our approach to
    locate faults in multi-fault programs.



                                                       27/34
Q&A




      28/34
Thank you!
Contact me via elfinhe@gmail.com




                                   29/34
# Background
                  Mutation analysis, first proposed by Hamlet [Ham77] and
                   Demilo et al. [DLS78] , is a fault-based testing technique
                   used to measure the effectiveness of a test suite.
                  In mutation analysis, one introduces syntactic code
                   changes, one at a time, into a program to generate
                   various faulty programs (called mutants).
                  A mutation operator is a change-seeding rule to
                   generate a mutant from the original program.

[Ham77] R.G. Hamlet, Testing Programs with the Aid of a Compiler, Software Engineering, IEEE Transactions
on, vol. SE-3, (no. 4), pp. 279- 290, 1977.
[DLS78] R.A. DeMillo, R.J. Lipton and F.G. Sayward, Hints on Test Data Selection: Help for the Practicing
Programmer, Computer, vol. 11, (no. 4), pp. 34-41, 1978.
                                                                                                30/34
# Ranking functions
                       Tarantula [JHS02], Ochiai [AZV07], χDebug [WQZ+07], and Naish [NLR11]




                                               Table: Ranking faunctions

[JHS02] J.A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. In Proceedings of the
24th International Conference on Software Engineering (ICSE '02), pp. 467-477, 2002.
[AZV07] R. Abreu, P. Zoeteweij and A.J.C. Van Gemund, On the accuracy of spectrum-based fault localization, in Proc. Proceedings -
Testing: Academic and Industrial Conference Practice and Research Techniques, TAIC PART-Mutation 2007, pp. 89-98, 2007.
[WQZ+07] W.E. Wong, Yu Qi, Lei Zhao, and Kai-Yuan Cai. Effective Fault Localization using Code Coverage. In Proceedings of the
31st Annual International Computer Software and Applications Conference (COMPSAC '07), Vol. 1, pp. 449-456, 2007.
[NLR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM Transaction on Software
Engineering Methodology, 20(3):11, 2011.                                                                                    31/34
# Our Approach – Muffler
     Faulty
                                    Test
    Program
                                    Suite


          Instrument program
                   &
        Execute against test suite
                          Coverage & Testing Results

      Select statements to mutate

                          Candidate Statements

       Mutate selected statements

                          Mutants

      Run mutants against test suite
                                                       Legend
                          Changes of testing results
        Calculate suspiciousness                        Input
                    &
            Sort statements                            Process

              Ranking List of all                      Output
                statements


     Figure: Dataflow diagram of Muffler.                        32/34
# Our Approach – Muffler




         Primary Key       Secondary Key   Additional Key
         (imprecise when   (invalid when   (inclined to handle
         multiple faults   coincidental    coincidental correctness)
         occurs)           correctness%
                           is high)




                                                              33/34
# An Example
                                                                            TotalPassed   TotalFailed   Part II


Part I                                                                         2440          210         Tarantula       Ochiai        χDebug          Naish


                                      Statement                              Passed(s)     Failed(s)    susp*     r**   susp      r    susp     r    susp      r


 S1      if (block_queue){                                                     1798          210        0.58      8     0.32      8   205.41    8   510812     8


 S2        count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */        1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S3        n = (int) (count*ratio); /* fault: missing ‘+1’ */                  1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S4        proc = find_nth(block_queue, n);                                    1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S5        if (proc) {                                                         1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S6          block_queue = del_ele(block_queue, proc);                         1358          210        0.64      3     0.37      3   205.85    3   511252     3

 S7          prio = proc->priority;                                            1358          210        0.64      3     0.37      3   205.85    3   511252     3

 S8          prio_queue[prio] = append_ele(prio_queue[prio], proc);}}          1358          210        0.64      3     0.37      3   205.85    3   511252     3


                                                       Code examination effort to locate S2 and S3:         88%           88%            88%            88%


                                      Figure: Faulty version v2 of program “schedule”.                                                               34/34
# An Example

Part III                                                                                                                                      Part IV       Muffler


                                  Mutated statement for each mutant               Changep→f   Changep→f   Changep→f   Changep→f   Changep→f   Impact      susp           r


M1         if (!block_queue ) {                                                     1644       1798       1101        1101        1644        1457.6    509354.4         8


M2           count = block_queue->mem_count != 1;                                   249        1097       1097         249        1382         814.8    510413.2         2


M3           n = (int) (count <= ratio) ;                                           249        1116       1101         494        1101         812.2    510415.8         2


M4           proc = find_nth(block_queue , ratio);                                  1088       638        1136         744        1382         997.6    510230.4         5


M5           if (!proc) {                                                           1136       1358       1101        1382        1101        1215.6    510012.4         6


M6             block_queue = del_ele(block_queue , proc-1);                         1123       349        1358         814        1358        1000.4    510251.6         4


M7             prio /= proc->priority;                                              1358       1358       1101        1101        1358        1255.2    509996.8         7


M8             prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }}     598        598        1138        1358        1101         958.6    510293.4         3


                                                                                      Code examination effort to locate S2 and S3:                           25%


                                        Figure: Faulty version v2 of program “schedule”.                                                                         35/34
# An Example
                                                                            TotalPassed   TotalFailed   Part II


Part I                                                                         2440          210         Tarantula       Ochiai        χDebug          Naish


                                      Statement                              Passed(s)     Failed(s)    susp*     r**   susp      r    susp     r    susp      r


 S1      if (block_queue){                                                     1798          210        0.58      8     0.32      8   205.41    8   510812     8


 S2        count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */        1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S3        n = (int) (count*ratio); /* fault: missing ‘+1’ */                  1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S4        proc = find_nth(block_queue, n);                                    1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S5        if (proc) {                                                         1382          210        0.64      7     0.36      7   205.83    7   511228     7


 S6          block_queue = del_ele(block_queue, proc);                         1358          210        0.64      3     0.37      3   205.85    3   511252     3

 S7          prio = proc->priority;                                            1358          210        0.64      3     0.37      3   205.85    3   511252     3

 S8          prio_queue[prio] = append_ele(prio_queue[prio], proc);}}          1358          210        0.64      3     0.37      3   205.85    3   511252     3


                                                       Code examination effort to locate S2 and S3:         88%           88%            88%            88%


                                      Figure: Faulty version v2 of program “schedule”.                                                               36/34
# An Example

Part III                                                                                                                                      Part IV       Muffler


                                  Mutated statement for each mutant               Changep→f   Changep→f   Changep→f   Changep→f   Changep→f   Impact      susp           r


M1         if (!block_queue ) {                                                     1644       1798       1101        1101        1644        1457.6    509354.4         8


M2           count = block_queue->mem_count != 1;                                   249        1097       1097         249        1382         814.8    510413.2         2


M3           n = (int) (count <= ratio) ;                                           249        1116       1101         494        1101         812.2    510415.8         2


M4           proc = find_nth(block_queue , ratio);                                  1088       638        1136         744        1382         997.6    510230.4         5


M5           if (!proc) {                                                           1136       1358       1101        1382        1101        1215.6    510012.4         6


M6             block_queue = del_ele(block_queue , proc-1);                         1123       349        1358         814        1358        1000.4    510251.6         4


M7             prio /= proc->priority;                                              1358       1358       1101        1101        1358        1255.2    509996.8         7


M8             prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }}     598        598        1138        1358        1101         958.6    510293.4         3


                                                                                      Code examination effort to locate S2 and S3:                           25%


                                        Figure: Faulty version v2 of program “schedule”.                                                                         37/34
# Empirical Evaluation
                              Versus          Versus          Versus          Versus
                            Tanrantula        Ochiai         χDebug           Naish

     More effective             102              96             93              89

   Same effectiveness            19              23             23              25

      Less effective              2              4               7               9

                    Table: Pair-wise comparison between
                      Muffler and existing techniques.

Muffler is more effective (examining more statements before encountering the faulty
statement) than Naish for 89 out of 123 faulty versions; is as effective (examining the same
number of statements before encountering the faulty statement) as Naish for 25 out of 123
faulty versions; and is less effective (examining less statements before encountering the
faulty statement) than Naish for only 9 out of 123 faulty versions.


                                                                                               38/34
# Empirical Evaluation
   Experience on real faults

         Faulty versions         CC%               Code examination effort
                                                   Naish           Muffler
               v5                1%                 0%               0%
               v9                7%                 1%               0%
               v17               31%               12%               7%
               v28               49%               11%               5%
               v29               99%               25%               9%

                 Table: Results with real faults in space



Five faulty versions are chosen to represent low, medium, and the high occurrence of
coincidental correctness. In this table, the column “CC%” presents the percentage of
coincidentally passed test cases out of all passed test cases. The columns under the head
“Code examination effort” present the percentage of code to be examined before the fault is
encountered.


                                                                                              39/34
# Empirical Evaluation
   Efficiency analysis
    Program suite                 CBFL (seconds)                   Muffler (seconds)
          tcas                       18.00                              868.68
       tot_info                      11.92                              573.12
       schedule                      34.02                             2703.01
      schedule2                      27.76                             1773.14
     print_tokens                    59.11                             2530.17
    print_tokens2                    62.07                             5062.87
        replace                      69.13                             4139.19
       Average                       40.29                             2521.46
     Table: Time spent by each technique on subject programs.

We have shown experimentally that, by taking advantages from both coverage and mutation
impact, Muffler outperforms Naish regardless the occurrence of coincidental correctness.
Unfortunately, our approaches, Muffler need to execute piles of mutants to compute mutation
impact. The execution of mutants against the test suite may increase the time cost of fault
localization. The time mainly contains the cost of instrumentation, execution, and coverage
collection. From this table, we observe that Muffler takes approximately 62.59 times of
average time cost to the Naish technique.
                                                                                              40/34
# Empirical Evaluation
   Efficiency analysis
   Program               Mutated                Total                         Time per mutant
                                                                 Mutants
     suite              statements           statements                          (seconds)
      tcas                 40.15                65.10             199.90            4.26
   tot_info                39.57               122.96             191.87            2.92
   schedule                80.60               150.20             351.60            7.59
  schedule2                75.33               127.56             327.78            5.32
 print_tokens              67.43               189.86             260.29            9.49
print_tokens2              86.67               199.44             398.67           12.54
    replace                71.14               242.86             305.93           13.30
   Average                 56.52               142.79             256.90            7.92

                Table: Information about mutants generated.

This Table illustrates the detailed data about the number of mutated/total executable
statements, the number of mutants generated, and the time cost of running each mutant. For
example, of the program tcas, there are, on average, 40.15 statements that are mutated by
Muffler; and 65.10 executable statements in total; 199.90 mutants are generated and it takes
4.26 seconds to run each of them, on average. Notice that there is no need to collect coverage
from the mutants‟ executions, and it takes about 1/4 time to run a mutant without
instrumentation and coverage collection.
                                                                                                 41/34

Contenu connexe

Tendances

Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
Vasavi College of Engg
 
Refinery Blending Problems by Engr. Adefami Olusegun
Refinery Blending Problems by Engr. Adefami OlusegunRefinery Blending Problems by Engr. Adefami Olusegun
Refinery Blending Problems by Engr. Adefami Olusegun
Engr. Adefami Segun, MNSE
 
Session 7 code_functional_coverage
Session 7 code_functional_coverageSession 7 code_functional_coverage
Session 7 code_functional_coverage
Nirav Desai
 
On the Performance Overhead of BPMN Modeling Practices
On the Performance Overhead of BPMN Modeling PracticesOn the Performance Overhead of BPMN Modeling Practices
On the Performance Overhead of BPMN Modeling Practices
Ana Ivanchikj
 
Session 9 advance_verification_features
Session 9 advance_verification_featuresSession 9 advance_verification_features
Session 9 advance_verification_features
Nirav Desai
 
Session 8 assertion_based_verification_and_interfaces
Session 8 assertion_based_verification_and_interfacesSession 8 assertion_based_verification_and_interfaces
Session 8 assertion_based_verification_and_interfaces
Nirav Desai
 

Tendances (13)

Console manual impl
Console manual implConsole manual impl
Console manual impl
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
 
Refinery Blending Problems by Engr. Adefami Olusegun
Refinery Blending Problems by Engr. Adefami OlusegunRefinery Blending Problems by Engr. Adefami Olusegun
Refinery Blending Problems by Engr. Adefami Olusegun
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Fortran - concise review
Fortran - concise reviewFortran - concise review
Fortran - concise review
 
Path testing
Path testingPath testing
Path testing
 
Session 7 code_functional_coverage
Session 7 code_functional_coverageSession 7 code_functional_coverage
Session 7 code_functional_coverage
 
Fortran compiling 2
Fortran compiling 2Fortran compiling 2
Fortran compiling 2
 
On the Performance Overhead of BPMN Modeling Practices
On the Performance Overhead of BPMN Modeling PracticesOn the Performance Overhead of BPMN Modeling Practices
On the Performance Overhead of BPMN Modeling Practices
 
Session 9 advance_verification_features
Session 9 advance_verification_featuresSession 9 advance_verification_features
Session 9 advance_verification_features
 
Session 8 assertion_based_verification_and_interfaces
Session 8 assertion_based_verification_and_interfacesSession 8 assertion_based_verification_and_interfaces
Session 8 assertion_based_verification_and_interfaces
 
Duplicate Code Detection using Control Statements
Duplicate Code Detection using Control StatementsDuplicate Code Detection using Control Statements
Duplicate Code Detection using Control Statements
 
St 1.6
St 1.6St 1.6
St 1.6
 

Similaire à Muffler a tool using mutation to facilitate fault localization 2.3

A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutations
Tao He
 
Fault simulation – application and methods
Fault simulation – application and methodsFault simulation – application and methods
Fault simulation – application and methods
Subash John
 
Software reliability models error seeding model and failure model-iv
Software reliability models error seeding model and failure model-ivSoftware reliability models error seeding model and failure model-iv
Software reliability models error seeding model and failure model-iv
Gurbakash Phonsa
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
Model based vulnerability testing
Model based vulnerability testingModel based vulnerability testing
Model based vulnerability testing
Kupili Archana
 
Cl32990995
Cl32990995Cl32990995
Cl32990995
IJMER
 
Model Based Software Testing
Model Based Software TestingModel Based Software Testing
Model Based Software Testing
Esin Karaman
 

Similaire à Muffler a tool using mutation to facilitate fault localization 2.3 (20)

A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutations
 
50120140502017
5012014050201750120140502017
50120140502017
 
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
 
A comparative on test case generation on concurrent
A comparative on test case generation on concurrentA comparative on test case generation on concurrent
A comparative on test case generation on concurrent
 
The MTBF - Day1_v2
The MTBF - Day1_v2The MTBF - Day1_v2
The MTBF - Day1_v2
 
Fault simulation – application and methods
Fault simulation – application and methodsFault simulation – application and methods
Fault simulation – application and methods
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Software reliability models error seeding model and failure model-iv
Software reliability models error seeding model and failure model-ivSoftware reliability models error seeding model and failure model-iv
Software reliability models error seeding model and failure model-iv
 
Debug me
Debug meDebug me
Debug me
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
ASQ Black Belt Body of Knowledge based Lean Six Sigma exam practice questions
ASQ Black Belt Body of Knowledge based Lean Six Sigma exam practice questionsASQ Black Belt Body of Knowledge based Lean Six Sigma exam practice questions
ASQ Black Belt Body of Knowledge based Lean Six Sigma exam practice questions
 
Advanced Econometrics L11- 12.pptx
Advanced Econometrics L11- 12.pptxAdvanced Econometrics L11- 12.pptx
Advanced Econometrics L11- 12.pptx
 
Model based vulnerability testing
Model based vulnerability testingModel based vulnerability testing
Model based vulnerability testing
 
Optimal Selection of Software Reliability Growth Model-A Study
Optimal Selection of Software Reliability Growth Model-A StudyOptimal Selection of Software Reliability Growth Model-A Study
Optimal Selection of Software Reliability Growth Model-A Study
 
Cl32990995
Cl32990995Cl32990995
Cl32990995
 
Restructuring Unit Tests with TestSurgeon - ICSE 2012
Restructuring Unit Tests with TestSurgeon - ICSE 2012Restructuring Unit Tests with TestSurgeon - ICSE 2012
Restructuring Unit Tests with TestSurgeon - ICSE 2012
 
Model Based Software Testing
Model Based Software TestingModel Based Software Testing
Model Based Software Testing
 
Software Testing and Quality Assurance Assignment 2
Software Testing and Quality Assurance Assignment 2Software Testing and Quality Assurance Assignment 2
Software Testing and Quality Assurance Assignment 2
 

Plus de Tao He

Introduction to llvm
Introduction to llvmIntroduction to llvm
Introduction to llvm
Tao He
 
Testing survey
Testing surveyTesting survey
Testing survey
Tao He
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
Tao He
 
Smart debugger
Smart debuggerSmart debugger
Smart debugger
Tao He
 
Mutation testing
Mutation testingMutation testing
Mutation testing
Tao He
 
C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4
Tao He
 
Django
DjangoDjango
Django
Tao He
 
基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述
Tao He
 
Java覆盖信息收集工具比较
Java覆盖信息收集工具比较Java覆盖信息收集工具比较
Java覆盖信息收集工具比较
Tao He
 
Testing group’s work on fault localization
Testing group’s work on fault localizationTesting group’s work on fault localization
Testing group’s work on fault localization
Tao He
 
Semantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti SpamSemantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti Spam
Tao He
 
Problems
ProblemsProblems
Problems
Tao He
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testing
Tao He
 
Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...
Tao He
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?
Tao He
 

Plus de Tao He (16)

Java 并发编程笔记:01. 并行与并发 —— 概念
Java 并发编程笔记:01. 并行与并发 —— 概念Java 并发编程笔记:01. 并行与并发 —— 概念
Java 并发编程笔记:01. 并行与并发 —— 概念
 
Introduction to llvm
Introduction to llvmIntroduction to llvm
Introduction to llvm
 
Testing survey
Testing surveyTesting survey
Testing survey
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Smart debugger
Smart debuggerSmart debugger
Smart debugger
 
Mutation testing
Mutation testingMutation testing
Mutation testing
 
C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4C语言benchmark覆盖信息收集总结4
C语言benchmark覆盖信息收集总结4
 
Django
DjangoDjango
Django
 
基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述基于覆盖信息的软件错误定位技术综述
基于覆盖信息的软件错误定位技术综述
 
Java覆盖信息收集工具比较
Java覆盖信息收集工具比较Java覆盖信息收集工具比较
Java覆盖信息收集工具比较
 
Testing group’s work on fault localization
Testing group’s work on fault localizationTesting group’s work on fault localization
Testing group’s work on fault localization
 
Semantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti SpamSemantic Parsing in Bayesian Anti Spam
Semantic Parsing in Bayesian Anti Spam
 
Problems
ProblemsProblems
Problems
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testing
 
Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...Cleansing test suites from coincidental correctness to enhance falut localiza...
Cleansing test suites from coincidental correctness to enhance falut localiza...
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Muffler a tool using mutation to facilitate fault localization 2.3

  • 1. Muffler: An Approach Using Mutation to Facilitate Fault Localization Tao He elfinhe@gmail.com Department of Computer Science, Sun Yat-Sen University Department of Computer Science and Engineering, HKUST Group Discussion February 2012 HKUST, Hong Kong, China 1/34
  • 2. Outline  Background  Motivation  Why does our approach work?  Our Approach – Muffler  Empirical Evaluation  Conclusion 2/34
  • 3. Background  Coverage-Based Fault Localization (CBFL)  Input  Coverage  Testing results (passed or failed)  Output  A ranking list of statements  Ranking functions  Most CBFL techniques are similar with each other except that different ranking functions are used to compute suspiciousness. 3/34
  • 4. What is the limitation of existing CBFL techniques? 4/34
  • 5. Motivation  One fundamental assumption [YPW08] of CBFL  The observed behaviors from passed runs can precisely represent the correct behaviors of this program;  and the observed behaviors from failed runs can represent the infamous behaviors.  Therefore, the different observed behaviors of program entities between passed runs and failed runs will indicate the fault’s location.  But this does not always hold. [YPW08] C. Yilmaz, A. Paradkar, and C. Williams. Time will tell: fault localization using time spectra. In Proceedings of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 81-90. 2008. 5/34
  • 6. Motivation  Coincidental Correctness (CC)  “No failure is detected, even though a fault has been executed.” [RT93]  i.e., the passed runs may cover the fault.  Weaken the first part of CBFL‟s assumption:  The observed behaviors from passed runs can precisely represent the correct behaviors of this program;  More, CC occurs frequently in practice.[MAE+09] [RT93] D.J. Richardson and M.C. Thompson, An analysis of test data selection criteria using the RELAY model of fault detection, Software Engineering, IEEE Transactions on, vol. 19, (no. 6), pp. 533-553, 1993. [MAE+09] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi, An empirical study of the factors that reduce the effectiveness of coverage-based fault localization, in Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in conjunction with the ACM SIGSOFT International Symposium on Software Testing 6/34 and Analysis (ISSTA 2009), pp. 1-5, 2009.
  • 7. Our goal is to address the CC issue via mutation analysis What is the idea? 7/34
  • 8. Why does our approach work? - Key hypothesis  Mutating the faulty statement tends to maintain the results of passed test cases.  By contrast, mutating a correct statement tends to change the results of passed test cases (from passed to failed). 8/34
  • 9. Why does our approach work? - Three comprehensive scenarios (1/3) - If we mutate an M in different basic blocks with F Test cases Passed Program Failed F M M: Mutant point Test results F: Fault point 3 test results change from passed to failed 9/34
  • 10. Why does our approach work? - Three comprehensive scenarios (1/3) - If we mutate an M in different basic blocks with F Test cases Passed M Program Failed F M: Mutant point Test results F: Fault point 3 test results change from passed to failed 10/34
  • 11. Why does our approach work? - Three comprehensive scenarios (1/3) - If we mutate F Test cases Passed Program Failed F +M M: Mutant point Test results F: Fault point 0 test result changes from passed to failed 11/34
  • 12. Why does our approach work? - Three comprehensive scenarios (2/3) - If we mutate an M in the same basic block with F Test cases Due to different data flow to affect output Passed F Program Failed M M: Mutant point F: Fault point Control Flow Test results 3 test results change from passed to failed Data Flow 12/34
  • 13. Why does our approach work? - Three comprehensive scenarios (2/3) - If we mutate F Test cases Passed F +M Program Failed M: Mutant point F: Fault point Control Flow Test results 0 test result change from passed to failed Data Flow 13/34
  • 14. Why does our approach work? - Three comprehensive scenarios (3/3) - When CC occurs frequently Test cases - If we mutate F Due to weak ability to affect output Passed Program Failed F +M M: Mutant point F: Fault point Test results Weak ability to generate an infectious state or to propagate the infectious state to output 0 test result changes from passed to failed 14/34
  • 15. Does this work in real programs? 15/34
  • 16. Why does our approach work? 1000 - A feasibility study 2500 2000 800 800 2000 1500 600 600 1500 400 1000 400 1000 200 200 500 500 0 0 0 0 tcas v7 tot_info v17 schedule v4 schedule2 v1 4000 4000 4000 150 3000 3000 3000 100 2000 2000 2000 50 1000 1000 1000 0 0 0 0 print_tokens v7 print_tokens2 v3 replace v24 space v20 Figure: Distribution of statements’ result changes and faulty statement’s testing result changes. The vertical axis denotes the number of testing results changes (from „passed‟ to „failed‟), and horizontal width denotes the probability density at corresponding amount of testing results changes. 16/34
  • 17. Why does our approach work? - Another feasibility study (When CC%≥95%) 25 ∎ Result changes (avg. 16.33%) 20 Frequency of faulty versions ∎ Naish (avg. 47.55%) 15 10 5 0 0% 20 % 40 % 60 % 80 % Percentage of code examined Figure: Frequency distribution of effectiveness when CC%≥ 95%.  When CC% is greater or equal than 95%, code examination effort reduction of result changes is 65.66% (=100%-16.33%/47.55%).  Only 6 faulty versions need to examine less than 20% of statements for Naish, while 22 versions by using result changes 17/34
  • 18. How to design our new ranking function? 18/34
  • 19. Our Approach – Muffler  [LRR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM Transaction on Software Engineering Methodology, 20(3):11, 2011. 19/34
  • 20. How do we evaluate our approach? What is the result? 20/34
  • 21. Empirical Evaluation  Lines of Number of Number of Program suite Executable LOC versions test cases Code tcas 41 63-67 1608 133-137 tot_info 23 122-123 1052 272-273 schedule 9 149-152 2650 290-294 schedule2 10 127-129 2710 261-263 print_tokens 7 189-190 4130 341-343 print_tokens2 10 199-200 4115 350-355 replace 32 240-245 5542 508-515 21/34 space 38 3633-3647 13585 5882-5904
  • 22. Empirical Evaluation 100% 95% 90% 85% 80% 75% Percentage of fault located 70% 65% 60% 55% 50% 45% 40% 35% 30% Techiniques 25% Muffler 20% Naish 15% Ochiai Tarantula 10% Wong3 5% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Percentage of code examined Figure: Overall effectiveness comparison. 22/34
  • 23. Empirical Evaluation % of code Tarantula Ochiai χDebug Naish Muffler examined 1% 14 18 19 21 35 5% 38 48 56 58 74 10% 54 63 68 68 85 15% 57 65 80 80 94 20% 60 67 84 84 99 30% 79 88 91 92 110 Table: Number of faults located at different 99 40% 92 98 98 level of code 117 examination effort using Naish and Muffler. 50% 98 99 101 102 121 60% 99 103 105 106 123 70% 101 107 117 119 123  When 1% of the statements have been examined, 123 can reach the 80% 114 122 122 Naish 123 fault in 17.07% of faulty versions. At 122 same time, Muffler 123 reach 90% 123 123 the 123 can the fault in 28.46% of faulty versions. 100% 123 123 123 123 123 23/34
  • 24. Empirical Evaluation Tarantula Ochiai χDebug Naish Muffler Min 0.00 0.00 0.00 0.00 0.00 Max 87.89 84.25 93.85 78.46 55.38 Median 20.33 9.52 7.69 7.32 3.25 Mean 27.68 23.62 20.04 19.34 9.62 Stdev 28.29 26.36 24.61 23.86 13.22 Table: Statistics of code examination effort. Among these five techniques, Muffler always scores the best in the rows that correspond to the minimum, median, and mean code examination effort. In addition, Muffler gets much lower standard deviation, which means that their performances vary less widely than others, and are shown to be more stable in terms of effectiveness. Results also show that Muffler reduces the average code examination effort from Naish by 50.26% (=100%- (9.62%/19.34%). 24/34
  • 25. How about the coincidental correctness issue? 25/34
  • 27. Conclusion and future work  We propose Muffler, a technique using mutation to help locate program faults.  On 123 faulty versions of seven programs, we conduct a comparison of effectiveness and efficiency with Naish technique. Results show that Muffler reduces the average code examination effort on each faulty version by 50.26%.  For future work, we plan to generalize our approach to locate faults in multi-fault programs. 27/34
  • 28. Q&A 28/34
  • 29. Thank you! Contact me via elfinhe@gmail.com 29/34
  • 30. # Background  Mutation analysis, first proposed by Hamlet [Ham77] and Demilo et al. [DLS78] , is a fault-based testing technique used to measure the effectiveness of a test suite.  In mutation analysis, one introduces syntactic code changes, one at a time, into a program to generate various faulty programs (called mutants).  A mutation operator is a change-seeding rule to generate a mutant from the original program. [Ham77] R.G. Hamlet, Testing Programs with the Aid of a Compiler, Software Engineering, IEEE Transactions on, vol. SE-3, (no. 4), pp. 279- 290, 1977. [DLS78] R.A. DeMillo, R.J. Lipton and F.G. Sayward, Hints on Test Data Selection: Help for the Practicing Programmer, Computer, vol. 11, (no. 4), pp. 34-41, 1978. 30/34
  • 31. # Ranking functions  Tarantula [JHS02], Ochiai [AZV07], χDebug [WQZ+07], and Naish [NLR11] Table: Ranking faunctions [JHS02] J.A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. In Proceedings of the 24th International Conference on Software Engineering (ICSE '02), pp. 467-477, 2002. [AZV07] R. Abreu, P. Zoeteweij and A.J.C. Van Gemund, On the accuracy of spectrum-based fault localization, in Proc. Proceedings - Testing: Academic and Industrial Conference Practice and Research Techniques, TAIC PART-Mutation 2007, pp. 89-98, 2007. [WQZ+07] W.E. Wong, Yu Qi, Lei Zhao, and Kai-Yuan Cai. Effective Fault Localization using Code Coverage. In Proceedings of the 31st Annual International Computer Software and Applications Conference (COMPSAC '07), Vol. 1, pp. 449-456, 2007. [NLR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM Transaction on Software Engineering Methodology, 20(3):11, 2011. 31/34
  • 32. # Our Approach – Muffler Faulty Test Program Suite Instrument program & Execute against test suite Coverage & Testing Results Select statements to mutate Candidate Statements Mutate selected statements Mutants Run mutants against test suite Legend Changes of testing results Calculate suspiciousness Input & Sort statements Process Ranking List of all Output statements Figure: Dataflow diagram of Muffler. 32/34
  • 33. # Our Approach – Muffler  Primary Key Secondary Key Additional Key (imprecise when (invalid when (inclined to handle multiple faults coincidental coincidental correctness) occurs) correctness% is high) 33/34
  • 34. # An Example TotalPassed TotalFailed Part II Part I 2440 210 Tarantula Ochiai χDebug Naish Statement Passed(s) Failed(s) susp* r** susp r susp r susp r S1 if (block_queue){ 1798 210 0.58 8 0.32 8 205.41 8 510812 8 S2 count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */ 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S3 n = (int) (count*ratio); /* fault: missing ‘+1’ */ 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S4 proc = find_nth(block_queue, n); 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S5 if (proc) { 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S6 block_queue = del_ele(block_queue, proc); 1358 210 0.64 3 0.37 3 205.85 3 511252 3 S7 prio = proc->priority; 1358 210 0.64 3 0.37 3 205.85 3 511252 3 S8 prio_queue[prio] = append_ele(prio_queue[prio], proc);}} 1358 210 0.64 3 0.37 3 205.85 3 511252 3 Code examination effort to locate S2 and S3: 88% 88% 88% 88% Figure: Faulty version v2 of program “schedule”. 34/34
  • 35. # An Example Part III Part IV Muffler Mutated statement for each mutant Changep→f Changep→f Changep→f Changep→f Changep→f Impact susp r M1 if (!block_queue ) { 1644 1798 1101 1101 1644 1457.6 509354.4 8 M2 count = block_queue->mem_count != 1; 249 1097 1097 249 1382 814.8 510413.2 2 M3 n = (int) (count <= ratio) ; 249 1116 1101 494 1101 812.2 510415.8 2 M4 proc = find_nth(block_queue , ratio); 1088 638 1136 744 1382 997.6 510230.4 5 M5 if (!proc) { 1136 1358 1101 1382 1101 1215.6 510012.4 6 M6 block_queue = del_ele(block_queue , proc-1); 1123 349 1358 814 1358 1000.4 510251.6 4 M7 prio /= proc->priority; 1358 1358 1101 1101 1358 1255.2 509996.8 7 M8 prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }} 598 598 1138 1358 1101 958.6 510293.4 3 Code examination effort to locate S2 and S3: 25% Figure: Faulty version v2 of program “schedule”. 35/34
  • 36. # An Example TotalPassed TotalFailed Part II Part I 2440 210 Tarantula Ochiai χDebug Naish Statement Passed(s) Failed(s) susp* r** susp r susp r susp r S1 if (block_queue){ 1798 210 0.58 8 0.32 8 205.41 8 510812 8 S2 count = block_queue->mem_count + 1; /* fault: insert ‘+1’ */ 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S3 n = (int) (count*ratio); /* fault: missing ‘+1’ */ 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S4 proc = find_nth(block_queue, n); 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S5 if (proc) { 1382 210 0.64 7 0.36 7 205.83 7 511228 7 S6 block_queue = del_ele(block_queue, proc); 1358 210 0.64 3 0.37 3 205.85 3 511252 3 S7 prio = proc->priority; 1358 210 0.64 3 0.37 3 205.85 3 511252 3 S8 prio_queue[prio] = append_ele(prio_queue[prio], proc);}} 1358 210 0.64 3 0.37 3 205.85 3 511252 3 Code examination effort to locate S2 and S3: 88% 88% 88% 88% Figure: Faulty version v2 of program “schedule”. 36/34
  • 37. # An Example Part III Part IV Muffler Mutated statement for each mutant Changep→f Changep→f Changep→f Changep→f Changep→f Impact susp r M1 if (!block_queue ) { 1644 1798 1101 1101 1644 1457.6 509354.4 8 M2 count = block_queue->mem_count != 1; 249 1097 1097 249 1382 814.8 510413.2 2 M3 n = (int) (count <= ratio) ; 249 1116 1101 494 1101 812.2 510415.8 2 M4 proc = find_nth(block_queue , ratio); 1088 638 1136 744 1382 997.6 510230.4 5 M5 if (!proc) { 1136 1358 1101 1382 1101 1215.6 510012.4 6 M6 block_queue = del_ele(block_queue , proc-1); 1123 349 1358 814 1358 1000.4 510251.6 4 M7 prio /= proc->priority; 1358 1358 1101 1101 1358 1255.2 509996.8 7 M8 prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }} 598 598 1138 1358 1101 958.6 510293.4 3 Code examination effort to locate S2 and S3: 25% Figure: Faulty version v2 of program “schedule”. 37/34
  • 38. # Empirical Evaluation Versus Versus Versus Versus Tanrantula Ochiai χDebug Naish More effective 102 96 93 89 Same effectiveness 19 23 23 25 Less effective 2 4 7 9 Table: Pair-wise comparison between Muffler and existing techniques. Muffler is more effective (examining more statements before encountering the faulty statement) than Naish for 89 out of 123 faulty versions; is as effective (examining the same number of statements before encountering the faulty statement) as Naish for 25 out of 123 faulty versions; and is less effective (examining less statements before encountering the faulty statement) than Naish for only 9 out of 123 faulty versions. 38/34
  • 39. # Empirical Evaluation  Experience on real faults Faulty versions CC% Code examination effort Naish Muffler v5 1% 0% 0% v9 7% 1% 0% v17 31% 12% 7% v28 49% 11% 5% v29 99% 25% 9% Table: Results with real faults in space Five faulty versions are chosen to represent low, medium, and the high occurrence of coincidental correctness. In this table, the column “CC%” presents the percentage of coincidentally passed test cases out of all passed test cases. The columns under the head “Code examination effort” present the percentage of code to be examined before the fault is encountered. 39/34
  • 40. # Empirical Evaluation  Efficiency analysis Program suite CBFL (seconds) Muffler (seconds) tcas 18.00 868.68 tot_info 11.92 573.12 schedule 34.02 2703.01 schedule2 27.76 1773.14 print_tokens 59.11 2530.17 print_tokens2 62.07 5062.87 replace 69.13 4139.19 Average 40.29 2521.46 Table: Time spent by each technique on subject programs. We have shown experimentally that, by taking advantages from both coverage and mutation impact, Muffler outperforms Naish regardless the occurrence of coincidental correctness. Unfortunately, our approaches, Muffler need to execute piles of mutants to compute mutation impact. The execution of mutants against the test suite may increase the time cost of fault localization. The time mainly contains the cost of instrumentation, execution, and coverage collection. From this table, we observe that Muffler takes approximately 62.59 times of average time cost to the Naish technique. 40/34
  • 41. # Empirical Evaluation  Efficiency analysis Program Mutated Total Time per mutant Mutants suite statements statements (seconds) tcas 40.15 65.10 199.90 4.26 tot_info 39.57 122.96 191.87 2.92 schedule 80.60 150.20 351.60 7.59 schedule2 75.33 127.56 327.78 5.32 print_tokens 67.43 189.86 260.29 9.49 print_tokens2 86.67 199.44 398.67 12.54 replace 71.14 242.86 305.93 13.30 Average 56.52 142.79 256.90 7.92 Table: Information about mutants generated. This Table illustrates the detailed data about the number of mutated/total executable statements, the number of mutants generated, and the time cost of running each mutant. For example, of the program tcas, there are, on average, 40.15 statements that are mutated by Muffler; and 65.10 executable statements in total; 199.90 mutants are generated and it takes 4.26 seconds to run each of them, on average. Notice that there is no need to collect coverage from the mutants‟ executions, and it takes about 1/4 time to run a mutant without instrumentation and coverage collection. 41/34

Notes de l'éditeur

  1. I assume that you have already known a lot of these techniques, so I only give a quick review.
  2. Please find another definition, using passed runs to describ CC
  3. Please remember to notate the CC, e.g., 1382.Please remember to add amination
  4. Please remember to notate the CC, e.g., 1382.Please remember to add amination
  5. It is worthwhile to mention that Muffler’s time cost can be greatly reduced with a simple test selection strategy. The strategy can be described as: do not re-run a test case that does not cover the mutated statement. Furthermore, because the executions of mutants do not depend on each other, we can parallelize them with not much effort. Nonetheless, we have to admit that Muffler need more time to offer a better effectiveness in fault localization.