SlideShare une entreprise Scribd logo
1  sur  60
TOWARDS AUTOMATED SUPPORTS FOR CODE
REVIEWS USING REVIEWER RECOMMENDATION
AND REVIEW QUALITY MODELLING
Mohammad Masudur Rahman, Chanchal K. Roy, Raula G.
Kula, Jason Collins, and Jesse Redl
University of Saskatchewan, Canada, Osaka University, Japan
Vendasta Technologies, Canada
56th COW: Code Review and Continuous Inspection/Integration
CODE REVIEW
2
Code review could be unpleasant 
RECAP ON CODE REVIEW
Formal inspection
Peer code review
Modern code review (MCR)
Code review is a systematic
examination of source code for
detecting bugs or defects and
coding rule violations.
3
Early bug detection
Stop coding rule violation
Enhance developer skill
PARTIES INVOLVED IN MCR
Code Reviewer
Patch Submitter
Suggestion on
appropriate code
reviewers
Help with good/useful
code reviews
4
TODAY’S TALK OUTLINE
Part I: Code Reviewer
Recommendation System
Part II: Prediction Model
for Review Usefulness
5
TODAY’S TALK OUTLINE
Part III: Impact of Continuous Integration on
Code Reviews
6
Part I: Code Reviewer
Recommendation
7
8
 FOR
Novice developers
Distributed software
development
Delayed 12 days
(Thongtanunam et al, SANER 2015)
EXISTING LITERATURE
 Line Change History (LCH)
 ReviewBot (Balachandran, ICSE 2013)
 File Path Similarity (FPS)
 RevFinder (Thongtanunam et al, SANER 2015)
 FPS (Thongtanunam et al, CHASE 2014)
 Tie (Xia et al, ICSME 2015)
 Code Review Content and Comments
 Tie (Xia et al, ICSME 2015)
 SNA (Yu et al, ICSME 2014)
9
 Issues & Limitations
 Mine developer’s contributions from
within a single project only.
 Library & Technology Similarity
Library Technology
OUTLINE OF THIS STUDY
10
Vendasta codebase
CORRECT
Evaluation using
VendAsta code base
Evaluation using
Open Source Projects
Conclusion
Comparative
study
Exploratory study 3 Research questions
EXPLORATORY STUDY ( 3 RQS)
RQ1: How frequently do the commercial software
projects reuse external libraries from within the
codebase?
RQ2: Does the experience of a developer with
such libraries matter in code reviewer selection by
other developers?
RQ3: How frequently do the commercial projects
adopt specialized technologies (e.g., taskqueue,
mapreduce, urlfetch)?
11
DATASET: EXPLORATORY STUDY
12
 Each project has at least 750 closed pull requests.
 Each library is used at least 10 times on average.
 Each technology is used at least 5 times on average.
10 utility libraries
(Vendasta)
10 commercial projects
(Vendasta)
10 Google App Engine
Technologies
LIBRARY USAGE IN COMMERCIAL PROJECTS
(ANSWERED: EXP-RQ1 )
 Empirical library usage frequency in 10 projects
 Mostly used: vtest, vauth, and vapi
 Least used: vlogs, vmonitor
13
LIBRARY USAGE IN
PULL REQUESTS (ANSWERED: EXP-RQ2)
 30%-70% of pull requests used at least one of the 10 libraries
 87%-100% of library authors recommended as code reviewers in
the projects using those libraries
 Library experience really matters!
14
% of PR using selected libraries % of library authors as code reviewers
SPECIALIZED TECHNOLOGY USAGE
IN PROJECTS (ANSWERED: EXP-RQ3)
 Empirical technology usage frequency in top 10
commercial projects
 Champion technology: mapreduce
15
TECHNOLOGY USAGE IN PULL REQUESTS
(ANSWERED: EXP-RQ3)
 20%-60% of the pull requests used at least one of the
10 specialized technologies.
 Mostly used in: ARM, CS and VBC
16
SUMMARY OF EXPLORATORY FINDINGS
17
About 50% of the pull requests used one or more of the
selected libraries. (Exp-RQ1)
About 98% of the library authors were later
recommended as pull request reviewers. (Exp-RQ2)
About 35% of the pull requests used one or more
specialized technologies. (Exp-RQ3)
Library experience and Specialized technology
experience really matter in code reviewer
selection/recommendation
CORRECT: CODE REVIEWER RECOMMENDATION
IN GITHUB USING CROSS-PROJECT &
TECHNOLOGY EXPERIENCE
18
CORRECT: CODE REVIEWER
RECOMMENDATION
19
R1 R2
R3
PR Review R1 PR Review R2
PR Review R3
Review
Similarity
Review
Similarity
OUR CONTRIBUTIONS
20
State-of-the-art (Thongtanunam et al, SANER 2015)
IF
IF
Our proposed technique--CORRECT
= New PR = Reviewed PR = Source file
= External library & specialized technology
EVALUATION OF CORRECT
 Two evaluations using-- (1) Vendasta codebase (2)
Open source software projects
21
1: Are library experience and technology experience useful
proxies for code review skills?
2: Does CoRReCT outperform the baseline technique for
reviewer recommendation?
3: Does CoRReCT perform equally/comparably for both
private and public codebase?
4: Does CoRReCT show bias to any of the development
frameworks
EXPERIMENTAL DATASET
 Sliding window of 30 past requests for learning.
 Metrics: Top-K Accuracy, Mean Precision (MP), Mean
Recall (MR), and Mean Reciprocal rank (MRR). 22
10 Python projects 2 Python, 2 Java &
2 Ruby projects
13,081 Pull requests 4,034 Pull requests
Code reviews Code reviewers
Gold set
LIBRARY EXPERIENCE & TECHNOLOGY
EXPERIENCE (ANSWERED: RQ1)
Metric Library Similarity Technology Similarity Combined Similarity
Top-3 Top-5 Top-3 Top-5 Top-3 Top-5
Accuracy 83.57% 92.02% 82.18% 91.83% 83.75% 92.15%
MRR 0.66 0.67 0.62 0.64 0.65 0.67
MP 65.93% 85.28% 62.99% 83.93% 65.98% 85.93%
MR 58.34% 80.77% 55.77% 79.50% 58.43% 81.39%
23
[ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
 Both library experience and technology experience are
found as good proxies, provide over 90% accuracy.
 Combined experience provides the maximum performance.
 92.15% recommendation accuracy with 85.93% precision and
81.39% recall.
 Evaluation results align with exploratory study findings.
COMPARATIVE STUDY FINDINGS (ANSWERED:
RQ2)
 CoRReCT performs better than the competing technique in all
metrics (p-value=0.003<0.05 for Top-5 accuracy)
 Performs better both on average and on individual projects.
 RevFinder uses PR similarity using source file name and file’s
directory matching
24
Metric RevFinder[18] CoRReCT
Top-5 Top-5
Accuracy 80.72% 92.15%
MRR 0.65 0.67
MP 77.24% 85.93%
MR 73.27% 81.39%
[ MP = Mean Precision, MR = Mean Recall,
MRR = Mean Reciprocal Rank ]
COMPARISON ON OPEN SOURCE PROJECTS
(ANSWERED: RQ3)
 In OSS projects, CoRReCT also performs better than the
baseline technique.
 85.20% accuracy with 84.76% precision and 78.73% recall,
and not significantly different than earlier (p-value=0.239>0.05
for precision)
 Results for private and public codebase are quite close.
25
Metric RevFinder [18] CoRReCT (OSS) CoRReCT (VA)
Top-5 Top-5 Top-5
Accuracy 62.90% 85.20% 92.15%
MRR 0.55 0.69 0.67
MP 62.57% 84.76% 85.93%
MR 58.63% 78.73% 81.39%
[ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
COMPARISON ON DIFFERENT PLATFORMS
(ANSWERED: RQ4)
Metrics Python Java Ruby
Beets St2 Avg. OkHttp Orientdb Avg. Rubocop Vagrant Avg.
Accuracy 93.06% 79.20% 86.13% 88.77% 81.27% 85.02% 89.53% 79.38% 84.46%
MRR 0.82 0.49 0.66 0.61 0.76 0.69 0.76 0.71 0.74
MP 93.06% 77.85% 85.46% 88.69% 81.27% 84.98% 88.49% 79.17% 83.83%
MR 87.36% 74.54% 80.95% 85.33% 76.27% 80.80% 81.49% 67.36% 74.43%
26
[ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
 In OSS projects, results for different platforms look
surprisingly close except the recall.
 Accuracy and precision are close to 85% on average.
 CORRECT does NOT show any bias to any particular platform.
THREATS TO VALIDITY
 Threats to Internal Validity
 Skewed dataset: Each of the 10 selected projects is
medium sized (i.e., 1.1K PR) except CS.
 Threats to External Validity
 Limited OSS dataset: Only 6 OSS projects considered—
not sufficient for generalization.
 Issue of heavy PRs: PRs containing hundreds of files can
make the recommendation slower.
 Threats to Construct Validity
 Top-K Accuracy: Does the metric represent effectiveness
of the technique? Widely used by relevant literature
(Thongtanunam et al, SANER 2015)
27
TAKE-HOME MESSAGES (PART I)
28
1
3
6
4
5
2
Part II: Prediction Model for
Code Review Usefulness
29
RESEARCH PROBLEM: USEFULNESS OF CODE
REVIEW COMMENTS
30
 What makes a review comment
useful or non-useful?
 34.5% of review comments are non-
useful at Microsoft (Bosu et al., MSR 2015)
 No automated support to detect or
improve such comments so far
STUDY METHODOLOGY
31
1,482 Review
comments (4 systems)
Manual tagging with
Bosu et al., MSR 2015
Non-useful
comments (602)
Useful
comments (880)
(1)
Comparative
study
(2)
Prediction
model
COMPARATIVE STUDY: VARIABLES
Independent Variables (8) Response Variable (1)
Reading Ease Textual
Comment Usefulness
(Yes / No)
Stop word Ratio Textual
Question Ratio Textual
Code Element Ratio Textual
Conceptual Similarity Textual
Code Authorship Experience
Code Reviewership Experience
External Lib. Experience Experience
32
 Contrast between useful and non-useful comments.
 Two paradigms– comment texts, and
commenter’s/developer’s experience
 Answers two RQs related to two paradigms.
ANSWERING RQ1: READING EASE
 Flesch-Kincaid Reading Ease applied.
 No significant difference between useful and non-
useful review comments.
33
ANSWERING RQ1: STOP WORD RATIO
 Used Google stop word list and Python keywords.
 Stop word ratio = #stop or keywords/#all words from a
review comment
 Non-useful comments contain more stop words than
useful comments, i.e., statistically significant.
34
ANSWERING RQ1: QUESTION RATIO
 Developers treat clarification questions as non-useful
review comments.
 Question ratio = #questions/#sentences of a comment.
 No significant difference between useful and non-useful
comments in question ratio.
35
ANSWERING RQ1: CODE ELEMENT RATIO
 Important code elements (e.g., identifiers) in the
comments texts, possibly trigger the code change.
 Code element ratio = #source tokens/#all tokens
 Useful comments > non-useful comments for code
element ratio, i.e., statistically significant.
36
ANSWERING RQ1: CONCEPTUAL SIMILARITY
BETWEEN COMMENTS & CHANGED CODE
 How relevant the comment is with the changed code?
 Do comments & changed code share vocabularies?
 Yes, useful comments do more sharing than non-useful ones,
i.e., statistically significant.
37
ANSWERING RQ2: CODE AUTHORSHIP
 File level authorship did not make much difference, a
bit counter-intuitive.
 Project level authorship differs between useful and
non-useful comments, mostly for Q2 and Q3
38
ANSWERING RQ2: CODE REVIEWERSHIP
 Does reviewing experience matter in providing useful
comments?
 Yes, it does. File level reviewing experience matters. Especially
true for Q2 and Q3.
 Experienced reviewers provide more useful comments than non-
useful comments.
39
ANSWERING RQ2: EXT. LIB. EXPERIENCE
 Familiarity with the library used in the changed
code for which comment is posted.
 Significantly higher for the authors of useful
comments for Q3 only.
40
SUMMARY OF COMPARATIVE STUDY
41
RQ Independent Variables Useful vs. Non-useful
Difference
RQ1
Reading Ease Not significant
Stop word Ratio Significant
Question Ratio Not significant
Code Element Ratio Significant
Conceptual Similarity Significant
RQ2
Code Authorship Somewhat significant
Code Reviewership Significant
External Lib. Experience Somewhat significant
EXPERIMENTAL DATASET & SETUP
42
1,482 code review
comments
Evaluation set
(1,116)
Validation set
(366)
Model training &
cross-validation
Validation with
unseen comments
REVHELPER: USEFULNESS PREDICTION MODEL
43
Review
comments
Manual classification
using Bosu et al.
Useful & non-useful
comments Model training
Prediction
model
New review comment
 Prediction of usefulness for a new review
comment to be submitted.
 Applied three ML algorithms– NB, LR, and RF
 Evaluation & validation with different data sets
 Answered 3 RQs– RQ3, RQ4 and RQ5
ANSWERING RQ3: MODEL PERFORMANCE
Learning Algorithm Useful Comments Non-useful Comments
Precision Recall Precision Recall
Naïve Bayes 61.30% 66.00% 53.30% 48.20%
Logistic Regression 60.70% 71.40% 54.60% 42.80%
Random Forest 67.93% 75.04% 63.06% 54.54%
44
 Random Forest based model performs the best.
 Both F1-score and accuracy 66%.
 Comment usefulness and features are not linearly
correlated.
 As a primer, this prediction could be useful.
ANSWERING RQ4: ROLE OF PARADIGMS
45
ANSWERING RQ4: ROLE OF PARADIGMS
46
ANSWERING RQ5: COMPARISON WITH
BASELINE (VALIDATION)
47
ANSWERING RQ5: COMPARISON WITH
BASELINE (ROC)
48
TAKE-HOME MESSAGES (PART II)
 Usefulness of review comments is complex but a
much needed piece of information.
 No automated support available so far to predict
usefulness of review comments instantly.
 Non-useful comments are significantly different from
useful comments in several textual features (e.g.,
conceptual similarity)
 Reviewing experience matters in providing useful
review comments.
 Our prediction model can predict the usefulness of a
new review comment.
 RevHelper performs better than random guessing and
available alternatives.
49
Part III: Impact of Continuous
Integration on Code Reviews
50
RESEARCH PROBLEM: IMPACT OF
AUTOMATED BUILDS ON CODE REVIEWS
 Automated Builds, an
important part of CI for commit
merging & consistency
 Exponential increase of
automated builds over the years
with Travis CI.
 Builds & Code reviews as
interleaving steps in the pull-
based development
RQ1: Does the status of automated builds influence the code
review participation in open source projects?
RQ2: Do frequent automated builds help improve the
overall quality of peer code reviews?
RQ3: Can we automatically predict whether an automated build
would trigger new code reviews or not? 51
ANSWERING RQ1: BUILD STATUS & REVIEW
PARTICIPATION
Build Status Build Only Builds + Reviews Total
Canceled 2,616 1,368 3,984
Errored 51,729 27,262 78,991
Failed 55,546 39,025 94,571
Passed 236,573 164,174 400,747
All 346,464 231,829 (40%) 578,293
52
 578K PR-based builds
 Four build statuses
 232K (40%) build entries
with code reviews.
 Chi-squared tests (p-
value=2.2e-16<0.05)
ANSWERING RQ1: BUILD STATUS & REVIEW
PARTICIPATION
53
Previous
Build status
#PR with Review Comments
Only Added↑ Only Removed↓ Total Changed↑↓
Canceled 20 24 65
Errored 510 265 812
Failed 1,542 826 2,316
Passed 4,235 1,788 5,677
All 6,307 2,903 8,870 (28%)
 31,648 PRs for 232K entries from 1000+ projects
 For 28% PR, #review comments changed.
 Passed builds triggered 18% of new reviews.
 Errored + Failed triggered 10%
ANSWERING RQ2: BUILD FREQUENCY &
REVIEW QUALITY
54
Quantile Issue Comments PR Comments All Review Comments
M p-value ∆ M p-value ∆ M p-value ∆
Q1
0.60
<0.001* 0.35
0.24
<0.001* 0.49
0.84
<0.001* 0.41
Q4
0.99 0.52 1.50
M= Mean #review comments, * = Statistically significant, ∆ = Cliff’s Delta
ANSWERING RQ2: BUILD FREQUENCY &
REVIEW QUALITY
 5 projects from Q1, and 5 from Q4, 3-4 years old
 Cumulative #review comments/build over 48 months
 Code review quality (i.e., #comments) improved almost
linearly for frequently built projects
 Didn’t happen so for the counterpart.
55
ANSWERING RQ3: PREDICTION OF NEW
CODE REVIEW TRIGGERING
Learning
Algorithm
Overall
Accuracy
New Review Triggered
Precision Recall
Naïve Bayes 58.03% 68.70% 29.50%
Logistic Regression 60.56% 64.50% 47.00%
J48 64.04% 69.50% 50.10%
56
 Features: build status, code change statistics, test
change statistics, code review comments. Response:
New review or unchanged.
 Three ML algorithms with 10-fold cross-validation.
 26.5K build entries as dataset.
 J48 performed the best, 64% accuracy, 69.50%
precision & 50% recall.
TAKE-HOME MESSAGE (PART III)
 Automated build might influence manual code
review since they interleave each other in the
modern pull-based development
 Passed builds more associated with review
participations, and with new code reviews.
 Frequently built projects received more review
comments than less frequently built ones.
 Code review activities are steady over time with
frequently built projects. Not true for counterparts.
 Our prediction model can predict whether a build
will trigger new code review or not.
57
REPLICATION PACKAGES
 CORRECT, RevHelper & Travis CI Miner
 http://www.usask.ca/~masud.rahman/correct/
 http://www.usask.ca/~masud.rahman/revhelper/
 http://www.usask.ca/~masud.rahman/msrch/travis/
Please contact Masud Rahman
(masud.rahman@usask.ca) for further details about these
studies and replications.
58
PUBLISHED PAPERS
[1] M. Masudur Rahman, C.K. Roy, and Jason Collins, "CORRECT: Code
Reviewer Recommendation in GitHub Based on Cross-Project and
Technology Experience", In Proceeding of The 38th International
Conference on Software Engineering Companion (ICSE-C 2016), pp. 222--
231, Austin Texas, USA, May 2016
[2] M. Masudur Rahman, C.K. Roy, Jesse Redl, and Jason Collins, "CORRECT:
Code Reviewer Recommendation at GitHub for Vendasta Technologies",
In Proceeding of The 31st IEEE/ACM International Conference on Automated
Software Engineering (ASE 2016), pp. 792--797, Singapore, September 2016
[3] M. Masudur Rahman and C.K. Roy and R.G. Kula, "Predicting Usefulness
of Code Review Comments using Textual Features and Developer
Experience", In Proceeding of The 14th International Conference on Mining
Software Repositories (MSR 2017), pp. 215--226, Buenos Aires, Argentina,
May, 2017
[4] M. Masudur Rahman and C.K. Roy, "Impact of Continuous Integration on
Code Reviews", In Proceeding of The 14th International Conference on
Mining Software Repositories (MSR 2017), pp. 499--502, Buenos Aires,
Argentina, May, 2017
59
THANK YOU!! QUESTIONS?
60
Email: chanchal.roy@usask.ca or
masud.rahman@usask.ca

Contenu connexe

Tendances

Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect predictionThomas Zimmermann
 
Defect effort prediction models in software
Defect effort prediction models in softwareDefect effort prediction models in software
Defect effort prediction models in softwareIAEME Publication
 
Achieving quality with tools case study
Achieving quality with tools case studyAchieving quality with tools case study
Achieving quality with tools case studyEosSoftware
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeAndrey Karpov
 
Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2Dan Toma
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutationsTao He
 
On the Malware Detection Problem: Challenges & Novel Approaches
On the Malware Detection Problem: Challenges & Novel ApproachesOn the Malware Detection Problem: Challenges & Novel Approaches
On the Malware Detection Problem: Challenges & Novel ApproachesMarcus Botacin
 
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...Chakkrit (Kla) Tantithamthavorn
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingJaguaraci Silva
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Chakkrit (Kla) Tantithamthavorn
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
 
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...Marcus Botacin
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESKula Sekhar Reddy Yerraguntla
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsJeongwhan Choi
 

Tendances (20)

Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 
Defect effort prediction models in software
Defect effort prediction models in softwareDefect effort prediction models in software
Defect effort prediction models in software
 
Achieving quality with tools case study
Achieving quality with tools case studyAchieving quality with tools case study
Achieving quality with tools case study
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source Code
 
Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2Technology & innovation Management Course - Session 2
Technology & innovation Management Course - Session 2
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutations
 
Trl and value chain
Trl and value chainTrl and value chain
Trl and value chain
 
On the Malware Detection Problem: Challenges & Novel Approaches
On the Malware Detection Problem: Challenges & Novel ApproachesOn the Malware Detection Problem: Challenges & Novel Approaches
On the Malware Detection Problem: Challenges & Novel Approaches
 
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software Testing
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
 
nullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexitiesnullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexities
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
 
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
 
Saner16b.ppt
Saner16b.pptSaner16b.ppt
Saner16b.ppt
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development Artifacts
 
M018147883
M018147883M018147883
M018147883
 

Similaire à Code-Review-COW56-Meeting

CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016Masud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015Masud Rahman
 
SurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search EngineSurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search EngineMasud Rahman
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
Exploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsExploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsMasud Rahman
 
Would Static Analysis Tools Help Developers with Code Reviews?
Would Static Analysis Tools Help Developers with Code Reviews?Would Static Analysis Tools Help Developers with Code Reviews?
Would Static Analysis Tools Help Developers with Code Reviews?Sebastiano Panichella
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smelleSAT Publishing House
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smelleSAT Journals
 
CASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award TalkCASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award TalkNikolaos Tsantalis
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Iwsm2014 application of function points to software based on open source - ...
Iwsm2014   application of function points to software based on open source - ...Iwsm2014   application of function points to software based on open source - ...
Iwsm2014 application of function points to software based on open source - ...Nesma
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeMasud Rahman
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolKellyton Brito
 
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...IncQuery Labs
 
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...BAINIDA
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Sayed Mohsin Reza
 
Machine programming
Machine programmingMachine programming
Machine programmingDESMOND YUEN
 

Similaire à Code-Review-COW56-Meeting (20)

CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
 
SurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search EngineSurfClipse-- An IDE based context-aware Meta Search Engine
SurfClipse-- An IDE based context-aware Meta Search Engine
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Exploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and ExceptionsExploiting Context in Dealing with Programming Errors and Exceptions
Exploiting Context in Dealing with Programming Errors and Exceptions
 
Test-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical StudyTest-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical Study
 
Would Static Analysis Tools Help Developers with Code Reviews?
Would Static Analysis Tools Help Developers with Code Reviews?Would Static Analysis Tools Help Developers with Code Reviews?
Would Static Analysis Tools Help Developers with Code Reviews?
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smell
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smell
 
CASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award TalkCASCON 2023 Most Influential Paper Award Talk
CASCON 2023 Most Influential Paper Award Talk
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Icsm19.ppt
 
Iwsm2014 application of function points to software based on open source - ...
Iwsm2014   application of function points to software based on open source - ...Iwsm2014   application of function points to software based on open source - ...
Iwsm2014 application of function points to software based on open source - ...
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
 
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
 
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...
 
Machine programming
Machine programmingMachine programming
Machine programming
 

Plus de Masud Rahman

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityMasud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanMasud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud RahmanMasud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanMasud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanMasud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationMasud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017Masud Rahman
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slidesMasud Rahman
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Masud Rahman
 

Plus de Masud Rahman (20)

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slides
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Code-Review-COW56-Meeting

  • 1. TOWARDS AUTOMATED SUPPORTS FOR CODE REVIEWS USING REVIEWER RECOMMENDATION AND REVIEW QUALITY MODELLING Mohammad Masudur Rahman, Chanchal K. Roy, Raula G. Kula, Jason Collins, and Jesse Redl University of Saskatchewan, Canada, Osaka University, Japan Vendasta Technologies, Canada 56th COW: Code Review and Continuous Inspection/Integration
  • 2. CODE REVIEW 2 Code review could be unpleasant 
  • 3. RECAP ON CODE REVIEW Formal inspection Peer code review Modern code review (MCR) Code review is a systematic examination of source code for detecting bugs or defects and coding rule violations. 3 Early bug detection Stop coding rule violation Enhance developer skill
  • 4. PARTIES INVOLVED IN MCR Code Reviewer Patch Submitter Suggestion on appropriate code reviewers Help with good/useful code reviews 4
  • 5. TODAY’S TALK OUTLINE Part I: Code Reviewer Recommendation System Part II: Prediction Model for Review Usefulness 5
  • 6. TODAY’S TALK OUTLINE Part III: Impact of Continuous Integration on Code Reviews 6
  • 7. Part I: Code Reviewer Recommendation 7
  • 8. 8  FOR Novice developers Distributed software development Delayed 12 days (Thongtanunam et al, SANER 2015)
  • 9. EXISTING LITERATURE  Line Change History (LCH)  ReviewBot (Balachandran, ICSE 2013)  File Path Similarity (FPS)  RevFinder (Thongtanunam et al, SANER 2015)  FPS (Thongtanunam et al, CHASE 2014)  Tie (Xia et al, ICSME 2015)  Code Review Content and Comments  Tie (Xia et al, ICSME 2015)  SNA (Yu et al, ICSME 2014) 9  Issues & Limitations  Mine developer’s contributions from within a single project only.  Library & Technology Similarity Library Technology
  • 10. OUTLINE OF THIS STUDY 10 Vendasta codebase CORRECT Evaluation using VendAsta code base Evaluation using Open Source Projects Conclusion Comparative study Exploratory study 3 Research questions
  • 11. EXPLORATORY STUDY ( 3 RQS) RQ1: How frequently do the commercial software projects reuse external libraries from within the codebase? RQ2: Does the experience of a developer with such libraries matter in code reviewer selection by other developers? RQ3: How frequently do the commercial projects adopt specialized technologies (e.g., taskqueue, mapreduce, urlfetch)? 11
  • 12. DATASET: EXPLORATORY STUDY 12  Each project has at least 750 closed pull requests.  Each library is used at least 10 times on average.  Each technology is used at least 5 times on average. 10 utility libraries (Vendasta) 10 commercial projects (Vendasta) 10 Google App Engine Technologies
  • 13. LIBRARY USAGE IN COMMERCIAL PROJECTS (ANSWERED: EXP-RQ1 )  Empirical library usage frequency in 10 projects  Mostly used: vtest, vauth, and vapi  Least used: vlogs, vmonitor 13
  • 14. LIBRARY USAGE IN PULL REQUESTS (ANSWERED: EXP-RQ2)  30%-70% of pull requests used at least one of the 10 libraries  87%-100% of library authors recommended as code reviewers in the projects using those libraries  Library experience really matters! 14 % of PR using selected libraries % of library authors as code reviewers
  • 15. SPECIALIZED TECHNOLOGY USAGE IN PROJECTS (ANSWERED: EXP-RQ3)  Empirical technology usage frequency in top 10 commercial projects  Champion technology: mapreduce 15
  • 16. TECHNOLOGY USAGE IN PULL REQUESTS (ANSWERED: EXP-RQ3)  20%-60% of the pull requests used at least one of the 10 specialized technologies.  Mostly used in: ARM, CS and VBC 16
  • 17. SUMMARY OF EXPLORATORY FINDINGS 17 About 50% of the pull requests used one or more of the selected libraries. (Exp-RQ1) About 98% of the library authors were later recommended as pull request reviewers. (Exp-RQ2) About 35% of the pull requests used one or more specialized technologies. (Exp-RQ3) Library experience and Specialized technology experience really matter in code reviewer selection/recommendation
  • 18. CORRECT: CODE REVIEWER RECOMMENDATION IN GITHUB USING CROSS-PROJECT & TECHNOLOGY EXPERIENCE 18
  • 19. CORRECT: CODE REVIEWER RECOMMENDATION 19 R1 R2 R3 PR Review R1 PR Review R2 PR Review R3 Review Similarity Review Similarity
  • 20. OUR CONTRIBUTIONS 20 State-of-the-art (Thongtanunam et al, SANER 2015) IF IF Our proposed technique--CORRECT = New PR = Reviewed PR = Source file = External library & specialized technology
  • 21. EVALUATION OF CORRECT  Two evaluations using-- (1) Vendasta codebase (2) Open source software projects 21 1: Are library experience and technology experience useful proxies for code review skills? 2: Does CoRReCT outperform the baseline technique for reviewer recommendation? 3: Does CoRReCT perform equally/comparably for both private and public codebase? 4: Does CoRReCT show bias to any of the development frameworks
  • 22. EXPERIMENTAL DATASET  Sliding window of 30 past requests for learning.  Metrics: Top-K Accuracy, Mean Precision (MP), Mean Recall (MR), and Mean Reciprocal rank (MRR). 22 10 Python projects 2 Python, 2 Java & 2 Ruby projects 13,081 Pull requests 4,034 Pull requests Code reviews Code reviewers Gold set
  • 23. LIBRARY EXPERIENCE & TECHNOLOGY EXPERIENCE (ANSWERED: RQ1) Metric Library Similarity Technology Similarity Combined Similarity Top-3 Top-5 Top-3 Top-5 Top-3 Top-5 Accuracy 83.57% 92.02% 82.18% 91.83% 83.75% 92.15% MRR 0.66 0.67 0.62 0.64 0.65 0.67 MP 65.93% 85.28% 62.99% 83.93% 65.98% 85.93% MR 58.34% 80.77% 55.77% 79.50% 58.43% 81.39% 23 [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]  Both library experience and technology experience are found as good proxies, provide over 90% accuracy.  Combined experience provides the maximum performance.  92.15% recommendation accuracy with 85.93% precision and 81.39% recall.  Evaluation results align with exploratory study findings.
  • 24. COMPARATIVE STUDY FINDINGS (ANSWERED: RQ2)  CoRReCT performs better than the competing technique in all metrics (p-value=0.003<0.05 for Top-5 accuracy)  Performs better both on average and on individual projects.  RevFinder uses PR similarity using source file name and file’s directory matching 24 Metric RevFinder[18] CoRReCT Top-5 Top-5 Accuracy 80.72% 92.15% MRR 0.65 0.67 MP 77.24% 85.93% MR 73.27% 81.39% [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
  • 25. COMPARISON ON OPEN SOURCE PROJECTS (ANSWERED: RQ3)  In OSS projects, CoRReCT also performs better than the baseline technique.  85.20% accuracy with 84.76% precision and 78.73% recall, and not significantly different than earlier (p-value=0.239>0.05 for precision)  Results for private and public codebase are quite close. 25 Metric RevFinder [18] CoRReCT (OSS) CoRReCT (VA) Top-5 Top-5 Top-5 Accuracy 62.90% 85.20% 92.15% MRR 0.55 0.69 0.67 MP 62.57% 84.76% 85.93% MR 58.63% 78.73% 81.39% [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
  • 26. COMPARISON ON DIFFERENT PLATFORMS (ANSWERED: RQ4) Metrics Python Java Ruby Beets St2 Avg. OkHttp Orientdb Avg. Rubocop Vagrant Avg. Accuracy 93.06% 79.20% 86.13% 88.77% 81.27% 85.02% 89.53% 79.38% 84.46% MRR 0.82 0.49 0.66 0.61 0.76 0.69 0.76 0.71 0.74 MP 93.06% 77.85% 85.46% 88.69% 81.27% 84.98% 88.49% 79.17% 83.83% MR 87.36% 74.54% 80.95% 85.33% 76.27% 80.80% 81.49% 67.36% 74.43% 26 [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]  In OSS projects, results for different platforms look surprisingly close except the recall.  Accuracy and precision are close to 85% on average.  CORRECT does NOT show any bias to any particular platform.
  • 27. THREATS TO VALIDITY  Threats to Internal Validity  Skewed dataset: Each of the 10 selected projects is medium sized (i.e., 1.1K PR) except CS.  Threats to External Validity  Limited OSS dataset: Only 6 OSS projects considered— not sufficient for generalization.  Issue of heavy PRs: PRs containing hundreds of files can make the recommendation slower.  Threats to Construct Validity  Top-K Accuracy: Does the metric represent effectiveness of the technique? Widely used by relevant literature (Thongtanunam et al, SANER 2015) 27
  • 28. TAKE-HOME MESSAGES (PART I) 28 1 3 6 4 5 2
  • 29. Part II: Prediction Model for Code Review Usefulness 29
  • 30. RESEARCH PROBLEM: USEFULNESS OF CODE REVIEW COMMENTS 30  What makes a review comment useful or non-useful?  34.5% of review comments are non- useful at Microsoft (Bosu et al., MSR 2015)  No automated support to detect or improve such comments so far
  • 31. STUDY METHODOLOGY 31 1,482 Review comments (4 systems) Manual tagging with Bosu et al., MSR 2015 Non-useful comments (602) Useful comments (880) (1) Comparative study (2) Prediction model
  • 32. COMPARATIVE STUDY: VARIABLES Independent Variables (8) Response Variable (1) Reading Ease Textual Comment Usefulness (Yes / No) Stop word Ratio Textual Question Ratio Textual Code Element Ratio Textual Conceptual Similarity Textual Code Authorship Experience Code Reviewership Experience External Lib. Experience Experience 32  Contrast between useful and non-useful comments.  Two paradigms– comment texts, and commenter’s/developer’s experience  Answers two RQs related to two paradigms.
  • 33. ANSWERING RQ1: READING EASE  Flesch-Kincaid Reading Ease applied.  No significant difference between useful and non- useful review comments. 33
  • 34. ANSWERING RQ1: STOP WORD RATIO  Used Google stop word list and Python keywords.  Stop word ratio = #stop or keywords/#all words from a review comment  Non-useful comments contain more stop words than useful comments, i.e., statistically significant. 34
  • 35. ANSWERING RQ1: QUESTION RATIO  Developers treat clarification questions as non-useful review comments.  Question ratio = #questions/#sentences of a comment.  No significant difference between useful and non-useful comments in question ratio. 35
  • 36. ANSWERING RQ1: CODE ELEMENT RATIO  Important code elements (e.g., identifiers) in the comments texts, possibly trigger the code change.  Code element ratio = #source tokens/#all tokens  Useful comments > non-useful comments for code element ratio, i.e., statistically significant. 36
  • 37. ANSWERING RQ1: CONCEPTUAL SIMILARITY BETWEEN COMMENTS & CHANGED CODE  How relevant the comment is with the changed code?  Do comments & changed code share vocabularies?  Yes, useful comments do more sharing than non-useful ones, i.e., statistically significant. 37
  • 38. ANSWERING RQ2: CODE AUTHORSHIP  File level authorship did not make much difference, a bit counter-intuitive.  Project level authorship differs between useful and non-useful comments, mostly for Q2 and Q3 38
  • 39. ANSWERING RQ2: CODE REVIEWERSHIP  Does reviewing experience matter in providing useful comments?  Yes, it does. File level reviewing experience matters. Especially true for Q2 and Q3.  Experienced reviewers provide more useful comments than non- useful comments. 39
  • 40. ANSWERING RQ2: EXT. LIB. EXPERIENCE  Familiarity with the library used in the changed code for which comment is posted.  Significantly higher for the authors of useful comments for Q3 only. 40
  • 41. SUMMARY OF COMPARATIVE STUDY 41 RQ Independent Variables Useful vs. Non-useful Difference RQ1 Reading Ease Not significant Stop word Ratio Significant Question Ratio Not significant Code Element Ratio Significant Conceptual Similarity Significant RQ2 Code Authorship Somewhat significant Code Reviewership Significant External Lib. Experience Somewhat significant
  • 42. EXPERIMENTAL DATASET & SETUP 42 1,482 code review comments Evaluation set (1,116) Validation set (366) Model training & cross-validation Validation with unseen comments
  • 43. REVHELPER: USEFULNESS PREDICTION MODEL 43 Review comments Manual classification using Bosu et al. Useful & non-useful comments Model training Prediction model New review comment  Prediction of usefulness for a new review comment to be submitted.  Applied three ML algorithms– NB, LR, and RF  Evaluation & validation with different data sets  Answered 3 RQs– RQ3, RQ4 and RQ5
  • 44. ANSWERING RQ3: MODEL PERFORMANCE Learning Algorithm Useful Comments Non-useful Comments Precision Recall Precision Recall Naïve Bayes 61.30% 66.00% 53.30% 48.20% Logistic Regression 60.70% 71.40% 54.60% 42.80% Random Forest 67.93% 75.04% 63.06% 54.54% 44  Random Forest based model performs the best.  Both F1-score and accuracy 66%.  Comment usefulness and features are not linearly correlated.  As a primer, this prediction could be useful.
  • 45. ANSWERING RQ4: ROLE OF PARADIGMS 45
  • 46. ANSWERING RQ4: ROLE OF PARADIGMS 46
  • 47. ANSWERING RQ5: COMPARISON WITH BASELINE (VALIDATION) 47
  • 48. ANSWERING RQ5: COMPARISON WITH BASELINE (ROC) 48
  • 49. TAKE-HOME MESSAGES (PART II)  Usefulness of review comments is complex but a much needed piece of information.  No automated support available so far to predict usefulness of review comments instantly.  Non-useful comments are significantly different from useful comments in several textual features (e.g., conceptual similarity)  Reviewing experience matters in providing useful review comments.  Our prediction model can predict the usefulness of a new review comment.  RevHelper performs better than random guessing and available alternatives. 49
  • 50. Part III: Impact of Continuous Integration on Code Reviews 50
  • 51. RESEARCH PROBLEM: IMPACT OF AUTOMATED BUILDS ON CODE REVIEWS  Automated Builds, an important part of CI for commit merging & consistency  Exponential increase of automated builds over the years with Travis CI.  Builds & Code reviews as interleaving steps in the pull- based development RQ1: Does the status of automated builds influence the code review participation in open source projects? RQ2: Do frequent automated builds help improve the overall quality of peer code reviews? RQ3: Can we automatically predict whether an automated build would trigger new code reviews or not? 51
  • 52. ANSWERING RQ1: BUILD STATUS & REVIEW PARTICIPATION Build Status Build Only Builds + Reviews Total Canceled 2,616 1,368 3,984 Errored 51,729 27,262 78,991 Failed 55,546 39,025 94,571 Passed 236,573 164,174 400,747 All 346,464 231,829 (40%) 578,293 52  578K PR-based builds  Four build statuses  232K (40%) build entries with code reviews.  Chi-squared tests (p- value=2.2e-16<0.05)
  • 53. ANSWERING RQ1: BUILD STATUS & REVIEW PARTICIPATION 53 Previous Build status #PR with Review Comments Only Added↑ Only Removed↓ Total Changed↑↓ Canceled 20 24 65 Errored 510 265 812 Failed 1,542 826 2,316 Passed 4,235 1,788 5,677 All 6,307 2,903 8,870 (28%)  31,648 PRs for 232K entries from 1000+ projects  For 28% PR, #review comments changed.  Passed builds triggered 18% of new reviews.  Errored + Failed triggered 10%
  • 54. ANSWERING RQ2: BUILD FREQUENCY & REVIEW QUALITY 54 Quantile Issue Comments PR Comments All Review Comments M p-value ∆ M p-value ∆ M p-value ∆ Q1 0.60 <0.001* 0.35 0.24 <0.001* 0.49 0.84 <0.001* 0.41 Q4 0.99 0.52 1.50 M= Mean #review comments, * = Statistically significant, ∆ = Cliff’s Delta
  • 55. ANSWERING RQ2: BUILD FREQUENCY & REVIEW QUALITY  5 projects from Q1, and 5 from Q4, 3-4 years old  Cumulative #review comments/build over 48 months  Code review quality (i.e., #comments) improved almost linearly for frequently built projects  Didn’t happen so for the counterpart. 55
  • 56. ANSWERING RQ3: PREDICTION OF NEW CODE REVIEW TRIGGERING Learning Algorithm Overall Accuracy New Review Triggered Precision Recall Naïve Bayes 58.03% 68.70% 29.50% Logistic Regression 60.56% 64.50% 47.00% J48 64.04% 69.50% 50.10% 56  Features: build status, code change statistics, test change statistics, code review comments. Response: New review or unchanged.  Three ML algorithms with 10-fold cross-validation.  26.5K build entries as dataset.  J48 performed the best, 64% accuracy, 69.50% precision & 50% recall.
  • 57. TAKE-HOME MESSAGE (PART III)  Automated build might influence manual code review since they interleave each other in the modern pull-based development  Passed builds more associated with review participations, and with new code reviews.  Frequently built projects received more review comments than less frequently built ones.  Code review activities are steady over time with frequently built projects. Not true for counterparts.  Our prediction model can predict whether a build will trigger new code review or not. 57
  • 58. REPLICATION PACKAGES  CORRECT, RevHelper & Travis CI Miner  http://www.usask.ca/~masud.rahman/correct/  http://www.usask.ca/~masud.rahman/revhelper/  http://www.usask.ca/~masud.rahman/msrch/travis/ Please contact Masud Rahman (masud.rahman@usask.ca) for further details about these studies and replications. 58
  • 59. PUBLISHED PAPERS [1] M. Masudur Rahman, C.K. Roy, and Jason Collins, "CORRECT: Code Reviewer Recommendation in GitHub Based on Cross-Project and Technology Experience", In Proceeding of The 38th International Conference on Software Engineering Companion (ICSE-C 2016), pp. 222-- 231, Austin Texas, USA, May 2016 [2] M. Masudur Rahman, C.K. Roy, Jesse Redl, and Jason Collins, "CORRECT: Code Reviewer Recommendation at GitHub for Vendasta Technologies", In Proceeding of The 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016), pp. 792--797, Singapore, September 2016 [3] M. Masudur Rahman and C.K. Roy and R.G. Kula, "Predicting Usefulness of Code Review Comments using Textual Features and Developer Experience", In Proceeding of The 14th International Conference on Mining Software Repositories (MSR 2017), pp. 215--226, Buenos Aires, Argentina, May, 2017 [4] M. Masudur Rahman and C.K. Roy, "Impact of Continuous Integration on Code Reviews", In Proceeding of The 14th International Conference on Mining Software Repositories (MSR 2017), pp. 499--502, Buenos Aires, Argentina, May, 2017 59
  • 60. THANK YOU!! QUESTIONS? 60 Email: chanchal.roy@usask.ca or masud.rahman@usask.ca

Notes de l'éditeur

  1. When I searched in the web, I found this. Obviously, code review is not a very good experience all the time  (1) If you do not select appropriate reviewers, the review could be disastrous. (2) If you do a poor code review yourself, that also could be embarrassing for you!
  2. We already had a lot of talks on code review. Anyway, just to recap, code review is a systematic process of source code that identifies defects and coding standard violation of the code. It helps in early bug detection—thus reduces cost. It also ensures code quality by maintaining the coding standards. Code review has also evolved. First, it was formal inspection which was time-consuming, slow and costly. Then came a less formal code review—peer code review. Now we do tool assisted code review—also called modern code review.
  3. There are two parties involved in modern code reviews -- (1) patch submitter and (2) code reviewers. Since modern code review promotes tool support for code reviews, these two parties actually need two different types of tool supports for successful code reviews. For example, a patch submitter needs automated support in finding appropriate code reviewers for his/her pull requests. On the other hand, a code reviewer needs a tool that can assist during the code review session. One type of support could be -- determining usefulness of the review comments in real time.
  4. In our work, we developed tool supports for both of these parties. Our talk is divided into two parts: In part I, we will talk about a code reviewer recommendation system. In part II, we will talk about a model that can predict the usefulness of code review comments.
  5. Since our focus is to extend automated supports in various aspects of code reviews, we conduct a third study. In this study, we analyze the impact of continuous integration on code reviews.
  6. Part I
  7. This is an example for code review at GitHub. Once a developer submits a pull request, a way to submitting changes at GitHub, The core developers/ reviewers can review the changes and provide their feedback like this. Our goal in this research is to identify appropriate code reviewers for such a pull request. Identifying such code reviewers is very important especially for the novice developers who do not know the skill set their fellow developers. It is also very essential for distributed development where the developers rarely meet face to face. Besides an earlier study suggest that without appropriate code reviewers the whole change submission could be 12 days late on average. However, identifying such reviewers is challenging since the skill is not obvious, and it would require massive mining of the revision history.
  8. The earlier study analyze line change history of source code, file path similarity of source files and review comments. In short, they mostly considered the work experience of a candidate code reviewers within a single project only. However, some skills span across multiple projects such as working experience with specific API libraries or specialized technologies. Also in an industrial setting, a developer’s contribution scatter throughout different projects within the company codebase. We thus consider external libraries and APIs included in the changed code and suggest more appropriate code reviewers.
  9. This is the outline of my today’s talk. We collect commercial projects and libraries from Vendasta codebase, a medium sized Canadian software company. Then we ask 3 research questions and conduct an exploratory study to answer those questions. Based on those findings, we then propose our recommendation technique—CORRECT. Then in the experiments, we experimented commercial projects, compare with the state-of-the-art and also experimented with Open source projects. Finally, we conclude the talk.
  10. We ask these three research questions. In a commercial codebase, there are two types of projects– customer project and utility projects. The utility projects are also called libraries. We ask. How frequently do the commercial software projects reuse external libraries in their code? Does working experience on such libraries matter in code reviewer selection? That means does a reviewer with such experience get preference over the others? Does working experience with specialized technologies such as mapreduce, taskqueue matter in code reviewer selection?
  11. This is connectivity graph of core projects and internal libraries from Vendasta codebase. We see the graph is pretty much connected, that means most of the libraries are used by most of the projects. For the study, we chose 10 projects and 10 internal libraries, and they are chosen based on certain restriction. Each project should have 750 closed pull requests, that means they should be pretty big, and most importantly pretty much active. Each internal library should be used at least 10 times on average by each of those projects. Each technology, I mean the specialized technology is should be used at least 5 times on average by each of those projects. We consider the Google App Engine libraries as the specialized technologies. We consider 10 of them.
  12. This is the usage frequency of the selected libraries in the 10 project we selected for the study. We take the latest snapshot of each of the projects, analyze their source files, and look for imported libraries using an AST parser. We try to find out those 10 libraries mostly, and this is the box plot of their frequencies. We can see that vtest, vauth and vapi are the mostly used, which a kind of make sense especially vtest and vauth, they provide possibly the generic testing and authentication support. However, vtest has a large variance, that means some projects used it extensively whereas the others didn’t use it at all. The least used libraries are vlogs and vmonitor. So, this is the empirical frequencies.
  13. We investigated the ratio of pull requests that used any of the selected libraries. We note that 30%-70% of all pull requests did that in different projects. We also investigated the percentage of the library authors who are later recommended as code reviewers for the projects referring to that library. We considered a developer as library author if he/she authored at least one pull request of the library. We note that almost 100% of authors are later recommended. This is a very interesting findings that suggest that library experience really matters.
  14. We also calculated the empirical frequency of the ten specialized technologies in the selected Vendasta projects And this is the box plot. We can see that mapreduce is the champion technology here, and the rest are close competitors.
  15. In case of the pull requests, 20%-60% pull requests used at least one of ten specialized technologies Mostly used by ARM, CS and VBC. So, specialized technologies are also used in our selected projects quite significantly.
  16. So, here are the empirical findings from the exploratory studies we conducted. They suggest that library experience and specialized technology experience really matter. These are new findings, and we exploit them to develop the recommendation algorithm later.
  17. Based on those exploratory findings, we propose CORRECT– Code reviewer recommendation based on cross-project and technology experience.
  18. This is our recommendation algorithm. Once a new pull request R3 is created, we analyze its commits, then source files, and look for the libraries referred and the specialized technologies used. Thus, we get a library token list and a technology token list. We combined both lists, and this list can be considered as a summary of libraries and technologies for the new pull request. Now, we consider the latest 10 but closed pull requests, and collect their library and technology tokens. It should be noted that the past requests contain their code reviewers. Now, we estimate the similarity between the new and each of the past requests. We use cosine similarity score between their token list. We add that score to the corresponding code reviewers. This way, finally, we get a list of reviewers who got accumulated score for different past reviews. Then they are ranked and top reviewers are recommended. Thus, we use pull request similarity score to estimate the relevant expertise of code review.
  19. Now, to be technically specific The state-of-the-art considers two pull requests relevant/similar if they share source code files or directories. On the other hand, we suggest that two pull requests are relevant/similar if they share the same external libraries and specialized technologies. That’s the major difference in methodology and our core technical contribution.
  20. We performed two evaluations– one with Vendasta codebase, and the other with Open source codebase. From that experiments, we try to answer four research questions. Are library experience and technology experience useful proxies for code review skills? Can our technique outperform the state-of-the-art technique from the literature? Does it perform equally for closed source and open source projects? Does it show any bias to any particular platform?
  21. We conducted experiments using 10 projects from GitHub codebase and 6 projects from open source domain. From Vendasta, we collected 13K pull requests, and from open source, we collect 4K pull requests. Gold reviewers are collected from the corresponding pull requests. Vendasta projects are python-based whereas the OS projects are written in python, Java and Ruby. We consider four performance metrics– accuracy, precision, recall, and reciprocal rank. In case of accuracy, if the recommendation contains at least one gold reviewer, we consider the recommendation is accurate.
  22. This is how we answer the first RQ. We see that both library similarity and technology similarity are pretty good proxies for code review skills. Each of them provides over 90% top-5 accuracy. However, when we combine, we get the maximum—92% top-5 accuracy. The precision and recall are also greater than 80% which is highly promising according to relevant literature.
  23. We then compare with the state-of-the-art –RevFinder. We found that our performance is significantly better than theirs. We get a p-value of 0.003 for top-5 accuracy with Mann-Whitney U tests. The median accuracy 95%. The median precision and median recall are between 85% to 90% In the case of individual projects, our technique also outperformed the state-of-the-art.
  24. We also experimented using 6 Open source projects, and found 85% Top-5 accuracy. For the case of precision and recall, they are not significantly different from those with Vendasta projects. For example, with precision, we get a p-value of 0.239 which is greater than 0.05.
  25. This slide shows how CORRECT performed with the projects from 3 programming platforms– Python, Java and Ruby. We also find quite similar performance for each of the platforms which is interesting. This shows that our findings with commercial projects are quite generalizable.
  26. There are a few threats to the validity of our findings. -- The dataset from VA codebase is a bit skewed. Most of the projects are medium and only one project is big. --Also the projects considered from open source domain is limited. --Also, the technique could be slower for big pull requests.
  27. Now to summarize Code review could be unpleasant or unproductive without appropriate code reviewers. We fist motivated our technique using an exploratory study which suggested that --library experience and specialized technology experience really matter for code reviewer selection. Then we proposed our technique—CORRECT that learns from past review history and then recommends. We experimented using both commercial and open source projects, and compared with the state-of-the-art. The results clearly demonstrate the high potential of our technique.
  28. For example, these are two review comments. This one triggers a new code change whereas the other one does not. Now, professional developers @Microsoft considered the change-triggering comment as the useful one. Now, in this research, we try to answer what makes a review comment useful? If not useful, can we identify that before sending them to the patch submitters? About 35% of the review comments are found non-useful at Microsoft, which is a significant amount. Unfortunately, there exist no automated supports for detecting or improving such comments so far. So, here comes our study.
  29. So, in this research, we try to understand what makes a review comment useful. For that, we need real code review data. So, we collaborated with Vendasta, a local software company with 150+ employees, and collect about 1,500 recent review comments from four of their systems. Then we apply the change-triggering heuristic of an earlier study, and manually annotate the comments. That is, if a review comment triggers new code change in its vicinity, it is a useful comment and vice versa. Now, we have to distinct sets of comments. Now we do comparative analysis between them, which is our first contribution. Then, we also develop a prediction model so that the non-useful comments can be predicted before submitting to the developers. The goal is to improve the non-useful review comments through identification and possible suggestions.
  30. In the comparative study, we contrast between useful and non-useful comments. We have independent and response variables. We consider two paradigms– comment texts, and the corresponding developer’s experience. In the textual aspect we consider five features of the comments such as reading ease, code element ratio or conceptual similarity between comment texts and the changed code. In the experience paradigm, we consider the authorship, review experience, and library experience of the developers. We have one response variable, that is the comment is useful or not– yes or no.
  31. We applied Flesch-Kincaid reading ease to both review comments which returns a ease score between 0 to 100. We did not find any significant difference in these scores for useful and non-useful comments.
  32. The second variable is stop word ratio. We used Google’s stop word list and Python keyword list since our code base is of Python. We found that non-useful comments contain significantly higher ratio of stop words than useful comments. The findings is also confirmed from quartile level analysis.
  33. Developers often treat clarification question as a sign of non-useful comments. So, we determine question ratio = # of questions/#sentences for each of the comments. We did not find any such evidence of developers claim. There is no significant difference between useful and non-useful comments in the question ratio
  34. From our manual observations, we note that review comments often contain relevant code elements such as method names, or identifiers. We also contrast on code element ratio between useful and non-useful comments. We found that useful comments contain more code elements than non-useful comments.
  35. How relevant the comment is against the changed code for which the comment is posted? This can be determined between lexical similarity between comments and the changed code. Similar concepts are applied with Stack Overflow questions and answers. We found that useful comments are conceptually more similar to their target code, than the non-useful comments. That is, you have to use familiar vocabulary to patch submitters to make your comment useful.
  36. As opposed to earlier findings, file level authorship did not make too much difference. That is, 46% of the useful comment authors changed file at least once. The same statistic for non-useful comments is 49% However, in the case with project level authorship of the developers, we found some difference between useful and non-useful comments. Mostly for Q2 and Q3. For more details, please read the paper.
  37. We also investigate if the reviewing experience is different between the authors of useful and non-useful comments. We found that file level reviewing experience matters here. Reviewing experience reduces the non-useful comments from a developer. Also experienced developers provide more useful comments than less experienced developers.
  38. We repeat the same for the external library experience, and found that developer’s experience for useful comments is slightly higher than that of non-useful comments.
  39. So, these are the summaries from our comparative study. We found three textual features are significantly different between useful and non-useful comments which are interesting. No studies explored these aspects earlier. We also confirmed some of the earlier findings on the developer experience paradigm as well.
  40. We divide our dataset into two groups– evaluation set and validation set. With the evaluation set, we evaluate our technique with 10-fold cross validation. Using the validation set, we validate the performance of our prediction model.
  41. These are the steps of our model development. We first annotate the comments using change-triggering heuristic. Then we extract the textual and experiment features of the comments, train our model using three learning algorithms. Then we use our model to predict whether a new review comment is useful or not. If a review comment is not useful, we can explain that with our identified features as well.
  42. Our evaluation shows that RF-based model performs the best, especially it can identify the non-usefulness comments more than the other variants. Since RF is a tree-based model, it also confirms that comment usefulness and features are NOT linearly correlated. The model provides 66% accuracy with 10-fold cross validation. As a state-of-the-art, this could be useful. Something is always better than nothing we believe.
  43. We also investigate how each of the features perform or contribute during comment usefulness prediction. We see that Conceptual Similarity, a textual feature has the strongest role in the prediction. For example, if we discard that feature, our prediction accuracy drops by 25% The next important features are reviewing experience stats of the developers.
  44. We also contrast between two group of features– textual features and experience features. While experience is a strong metric overall, it has limited practical use/applicability during usefulness prediction. That is why, we introduce the textual features. It also shows some interesting results. ROC and PR-curve show that our prediction is much better than random guessing.
  45. We compare we five variants of a baseline model – Bosu et al., MSR 2015 That model considers a set of keywords and sentiment analysis to separate useful and non-useful comments. However, according to our experiments, these metrics are not sufficient enough, especially for usefulness prediction. We see that our identified features are more effective, and prediction much more accurate than theirs.
  46. This is ROC for our model and the competitive models. While the baseline models are close to random guessing, we are doing much better. Obviously, these can be further improved with more accurate features and possibly more training dataset.
  47. Possibly you can read them out I guess.
  48. Automated build is an integral part of CI. It checks for commit merge success, basically if the patch is breaking something or not. From the challenge data, we see a significant adoption of Travis CI over the years. In the pull-based software development, automated builds and code reviews are basically interleaving steps. Both of them focus on quality assurance, but using different means– automated and manual. Given that code review is a widely adopted practice nowadays both in the industry and open source, We tried to investigate how automated builds can influence the code reviews. So, we formulate three RQs ---Read them out
  49. In the RQ1, we try to find out if review participation is correlated to/affected by automated build status. Out of 578K PR-based build entries, we found 40% are associated with code reviews. In order to contrast between entries not related to code reviews and related to code reviews, we extract two equal size samples. We consider #review comments>0 as the indication of review participation. Then our Chi-squared tests reported that build status can significantly affect the review participation which refutes the null hypothesis. We also perform Pearson correlation between build status and #commits with code reviews for each project, and we found passed builds are the most correlated to the review participation. That is, yes, people are interested to review such code, that does not contain any visible errors (e.g., compile errors, merge errors) But yes, errored and failed builds also received some code reviews as well, which is interesting.
  50. In RQ1, we also investigate whether a previous build can trigger new code reviews or not. We found, that is true for 28% of the pull requests from our investigation. We found that passed builds triggered the most of them, like 18% of the code reviews. The errored and failed builds triggered 10% of the reviews.
  51. In the RQ2, we investigate how build frequency might influence the code review quality. We consider review comment count as a proxy to code review quality. We divide the projects into four quartiles based on their #build entries per month. Then we contrast the review comments from the projects of Q1 and Q4 in the box plot. We see that frequently build projects have more review comments than that of less frequently built projects. That is, frequently builds possibly trigger further code reviews that the counterparts. The statistical tests below also confirmed that findings.
  52. We also investigate how build frequency can affect the code review activities over a certain period. For this, we select 5 most frequent projects from Q4, and 5 least frequent projects from Q1. Each of them is 3-4 years old. Then for each month, we calculate review comment counts per build for each project, actually the plot shows the cumulative version. We see that frequently built projects have almost linear increase in their review comments. The same did not happen for least-frequent projects, they look zigzag. That regular/frequent builds might have encouraged code reviews, and keep up a steady flow.
  53. Whether a particular build needs further code reviews or not, this is a vital piece of information for project management. We apply ML on the challenge data to find such info. We apply 3 algorithms on the data where we use build status, code change stats, test change stats, and previous code review stats to predict such info. J48 performed the best according to our experiment with 70% precision and 50% recall.
  54. Possibly you can read them out I guess.
  55. Thanks for your time. Questions!!