SlideShare une entreprise Scribd logo
1  sur  63
The “Bellwether” Effect
Rahul Krishna (rkrish11@ncsu.edu)
Tim Menzies, and Wei Fu
And Its Implications to Transfer Learning
1
2WeTOSM ‘14
[Turhan09] Data from
Turkish toasters can
predict defects in
NASA flight systems
Today’s topic:
Transfer Learning
3
Today’s topic:
Simpler Transfer Learning with
“Bell…. what?”
Definitions
Bellwether effect
4
• If a community builds
many software projects
• There exists one ∈ many
from which
• quality predictors can
be built …
• … and used for all
Bellwether method
• find the one
• use it
Definitions
5
• find the one
• use it
Note: vastly simpler than other transfer learning
methods [Turhan09, Turhan11, Nam13, etc]
Bellwether effect Bellwether method
• If a community builds
many software projects
• There exists one ∈ many
from which
• quality predictors can
be built …
• … and used for all
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 6
The “Cold-Start” Problem
Past Projects Prediction Model Upcoming
releases
7
The “Cold-Start” Problem
Past Projects Prediction Model
?
8
Upcoming
releases
Challenges:
Variable Datasets
... “New projects are always emerging,
and old ones are being rewritten…”
… “the quality, representativeness,
and volume of the training data have a
major influence on the usefulness
and stability of model performance…”
— Rahman et al.
[Rah12]
Growing Volume
Of Projects
9
• Unstable conclusions are typical in SE [Menzies12]
• Usefulness of some lesson “X” is contradictory
Challenges:
Conclusion Instability
10
• Unstable conclusions are typical in SE [Menzies12]
• Usefulness of some lesson “X” is contradictory
Challenges:
Conclusion Instability
11
Kitchenham et al. ‘07
• Are data from other
organizations …
• … as useful as local
data?
• Inconclusive
• 3 cases: Just as good.
4 cases: Worse.
• Unstable conclusions are typical in SE [Menzies12]
• Usefulness of some lesson “X” is contradictory
Challenges:
Conclusion Instability
12
Zimmermann et al. ‘09
• 622 pairs of projects
• Only 4% of pairs
were useful
Kitchenham et al. ‘07
• Are data from other
organizations …
• … as useful as local
data?
• Inconclusive
• 3 cases: Just as good.
4 cases: Worse.
• Menzies et al. [Men12] offer several ways
• They ask for better experimental practice.
• Is there a better way?
•Yes! Look for the “Bellwether”
• As long as the bellwether continues to offer good
quality predictions
•Then conclusions from one…
•... are conclusions for all
13
How to Reduce this Instability?
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 14
Estimating Quality
Why not Static Analyzers?
• [Rahman14] et al. compared
• Code analysis tools:
FindBugs, JLint, and PMD
• with Static Code defect
Predictors
• Found no difference
(measurement: AUCEC)
15
• And
• Using lightweight parsers...
• … Defect predictors can
quickly jump to new
languages
• Same is not true for static
code analysis tools
• Lesser Bugs Better Software
Estimating Quality
Why not Static Analyzers?
16
• And
• They work surprisingly well!
• [Ostrand04]: ~80% of the bugs localized
in 20% of the code
Estimating Quality:
Static code Defect Prediction
1. Ubiquitous
• Researchers and Industrial practitioners frequently use
them. Eg. Companies like Google [Lew14], V&V books
[Raktin01]
2. A lot of (ongoing) research
• Tremendous Attention [Nam13]
• Better approaches are constantly being proposed
3. They are easy to use
• Software Metrics can be collected fast
• Wide variety of tools, open source data miners
[sklearn][weka]
17
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 18
Transfer Learning:
Introduction
• Extract knowledge from source (S) and apply to
target (T)
• Data needs to be massaged before use[Zhang15]
• Careful sub-sampling
• Transformation
• Based on data source, TL is categorized as:
• Homogeneous vs. Heterogeneous
• Based on transformation[Nam13, Nam15, Jing15]
• Similarity vs. Dimensionality
19
Transfer Learning:
Categories
Homogeneous
• Source (S) and Target
(T) are quantified using
the same attributes
Heterogeneous
• Source (S) and Target
(T) are quantified using
different attributes
Similarity
• Learn from subsampled
rows/columns of the
source (S)
Dimensionality
• Manipulate
rows/columns of
source (S) to match
target (T)
20
Heterogeneous
• Source (S) and Target
(T) are quantified using
different attributes
Dimensionality
• Manipulate
rows/columns of
source (S) to match
target (T)
Transfer Learning:
Categories
Homogeneous
• Source (S) and Target
(T) are quantified using
the same attributes
Similarity
• Learn from subsampled
rows/columns of the
source (S)
This Talk
21
Homogeneous TL:
Burak Filter
22
• Burak[Tur09] used relevancy filtering
• Filter using kNN
• Gather two sets of data
• Validation set (S) Test Data
• Candidate set (T) Train Data
• Use kNN
• Pick “similar” instances from T
• Filter T using S
Homogeneous TL:
Burak Filter
• First study on relevancy
• Their conclusion:
23
… The performances of defect predictors based on the
NN-filtered data do not give necessary empirical
evidence to make a strong conclusion …
… Sometimes NN data based models may perform
better than WC data based models …
Homogeneous TL:
Mixed Model Learner
• Turhan et al.[Tur11] proposed a mixed-model learner
• Combine local data with curated non-local data
• Gather two sets of data
• Validation set (S): Pick a random 10% of local data
• Candidate set (T): Remaining 90% and non-local data
• For non-local data, they use Burak filter[Tur09]
• Experiment with various 90%-10% splits
• 400 experiments were conducted to pick the best model
24
Homogeneous TL:
Mixed Model Learner
• Extension to Burak Filter
• Incorporated local data
Challenges
• Similar issues as Burak Filter
• Biased; Unstable model.
• The authors report:
… mixed project models offer only limited improvements
i.e., 3 out 10 projects
— Turhan
‘11
25
Homogeneous TL:
Addressing the challenges
• Researchers have offered a bleak view of TL
• Zimmerman et al.[Zimm09]
•Transfer is not always consistent
•IE could learn from Firefox but not vice versa
•Rahman et al.[Rahman12]
•The “imprecision” of learning across projects
• Recent research has resorted to more complex
approaches
26
More Transfer Learners …
27 WeTOSM ‘14
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 28
Is this complexity necessary?
• Short answer — No
• Just look for the “Bellwether”
•Use our bellwether method
•Build your model
•Et voilà!
29
The Bellwether Method
Generate
Apply Monitor
#
The Bellwether Method
Generate
• Project Pairs Pi , j
• Perform a Leave-one-out Test
Train on Pi Test on Pj
• Pick the Project with the
best model
Apply Monitor
#
The Bellwether Method
Generate
Apply
• Predict Quality
on future
projects
Monitor
#
The Bellwether Method
Generate
Apply
Monitor
• When
predictions
fail. Restart.
#
The Bellwether Method
Generate
Apply Monitor
#
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 35
Experiment Setup:
Benchmark Data
• 120 Datasets from 4 communities
• Defects in 3 levels of granularity
• File, Class, and Function
• Open source and Proprietary
36
Experiment Setup:
Benchmark Data
• BTW, Apache has local data
• Multiple versions
• Temporally ordered
37
A total of
54 datasets
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 38
Experiment Setup:
Prediction Model
• We use Random Forests[Zimmerman08]
• Build several decision trees from random subsamples
• Use ensemble learning
• Samples are imbalanced[Pelayo07]
• More “clean” examples
• Use SMOTE [Chawla01] to rebalance data*
• Randomly down sample “clean” instances
• Up-sample “buggy” instances
*Apply only to training data
38
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 40
Experiment Setup:
Statistical Measures
41
• Prediction is usually measured using ROC
• ROC is a plot of Recall vs. False Alarm
• Plot requires several treatments
• Obtained by cross validation.
• We refrain from Cross-Validation
• It tends to mix the test data with the bellwether
• Instead,
• We use Balance [Ma07]
Experiment Setup:
Statistical Measures
42
• Instead of a set of points for ROC,
• Produce one point.
• X, Y = Pd (Recall), Pf (False Alarm)
• Balance is the weighted distance from the ideal
point
• Ideal Point => (Pd, Pf) = (1, 0)
• Balance =
• Lower the Balance, better the performance
Experiment Setup:
Statistical Measures
• Prediction Model is inherently random
• Rerun model 40 times with different seeds
• Collect Balance measure in every run
• Use Scott-Knott Test to compare Balance values
• Scott-Knott ranks Balance values (best to worst)
• Rank -> Effect Size Test + Hypothesis Test
• Why SK?
•It’s been used by recent high profile papers at TSE
[Mittas13] and ICSE [Ghotra15]
43
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 44
How rare are “Bellwethers”?
How does the bellwether fare against local models?
Is Bellwether better than other transfer learning methods?
Can we predict which data set will be bellwether?
How much of the “Bellwether” data is required?
Results:
Research Questions
45
How rare are “Bellwethers”?
How does the bellwether fare against local models?
Is Bellwether better than other transfer learning methods?
Can we predict which data set will be bellwether?
How much of the “Bellwether” data is required?
Results:
Research Question 1
46
Results:
Research Question 1
47
Research Answer
Our results suggest bellwethers are not rare.
How rare are “Bellwethers”?
How rare are “Bellwethers”?
Community:
Bellwether: Lucene
Apache
Results:
Research Question 1
48
How rare are “Bellwethers”?
Community:
Bellwether: MC
NASA
Results:
Research Question 1
49
How rare are “Bellwethers”?
Community:
Bellwether: LC
AEEEM
Results:
Research Question 1
50
How rare are “Bellwethers”?
Community:
Bellwether: Safe
ReLink
X===
Results:
Research Question 1
51
How rare are “Bellwethers”?
How does the bellwether fare against local models?
Is Bellwether better than other transfer learning methods?
Can we predict which data set will be bellwether?
How much of the “Bellwether” data is required?
Results:
Research Question 2
52
How does the bellwether fare against local models?
Research Answer
For projects measured with the
same quality metrics, training
models with bellwether is just
as good as — if not better than
— local models
Results:
Research Question 2
53
How rare are “Bellwethers”?
How does the bellwether fare against local models?
Is Bellwether better than other transfer learning methods?
Can we predict which data set will be bellwether?
How much of the “Bellwether” data is required?
Results:
Research Question 3
54
Is Bellwether better than other transfer learning methods?
Research Answer
The bellwether outperforms standard homogeneous transfer learners.
Results:
Research Question 3
55
How rare are “Bellwethers”?
How does the bellwether fare against local models?
Is Bellwether better than other transfer learning methods?
Can we predict which data set will be bellwether?
How much of the “Bellwether” data is required?
Results:
Research Question 4
56
Can we predict which data set will be bellwether?
Research Answer
This is non-trivial. Trying to statistically determine if a project will be a
bellwether was unsuccessful. This is open to further examination.
Results:
Research Question 4
57
How rare are “Bellwethers”?
How does the bellwether fare against local models?
Is Bellwether better than other transfer learning methods?
Can we predict which data set will be bellwether?
How much of the “Bellwether” data is required?
Results:
Research Question 5
58
How much data is required before detecting the “Bellwether”?
Research Answer
A few dozen defective samples from the bellwether is sufficient to build a
reliable model
Results:
Research Question 5
59
Outline
● Motivation
● Background
○ Evaluating Quality
○ Transfer Learning
○ The “Bellwether”
● Experimental Setup
○ Benchmark Data
○ Prediction Model
○ Statistical Measures
● Results
● Conclusions 60
Practical Implications
• The problem of generality in SE
• Reproducibility is hard to achieve.
• With Bellwethers Transfer Learners can
• Not only be reproducible
• But also be stable
• and Reliable
• Identification of Bellwether earlier
• Would have changed course of research
• More focus on coarse grain analysis
• Less on relevancy filtering, model generation
61
Future Work
• Bellwethers in heterogeneous learners
• Promising heterogeneous transfer learners [Nam15][Jing15]
• Perform complex dimensionality mapping transforms
• Can Bellwethers assist in finding the best mapping?
• Study and quantify bellwether
• what makes a bellwether, a bellwether?
•Bellwethers beyond defect prediction
•Are there bellwethers in other data?
62
In conclusion...
•Look for bellwethers
•To use as a baseline
•To justify the use of transfer learning
•Stabilize the pace of conclusions
•Not permanent conclusion stability
•Easy to find
•Look when necessary
•New data can be discarded
•Updated only as they start failing
63

Contenu connexe

Tendances

OHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisOHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisCamille Maumet
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaLuca Marignati
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialBilkent University
 
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...Jiapeng Wu
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsAravind Sesagiri Raamkumar
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsTetsuya Sakai
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation sourcebutest
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overviewalessio_ferrari
 
Past and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisPast and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisLionel Briand
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
 

Tendances (19)

OHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisOHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysis
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in Informatica
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence Intervals
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 
Machine learning yearning
Machine learning yearningMachine learning yearning
Machine learning yearning
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Bottle sum
Bottle sumBottle sum
Bottle sum
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Past and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisPast and Future of Software Testing and Analysis
Past and Future of Software Testing and Analysis
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
 

Similaire à The “Bellwether” Effect and Its Implications to Transfer Learning

'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 Georgina Tilby
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User StudiesYONG ZHENG
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
Lecture 3 for the AI course in A university
Lecture 3 for the AI course in A universityLecture 3 for the AI course in A university
Lecture 3 for the AI course in A universityCao Minh Tu
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization CS, NcState
 
Test design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARTest design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARRik Marselis
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxshamsul2010
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring EvaluationYun Huang
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Shared position in a project: testing and analysis
Shared position in a project: testing and analysisShared position in a project: testing and analysis
Shared position in a project: testing and analysisReturn on Intelligence
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...広樹 本間
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 

Similaire à The “Bellwether” Effect and Its Implications to Transfer Learning (20)

'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Lecture3-eval.pptx
Lecture3-eval.pptxLecture3-eval.pptx
Lecture3-eval.pptx
 
Lecture 3 for the AI course in A university
Lecture 3 for the AI course in A universityLecture 3 for the AI course in A university
Lecture 3 for the AI course in A university
 
FDS Unit I_PPT.pptx
FDS Unit I_PPT.pptxFDS Unit I_PPT.pptx
FDS Unit I_PPT.pptx
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
 
Test design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARTest design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTAR
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
 
Shared position in a project
Shared position in a projectShared position in a project
Shared position in a project
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Shared position in a project: testing and analysis
Shared position in a project: testing and analysisShared position in a project: testing and analysis
Shared position in a project: testing and analysis
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User Curriculum
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
CPP09 - Testing
CPP09 - TestingCPP09 - Testing
CPP09 - Testing
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 

Dernier

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Dernier (20)

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

The “Bellwether” Effect and Its Implications to Transfer Learning

  • 1. The “Bellwether” Effect Rahul Krishna (rkrish11@ncsu.edu) Tim Menzies, and Wei Fu And Its Implications to Transfer Learning 1
  • 2. 2WeTOSM ‘14 [Turhan09] Data from Turkish toasters can predict defects in NASA flight systems Today’s topic: Transfer Learning
  • 3. 3 Today’s topic: Simpler Transfer Learning with “Bell…. what?”
  • 4. Definitions Bellwether effect 4 • If a community builds many software projects • There exists one ∈ many from which • quality predictors can be built … • … and used for all Bellwether method • find the one • use it
  • 5. Definitions 5 • find the one • use it Note: vastly simpler than other transfer learning methods [Turhan09, Turhan11, Nam13, etc] Bellwether effect Bellwether method • If a community builds many software projects • There exists one ∈ many from which • quality predictors can be built … • … and used for all
  • 6. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 6
  • 7. The “Cold-Start” Problem Past Projects Prediction Model Upcoming releases 7
  • 8. The “Cold-Start” Problem Past Projects Prediction Model ? 8 Upcoming releases
  • 9. Challenges: Variable Datasets ... “New projects are always emerging, and old ones are being rewritten…” … “the quality, representativeness, and volume of the training data have a major influence on the usefulness and stability of model performance…” — Rahman et al. [Rah12] Growing Volume Of Projects 9
  • 10. • Unstable conclusions are typical in SE [Menzies12] • Usefulness of some lesson “X” is contradictory Challenges: Conclusion Instability 10
  • 11. • Unstable conclusions are typical in SE [Menzies12] • Usefulness of some lesson “X” is contradictory Challenges: Conclusion Instability 11 Kitchenham et al. ‘07 • Are data from other organizations … • … as useful as local data? • Inconclusive • 3 cases: Just as good. 4 cases: Worse.
  • 12. • Unstable conclusions are typical in SE [Menzies12] • Usefulness of some lesson “X” is contradictory Challenges: Conclusion Instability 12 Zimmermann et al. ‘09 • 622 pairs of projects • Only 4% of pairs were useful Kitchenham et al. ‘07 • Are data from other organizations … • … as useful as local data? • Inconclusive • 3 cases: Just as good. 4 cases: Worse.
  • 13. • Menzies et al. [Men12] offer several ways • They ask for better experimental practice. • Is there a better way? •Yes! Look for the “Bellwether” • As long as the bellwether continues to offer good quality predictions •Then conclusions from one… •... are conclusions for all 13 How to Reduce this Instability?
  • 14. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 14
  • 15. Estimating Quality Why not Static Analyzers? • [Rahman14] et al. compared • Code analysis tools: FindBugs, JLint, and PMD • with Static Code defect Predictors • Found no difference (measurement: AUCEC) 15 • And • Using lightweight parsers... • … Defect predictors can quickly jump to new languages • Same is not true for static code analysis tools • Lesser Bugs Better Software
  • 16. Estimating Quality Why not Static Analyzers? 16 • And • They work surprisingly well! • [Ostrand04]: ~80% of the bugs localized in 20% of the code
  • 17. Estimating Quality: Static code Defect Prediction 1. Ubiquitous • Researchers and Industrial practitioners frequently use them. Eg. Companies like Google [Lew14], V&V books [Raktin01] 2. A lot of (ongoing) research • Tremendous Attention [Nam13] • Better approaches are constantly being proposed 3. They are easy to use • Software Metrics can be collected fast • Wide variety of tools, open source data miners [sklearn][weka] 17
  • 18. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 18
  • 19. Transfer Learning: Introduction • Extract knowledge from source (S) and apply to target (T) • Data needs to be massaged before use[Zhang15] • Careful sub-sampling • Transformation • Based on data source, TL is categorized as: • Homogeneous vs. Heterogeneous • Based on transformation[Nam13, Nam15, Jing15] • Similarity vs. Dimensionality 19
  • 20. Transfer Learning: Categories Homogeneous • Source (S) and Target (T) are quantified using the same attributes Heterogeneous • Source (S) and Target (T) are quantified using different attributes Similarity • Learn from subsampled rows/columns of the source (S) Dimensionality • Manipulate rows/columns of source (S) to match target (T) 20
  • 21. Heterogeneous • Source (S) and Target (T) are quantified using different attributes Dimensionality • Manipulate rows/columns of source (S) to match target (T) Transfer Learning: Categories Homogeneous • Source (S) and Target (T) are quantified using the same attributes Similarity • Learn from subsampled rows/columns of the source (S) This Talk 21
  • 22. Homogeneous TL: Burak Filter 22 • Burak[Tur09] used relevancy filtering • Filter using kNN • Gather two sets of data • Validation set (S) Test Data • Candidate set (T) Train Data • Use kNN • Pick “similar” instances from T • Filter T using S
  • 23. Homogeneous TL: Burak Filter • First study on relevancy • Their conclusion: 23 … The performances of defect predictors based on the NN-filtered data do not give necessary empirical evidence to make a strong conclusion … … Sometimes NN data based models may perform better than WC data based models …
  • 24. Homogeneous TL: Mixed Model Learner • Turhan et al.[Tur11] proposed a mixed-model learner • Combine local data with curated non-local data • Gather two sets of data • Validation set (S): Pick a random 10% of local data • Candidate set (T): Remaining 90% and non-local data • For non-local data, they use Burak filter[Tur09] • Experiment with various 90%-10% splits • 400 experiments were conducted to pick the best model 24
  • 25. Homogeneous TL: Mixed Model Learner • Extension to Burak Filter • Incorporated local data Challenges • Similar issues as Burak Filter • Biased; Unstable model. • The authors report: … mixed project models offer only limited improvements i.e., 3 out 10 projects — Turhan ‘11 25
  • 26. Homogeneous TL: Addressing the challenges • Researchers have offered a bleak view of TL • Zimmerman et al.[Zimm09] •Transfer is not always consistent •IE could learn from Firefox but not vice versa •Rahman et al.[Rahman12] •The “imprecision” of learning across projects • Recent research has resorted to more complex approaches 26
  • 27. More Transfer Learners … 27 WeTOSM ‘14
  • 28. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 28
  • 29. Is this complexity necessary? • Short answer — No • Just look for the “Bellwether” •Use our bellwether method •Build your model •Et voilà! 29
  • 31. The Bellwether Method Generate • Project Pairs Pi , j • Perform a Leave-one-out Test Train on Pi Test on Pj • Pick the Project with the best model Apply Monitor #
  • 32. The Bellwether Method Generate Apply • Predict Quality on future projects Monitor #
  • 33. The Bellwether Method Generate Apply Monitor • When predictions fail. Restart. #
  • 35. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 35
  • 36. Experiment Setup: Benchmark Data • 120 Datasets from 4 communities • Defects in 3 levels of granularity • File, Class, and Function • Open source and Proprietary 36
  • 37. Experiment Setup: Benchmark Data • BTW, Apache has local data • Multiple versions • Temporally ordered 37 A total of 54 datasets
  • 38. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 38
  • 39. Experiment Setup: Prediction Model • We use Random Forests[Zimmerman08] • Build several decision trees from random subsamples • Use ensemble learning • Samples are imbalanced[Pelayo07] • More “clean” examples • Use SMOTE [Chawla01] to rebalance data* • Randomly down sample “clean” instances • Up-sample “buggy” instances *Apply only to training data 38
  • 40. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 40
  • 41. Experiment Setup: Statistical Measures 41 • Prediction is usually measured using ROC • ROC is a plot of Recall vs. False Alarm • Plot requires several treatments • Obtained by cross validation. • We refrain from Cross-Validation • It tends to mix the test data with the bellwether • Instead, • We use Balance [Ma07]
  • 42. Experiment Setup: Statistical Measures 42 • Instead of a set of points for ROC, • Produce one point. • X, Y = Pd (Recall), Pf (False Alarm) • Balance is the weighted distance from the ideal point • Ideal Point => (Pd, Pf) = (1, 0) • Balance = • Lower the Balance, better the performance
  • 43. Experiment Setup: Statistical Measures • Prediction Model is inherently random • Rerun model 40 times with different seeds • Collect Balance measure in every run • Use Scott-Knott Test to compare Balance values • Scott-Knott ranks Balance values (best to worst) • Rank -> Effect Size Test + Hypothesis Test • Why SK? •It’s been used by recent high profile papers at TSE [Mittas13] and ICSE [Ghotra15] 43
  • 44. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 44
  • 45. How rare are “Bellwethers”? How does the bellwether fare against local models? Is Bellwether better than other transfer learning methods? Can we predict which data set will be bellwether? How much of the “Bellwether” data is required? Results: Research Questions 45
  • 46. How rare are “Bellwethers”? How does the bellwether fare against local models? Is Bellwether better than other transfer learning methods? Can we predict which data set will be bellwether? How much of the “Bellwether” data is required? Results: Research Question 1 46
  • 47. Results: Research Question 1 47 Research Answer Our results suggest bellwethers are not rare. How rare are “Bellwethers”?
  • 48. How rare are “Bellwethers”? Community: Bellwether: Lucene Apache Results: Research Question 1 48
  • 49. How rare are “Bellwethers”? Community: Bellwether: MC NASA Results: Research Question 1 49
  • 50. How rare are “Bellwethers”? Community: Bellwether: LC AEEEM Results: Research Question 1 50
  • 51. How rare are “Bellwethers”? Community: Bellwether: Safe ReLink X=== Results: Research Question 1 51
  • 52. How rare are “Bellwethers”? How does the bellwether fare against local models? Is Bellwether better than other transfer learning methods? Can we predict which data set will be bellwether? How much of the “Bellwether” data is required? Results: Research Question 2 52
  • 53. How does the bellwether fare against local models? Research Answer For projects measured with the same quality metrics, training models with bellwether is just as good as — if not better than — local models Results: Research Question 2 53
  • 54. How rare are “Bellwethers”? How does the bellwether fare against local models? Is Bellwether better than other transfer learning methods? Can we predict which data set will be bellwether? How much of the “Bellwether” data is required? Results: Research Question 3 54
  • 55. Is Bellwether better than other transfer learning methods? Research Answer The bellwether outperforms standard homogeneous transfer learners. Results: Research Question 3 55
  • 56. How rare are “Bellwethers”? How does the bellwether fare against local models? Is Bellwether better than other transfer learning methods? Can we predict which data set will be bellwether? How much of the “Bellwether” data is required? Results: Research Question 4 56
  • 57. Can we predict which data set will be bellwether? Research Answer This is non-trivial. Trying to statistically determine if a project will be a bellwether was unsuccessful. This is open to further examination. Results: Research Question 4 57
  • 58. How rare are “Bellwethers”? How does the bellwether fare against local models? Is Bellwether better than other transfer learning methods? Can we predict which data set will be bellwether? How much of the “Bellwether” data is required? Results: Research Question 5 58
  • 59. How much data is required before detecting the “Bellwether”? Research Answer A few dozen defective samples from the bellwether is sufficient to build a reliable model Results: Research Question 5 59
  • 60. Outline ● Motivation ● Background ○ Evaluating Quality ○ Transfer Learning ○ The “Bellwether” ● Experimental Setup ○ Benchmark Data ○ Prediction Model ○ Statistical Measures ● Results ● Conclusions 60
  • 61. Practical Implications • The problem of generality in SE • Reproducibility is hard to achieve. • With Bellwethers Transfer Learners can • Not only be reproducible • But also be stable • and Reliable • Identification of Bellwether earlier • Would have changed course of research • More focus on coarse grain analysis • Less on relevancy filtering, model generation 61
  • 62. Future Work • Bellwethers in heterogeneous learners • Promising heterogeneous transfer learners [Nam15][Jing15] • Perform complex dimensionality mapping transforms • Can Bellwethers assist in finding the best mapping? • Study and quantify bellwether • what makes a bellwether, a bellwether? •Bellwethers beyond defect prediction •Are there bellwethers in other data? 62
  • 63. In conclusion... •Look for bellwethers •To use as a baseline •To justify the use of transfer learning •Stabilize the pace of conclusions •Not permanent conclusion stability •Easy to find •Look when necessary •New data can be discarded •Updated only as they start failing 63