SlideShare a Scribd company logo
1 of 117
Download to read offline
Software Analytics:
Towards Software Mining
that Matters
Tao Xie
University of Illinois at Urbana-Champaign
http://www.cs.illinois.edu/homes/taoxie/
taoxie@illinois.edu
Should I testreview my?

©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
Software analytics is to enable software practitioners
to perform data exploration and analysis in order to
obtain insightful and actionable information for datadriven tasks around software and services.

[MALETS’11 Zhang et al.]
Software Intelligence &
Analytics for Software Development

http://people.engr.ncsu.edu/txie/publications/foser10-si.pdf
http://thomas-zimmermann.com/publications/files/buse-foser-2010.pdf
• use Data Exploration and Analysis
 Mining Software Repositories (MSR)

• for Software Practitioners
Beyond Software Developers

• obtain Insightful and Actionable info
Need get real as well

• Analytic Techniques
• Producing Impact on Practice
Look through your
software
data

©A. Hassan
Look through your
software
data

©A. Hassan
Mine through
the data!

An international effort to
make software repositories actionable
http://msrconf.org

http://promisedata.org
©A. Hassan
Mine through
the data!

An international effort to
make software repositories actionable
http://msrconf.org

http://promisedata.org
©A. Hassan
Promise Data
Repository

Mine through
the data!

An international effort to
make software repositories actionable
http://msrconf.org

http://promisedata.org
©A. Hassan
Mining Software Repositories (MSR)
• Transforms static recordkeeping repositories to
active repositories
• Makes repository data
actionable by uncovering
hidden patterns and
trends
Field logs CVS/SVN
Bugzilla

Mailinglist

Crashes
11

©A. Hassan
Mining Software Repositories (MSR)
• Transforms static recordkeeping repositories to
active repositories
• Makes repository data
actionable by uncovering
hidden patterns and
trends
Field logs CVS/SVN
Bugzilla

Mailinglist

Crashes
11

©A. Hassan
Source Control
CVS/SVN

Bugzilla

Mailing
lists
12
Source Control
CVS/SVN

Bugzilla

Mailing
lists

Crash
Repos

Field
Logs
12
Source Control
CVS/SVN

Bugzilla

Mailing
lists

Historical Repositories

Crash
Repos

Field
Logs
12
Source Control
CVS/SVN

Bugzilla

Mailing
lists

Historical Repositories

Crash
Repos

Field
Logs

Runtime Repos
12
Sourceforge
GoogleCode

Code Repos

Source Control
CVS/SVN

Bugzilla

Mailing
lists

Historical Repositories

Crash
Repos

Field
Logs

Runtime Repos
12
MSR researchers
analyze and cross-link repositories

Bugzilla

Mailinglist

CVS/SVN

Crashes

©A. Hassan
MSR researchers
analyze and cross-link repositories
discussions

Buggy change &
Fixing change

fixed
bug
Bugzilla

Mailinglist

CVS/SVN

Field
crashes

Crashes

©A. Hassan
MSR researchers
analyze and cross-link repositories
discussions

Buggy change &
Fixing change

fixed
bug
Bugzilla

Mailinglist

CVS/SVN

Field
crashes

Crashes

New Bug Report

©A. Hassan
MSR researchers
analyze and cross-link repositories
discussions

Buggy change &
Fixing change

fixed
bug
Bugzilla

Mailinglist

CVS/SVN

Field
crashes

Crashes

New Bug Report
Estimate fix effort
Mark duplicates
Suggest experts and fix!
©A. Hassan
• use Data Exploration and Analysis
 Mining Software Repositories (MSR)

• for Software Practitioners
Beyond Software Developers

• obtain Insightful and Actionable info
Need get real as well

• Analytic Techniques
• Producing Impact on Practice
We continue to help
practitioners (esp. developers)

©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
©A. Hassan
Detection and Management of
Code Clones

©A. Hassan
Support
Logs

Source
Code

©A. Hassan
©A. Hassan
• use Data Exploration and Analysis
 Mining Software Repositories (MSR)

• for Software Practitioners
Beyond Software Developers

• obtain Insightful and Actionable info
Need get real as well

• Analytic Techniques
• Case Studies
Predicting Bugs
• Studies have shown that most complexity metrics
correlate well with LOC!
– Graves et al. 2000 on commercial systems
– Herraiz et al. 2007 on open source systems

• Noteworthy findings:
– Previous bugs are good predictors of future bugs
– The more a file changes, the more likely it will have
bugs in it
– Recent changes affect more the bug potential of a file
over older changes (weighted time damp models)
– Number of developers is of little help in predicting bugs
– Hard to generalize bug predictors across projects
unless in similar domains [Nagappan, Ball et al. 2006]
23
Using Imports in Eclipse to Predict Bugs
71% of files that import compiler packages,
had to be fixed later on.
import org.eclipse.jdt.internal.compiler.lookup.*;
import org.eclipse.jdt.internal.compiler.*;
import org.eclipse.jdt.internal.compiler.ast.*;
import org.eclipse.jdt.internal.compiler.util.*;
...
import org.eclipse.pde.core.*;
import org.eclipse.jface.wizard.*;
import org.eclipse.ui.*;

14% of all files that import ui packages,
had to be fixed later on.
[Schröter et al. 06]
24
Don’t program on Fridays ;-)

Percentage of bug-introducing changes for eclipse

[Zimmermann et al. 05]
25
Failure is a 4-letter Word

[PROMISE’11 Zeller et al.]
26
Actionable Alone is not Enough!

[PROMISE’11 Zeller et al.]
27
Who produces more buggy code?

©A. Hassan
Who produces more buggy code?

©A. Hassan
• use Data Exploration and Analysis
 Mining Software Repositories (MSR)

• for Software Practitioners
Beyond Software Developers

• obtain Insightful and Actionable info
Need get real as well

• Analytic Techniques
• Producing Impact on Practice
Analytic Techniques in SE
• Association rules and frequent patterns
• Classification
• Clustering
• Text mining/Natural language processing
• Visualization
More details are at
• https://sites.google.com/site/xsoftanalytics/
30
Solution-Driven Problem-Driven
Where can I apply X miner?
Basic
mining
algorithms
E.g., association rule,
frequent itemset mining…

Advanced
mining
algorithms
E.g., frequent partial order
mining [ESEC/FSE 07]

What patterns do
we really need?
New/adapted
mining
algorithms

E.g., [ICSE 09], [ASE 09]

49
Mining  Searching + Mining
Traditional approaches
Code repositories

1

2

mining

patterns

Eclipse, Linux, …

50
Mining  Searching + Mining
Traditional approaches
Code repositories

1
Eclipse, Linux, …

2

mining

patterns

Often lack sufficient relevant data points (Eg. API call sites)

51
Mining  Searching + Mining
Traditional approaches
Code repositories

1

patterns

mining

2

Eclipse, Linux, …

Often lack sufficient relevant data points (Eg. API call sites)

Our new approaches
Code repositories

1

2

…

Open source code
on the web

N

searching

mining

patterns

Code search engine
e.g.,
53
53

52
Existing approaches produce high % of false positives
One major observation:
Programmers often write code in different ways for
achieving the same task
Some ways are more frequent than others

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Existing approaches produce high % of false positives
One major observation:
Programmers often write code in different ways for
achieving the same task
Some ways are more frequent than others
Frequent
ways

Infrequent
ways

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Existing approaches produce high % of false positives
One major observation:
Programmers often write code in different ways for
achieving the same task
Some ways are more frequent than others
Frequent
ways

Infrequent
ways

mine patterns

Mined Patterns
S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Existing approaches produce high % of false positives
One major observation:
Programmers often write code in different ways for
achieving the same task
Some ways are more frequent than others
Frequent
ways
mine patterns

Infrequent
ways
detect violations

Mined Patterns
S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Existing approaches produce high % of false positives
One major observation:
Programmers often write code in different ways for
achieving the same task
Some ways are more frequent than others
Frequent
ways
mine patterns

Infrequent
ways
detect violations

Mined Patterns
S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Java.util.Iterator.next() throws NoSuchElementException when invoked on a list
without any elements
Code Sample 1

PrintEntries1(ArrayList<string>
entries)

{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2

PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}
58

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

59

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

1243 code
examples
60

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1

Code Sample 2

PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

Sample 1 (1218 / 1243)
1243 code
examples
61

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1

Code Sample 2

PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}
Sample 2 (6/1243)

Sample 1 (1218 / 1243)
1243 code
examples
62

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1

Code Sample 2

PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}
Sample 2 (6/1243)

Sample 1 (1218 / 1243)
1243 code
examples
Mined Pattern from existing approaches:
“boolean check on return of Iterator.hasNext before Iterator.next”

63

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1

Code Sample 2

PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}
Sample 2 (6/1243)

Sample 1 (1218 / 1243)
1243 code
examples
Mined Pattern from existing approaches:
“boolean check on return of Iterator.hasNext before Iterator.next”

64

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1

Code Sample 2

PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}
Sample 2 (6/1243)

Sample 1 (1218 / 1243)
1243 code
examples
Mined Pattern from existing approaches:
“boolean check on return of Iterator.hasNext before Iterator.next”

65

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

 Require more general patterns (alternative patterns): P1 or P2

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

 Require more general patterns (alternative patterns): P1 or P2
P1 : boolean check on return of Iterator.hasNext before Iterator.next

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

 Require more general patterns (alternative patterns): P1 or P2
P1 : boolean check on return of Iterator.hasNext before Iterator.next
P2 : boolean check on return of ArrayList.size before Iterator.next

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Example: java.util.Iterator.next()
Code Sample 1
PrintEntries1(ArrayList<string>
entries)
{
…
Iterator it = entries.iterator();
if(it.hasNext()) {
string last = (string) it.next();
}
…
}

Code Sample 2
PrintEntries2(ArrayList<string>
entries)
{
…
if(entries.size() > 0) {
Iterator it = entries.iterator();
string last = (string) it.next();
}
…
}

 Require more general patterns (alternative patterns): P1 or P2
P1 : boolean check on return of Iterator.hasNext before Iterator.next
P2 : boolean check on return of ArrayList.size before Iterator.next

 Cannot be mined by existing approaches, since alternative P2 is
infrequent

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Our Solution: ImMiner Algorithm
[ASE 09]

 Mines alternative patterns of the form P1 or P2
 Based on the observation that infrequent alternatives such as P2
are frequent among code examples that do not support P1

72

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Our Solution: ImMiner Algorithm
[ASE 09]

 Mines alternative patterns of the form P1 or P2
 Based on the observation that infrequent alternatives such as P2
are frequent among code examples that do not support P1
1243 code examples

Sample 2 (6/1243)

Sample 1 (1218 / 1243)

73

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Our Solution: ImMiner Algorithm
[ASE 09]

 Mines alternative patterns of the form P1 or P2
 Based on the observation that infrequent alternatives such as P2
are frequent among code examples that do not support P1
1243 code examples

Sample 2 (6/1243)

Sample 1 (1218 / 1243)

P2 is infrequent among
entire 1243 code examples
74

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Our Solution: ImMiner Algorithm
[ASE 09]

 Mines alternative patterns of the form P1 or P2
 Based on the observation that infrequent alternatives such as P2
are frequent among code examples that do not support P1
1243 code examples

Sample 2 (6/1243)

Sample 1 (1218 / 1243)

P2 is frequent among code
examples not supporting P1
75

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
Alternative Patterns
 ImMiner mines three kinds of alternative
patterns of the general form “P1 or P2”
Balanced: all alternatives (both P1 and P2) are frequent
Imbalanced: some alternatives (P1) are frequent and
others are infrequent (P2). Represented as “P1 or P^2”
Single: only one alternative
76

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
ImMiner Algorithm
 Uses frequent-itemset mining [Burdick et al. ICDE 01]
iteratively
 An input database with the following APIs
for Iterator.next()
Input database

Mapping of IDs to APIs

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
ImMiner Algorithm: Frequent Alternatives
Input database

Frequent itemset
mining
(min_sup 0.5)

Frequent item: 1
P1: boolean-check on the return of Iterator.hasNext()
before Iterator.next()
S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
ImMiner: Infrequent Alternatives of P1

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
79
ImMiner: Infrequent Alternatives of P1
 Split input database into two databases: Positive and Negative

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
80
ImMiner: Infrequent Alternatives of P1
 Split input database into two databases: Positive and Negative
Positive database (PSD)

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
81
ImMiner: Infrequent Alternatives of P1
 Split input database into two databases: Positive and Negative
Positive database (PSD)

Negative database (NSD)

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
82
ImMiner: Infrequent Alternatives of P1
 Split input database into two databases: Positive and Negative
Positive database (PSD)

Negative database (NSD)

 Mine patterns that are frequent in NSD and are infrequent in PSD
 Reason: Only such patterns serve as alternatives for P1

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
83
ImMiner: Infrequent Alternatives of P1
 Split input database into two databases: Positive and Negative
Positive database (PSD)

Negative database (NSD)

 Mine patterns that are frequent in NSD and are infrequent in PSD
 Reason: Only such patterns serve as alternatives for P1
 Alternative Pattern : P2 “const check on the return of ArrayList.size()
before Iterator.next()”
 Alattin applies ImMiner algorithm to detect neglected conditions
S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
84
Neglected Conditions
 Neglected conditions refer to

 Missing conditions that check the arguments
or receiver of the API call before the API call
 Missing conditions that check the return or
receiver of the API call after the API call

 One primary reason for many fatal issues
 security or buffer-overflow vulnerabilities
[Chang et al. ISSTA 07]
S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
• use Data Exploration and Analysis
 Mining Software Repositories (MSR)

• for Software Practitioners
Beyond Software Developers

• obtain Insightful and Actionable info
Need get real as well

• Analytic Techniques
• Producing Impact on Practice
Machine Learning that Matters

[ICML’12 Wagstaff]
http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
• Hyper-Focus on Benchmark Data Sets
• Hyper-Focus on Abstract Metrics
• Lack of Follow-Through

[ICML’12 Wagstaff]
http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
• Meaningful Evaluation Methods
• Involvement of the World Outside ML
• Eyes on the Prize

[ICML’12 Wagstaff]
http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
MSRA Software Analytics Group
Utilize data-driven approach to help create highly performing, user
friendly, and efficiently developed and operated software and services.

Contact: Dongmei Zhang (dongmeiz@microsoft.com)
http://research.microsoft.com/groups/sa/
MSRA Software Analytics Group
Utilize data-driven approach to help create highly performing, user
friendly, and efficiently developed and operated software and services.

Software
Users

Software
Development

Process

Software
Systems

Research Topics
Contact: Dongmei Zhang (dongmeiz@microsoft.com)
http://research.microsoft.com/groups/sa/
MSRA Software Analytics Group
Utilize data-driven approach to help create highly performing, user
friendly, and efficiently developed and operated software and services.
Information Visualization
Software
Users

Software
Development

Process

Analysis Algorithms
Software
Systems
Large-scale Computing

Research Topics

Technology Pillars

Contact: Dongmei Zhang (dongmeiz@microsoft.com)
http://research.microsoft.com/groups/sa/
MSRA Software Analytics Group
Utilize data-driven approach to help create highly performing, user
friendly, and efficiently developed and operated software and services.
Information Visualization
Software
Users

Software
Development

Process

Analysis Algorithms
Software
Systems
Large-scale Computing

Research Topics

Technology Pillars

Contact: Dongmei Zhang (dongmeiz@microsoft.com)
http://research.microsoft.com/groups/sa/
MSRA Software Analytics Group
Utilize data-driven approach to help create highly performing, user
friendly, and efficiently developed and operated software and services.
Information Visualization
Software
Users

Software

Vertical

Development

Process

Analysis Algorithms
Horizontal

Software
Systems

Large-scale Computing

Research Topics

Technology Pillars

Contact: Dongmei Zhang (dongmeiz@microsoft.com)
http://research.microsoft.com/groups/sa/
Software Analytics in Practice
Adoption Challenges for
Software Analytics

Must show value
before data quality
improves

Correlation vs.
Causation
ICSE Papers: Industry vs. Academia

Source© Carlo Ghezzi
ICSE Papers: Industry vs. Academia
OSDI 2008 26% vs. xSE ?%
Developers, Programmers, Architects
Among All Attendees

Source© Carlo Ghezzi
ICSE Papers: Industry vs. Academia
OSDI 2008 26% vs. xSE ?%
Developers, Programmers, Architects
Among All Attendees

MSR 11 Keynote

ICSE 09 Keynote

Source© Carlo Ghezzi

MSR 12 Keynote

ICSM 11 Keynote

SCAM 12 Keynote
"Are Automated Debugging [Research]
Techniques Actually Helping Programmers?"
• 50 years of automated debugging research
– N papers  only 5 evaluated with actual programmers

“

”

[ISSTA11 Parnin&Orso]
Are Regression Testing [Research] Techniques
Actually Helping Industry?
• Likely most studied testing problems
– N papers

“

”
[STVR11 Yoo&Harman]
Are [Some] Failure-Proneness Prediction
[Research] Techniques Actually Helping?
• Empirical software engineering (on prediction)
– N papers

”

[PROMISE11 Zeller et al.]
A Researcher's Observation in HCI
Research Community
• “The reviewers simply do not value the
difficulty of building real systems and how
hard controlled studies are to run on real
systems for real tasks. This is in contrast
with how easy it is to build new interaction
techniques and then to run tight,
controlled studies on these new
techniques with small, artificial tasks”
“I give up on CHI/UIST” by James Landay
http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html

Source©J. Landay
A Researcher's Observation in HCI
Research Community
• “This attitude is a joke and it offers
researchers no incentive to do systems
work. Why should they? Why should we
put 3-4 person years into every CHI
publication? Instead we can do 8 weeks of
work on an idea piece or create a new
interaction technique and test it tightly in
8-12 weeks and get a full CHI paper.”
“I give up on CHI/UIST” by James Landay
http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html

Source©J. Landay
A Researcher's Observation in HCI
Research Community
• “When will this community wake up and
understand that they are going to run out any
work on creating new systems (rather than
small pieces of systems) and cede that
important endeavor to industry?”
• “We are our own worst enemies. I think we
have been blinded by the perception that "true
scientific" research is only found in controlled
experiments and nice statistics.”
“I give up on CHI/UIST” by James Landay
http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html

Source©J. Landay
A Researcher's Observation in HCI
Research Community
• “When will this community wake up and
understand that they are going to run out any
work on creating new systems (rather than
Does
small pieces of systems) and cede that
our research community
important endeavor to industry?”
have similar issues??
• “We are our own worst enemies. I think we
have been blinded by the perception that "true
scientific" research is only found in controlled
experiments and nice statistics.”
“I give up on CHI/UIST” by James Landay
http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html

Source©J. Landay
MS Academic Search: “Pointer
Analysis”
“Pointer Analysis: Haven’t We Solved
This Problem Yet?” [Hind PASTE’01]
“During the past 21 years, over 75 papers
and 9 Ph.D. theses have been published on
pointer analysis. Given the tones of work on
this topic one may wonder, “Haven't we
solved this problem yet?'' With input from
many researchers in the field, this paper
describes issues related to pointer analysis
and remaining open problems.”
Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc.
ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and
Engineering (PASTE 2001)
Source©M. Hind

58
“Pointer Analysis: Haven’t We Solved
This Problem Yet?” [Hind PASTE’01]
Section 4.3 Designing an Analysis for a Client’s Needs

“Barbara Ryder expands on this topic: “… We can all
write an unbounded number of papers that compare
different pointer analysis approximations in the
abstract. However, this does not accomplish the key
goal, which is to design and engineer pointer
analyses that are useful for solving real

software problems for realistic programs.”
Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc.
ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and
Engineering (PASTE 2001)
Source©M. Hind&B. Ryder

59
MS Academic Search: “Clone
Detection”
MS Academic Search: “Clone
Detection”
Typically focus/evaluate on
intermediate steps (e.g., clone
detection) instead of ultimate
tasks (e.g., bug detection or
refactoring), even when the
field already grows mature
with n years of efforts on
intermediate steps
Some Success Stories of Applying
Clone Detection [Focus on Ultimate Tasks]
MSRA
XIAO
Yingnong Dang, Dongmei Zhang, Song
Ge, Chengyun Chu, Yingjun Qiu, and
Tao Xie. XIAO: Tuning Code Clones at
Hands of Engineers in Practice. In
Proc. ACSAC 2012,
http://research.microsoft.com/en-us/groups/sa/

Zhenmin Li, Shan Lu, Suvda Myagmar,
and Yuanyuan Zhou. CP-Miner: a tool
for finding copy-paste and related
bugs in operating system code. In
Proc. OSDI 2004.
http://patterninsight.com/

http://www.blackducksoftware.com/

61
Suggested Actions  Tech Adoption
•
•
•
•

Get research problems from real practice
Get feedback from real practice
Collaborate across disciplines
Collaborate with industry
•Software Analytics
Data Exploration and Analysis
For Software Practitioners
Obtain Insightful and Actionable info
With Analytic Techniques

• Producing Impact on Practice
Acknowledgments
• Microsoft Research Asia Software Analytics
Group
• Ahmed Hassan, Lin Tan, Jian Pei
• Many other colleagues

64
Q&A
•Software Analytics
Data Exploration and Analysis
For Software Practitioners
Obtain Insightful and Actionable info
With Analytic Techniques

• Producing Impact on Practice

More Related Content

What's hot

Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchTao Xie
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Alex Pinto
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Alex Pinto
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringMehdi Mirakhorli
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationAlex Pinto
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Alex Pinto
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Alex Pinto
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataJeongwhan Choi
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionAlex Pinto
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelAlex Pinto
 

What's hot (20)

Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
Datamingse
DatamingseDatamingse
Datamingse
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software Engineering
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software Data
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
 
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
 
On Impact in Software Engineering Research (HU Berlin 2021)
On Impact in Software Engineering Research (HU Berlin 2021)On Impact in Software Engineering Research (HU Berlin 2021)
On Impact in Software Engineering Research (HU Berlin 2021)
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
 

Viewers also liked

Crime Analysis using Data Analysis
Crime Analysis using Data AnalysisCrime Analysis using Data Analysis
Crime Analysis using Data AnalysisChetan Hireholi
 
Mineograph Mining Automation Software
Mineograph Mining Automation SoftwareMineograph Mining Automation Software
Mineograph Mining Automation SoftwareMineograph Software
 
Mining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR ModelsMining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR ModelsSAIL_QU
 
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차JM code group
 
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...Norihiro Yoshida
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison Esteban Alcaide
 
임태현, software catastrophe
임태현, software catastrophe임태현, software catastrophe
임태현, software catastrophe태현 임
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentThomas Zimmermann
 
Model Comparison for Delta-Compression
Model Comparison for Delta-CompressionModel Comparison for Delta-Compression
Model Comparison for Delta-CompressionMarkus Scheidgen
 
An Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesAn Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesSAIL_QU
 
MSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick TriggerMSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick TriggerXin Yang
 
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자Dylan Ko
 
MSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review RepositoriesMSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review RepositoriesXin Yang
 
연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝Keunhyun Oh
 
고품질 Sw와 개발문화
고품질 Sw와 개발문화고품질 Sw와 개발문화
고품질 Sw와 개발문화도형 임
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer
 
Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiFilippo Lanubile
 
DOs and DONT&rsquo;s of Social Analytics
DOs and DONT&rsquo;s of Social AnalyticsDOs and DONT&rsquo;s of Social Analytics
DOs and DONT&rsquo;s of Social AnalyticsChristophe Lauer
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?Matt Wood
 

Viewers also liked (20)

Crime Analysis using Data Analysis
Crime Analysis using Data AnalysisCrime Analysis using Data Analysis
Crime Analysis using Data Analysis
 
Mineograph Mining Automation Software
Mineograph Mining Automation SoftwareMineograph Mining Automation Software
Mineograph Mining Automation Software
 
Mining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR ModelsMining Unstructured Software Repositories Using IR Models
Mining Unstructured Software Repositories Using IR Models
 
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
빅데이터와 교육데이터마이닝 (고려대학교 대학원 강의) 6주차
 
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
Mining the Modern Code Review Repositories: A Dataset of People, Process and ...
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison
 
임태현, software catastrophe
임태현, software catastrophe임태현, software catastrophe
임태현, software catastrophe
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
 
Model Comparison for Delta-Compression
Model Comparison for Delta-CompressionModel Comparison for Delta-Compression
Model Comparison for Delta-Compression
 
An Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub RepositoriesAn Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub Repositories
 
MSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick TriggerMSR mining challenge 2015 - Quick Trigger
MSR mining challenge 2015 - Quick Trigger
 
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
[우리가 데이터를 쓰는 법] 온라인 서비스 개선을 위한 데이터 활용법 - 마이크로소프트 김진영 데이터과학자
 
MSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review RepositoriesMSR 2016 data showcase - Mining Code Review Repositories
MSR 2016 data showcase - Mining Code Review Repositories
 
연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝연관도 분석을 이용한 데이터마이닝
연관도 분석을 이용한 데이터마이닝
 
고품질 Sw와 개발문화
고품질 Sw와 개발문화고품질 Sw와 개발문화
고품질 Sw와 개발문화
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumenti
 
Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge
 
DOs and DONT&rsquo;s of Social Analytics
DOs and DONT&rsquo;s of Social AnalyticsDOs and DONT&rsquo;s of Social Analytics
DOs and DONT&rsquo;s of Social Analytics
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
 

Similar to Software Analytics: Towards Software Mining that Matters

Mining Software Engineering Data
Mining Software Engineering DataMining Software Engineering Data
Mining Software Engineering DataSAIL_QU
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City Red Gate Software
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Advancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software AnalyticsAdvancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software AnalyticsTao Xie
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overviewalessio_ferrari
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docpraveena06
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniquesijdmtaiir
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Christopher Curtin
 
Three Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersThree Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersAndrey Karpov
 
2016 09-19 - stephan jou - machine learning meetup v1
2016 09-19 - stephan jou - machine learning meetup v12016 09-19 - stephan jou - machine learning meetup v1
2016 09-19 - stephan jou - machine learning meetup v1Jenny Midwinter
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Andrew Clark
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Denim Group
 

Similar to Software Analytics: Towards Software Mining that Matters (20)

Mining Software Engineering Data
Mining Software Engineering DataMining Software Engineering Data
Mining Software Engineering Data
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Advancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software AnalyticsAdvancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software Analytics
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 
Case study
Case studyCase study
Case study
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.doc
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniques
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
 
Three Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersThree Interviews About Static Code Analyzers
Three Interviews About Static Code Analyzers
 
Software testing
Software testingSoftware testing
Software testing
 
2016 09-19 - stephan jou - machine learning meetup v1
2016 09-19 - stephan jou - machine learning meetup v12016 09-19 - stephan jou - machine learning meetup v1
2016 09-19 - stephan jou - machine learning meetup v1
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
 

More from Tao Xie

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
 
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...Tao Xie
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesTao Xie
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...Tao Xie
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...Tao Xie
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...Tao Xie
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Tao Xie
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeTao Xie
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing IssuesTao Xie
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTao Xie
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingTao Xie
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingTao Xie
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehuntTao Xie
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for SecurityTao Xie
 
Gamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and ProgrammingGamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and ProgrammingTao Xie
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTao Xie
 

More from Tao Xie (17)

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
 
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from Allies
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and Practice
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing Issues
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to Practice
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering Tooling
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized Testing
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehunt
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for Security
 
Gamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and ProgrammingGamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and Programming
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Software Analytics: Towards Software Mining that Matters

  • 1. Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign http://www.cs.illinois.edu/homes/taoxie/ taoxie@illinois.edu
  • 2. Should I testreview my? ©A. Hassan
  • 6. Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for datadriven tasks around software and services. [MALETS’11 Zhang et al.]
  • 7. Software Intelligence & Analytics for Software Development http://people.engr.ncsu.edu/txie/publications/foser10-si.pdf http://thomas-zimmermann.com/publications/files/buse-foser-2010.pdf
  • 8. • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
  • 11. Mine through the data! An international effort to make software repositories actionable http://msrconf.org http://promisedata.org ©A. Hassan
  • 12. Mine through the data! An international effort to make software repositories actionable http://msrconf.org http://promisedata.org ©A. Hassan
  • 13. Promise Data Repository Mine through the data! An international effort to make software repositories actionable http://msrconf.org http://promisedata.org ©A. Hassan
  • 14. Mining Software Repositories (MSR) • Transforms static recordkeeping repositories to active repositories • Makes repository data actionable by uncovering hidden patterns and trends Field logs CVS/SVN Bugzilla Mailinglist Crashes 11 ©A. Hassan
  • 15. Mining Software Repositories (MSR) • Transforms static recordkeeping repositories to active repositories • Makes repository data actionable by uncovering hidden patterns and trends Field logs CVS/SVN Bugzilla Mailinglist Crashes 11 ©A. Hassan
  • 21. MSR researchers analyze and cross-link repositories Bugzilla Mailinglist CVS/SVN Crashes ©A. Hassan
  • 22. MSR researchers analyze and cross-link repositories discussions Buggy change & Fixing change fixed bug Bugzilla Mailinglist CVS/SVN Field crashes Crashes ©A. Hassan
  • 23. MSR researchers analyze and cross-link repositories discussions Buggy change & Fixing change fixed bug Bugzilla Mailinglist CVS/SVN Field crashes Crashes New Bug Report ©A. Hassan
  • 24. MSR researchers analyze and cross-link repositories discussions Buggy change & Fixing change fixed bug Bugzilla Mailinglist CVS/SVN Field crashes Crashes New Bug Report Estimate fix effort Mark duplicates Suggest experts and fix! ©A. Hassan
  • 25. • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
  • 26. We continue to help practitioners (esp. developers) ©A. Hassan
  • 36. Detection and Management of Code Clones ©A. Hassan
  • 39. • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Case Studies
  • 40. Predicting Bugs • Studies have shown that most complexity metrics correlate well with LOC! – Graves et al. 2000 on commercial systems – Herraiz et al. 2007 on open source systems • Noteworthy findings: – Previous bugs are good predictors of future bugs – The more a file changes, the more likely it will have bugs in it – Recent changes affect more the bug potential of a file over older changes (weighted time damp models) – Number of developers is of little help in predicting bugs – Hard to generalize bug predictors across projects unless in similar domains [Nagappan, Ball et al. 2006] 23
  • 41. Using Imports in Eclipse to Predict Bugs 71% of files that import compiler packages, had to be fixed later on. import org.eclipse.jdt.internal.compiler.lookup.*; import org.eclipse.jdt.internal.compiler.*; import org.eclipse.jdt.internal.compiler.ast.*; import org.eclipse.jdt.internal.compiler.util.*; ... import org.eclipse.pde.core.*; import org.eclipse.jface.wizard.*; import org.eclipse.ui.*; 14% of all files that import ui packages, had to be fixed later on. [Schröter et al. 06] 24
  • 42. Don’t program on Fridays ;-) Percentage of bug-introducing changes for eclipse [Zimmermann et al. 05] 25
  • 43. Failure is a 4-letter Word [PROMISE’11 Zeller et al.] 26
  • 44. Actionable Alone is not Enough! [PROMISE’11 Zeller et al.] 27
  • 45. Who produces more buggy code? ©A. Hassan
  • 46. Who produces more buggy code? ©A. Hassan
  • 47. • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
  • 48. Analytic Techniques in SE • Association rules and frequent patterns • Classification • Clustering • Text mining/Natural language processing • Visualization More details are at • https://sites.google.com/site/xsoftanalytics/ 30
  • 49. Solution-Driven Problem-Driven Where can I apply X miner? Basic mining algorithms E.g., association rule, frequent itemset mining… Advanced mining algorithms E.g., frequent partial order mining [ESEC/FSE 07] What patterns do we really need? New/adapted mining algorithms E.g., [ICSE 09], [ASE 09] 49
  • 50. Mining  Searching + Mining Traditional approaches Code repositories 1 2 mining patterns Eclipse, Linux, … 50
  • 51. Mining  Searching + Mining Traditional approaches Code repositories 1 Eclipse, Linux, … 2 mining patterns Often lack sufficient relevant data points (Eg. API call sites) 51
  • 52. Mining  Searching + Mining Traditional approaches Code repositories 1 patterns mining 2 Eclipse, Linux, … Often lack sufficient relevant data points (Eg. API call sites) Our new approaches Code repositories 1 2 … Open source code on the web N searching mining patterns Code search engine e.g., 53 53 52
  • 53. Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 54. Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways Infrequent ways S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 55. Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways Infrequent ways mine patterns Mined Patterns S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 56. Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways mine patterns Infrequent ways detect violations Mined Patterns S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 57. Existing approaches produce high % of false positives One major observation: Programmers often write code in different ways for achieving the same task Some ways are more frequent than others Frequent ways mine patterns Infrequent ways detect violations Mined Patterns S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 58. Example: java.util.Iterator.next() Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } 58 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 59. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } 59 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 60. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } 1243 code examples 60 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 61. Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 1 (1218 / 1243) 1243 code examples 61 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 62. Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples 62 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 63. Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next” 63 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 64. Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next” 64 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 65. Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next” 65 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 66. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 67. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 68. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 69. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 P1 : boolean check on return of Iterator.hasNext before Iterator.next S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 70. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 P1 : boolean check on return of Iterator.hasNext before Iterator.next P2 : boolean check on return of ArrayList.size before Iterator.next S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 71. Example: java.util.Iterator.next() Code Sample 1 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 2 PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … }  Require more general patterns (alternative patterns): P1 or P2 P1 : boolean check on return of Iterator.hasNext before Iterator.next P2 : boolean check on return of ArrayList.size before Iterator.next  Cannot be mined by existing approaches, since alternative P2 is infrequent S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 72. Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 72 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 73. Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 1243 code examples Sample 2 (6/1243) Sample 1 (1218 / 1243) 73 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 74. Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 1243 code examples Sample 2 (6/1243) Sample 1 (1218 / 1243) P2 is infrequent among entire 1243 code examples 74 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 75. Our Solution: ImMiner Algorithm [ASE 09]  Mines alternative patterns of the form P1 or P2  Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1 1243 code examples Sample 2 (6/1243) Sample 1 (1218 / 1243) P2 is frequent among code examples not supporting P1 75 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 76. Alternative Patterns  ImMiner mines three kinds of alternative patterns of the general form “P1 or P2” Balanced: all alternatives (both P1 and P2) are frequent Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^2” Single: only one alternative 76 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 77. ImMiner Algorithm  Uses frequent-itemset mining [Burdick et al. ICDE 01] iteratively  An input database with the following APIs for Iterator.next() Input database Mapping of IDs to APIs S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 78. ImMiner Algorithm: Frequent Alternatives Input database Frequent itemset mining (min_sup 0.5) Frequent item: 1 P1: boolean-check on the return of Iterator.hasNext() before Iterator.next() S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 79. ImMiner: Infrequent Alternatives of P1 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 79
  • 80. ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 80
  • 81. ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 81
  • 82. ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) Negative database (NSD) S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 82
  • 83. ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) Negative database (NSD)  Mine patterns that are frequent in NSD and are infrequent in PSD  Reason: Only such patterns serve as alternatives for P1 S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 83
  • 84. ImMiner: Infrequent Alternatives of P1  Split input database into two databases: Positive and Negative Positive database (PSD) Negative database (NSD)  Mine patterns that are frequent in NSD and are infrequent in PSD  Reason: Only such patterns serve as alternatives for P1  Alternative Pattern : P2 “const check on the return of ArrayList.size() before Iterator.next()”  Alattin applies ImMiner algorithm to detect neglected conditions S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009. 84
  • 85. Neglected Conditions  Neglected conditions refer to  Missing conditions that check the arguments or receiver of the API call before the API call  Missing conditions that check the return or receiver of the API call after the API call  One primary reason for many fatal issues  security or buffer-overflow vulnerabilities [Chang et al. ISSTA 07] S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.
  • 86. • use Data Exploration and Analysis  Mining Software Repositories (MSR) • for Software Practitioners Beyond Software Developers • obtain Insightful and Actionable info Need get real as well • Analytic Techniques • Producing Impact on Practice
  • 87. Machine Learning that Matters [ICML’12 Wagstaff] http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
  • 88. • Hyper-Focus on Benchmark Data Sets • Hyper-Focus on Abstract Metrics • Lack of Follow-Through [ICML’12 Wagstaff] http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
  • 89. • Meaningful Evaluation Methods • Involvement of the World Outside ML • Eyes on the Prize [ICML’12 Wagstaff] http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
  • 90. MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
  • 91. MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Software Users Software Development Process Software Systems Research Topics Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
  • 92. MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Information Visualization Software Users Software Development Process Analysis Algorithms Software Systems Large-scale Computing Research Topics Technology Pillars Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
  • 93. MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Information Visualization Software Users Software Development Process Analysis Algorithms Software Systems Large-scale Computing Research Topics Technology Pillars Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
  • 94. MSRA Software Analytics Group Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services. Information Visualization Software Users Software Vertical Development Process Analysis Algorithms Horizontal Software Systems Large-scale Computing Research Topics Technology Pillars Contact: Dongmei Zhang (dongmeiz@microsoft.com) http://research.microsoft.com/groups/sa/
  • 96. Adoption Challenges for Software Analytics Must show value before data quality improves Correlation vs. Causation
  • 97. ICSE Papers: Industry vs. Academia Source© Carlo Ghezzi
  • 98. ICSE Papers: Industry vs. Academia OSDI 2008 26% vs. xSE ?% Developers, Programmers, Architects Among All Attendees Source© Carlo Ghezzi
  • 99. ICSE Papers: Industry vs. Academia OSDI 2008 26% vs. xSE ?% Developers, Programmers, Architects Among All Attendees MSR 11 Keynote ICSE 09 Keynote Source© Carlo Ghezzi MSR 12 Keynote ICSM 11 Keynote SCAM 12 Keynote
  • 100. "Are Automated Debugging [Research] Techniques Actually Helping Programmers?" • 50 years of automated debugging research – N papers  only 5 evaluated with actual programmers “ ” [ISSTA11 Parnin&Orso]
  • 101. Are Regression Testing [Research] Techniques Actually Helping Industry? • Likely most studied testing problems – N papers “ ” [STVR11 Yoo&Harman]
  • 102. Are [Some] Failure-Proneness Prediction [Research] Techniques Actually Helping? • Empirical software engineering (on prediction) – N papers ” [PROMISE11 Zeller et al.]
  • 103. A Researcher's Observation in HCI Research Community • “The reviewers simply do not value the difficulty of building real systems and how hard controlled studies are to run on real systems for real tasks. This is in contrast with how easy it is to build new interaction techniques and then to run tight, controlled studies on these new techniques with small, artificial tasks” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
  • 104. A Researcher's Observation in HCI Research Community • “This attitude is a joke and it offers researchers no incentive to do systems work. Why should they? Why should we put 3-4 person years into every CHI publication? Instead we can do 8 weeks of work on an idea piece or create a new interaction technique and test it tightly in 8-12 weeks and get a full CHI paper.” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
  • 105. A Researcher's Observation in HCI Research Community • “When will this community wake up and understand that they are going to run out any work on creating new systems (rather than small pieces of systems) and cede that important endeavor to industry?” • “We are our own worst enemies. I think we have been blinded by the perception that "true scientific" research is only found in controlled experiments and nice statistics.” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
  • 106. A Researcher's Observation in HCI Research Community • “When will this community wake up and understand that they are going to run out any work on creating new systems (rather than Does small pieces of systems) and cede that our research community important endeavor to industry?” have similar issues?? • “We are our own worst enemies. I think we have been blinded by the perception that "true scientific" research is only found in controlled experiments and nice statistics.” “I give up on CHI/UIST” by James Landay http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay
  • 107. MS Academic Search: “Pointer Analysis”
  • 108. “Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE’01] “During the past 21 years, over 75 papers and 9 Ph.D. theses have been published on pointer analysis. Given the tones of work on this topic one may wonder, “Haven't we solved this problem yet?'' With input from many researchers in the field, this paper describes issues related to pointer analysis and remaining open problems.” Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001) Source©M. Hind 58
  • 109. “Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE’01] Section 4.3 Designing an Analysis for a Client’s Needs “Barbara Ryder expands on this topic: “… We can all write an unbounded number of papers that compare different pointer analysis approximations in the abstract. However, this does not accomplish the key goal, which is to design and engineer pointer analyses that are useful for solving real software problems for realistic programs.” Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001) Source©M. Hind&B. Ryder 59
  • 110. MS Academic Search: “Clone Detection”
  • 111. MS Academic Search: “Clone Detection” Typically focus/evaluate on intermediate steps (e.g., clone detection) instead of ultimate tasks (e.g., bug detection or refactoring), even when the field already grows mature with n years of efforts on intermediate steps
  • 112. Some Success Stories of Applying Clone Detection [Focus on Ultimate Tasks] MSRA XIAO Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO: Tuning Code Clones at Hands of Engineers in Practice. In Proc. ACSAC 2012, http://research.microsoft.com/en-us/groups/sa/ Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proc. OSDI 2004. http://patterninsight.com/ http://www.blackducksoftware.com/ 61
  • 113. Suggested Actions  Tech Adoption • • • • Get research problems from real practice Get feedback from real practice Collaborate across disciplines Collaborate with industry
  • 114. •Software Analytics Data Exploration and Analysis For Software Practitioners Obtain Insightful and Actionable info With Analytic Techniques • Producing Impact on Practice
  • 115. Acknowledgments • Microsoft Research Asia Software Analytics Group • Ahmed Hassan, Lin Tan, Jian Pei • Many other colleagues 64
  • 116. Q&A
  • 117. •Software Analytics Data Exploration and Analysis For Software Practitioners Obtain Insightful and Actionable info With Analytic Techniques • Producing Impact on Practice