2. AI in SE: A Success Story
Large, active field, with:
● Growing research community
● Numerous conferences and workshops,
such as MSR, PROMISE, RAISE
● Large data repositories
● History of collaboration between industry
and academia
2
3. We're already good at drawing useful
conclusions. We expect further algorithmic
improvements.
But...
We need to improve our data!
3
4. Problem 1:
We don't know what data we need.
Trying to solve complex problems. Make
guesses, then collect data.
Results in missing attributes, added noise.
4
5. Problem 2:
The data we have is often weak.
Solution quality depends on data quality.
Some commonly-used data sets infamous for
missing values, unhelpful attributes, poor
recording standards.
5
6. We should improve data standards, but..
We need to use the data we have.
Synergy of human feedback and AI to turn
static data models into dynamic models.
Bring a Wikipedia model to data sets.
6
8. Enhanced Feedback Loop
8
Recommendation:
MC/DC
Helpful?
Yes
New Values for
Existing Attributes:
Num. Boolean
Expressions: 219
Num. Numeric
Calculations: 73
New Attributes to
Collect (and Values):
Ratio of Boolean to
Numeric Calculations:
3:1
Data to Delete:
Projects 1, 3, 7
9. Why should we enhance our data?
These dynamic data models allow:
● Low start-up costs.
● Build body of evidence over time.
● Address data quality issues.
● Human-in-the-loop feedback.
9
11. Challenge 2:
How do we use feedback?
Fundamental trade-off between human curation
and automated AI learning.
When should attributes be filtered? Un-updated
data phased out? New data added?
11
14. We propose feedback-driven dynamic
data models maintained by a synergy of
user-feedback and automated AI techniques.
We propose that dynamic data will allow for
low start-up costs, a stronger body of
evidence over time, and adaptations to
changing industrial conditions.
14
15. For discussion...
1. Is this even a good idea?
2. What can we do to solve data quality
issues? (other than just the idea suggested
here)
3. What kind of data would benefit from
dynamic adaptation?
4. How do we motivate users to provide
feedback, new data, and update old data?
15