Data Science Challenges in Personal Program Analysis
1. Data Science Challenges in
Personal Program Analysis
Bas van Schaik
New York R Conference (April 2016)
2. - Cloud service for
personal program analysis
- Free for OSS projects
- Currently in private beta,
release imminent
3. Personal Program Analysis: why?
We are passionate about code.
We wish everyone would
write better code.
We help people
build better software better.
5. What’s an ‘Alert’?
Short answer: a bug or a violation of good coding practice
Example: define the same key twice in a Python dict
E.g. in OpenStack Designate:
self.target = objects.PoolTarget.from_dict({
'type': 'powerdns',
'options': [{
'key': 'connection', 'value': 'memory://',
'key': 'host', 'value': '127.0.0.1',
'key': 'port', 'value': 53}],
})
My guess of what was intended:
self.target = objects.PoolTarget.from_dict({
'type': 'powerdns',
'options': [
{'key': 'connection', 'value': 'memory://'},
{'key': 'host', 'value': '127.0.0.1'},
{'key': 'port', 'value': 53}],
})
6. What’s an ‘Alert’?
Alerts are found by queries:
●
The source code is our database
●
Every query result is an alert.
Support for 10 different programming languages (and counting), a total > 1000
queries and metrics.
7. What does a query look like?
from Method m
where m.hasName("hashcode") and
m.hasNoParameters()
select m, "Should this method be called
'hashCode' rather than 'hashcode'?"
8. Making it interesting: project over time
netalerts
activity
compositionnetLOC
OpenStack Nova (python)
9. Or: compare different projects
Cinder
Nova
Neutron
Horizon
Heat
Swift
Sahara
Glance
Designate
Keystone
Fuel
Ironic
alerts
LOC
10. Even more interesting: make it personal
A
X
net LOC contributed (all OpenStack modules)
netalerts
B
11. Data Science for PPA: finding fun facts
Trailblazer
Bug squasher
Refactorer
None
Major release
Totalcontributors%contributors
Who's doing what
in OpenStack?
12. Data science for PPA: cleaning
PostgreSQL (net churn and net alerts - before cleaning)
PostgreSQL: after cleaning
14. But… why make it personal?
Some developers not so happy:
“are you questioning my ability to write code?”
No. We're helping you to improve.
15. But… why make it personal?
By making it personal, we make people care.
When people care, they improve.
When developers improve, the code improves.
16. But… why make it personal?
When developers improve, the code improves.
● Automated code review on GitHub pull requests
● “On 12/11/2015 you introduced X, fancy fixing that?”
● “You recently fixed alert A in file B. Based on your expertise, you
might also be interested in fixing alert X in file Y?”
● “Compared to developers like you, you rank 20 out of 100”
● “… and by fixing these 5 alerts, you'll be in the top 10!”
● Found a bug in your project? Write a query for it, share it!
19. Interested in…
Early access to CodingStars?
Having your OSS project analysed?
Working for us in New York, San
Francisco, Oxford (UK), or
Copenhagen (Denmark)?
Talk to us!
(in person, or bas@semmle.com)