This document contains the slides from a presentation on best practices for scientific computing. It discusses several key findings from studies on software engineering practices. Some of the main points summarized are:
- Early studies found that rigorous code inspections can remove 60-90% of errors before testing begins. Later work refined this by finding the first review and hour matter most.
- A classic 1975 study found that most errors are introduced during requirements and design, and errors become more expensive to fix the later they are found.
- More recent studies found that an individual's distance in an organization's structure is a better predictor of software quality than their geographic distance.
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
What We Actually Know About Programming And What We Ought To Do Next
1. 2013-03-22
1
What We Actually Know
About Programming
And What We Ought To Do Next
Greg Wilson http://software-carpentry.org March 2013
Best Practices for Scientific Computing 2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect
the rights and licenses associated with its
components.
Best Practices for Scientific Computing 3
Arrr, Matey
Seven Years’ War (actually 1754-63)
Britain lost 1,512 sailors to enemy action...
...and almost 100,000 to scurvy
Best Practices for Scientific Computing 4
James Lind (1716-94)
1747: (possibly) the first-ever
controlled medical experiment
× cider
× sulfuric acid
× vinegar
× sea water
√ oranges
× barley water
Of course, no-one paid attention until a proper Englishman
repeated the experiment in 1794...
Oh, the Irony
2. 2013-03-22
2
Best Practices for Scientific Computing 5
1950: Hill & Doll publish a
case-control study comparing
smokers with non-smokers
Now called the “British Doctors”
study, it ran until 2001
It Took a While...
Best Practices for Scientific Computing 6
#1: Smoking causes
lung cancer
#2: Most people would
rather fail than change
“What happens ‘on average’ is
of no help when one is faced
with a specific patient…”
What They Found
The Cochrane Collaboration (http://www.cochrane.org/)
now archives results from hundreds of medical studies
Best Practices for Scientific Computing 7
“[Using domain-specific languages] leads to two primary
benefits. The first, and simplest, is improved programmer
productivity... The second...is...communication with
domain experts.”
– Martin Fowler,
IEEE Software,
July/August 2009
So Where Are We?
Best Practices for Scientific Computing 8
One of the smartest guys in the industry...
...made two substantive claims of fact…
Look Closer
...in a peer-reviewed journal...
...without a single citation…
…because nobody expected one
3. 2013-03-22
3
Best Practices for Scientific Computing 9
Growing emphasis on empirical studies in
software engineering since the mid-1990s
Papers describing new tools or
practices routinely include results
from some kind of field study
Many are flawed or incomplete,
but standards are constantly improving
A New Hope
Best Practices for Scientific Computing 10
Rigorous inspections can remove 60-90% of errors before
the first test is run. (Fagan 1975)
A Classic Result
Best Practices for Scientific Computing 11
Rigorous inspections can remove 60-90% of errors before
the first test is run. (Fagan 1975)
The first review and hour matter most. (Cohen 2006)
A Classic Result Refined
Best Practices for Scientific Computing 12
Sackman, Erikson, and Grant (1968): “Exploratory
experimental studies comparing online and offline
programming performance.”
Or 10, or 40, or 100, or whatever other
large number pops into the head of
someone who can’t be bothered to
look up the reference...
The best programmers are
up to 28 times more productive than the worst.
Most Often Misquoted
4. 2013-03-22
4
Best Practices for Scientific Computing 13
1. Study was designed to compare batch vs. interactive,
not measure productivity
2. How was productivity measured, anyway?
3. Best vs. worst exaggerates any effect
4. Twelve programmers for an afternoon
Pick That Apart
Next major study was 54 programmers...
...for up to an hour
Best Practices for Scientific Computing 14
Boehm et al (1975): “Some Experience with
Automated Aids to the Design of Large-Scale
Reliable Software.”
...and many, many more since
1. Most errors are introduced
during requirements analysis
and design
2. The later they are removed,
the most expensive it is to
take them out
time
number/cost
Another Classic Result
Best Practices for Scientific Computing 15
Pessimists: “If we tackle the
hump in the error injection
curve, fewer bugs will get to the
expensive part of the fixing
curve.”
Optimists: “If we
do lots of short
iterations, the total
cost of fixing bugs
will go down.”
That Explains a Lot
Best Practices for Scientific Computing 16
Nagappan et al (2007) & Bird et al (2009):
Geography has little correlation with software quality
Isn’t That Interesting…
5. 2013-03-22
5
Best Practices for Scientific Computing 17
Nagappan et al (2007) & Bird et al (2009):
Distance in the org chart is a much better predictor
Isn’t That Interesting…
Best Practices for Scientific Computing 18
Are any metrics better at predicting faults/effort than LOC?
No.
A Few More Results
Best Practices for Scientific Computing 19
Do more frequent releases improve software quality?
Yes, but it also changes the nature of the bugs.
A Few More Results
Best Practices for Scientific Computing 20
Are there better ways to teach programming?
Yes: media-based instruction and peer instruction.
A Few More Results
6. 2013-03-22
6
Best Practices for Scientific Computing 21
Sampling Bias
I focus on quantitative
studies because they’re
what I know best
A lot of the best work in this
field is using qualitative
methods
Best Practices for Scientific Computing 22
All Together Now
Andy Oram & Greg Wilson (ed):
Making Software: What Really
Works, and Why We Believe It.
O'Reilly, 2010, 978-0596808327.
http://neverworkintheory.org
Best Practices for Scientific Computing 23
Where to Start?
Best Practices for Scientific Computing 24
Where to Start?
Many practices can be monitored
automatically
7. 2013-03-22
7
Best Practices for Scientific Computing 25
Where to Start?
But top-down initiatives usually don’t work
Best Practices for Scientific Computing 26
1. How do you identify people with good ideas?
Where to Start?
Best Practices for Scientific Computing 27
1. How do you identify people with good ideas?
2. How do you reward people for good ideas?
Where to Start?
Best Practices for Scientific Computing 28
1. How do you identify people with good ideas?
2. How do you reward people for good ideas?
3. How do they share those ideas with peers?
Where to Start?
8. 2013-03-22
8
Best Practices for Scientific Computing 29
1. How do you identify people with good ideas?
2. How do you reward people for good ideas?
3. How do they share those ideas with peers?
4. How do you tell if it actually worked?
Where to Start?
Best Practices for Scientific Computing 30
Where to Start?
Remember: some changes
will be qualitative
1. How do you identify people with good ideas?
2. How do you reward people for good ideas?
3. How do they share those ideas with peers?
4. How do you tell if it actually worked?
Best Practices for Scientific Computing 31
Words to Live By
If you build a man a fire,
you'll keep him warm for a night.
If you set a man on fire,
you'll keep him warm for the rest of his life.
— Terry Pratchett
Best Practices for Scientific Computing 32
http://software-carpentry.org
Thank You