3. “Dark Matter Developers”
Scott Hanselman (March 2012)
!
[We] hypothesize that there is another kind
of developer than the ones we meet all the
time. We call them Dark Matter Developers.
They don't read a lot of blogs, they never write blogs,
they don't go to user groups, they don't tweet or
facebook, and you don't often see them at large
conferences... [They] aren't chasing the latest beta
or pushing...limits, they are just producing.
!
!
http://www.hanselman.com/blog/DarkMatterDevelopersTheUnseen99.aspx
!3
4. We’re Not That Optimistic
l
90% of hydrologists have their heads down
−
l
l
Doing hydrology instead of talking about using
Charm++ to asynchronously distribute their CUDAaccelerated heterogeneous N-body Universe
simulations
Not because they don't
want to do all that Star
Trek stuff
But because the
majority of the field isn’t
there yet.
!4
8. Surely You're Exaggerating
1. How many graduate students write shell
scripts to analyze each new data set
instead of running those analyses by hand?
!8
9. Surely You're Exaggerating
2. How many of them use version control
to keep track of their work and collaborate
with colleagues?
!9
10. Surely You're Exaggerating
3.
How many routinely break large problems
into pieces small enough to be
- comprehensible,
- testable, and
- reusable?
!10
11. Surely You're Exaggerating
3.
How many routinely break large problems
into pieces small enough to be
- comprehensible,
- testable, and
- reusable?
And how many know those are the
same things?
!11
13. We've Left the Majority Behind
Other than
Googling for things,
the majority of scientists
do not use computers
more effectively today
than they did 28 years ago.
!13
14. “Not My Problem”
Actually your biggest problem
•
•
•
Application Yesterday’s models are solving today’s
million dollar problems, because nobody knows how to
update them.
Collaboration Hundreds of person-years of effort are
redundantly combining to create a year’s worth of result
Credibility Many published results can’t be trusted,
because the scientific code isn’t modular, can’t be
tested, and sometimes isn’t even released
And Hydrology Loses
!14
15. Jeff Hammond
Your code will outlive your
supercomputer. Most
successful supercomputers
lasts 5 years. The most
successful codes may reach 5
decades. There are currently
40 year-old modeling codes
still being used.
Argonne Leadership
Computing Facility
!15
16. Where Are Your Goalposts?
A hydrologist is computationally competent if she
can build, use, validate, and share software to:
•Manage software and data
•Tell if it's working correctly
•Find and fix problems when it isn’t
•Keep track of what she’s done
•Share work with others
•Do all of these things efficiently
!16
17. A Driver's License Exam
l
l
Developed with the Software Sustainability
Institute for the DiRAC Consortium
Formative assessment
−
l
Do you know what you need to know in order to
get the most out of this gear?
Pencils ready?
Note: actual exam allows for several different
programming languages, version control systems, etc.
!17
18. A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.
Check out a working copy of the exam
materials from a git repository.
Could do it easily
Could struggle through
1
0.5
Wouldn’t know where to start
0
Don’t understand the question
-1
!18
19. A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.
Use find and grep in a pipe to create a
list of all .dat files in the working copy,
redirect the output to a file, and add that
file to the repository.
Could do it easily
Could struggle through
1
0.5
Wouldn’t know where to start
0
Don’t understand the question
-1
!19
20. A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.
Write a shell script that runs a legacy
program for each parameter in a set.
Could do it easily
Could struggle through
1
0.5
Wouldn’t know where to start
0
Don’t understand the question
-1
!20
21. A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.
Edit the Makefile provided so that if any
.dat file changes, analyze.py is re-run to
create the corresponding .out file.
Could do it easily
Could struggle through
1
0.5
Wouldn’t know where to start
0
Don’t understand the question
-1
!21
22. A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.
Write four unit tests to exercise a function
that calculates running sums. Explain why
your four tests are most likely to uncover
bugs in the function.
Could do it easily
Could struggle through
1
0.5
Wouldn’t know where to start
0
Don’t understand the question
-1
!22
23. A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.
Explain when and how a function could
pass your tests, but still fail on real data.
Could do it easily
Could struggle through
1
0.5
Wouldn’t know where to start
0
Don’t understand the question
-1
!23
24. A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.
Do a code review of the legacy program
from Q3 (about 50 lines long) and list
the four most important improvements
you would make.
Could do it easily
Could struggle through
1
0.5
Wouldn’t know where to start
0
Don’t understand the question
-1
!24
26. How Are We Doing?
a) How well did you do?
b) How well do you think most hydrologists would
do?
!26
27. How Are We Doing?
a) How well did you do?
b) How well do you think most hydrologists would do?
c) Do you think a hydrologist could use your software
without having these skills?
d) Or a grasp of the principles behind them?
!27
28. What Has Worked
l
Target graduate students
l
Intensive short courses (2 days to 2 weeks)
l
Focus on practical skills
!28
29. What Has Worked
l
Target graduate students
l
Intensive short courses (2 days to 2 weeks)
l
Focus on practical skills
http://software-carpentry.org
!29
31. What We Teach
l
Unix shell
how to automate repetitive tasks
l
Git and GitHub
how to use distributed version control to manage,
collaborate and share work
l
Python or R
how to grow a program in a structured, modular,
testable, reusable way
l
Databases
the difference between structured and unstructured data
!31
33. “That's Merely Useful”
l
None of this is publishable any longer
−
Which means it's ineffective career-wise for
academic computer scientists
!33
34. “That's Merely Useful”
l
None of this is publishable any longer
−
l
Which means it's ineffective career-wise for
academic computer scientists
But it works
−
Two independent assessments have validated the
impact of what we do
!34
36. Teach a Workshop
l
All our materials are CC-BY licensed
l
We will help train you
Text
???
!36
37. Shine Some Light
l
l
Include a few lines about your software stack
and working practices in your scientific papers
Ask about them when reviewing
−
−
−
Where's the repository containing the code?
What's the coverage of your unit tests?
How did you track provenance?
!37
38. Links!
Find this talk on Twitter: @ahmadia
!
Me:
http://aron.ahmadia.net
!
Software Carpentry:
http://software-carpentry.org
!
US Army Engineer Research and
Development Center:
http://www.erdc.usace.army.mil/
!38