4. On SSL and Heartbleed
“[Heartbleed] is a software faw that
has left up to two-thirds of the
world’s websites vulnerable to
attack by hackers.”
– The Economist
5. “There is no such thing as bad publicity except
your own obituary.”
– Brendan Behan
6. ● “Most open-source software – and Open SSL is
no exception – is produced voluntarily by
people who are not paid for creating it. They
do it for love, professional pride or as a way of
demonstrating technical virtuosity. And mostly
they do it in their spare time.”
– John Naughton The Observer/The Guardian
'Heartbleed' bug can't be simply blamed on coders,
April 13, 2014
7. “Responsible corporate use of open-source
software should therefore involve some
measure of reciprocity: a corporation that
benefts hugely from such software ought to
put something back, either in the form of
fnancial support for a particular open-
source project, or – better still – by
encouraging its own software people to
contribute to the project.”
8. “Much of the invisible backbone of
websites from Google to Amazon to the
Federal Bureau of Investigation was built
by volunteer programmers in what is
known as the open-source community.”
9. “... volunteers, connected over the
Internet, work together to build free
software, to maintain and improve it and
to look for bugs. Ideally, they check one
another’s work in a peer review system
similar to that found in science.”
10. Linus Law:
“Given Enough Eyeballs, all Bugs are
Shallow”
Eric Raymond, The Cathedral and the Bazaar
11. In the case of Heartbleed
“There weren't enough eyeballs”
- Eric Raymond,
12. ● Code was created by a grad student
● Reviewed by S. Henson, core developer
of OpenSSL
● Included in OpenSSL in the Spring 2011
● Not discovered for 3 years!
Budget of openSSL:
– US$2,000 for 2013
13. the OpenSSL problem
● important infrastructure projects that are
run by small teams of volunteers
● on April 24, the Linux Foundation
announces the “Core Infrastructure
Initiative” to address it
14. Core Infrastructure Initiative
● Funded by:
– Amazon, Cisco, Dell, Facebook, Fujitsu, Google,
IBM, Intel, Microsoft, NetApp,
Rackspace,Qualcomm, VMware and The Linux
Foundation
● Funding to core projects:
– Fellowships to core developers
– as well as other resources to assist the project in
improving its security, enabling outside reviews,
and improving responsiveness to patch requests.
15. What is FOSS development?
● Most important feature of FOSS
– its free or open source license
● License
– Guarantees code is available to others to
reuse
– Becomes a social contract
among participants
16. What is OSS development?
● Most frequently defned as:
– Self organized teams developing software
without a central authority
● Code is open for review
– and reuse!!!
● Anybody can participate
17. What makes OSS development
possible?
● Teams of self-organized developers
and contributors
● The Internet
● A common toolkit
● Version control systems
18. Teams
● Come from all sectors:
– Professionals and hobbyists
– Paid and volunteers
– Novices and Experienced
– High-school students to PhDs
– All over the world!!!
● Highly motivated!
19. Common Toolkit
● To be able to collaborate you need a common set of tools
– Programming languages
● gcc, perl, python, java, ruby, lua, php...
– Editors and IDEs
● Emacs, vim, Eclipse, Netbeans...
– Libraries
● boost, maven, cpan, Pypi...
– Infrastructure
● Make, ant, cmake, bugzilla, etc.
– Hosting infrastructure
● Sourceforge, Google Code, github, bitbucket
● They must be available at zero cost to anybody
20. FOSS Toolkit
● I posit that one of the biggest infuences
of FOSS on the practice of Software
Development is the wide use of FOSS
tools for the development of software
– Most implementations of popular
programming languages today are open
source
– FOSS Editors and IDEs are
widely used too
21. Free Software Foundation
● The FSF had to boostrap the development
of the OSS toolkit
– To build an Operating System you need a
compiler
– Before you build a compiler you need an
editor, but you need an editor to build a
compiler
– gcc, emacs, bintools (ls, echo, cat, etc.), etc
24. Need for Code Reviews
● Many FOSS teams discovered that to
ship good quality software they needed
to review the source code
25. Fagan Code Inspections
● Code reviews performed at specifc stages of
development
Effective, but not widely used
26. Open Source style Code Reviews
● Fagan inspections were unfeasible
– Required participants to be in the same room
● Instead, code reviews started to be
incremental
– Rather than reviewing the whole, review the
delta (the patch)
29. code reviews in FOSS
(1) early, frequent reviews
(2) of small, independent, complete
contributions
(3) that are broadcast to a large group of
stakeholders, but only reviewed by a small
set of self-selected experts
(4) resulting in an effcient and effective peer
review technique.
- Peter Rigby
36. Version Control Systems
● At the beginning, FOSS used tar fles in
USENET
– the FSF would ship physical tapes!
● Today, version control systems are the norm
– Centralized or Distributed
● FOSS has a continuous and proven track of
innovation in version control systems
– FOSS democratized VC
37. On Version Control
● The VC is the
circulatory system of a
software development
● It brings the code to all
stakeholders
● A contribution is a patch
– one or more commits
38. the patch
● the patch should be reviewed
● most VCs don't support reviewing of
patches
39. the patch and its review
● Two models:
– Commit then Review
● Review the code after it has been integrated
or
– Review Then Commit (RTC)
● Review the patch before it is integrated
40. Linux
● Linux incorporated RTC early in its
process
● Linus needed integration of Review
process with VC
● No FOSS VC did it
– he turned to bitkeeper
41. Bitkeeper and Linux
● Symbiotic relationship
– Free (as in beer) licenses to linux developers with one
big condition
● User should not develop competing tools
– Bitkeeper rapidly improved Linux integration process
● simplifed integration of reviewed code
– Bitkeeper was probably infuenced by Linus workfow
– in 2005 bitkeeper revokes its license to Linux
developers
42. Git
● Many other distributed version control
systems before it
● What makes it special?
– Many features, but specially:
● Pull-requests
● git incorporates code review process with a
distributed version control system
– Even via email patches
51. Super-repository
● Collection of repositories cloned
(recursively) from the same repo
– At least one per developer
● In their personal computer
– At least one public repository
● The blessed
– In git, no way to trace them
52. Moving commits across the
superRepo
Method
Push Done at source, needs write access to destination
Pull Done at destination, needs read access to source
Email Source creates patch mails it; recipient applies it
56. ContinuousMining of Linux
● Linux has no centralized logging
– Nobody really knows what the superRepo is
– Commits fow without any event
broadcasting mechanism
● Who do we fnd the activity?
– Repos
– Commits
57. Semiautomatic Process
● Every 3 hrs, ask every repo
– What new commits do you have?
– What commits did you delete?
– Automatically resolve propagations
● Commits might propagate before we scan
● Daily:
– Are commits in repo by unknown committers?
● Answer:
– is there a new repo? or is committer new to repo?
58. Implementation
● Running since Nov. 2011
– Currently scans 650 repos every 3 hrs
– Retrieved
● 2.3 million commits (compared to 400k in Linus
repo)
● 109 million records in propagation table
<commit-id, added|deleted, repo, when>
59. Snapshot (Linus) Continuous
No Repos 1 479
Commits 64k 533k
Non-merge Commits 59k 485k
Unique Non-merges 58k 135k
%unique non-merges 98.9% 27.9%
Non-merges that reached Blessed 43.1%
Different authors emails 3434 5646
Different authors 2883 4575
Different committers emails 283 1185
Different committers 245 1058
60. Commit vs Patches
● Commit ids are insuffcient to tracks patches
● Large amount of work not reaching blessed
68. Linux Dashboard
● We asked two linux maintainers:
– Can this info be useful?
● Answer:
– “Yes”
… but not for what we expected...
69. Tracking commits in Linux
● Need to track patches, not commits
– Particularly important in consumer
repositories
– Need to cross-reference commits
● What commits contain the same patch?
– Some repos track commits from blessed via
cherry-picking
● Commit ids are useless
● So they annotate log with the origin commit id
70. Linux Commits Dashboard
● Where is my commit?
– My original commit, has it reached Linus?
● What was merged?
– What commits were merged at once by Linus?
● What commits are related to this one?
– Same patch
● Rebasing
● Cherry picking
– Mentioned in a commit
● This commit fxes bug introduced in X
● This commit reverts commit X
● http://o.cs.uvic.ca:20810/perl/cid.pl?cid=70cb8bb0d365f0bc8b20fa67347caf9598a4674e
●
74. Researcher states:
“40% of pull requests are not merged”
● Based on simply querying ghtorrent data
● But it ignores what really happens
● Many pull requests are merged without being
marked as merged in github
● Ghtorrent data has many potential threats to
validity
82. I. A repository is not necessarily a project
II. Most projects have few commits
III. Most projects are innactive
IV. A large proportion of repositories are not for software
engineering
V. More than two thirds of projects are personal
VI. Only a fraction of repos use pull requests
VII. If the commits in a pull-request are reworked, github only
records the resulting patch
VIII.Most pull-requests appear as non-merged, even though
they were merged
IX. Many active projects do not conduct all their sotfware
development activity in github
86. Self contained?
“Any serious project would have to have some
separate infrastructure - mailing lists, forums, irc
channels and their archives, build farms, etc. [...]
Thus while GitHub and all other project hosts are
used for collaboration, they are not and can not
be a complete solution.”
94. Themes: focus
● Simple tools
– git branching/merging
– github features seem to be enough for most
● Pull requests and issue tracking
● Focused interaction
– code-centric, focused communication
– asynchronous and unobtrusive
●
95. Focus: independence
● Decentralized work:
– git allows them to work independently
– yet they have visibility of what others do
● Low need for management:
– Need for a clear process (the workfow)
– They shy away from rigid management and team
structure
– Team managers recognize this
– Managers should be educated on using git/github
96. Focus: Exposure
● Easy contribution process
– Fork and potentially contribute without pre-
authorization
● Peer pressure
– Developers are conscious that their code is
readily visible to others
– Adoption of small, frequent contributions
97. OSS mentality
● At the operational level
– the nature of the work allows independence and self-
organization.
– developers are familiar with the idea of working this way and
share the mentality behind it.
● developers are self-driven
● share the mentality of
– self- organizing,
– minimizing communication and coordination needs,
– having ownership of code, and
– operating on a meritocratic, expertise-based model
99. The Github Ecosystem
● github is creating an ecosystem of
proprietary, cloud enabled applications
for software development teams
– Service integration
– JSON API
● Asana, Campfre, Lighthouse, Jira, Travis,
Trello, etc, etc.
101. Conclusions
● git and github are promoting the use of the pull-
request workfow
– small, independent contributions
– that can be reviewed before integration
● Effectively, adopting open source code practices
into their development
– Independent work
– Code reviews of contributions before they are
integrated