1. The Ultimate Debian
Database
Israel Herraiz
<israel.herraiz@upm.es>
Davis, CA, July 26th 2012
Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
2. Outline
1. Debian: what is it and sources of data
2. The UDD: what is it and where to get it
3. What has been done and what we can do
1 / 25
4. Debian
• GNU/Linux software distribution
• Goal: to deliver an entirely and exclusively free
distribution
• Maintained by volunteers
• Bureaucratic organization (policies, constitution,
social contract)
• Release when ready
• > 10 years history
• > 500 MSLOC
• > 15k packages
3 / 25
8. Source and Binary Packages
• A source package generates one or more binary
packages
octave-core
octave-doc
octave
liboctave
liboctave-dev
7 / 25
9. Package uploads
• There are no repositories like in other software
projects
• Although developers may privately use version
control systems
• When a bug is fixed, a new version is uploaded
• Uploads == commits
8 / 25
13. Debian Popcon: Tracking Installations
• Popularity: total
install counts
• Recent Use (< 30
days)
• Old Use (Beyond 30
days)
• Data collected daily
• Users voluntarily opt-
in
• Source of bias
12 / 25
14. Debian Bugs
• People find bugs in binary packages
• ~500 bugs per month
• But bugs are linked to source packages
• Bugs can be
• Accepted and solved in Debian
• Rejected
• Forwarded to upstream
• Everything else, similar to other bug tracking
systems
• Life cycle, comments, severity levels…
13 / 25
15. 2. The UDD: what is it and
where to get it
14 / 25
18. What is the UDD?
• PostgreSQL database with all the information of
the sources described so far
• http://udd.debian.org
• New dumps available every two days
• ~ 500 MB bz2
• Used for some Debian internal services
• Schema too complex and too big for a slide
• Technical detail: you need a Debian-based
system to load the dump of the UDD
17 / 25
19. Debian sources of data
• Sources / Packages • Lintian
metadata • Migrations to testing
• Bugs • Uploads
• including *all* • All the way back to
archived bugs 1998!
• 1995-96-97
• New packages queue
• Carnivore
• Translations status
• Debtags
• Orphaned packages
• Popularity Contest
• Screenshots
• DEHS
18 / 25
21. Bear in mind!
• You can also obtain the source code of the
packages
• Easy to automate
• And the modifications done by the Debian
maintainers
• So add product metrics to the set of data
sources
• But this is not included in the UDD
20 / 25
22. 3. What has been done and
what we can do
21 / 25
23. What kind of questions does Debian solve with the
UDD?
• High priority packages that have Release
Candidate blocker bugs
• Developers with very buggy and/or outdated
packages
• Who uploaded this package to the unstable
release?
• Who reported the RC bugs since the last
release?
22 / 25
24. Some questions solved in the literature
• The popularity bias
• http://oa.upm.es/9585/
• Open source projects get more bug reports if
they are popular
• The actual number of bugs is not related to the
number of bugs reported
• So more bugs actually means more quality
• Well, at least more people who decide to use the
software
23 / 25
26. Summary
• Packages and sources metadata
• And source code
• Bugs
• All the way back to 1995-96-97!
• Popularity contest
• Maintainers activity (uploads)
• All the way back to 1998!
• And much more….
• Now, what do you think we can do with this?
25 / 25