SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
The Ultimate Debian
  Database
  Israel Herraiz
  <israel.herraiz@upm.es>

  Davis, CA, July 26th 2012



Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
Outline

1. Debian: what is it and sources of data

2. The UDD: what is it and where to get it

3. What has been done and what we can do




                                             1 / 25
1. Debian: what is it and
sources of data

                            2 / 25
Debian

• GNU/Linux software distribution
   •   Goal: to deliver an entirely and exclusively free
       distribution
• Maintained by volunteers
• Bureaucratic organization (policies, constitution,
  social contract)
• Release when ready
• > 10 years history
• > 500 MSLOC
• > 15k packages
                                                           3 / 25
Debian Releases




                  4 / 25
5 / 25
Debian Source Packages




                         6 / 25
Source and Binary Packages

• A source package generates one or more binary
  packages
                                 octave-core

                                 octave-doc

   octave
                                  liboctave

                                 liboctave-dev


                                                 7 / 25
Package uploads

• There are no repositories like in other software
  projects
  •   Although developers may privately use version
      control systems
• When a bug is fixed, a new version is uploaded
  •   Uploads == commits




                                                      8 / 25
Source Packages metadata


Source: octave
Section: math
Priority: extra
Maintainer: Debian Octave Group <pkg-octave-devel@lists.alioth.debian.org>
Uploaders: Thomas Weber <tweber@debian.org>, Sébastien Villemot
<sebastien.villemot@ens.fr>
DM-Upload-Allowed: yes
Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….
Standards-Version: 3.9.3
Homepage: http://www.octave.org/
Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git
Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git




                                                                                 9 / 25
Binary Packages metadata
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
                                                                        10 / 25
Binary Packages metadata
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
                                                                        11 / 25
Debian Popcon: Tracking Installations

• Popularity: total
  install counts
  •   Recent Use (< 30
      days)
  •   Old Use (Beyond 30
      days)
• Data collected daily
• Users voluntarily opt-
  in
  •   Source of bias

                                                12 / 25
Debian Bugs

• People find bugs in binary packages
  •   ~500 bugs per month
• But bugs are linked to source packages
• Bugs can be
  •   Accepted and solved in Debian
  •   Rejected
  •   Forwarded to upstream
• Everything else, similar to other bug tracking
  systems
  •   Life cycle, comments, severity levels…
                                               13 / 25
2. The UDD: what is it and
where to get it

                             14 / 25
Research work: main paper (at MSR 2010)




                                          15 / 25
Other papers at MSR 2010




                           16 / 25
What is the UDD?

• PostgreSQL database with all the information of
  the sources described so far
  •   http://udd.debian.org
• New dumps available every two days
  •   ~ 500 MB bz2
• Used for some Debian internal services
• Schema too complex and too big for a slide 
• Technical detail: you need a Debian-based
  system to load the dump of the UDD

                                                17 / 25
Debian sources of data

• Sources / Packages • Lintian
  metadata           • Migrations to testing
• Bugs               • Uploads
    •   including *all*             •   All the way back to
        archived bugs                   1998!
        •   1995-96-97
                                •   New packages queue
•   Carnivore
                                •   Translations status
•   Debtags
                                •   Orphaned packages
•   Popularity Contest
                                •   Screenshots
•   DEHS
                                                              18 / 25
!

    19 / 25
Bear in mind!

• You can also obtain the source code of the
  packages
  •   Easy to automate
• And the modifications done by the Debian
  maintainers
• So add product metrics to the set of data
  sources
• But this is not included in the UDD


                                           20 / 25
3. What has been done and
what we can do

                            21 / 25
What kind of questions does Debian solve with the
                       UDD?
• High priority packages that have           Release
  Candidate blocker bugs
• Developers with very buggy and/or         outdated
  packages
• Who uploaded this package to the          unstable
  release?
• Who reported the RC bugs since            the last
  release?


                                                      22 / 25
Some questions solved in the literature

• The popularity bias
      •   http://oa.upm.es/9585/
  •   Open source projects get more bug reports if
      they are popular
  •   The actual number of bugs is not related to the
      number of bugs reported
  •   So more bugs actually means more quality
      •   Well, at least more people who decide to use the
          software


                                                             23 / 25
The popularity bias


            Required packages
Log(Bugs)




                    Log(installations)
                                         24 / 25
Summary

• Packages and sources metadata
     •   And source code
• Bugs
     •   All the way back to 1995-96-97!
• Popularity contest
• Maintainers activity (uploads)
     •   All the way back to 1998!
• And much more….
• Now, what do you think we can do with this?

                                                25 / 25

Contenu connexe

Tendances

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 

Tendances (11)

Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)
 
Comments on carriage of timed text and visual overlays in MP4
Comments on carriage of timed text and visual overlays in MP4Comments on carriage of timed text and visual overlays in MP4
Comments on carriage of timed text and visual overlays in MP4
 
Ceph Day Santa Clara: Ceph Fundamentals
Ceph Day Santa Clara: Ceph Fundamentals Ceph Day Santa Clara: Ceph Fundamentals
Ceph Day Santa Clara: Ceph Fundamentals
 
Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
Perforce Helix Never Dies: DevOps at Bandai Namco StudiosPerforce Helix Never Dies: DevOps at Bandai Namco Studios
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
 
Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse Branches
 
Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFS
 
MPEG-DASH open source tools and cloud services
MPEG-DASH open source tools and cloud servicesMPEG-DASH open source tools and cloud services
MPEG-DASH open source tools and cloud services
 

En vedette

Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptography
Israel Herraiz
 
Informe tecnico unidad 5 tap
Informe tecnico unidad 5 tapInforme tecnico unidad 5 tap
Informe tecnico unidad 5 tap
Irving Che
 
Practica martes22
Practica martes22Practica martes22
Practica martes22
jamarzo
 
Practica3
Practica3Practica3
Practica3
jamarzo
 
Electrónica analogica
Electrónica analogicaElectrónica analogica
Electrónica analogica
Irving Che
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguay
José Caraguay
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguay
José Caraguay
 
Fotos tomadas con ingenio
Fotos tomadas con ingenioFotos tomadas con ingenio
Fotos tomadas con ingenio
José Caraguay
 

En vedette (20)

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolution
 
Statistical Distribution of Metrics
Statistical Distribution of MetricsStatistical Distribution of Metrics
Statistical Distribution of Metrics
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software cost
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptography
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasets
 
Informe tecnico unidad 5 tap
Informe tecnico unidad 5 tapInforme tecnico unidad 5 tap
Informe tecnico unidad 5 tap
 
Practica martes22
Practica martes22Practica martes22
Practica martes22
 
Informe tecnico unidad 3
Informe tecnico unidad 3Informe tecnico unidad 3
Informe tecnico unidad 3
 
Esfera
EsferaEsfera
Esfera
 
Practica3
Practica3Practica3
Practica3
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
Electrónica analogica
Electrónica analogicaElectrónica analogica
Electrónica analogica
 
Comenzar
ComenzarComenzar
Comenzar
 
Examen
Examen Examen
Examen
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguay
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguay
 
Fotos tomadas con ingenio
Fotos tomadas con ingenioFotos tomadas con ingenio
Fotos tomadas con ingenio
 
Examen Word
Examen WordExamen Word
Examen Word
 

Similaire à The Ultimate Debian Database

Debian general presentation
Debian general presentationDebian general presentation
Debian general presentation
Ding Zhou
 
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
IxiaRomania
 

Similaire à The Ultimate Debian Database (20)

Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Distro Recipes 2013 : Debian and quality assurance
Distro Recipes 2013 : Debian and quality assuranceDistro Recipes 2013 : Debian and quality assurance
Distro Recipes 2013 : Debian and quality assurance
 
Docker and the Linux Kernel
Docker and the Linux KernelDocker and the Linux Kernel
Docker and the Linux Kernel
 
Linux: Everyting-as-a-service
Linux: Everyting-as-a-serviceLinux: Everyting-as-a-service
Linux: Everyting-as-a-service
 
Smau Milano 2016 - Fabio Alessandro Locati
Smau Milano 2016 - Fabio Alessandro LocatiSmau Milano 2016 - Fabio Alessandro Locati
Smau Milano 2016 - Fabio Alessandro Locati
 
Perl Dist::Surveyor 2011
Perl Dist::Surveyor 2011Perl Dist::Surveyor 2011
Perl Dist::Surveyor 2011
 
Debian general presentation
Debian general presentationDebian general presentation
Debian general presentation
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
 
Building community with CentOS Stream
Building community with CentOS StreamBuilding community with CentOS Stream
Building community with CentOS Stream
 
CentOS at Facebook
CentOS at FacebookCentOS at Facebook
CentOS at Facebook
 
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
 
Upgrading CentOS on the Facebook fleet
Upgrading CentOS on the Facebook fleetUpgrading CentOS on the Facebook fleet
Upgrading CentOS on the Facebook fleet
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
 
CentOS Stream at Facebook
CentOS Stream at FacebookCentOS Stream at Facebook
CentOS Stream at Facebook
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
The Gory Details of Debian packages
The Gory Details of Debian packagesThe Gory Details of Debian packages
The Gory Details of Debian packages
 
Managing Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EraManaging Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub Era
 

Dernier

Dernier (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 

The Ultimate Debian Database

  • 1. The Ultimate Debian Database Israel Herraiz <israel.herraiz@upm.es> Davis, CA, July 26th 2012 Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
  • 2. Outline 1. Debian: what is it and sources of data 2. The UDD: what is it and where to get it 3. What has been done and what we can do 1 / 25
  • 3. 1. Debian: what is it and sources of data 2 / 25
  • 4. Debian • GNU/Linux software distribution • Goal: to deliver an entirely and exclusively free distribution • Maintained by volunteers • Bureaucratic organization (policies, constitution, social contract) • Release when ready • > 10 years history • > 500 MSLOC • > 15k packages 3 / 25
  • 8. Source and Binary Packages • A source package generates one or more binary packages octave-core octave-doc octave liboctave liboctave-dev 7 / 25
  • 9. Package uploads • There are no repositories like in other software projects • Although developers may privately use version control systems • When a bug is fixed, a new version is uploaded • Uploads == commits 8 / 25
  • 10. Source Packages metadata Source: octave Section: math Priority: extra Maintainer: Debian Octave Group <pkg-octave-devel@lists.alioth.debian.org> Uploaders: Thomas Weber <tweber@debian.org>, Sébastien Villemot <sebastien.villemot@ens.fr> DM-Upload-Allowed: yes Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo …. Standards-Version: 3.9.3 Homepage: http://www.octave.org/ Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git 9 / 25
  • 11. Binary Packages metadata Package: octave Priority: extra Section: math Installed-Size: 4760 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Version: 3.6.1-1ubuntu1ppa1~precise1 Recommends: gnuplot, libatlas3gf-base Replaces: octave3.2 Suggests: octave-info, octave-doc, octave-htmldoc Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), … Conflicts: octave3.2 Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb Size: 1746050 MD5sum: 2c431556d6cf98fd8a341e865ac63058 SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7 Description: GNU Octave language for numerical computations… 10 / 25
  • 12. Binary Packages metadata Package: octave Priority: extra Section: math Installed-Size: 4760 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Version: 3.6.1-1ubuntu1ppa1~precise1 Recommends: gnuplot, libatlas3gf-base Replaces: octave3.2 Suggests: octave-info, octave-doc, octave-htmldoc Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), … Conflicts: octave3.2 Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb Size: 1746050 MD5sum: 2c431556d6cf98fd8a341e865ac63058 SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7 Description: GNU Octave language for numerical computations… 11 / 25
  • 13. Debian Popcon: Tracking Installations • Popularity: total install counts • Recent Use (< 30 days) • Old Use (Beyond 30 days) • Data collected daily • Users voluntarily opt- in • Source of bias 12 / 25
  • 14. Debian Bugs • People find bugs in binary packages • ~500 bugs per month • But bugs are linked to source packages • Bugs can be • Accepted and solved in Debian • Rejected • Forwarded to upstream • Everything else, similar to other bug tracking systems • Life cycle, comments, severity levels… 13 / 25
  • 15. 2. The UDD: what is it and where to get it 14 / 25
  • 16. Research work: main paper (at MSR 2010) 15 / 25
  • 17. Other papers at MSR 2010 16 / 25
  • 18. What is the UDD? • PostgreSQL database with all the information of the sources described so far • http://udd.debian.org • New dumps available every two days • ~ 500 MB bz2 • Used for some Debian internal services • Schema too complex and too big for a slide  • Technical detail: you need a Debian-based system to load the dump of the UDD 17 / 25
  • 19. Debian sources of data • Sources / Packages • Lintian metadata • Migrations to testing • Bugs • Uploads • including *all* • All the way back to archived bugs 1998! • 1995-96-97 • New packages queue • Carnivore • Translations status • Debtags • Orphaned packages • Popularity Contest • Screenshots • DEHS 18 / 25
  • 20. ! 19 / 25
  • 21. Bear in mind! • You can also obtain the source code of the packages • Easy to automate • And the modifications done by the Debian maintainers • So add product metrics to the set of data sources • But this is not included in the UDD 20 / 25
  • 22. 3. What has been done and what we can do 21 / 25
  • 23. What kind of questions does Debian solve with the UDD? • High priority packages that have Release Candidate blocker bugs • Developers with very buggy and/or outdated packages • Who uploaded this package to the unstable release? • Who reported the RC bugs since the last release? 22 / 25
  • 24. Some questions solved in the literature • The popularity bias • http://oa.upm.es/9585/ • Open source projects get more bug reports if they are popular • The actual number of bugs is not related to the number of bugs reported • So more bugs actually means more quality • Well, at least more people who decide to use the software 23 / 25
  • 25. The popularity bias Required packages Log(Bugs) Log(installations) 24 / 25
  • 26. Summary • Packages and sources metadata • And source code • Bugs • All the way back to 1995-96-97! • Popularity contest • Maintainers activity (uploads) • All the way back to 1998! • And much more…. • Now, what do you think we can do with this? 25 / 25