Publicité
Publicité

Contenu connexe

Publicité

Plus de Tom Mens(20)

Publicité

The (r)evolution of CI/CD on GitHub

  1. The (r)evolution of CI/CD on GitHub Promises and Perils of the GitHub Actions ecosystem Tom Mens Software Engineering Lab March 2023 SECO-ASSIST secoassist.github.io
  2. 2
  3. 3
  4. Collaborative software development 4 Commits Issues Pull Requests Comments Code Reviews Discussions Project Management ... Continuous Integration Quality analysis Build Test Deploy GitHub Actions
  5. Examples of CI/CD tools 5
  6. Specifying GitHub Actions workflows 6 repository workflow 3 workflow 2 step 3 job 1 workflow 1 job 2 job 3 workflows jobs steps repository Parallel Parallel by default / sequential Sequential .github/workflows/ strategy step 2 step 1 use: (action) run: (shell cmd) use: (action)
  7. Running workflows 7
  8. GitHub marketplace 8 Reusing Actions from GitHub MarketPlace
  9. On the rise and fall of CI services in GitHub Mehdi Golzadeh Software Engineering Lab University of Mons Mons, Belgium mehdi.golzadeh@umons.ac.be Alexandre Decan Software Engineering Lab University of Mons Mons, Belgium alexandre.decan@umons.ac.be Tom Mens Software Engineering Lab University of Mons Mons, Belgium tom.mens@umons.ac.be Abstract—Continuous integration (CI) services are used in collaborative open source projects to automate parts of the development workflow. Such services have been in widespread use for over a decade, with new CIs being introduced over the years, sometimes overtaking other CIs in popularity. We conducted a longitudinal empirical study over a period of nine years, aiming to better understand this rapidly evolving CI landscape. By analysing the development history of 91,810 GitHub repositories of active npm packages having used at least one CI service, we quantitatively studied the evolution of seven popular CIs, specifically focusing on their co-usage and migration in the considered repositories. We provide statistical evidence of the rise of GitHub Actions, that has become the dominant CI service in less than 18 months time. This coincides with the fall of Travis that has seen an important decrease in usage, likely due to a combination of policy changes and migrations to GitHub Actions. Index Terms—Continuous integration, distributed software development, software repositories, GitHub I. INTRODUCTION Continuous integration (CI), deployment and delivery have become the cornerstone of collaborative software development and DevOps practices. CI automates the integration of code changes from multiple contributors into a central repository where automated builds, tests and code quality checks run. Well-known examples of CI services are Jenkins, Travis, CircleCI and AppVeyor. CI services can also be built-in in social coding platforms such as GitHub and GitLab [1]. GitLab already featured CI capabilities since November 2012. Based on popular demand, and in response to CI support integrated in GitLab, GitHub publicly announced the beta version of GitHub Actions (abbreviated to GHA in the remainder of this article) in October 2018. In August 2019, they officially began supporting Continuous Integration through GHA, and the product was released publicly in November 2019. GHA [2] allows to automate a wide range of tasks based on a variety of triggers such as commits, issues, pull requests, comments and many more. GHA can be used to facilitate code reviews, code quality analysis, communication, dependency and security monitoring and management, testing, etc. GHA facilitates the integration with external services, and can even obviate the need of using such external services altogether. GitHub is by far the largest social coding platform, hosting the development history of millions of collaborative software repositories, and accommodating over 56 million users in September 2020 [3]. Given its popularity and the ease with which GHA allows to automate the CI workflow, we hypoth- esise that GHA has had a significant impact on today’s CI landscape. More particularly, we believe that it has increased the awareness of the need for CI, it has reduced the entry barrier for projects to start using CI, and it may have lead projects to migrate from other CI services towards GHA. This article aims to quantitatively and objectively verify these hypotheses, and discusses their consequences, through a longitudinal analysis of how different CIs have been used over a nine-year period in 91,810 GitHub repositories correspond- ing to the software development history of reusable Node.JS packages distributed through the npm package registry. This empirical study focuses on four research questions: RQ1 How did the CI landscape evolve? We identified 20 different CIs being used in the considered set of repositories, some of which were considerably more prevalent than others. Together with Travis, GHA covers more than 80% of all usages. Moreover, in only 18 months GHA has overtaken all other CIs in popularity. RQ2 What are the most frequent combinations of CIs? We observed that many repositories have used multiple CIs during their lifetime. AppVeyor is nearly always used in combination with some other CI. If a repository uses a CI simultaneously with another one, it is mostly in combination with Travis, GHA or CircleCI. RQ3 How frequently are CIs being replaced by an alternative? We observed a non-negligible amount of CI migrations. GHA attracted most of these migrations. The majority of migrations were moving away from Travis and towards GHA. RQ4 How has the CI landscape changed since GHA was introduced? Based on a regression discontinuity design, we found that the usage of Travis, Azure and CircleCI has been negatively affected by the introduction of GHA. This article is structured as follows. Section II motivates the selected dataset and discusses the data extraction and cleaning steps that were carried out. Sections III to VI provide answers to each research question. Section VII discusses the ramifi- cations of these answers. Section VIII presents the threats to validity of the conducted research. Section IX presents the related work. Finally, Section X concludes. II. DATA EXTRACTION In order to analyse the use of CIs in software development repositories on GitHub, we need a large dataset containing On the Use of GitHub Actions in Software Development Repositories Alexandre Decan Software Engineering Lab University of Mons Mons, Belgium alexandre.decan@umons.ac.be Tom Mens Software Engineering Lab University of Mons Mons, Belgium tom.mens@umons.ac.be Pooya Rostami Mazrae Software Engineering Lab University of Mons Mons, Belgium pooya.rostamimazrae@umons.ac.be Mehdi Golzadeh Software Engineering Lab University of Mons Mons, Belgium mehdi.golzadeh@umons.ac.be Abstract—GitHub Actions was introduced in 2019 and con- stitutes an integrated alternative to CI/CD services for GitHub repositories. The deep integration with GitHub allows reposi- tories to easily automate software development workflows. This paper empirically studies the use of GitHub Actions on a dataset comprising 68K repositories on GitHub, of which 43.9% are using GitHub Actions workflows. We analyse which workflows are automated and identify the most frequent automation practices. We show that reuse of actions is a common practice, even if this reuse is concentrated in a limited number of actions. We study which actions are most frequently used and how workflows refer to them. Furthermore, we discuss the related security and versioning aspects. As such, we provide an overview of the use of GitHub Actions, constituting a necessary first step towards a better understanding of this emerging ecosystem and its implications on collaborative software development in the GitHub social coding platform. Index Terms—GitHub Actions, continuous integration, collab- orative software development, workflow automation I. INTRODUCTION Open source software (OSS) development is a continuous, highly distributed and collaborative endeavour [1]. Develop- ment of OSS projects faces many socio-technical challenges [2]–[4]. The multitude of tools (e.g., version control systems, software distribution managers, bug and issue trackers) and development-related activities makes it very challenging for contributor communities to keep up with the rapid pace of producing and maintaining high-quality software releases. Automated workflows were introduced to automate numer- ous repetitive social or technical activities that are inherently part of the collaborative software development process. Con- tinuous integration, deployment and delivery (CI/CD) have become the cornerstone of collaborative software develop- ment and DevOps practices. Well-known examples of CI/CD services are Travis, Jenkins, CircleCI and TeamCity. They automate the integration of code changes from multiple con- tributors into a central repository where automated builds, tests and code quality checks run. GitHub is by far the largest social coding platform, hosting the development history of millions of collaborative software repositories, and accommodating over 73 million users in 2021 [5]. GitHub publicly announced the beta version of GitHub Actions (abbreviated to GHA in the remainder of this paper) in October 2018 based on popular demand, and in response to GitLab’s integrated CI/CD support [6]. In August 2019, GitHub officially began supporting CI through GHA, and the product was released publicly in November 2019. GHA [7] allows the automation of a wide range of tasks based on a variety of triggers such as commits, issues, pull requests, comments, schedules, and many more. Its deep integration into GitHub implies that GHA can be used not only for executing test suites or deploying new releases as in traditional CI/CD services, but also to facilitate code reviews, communication, dependency and security monitoring and management, etc. GHA also promotes the use and sharing of reusable components, called actions, in workflows. These actions are distributed in public repositories and on the GitHub Marketplace. They allow workflow developers to easily in- tegrate specific tasks (e.g., set up a specific programming language environment, publish a release on a package registry, run tests and check code quality) without having to write the corresponding code. Since its public release in November 2019, GHA has become the most dominant CI/CD service, only 18 months after its introduction [8]. Its Marketplace of reusable actions has been growing exponentially ever since, reaching 12K reusable actions in February 2022. It is therefore fair to say that GHA has become a software ecosystem of its own, comparable to ecosystems of reusable software libraries (such as npm, RubyGems, CRAN, Maven, and PyPI) that have been empirically studied by many researchers in recent years (e.g., [9]–[14]). The emerging GHA ecosystem is worthy of being empiri- cally studied in its own right since it is likely to suffer from the same issues related to dependency management, security vulnerabilities, outdated or obsolete components, backward compatibility, and so on. This article therefore quantitatively studies the use of GHA in 68K repositories on GitHub. We analyse which workflows are automated and identify the most frequent automation practices. We show that reuse of actions is a common practice and identify which actions are reused and how. As such, we provide an overview of the use of GHA, a necessary first step towards a better understanding of the emerging GHA ecosystem and its implications on software development in GitHub repositories. More concretely, we answer the following research questions: 9 Empirical Software Engineering (2023) 28:52 https://doi.org/10.1007/s10664-022-10285-5 On the usage, co-usage and migration of CI/CD tools: A qualitative analysis Pooya Rostami Mazrae1 · Tom Mens1 · Mehdi Golzadeh1 · Alexandre Decan1 Accepted: 28 December 2022 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023 Abstract Continuous integration, delivery and deployment (CI/CD) is used to support the collabora- tive software development process. CI/CD tools automate a wide range of activities in the development workflow such as testing, linting, updating dependencies, creating and deploy- ing releases, and so on. Previous quantitative studies have revealed important changes in the landscape of CI/CD usage, with the increasing popularity of cloud-based services, and many software projects migrating to other CI/CD tools. In order to understand the reasons behind these changes in CI/CD usage, this paper presents a qualitative study based on in-depth interviews with 22 experienced software practitioners reporting on their usage, co-usage and migration of 31 different CI/CD tools. Following an inductive and deductive coding process, we analyse the interviews and found a high amount of competition between CI/CD tools. We observe multiple reasons for co-using different CI/CD tools within the same project, and we identify the main reasons and detractors for migrating to different alternatives. Among all reported migrations, we observe a clear trend of migrations away from Travis and migrations towards GitHub Actions and we identify the main reasons behind them. Keywords CI/CD · Collaborative software development · Workflow automation · Qualitative analysis · Empirical software engineering Communicated by: Alexander Serebrenik Alexandre Decan (F.R.S.-FNRS Research Associate) ! Pooya Rostami Mazrae pooya.rostami.m@gmail.com; pooya.rostamimazrae@umons.ac.be Tom Mens tom.mens@umons.ac.be Mehdi Golzadeh golzadeh.mehdi@gmail.com Alexandre Decan alexandre.decan@umons.ac.be 1 Software Engineering Lab, Université de Mons, Mons, Belgium https://doi.org/10.1109/ICSME55016.2022.00029 https://doi.org/10.1109/SANER53432.2022.00084
  10. On the rise and fall of CI services in GitHub Mehdi Golzadeh Software Engineering Lab University of Mons Mons, Belgium mehdi.golzadeh@umons.ac.be Alexandre Decan Software Engineering Lab University of Mons Mons, Belgium alexandre.decan@umons.ac.be Tom Mens Software Engineering Lab University of Mons Mons, Belgium tom.mens@umons.ac.be Abstract—Continuous integration (CI) services are used in collaborative open source projects to automate parts of the development workflow. Such services have been in widespread use for over a decade, with new CIs being introduced over the years, sometimes overtaking other CIs in popularity. We conducted a longitudinal empirical study over a period of nine years, aiming to better understand this rapidly evolving CI landscape. By analysing the development history of 91,810 GitHub repositories of active npm packages having used at least one CI service, we quantitatively studied the evolution of seven popular CIs, specifically focusing on their co-usage and migration in the considered repositories. We provide statistical evidence of the rise of GitHub Actions, that has become the dominant CI service in less than 18 months time. This coincides with the fall of Travis that has seen an important decrease in usage, likely due to a combination of policy changes and migrations to GitHub Actions. Index Terms—Continuous integration, distributed software development, software repositories, GitHub I. INTRODUCTION Continuous integration (CI), deployment and delivery have become the cornerstone of collaborative software development and DevOps practices. CI automates the integration of code changes from multiple contributors into a central repository where automated builds, tests and code quality checks run. Well-known examples of CI services are Jenkins, Travis, CircleCI and AppVeyor. CI services can also be built-in in social coding platforms such as GitHub and GitLab [1]. GitLab already featured CI capabilities since November 2012. Based on popular demand, and in response to CI support integrated in GitLab, GitHub publicly announced the beta version of GitHub Actions (abbreviated to GHA in the remainder of this article) in October 2018. In August 2019, they officially began supporting Continuous Integration through GHA, and the product was released publicly in November 2019. GHA [2] allows to automate a wide range of tasks based on a variety of triggers such as commits, issues, pull requests, comments and many more. GHA can be used to facilitate code reviews, code quality analysis, communication, dependency and security monitoring and management, testing, etc. GHA facilitates the integration with external services, and can even obviate the need of using such external services altogether. GitHub is by far the largest social coding platform, hosting the development history of millions of collaborative software repositories, and accommodating over 56 million users in September 2020 [3]. Given its popularity and the ease with which GHA allows to automate the CI workflow, we hypoth- esise that GHA has had a significant impact on today’s CI landscape. More particularly, we believe that it has increased the awareness of the need for CI, it has reduced the entry barrier for projects to start using CI, and it may have lead projects to migrate from other CI services towards GHA. This article aims to quantitatively and objectively verify these hypotheses, and discusses their consequences, through a longitudinal analysis of how different CIs have been used over a nine-year period in 91,810 GitHub repositories correspond- ing to the software development history of reusable Node.JS packages distributed through the npm package registry. This empirical study focuses on four research questions: RQ1 How did the CI landscape evolve? We identified 20 different CIs being used in the considered set of repositories, some of which were considerably more prevalent than others. Together with Travis, GHA covers more than 80% of all usages. Moreover, in only 18 months GHA has overtaken all other CIs in popularity. RQ2 What are the most frequent combinations of CIs? We observed that many repositories have used multiple CIs during their lifetime. AppVeyor is nearly always used in combination with some other CI. If a repository uses a CI simultaneously with another one, it is mostly in combination with Travis, GHA or CircleCI. RQ3 How frequently are CIs being replaced by an alternative? We observed a non-negligible amount of CI migrations. GHA attracted most of these migrations. The majority of migrations were moving away from Travis and towards GHA. RQ4 How has the CI landscape changed since GHA was introduced? Based on a regression discontinuity design, we found that the usage of Travis, Azure and CircleCI has been negatively affected by the introduction of GHA. This article is structured as follows. Section II motivates the selected dataset and discusses the data extraction and cleaning steps that were carried out. Sections III to VI provide answers to each research question. Section VII discusses the ramifi- cations of these answers. Section VIII presents the threats to validity of the conducted research. Section IX presents the related work. Finally, Section X concludes. II. DATA EXTRACTION In order to analyse the use of CIs in software development repositories on GitHub, we need a large dataset containing 10 https://doi.org/10.1109/SANER53432.2022.00084
  11. Dataset 11 1.6M+ Scoped packages 803K packages on GitHub Excluded 11,557 forks Excluded inactive repositories 201,403 Repositories Presence of CI configuration files 119,033 CI usages in 91,810 Repositories May 2021 Cloned 676K
  12. How prevalent is CI usage in GitHub repositories? CI services are used in more than half of all considered repositories.
  13. Evolution of GitHub CI/CD landscape 13 Since 2021, GitHub Actions has become the dominant CI/CD tool in GitHub
  14. Most frequent co-usage of CIs 14
  15. Analysing CI churn in the last 3 years
  16. Migrations between CIs
  17. Migrations toward GitHub Actions
  18. Migrations away from Travis
  19. What happened to Travis? Travis changed its free plan GHA was introduced
  20. 20 Empirical Software Engineering (2023) 28:52 https://doi.org/10.1007/s10664-022-10285-5 On the usage, co-usage and migration of CI/CD tools: A qualitative analysis Pooya Rostami Mazrae1 · Tom Mens1 · Mehdi Golzadeh1 · Alexandre Decan1 Accepted: 28 December 2022 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023 Abstract Continuous integration, delivery and deployment (CI/CD) is used to support the collabora- tive software development process. CI/CD tools automate a wide range of activities in the development workflow such as testing, linting, updating dependencies, creating and deploy- ing releases, and so on. Previous quantitative studies have revealed important changes in the landscape of CI/CD usage, with the increasing popularity of cloud-based services, and many software projects migrating to other CI/CD tools. In order to understand the reasons behind these changes in CI/CD usage, this paper presents a qualitative study based on in-depth interviews with 22 experienced software practitioners reporting on their usage, co-usage and migration of 31 different CI/CD tools. Following an inductive and deductive coding process, we analyse the interviews and found a high amount of competition between CI/CD tools. We observe multiple reasons for co-using different CI/CD tools within the same project, and we identify the main reasons and detractors for migrating to different alternatives. Among all reported migrations, we observe a clear trend of migrations away from Travis and migrations towards GitHub Actions and we identify the main reasons behind them. Keywords CI/CD · Collaborative software development · Workflow automation · Qualitative analysis · Empirical software engineering Communicated by: Alexander Serebrenik Alexandre Decan (F.R.S.-FNRS Research Associate) ! Pooya Rostami Mazrae pooya.rostami.m@gmail.com; pooya.rostamimazrae@umons.ac.be Tom Mens tom.mens@umons.ac.be Mehdi Golzadeh golzadeh.mehdi@gmail.com Alexandre Decan alexandre.decan@umons.ac.be 1 Software Engineering Lab, Université de Mons, Mons, Belgium
  21. Methodology 21 • Around 30 questions related to CI usage, co-usage and migration Interview questionnaire • Selected candidates through Twitter, LinkedIn, email, direct messages • Colleagues' referrals (snowballing) Selection of respondents • Using online video conferencing tool Geographic diversity • Actively contributed to, or having been responsible for a software project relying on CI • Sufficient knowledge about which CI tool is used in that software project and how • Having been involved in setting up or maintaining the CI process of the project Inclusion Criteria
  22. Demographics of respondents • 22 respondents • 16 from 7 European countries • 4 from North America • 2 from Asia • software development experience • average of 12 years and 4 months • Good mix of industrial and open source contributors 22
  23. CI/CD tools being used • 14 additional tools reported only once • 3 custom-built in-house CI/CD solutions 23
  24. The good ... 25
  25. 26 The bad ...
  26. The ugly 27
  27. CI/CD migrations 30
  28. Reasons for CI migration 31
  29. Why is GitHub Actions so popular? • deep integration with GitHub • ease of setup and use • trendy • speed • reliability • free tier for open source projects • large marketplace of reusable Actions • support for major operating systems • company support (Microsoft) • automation beyond CI/CD 33
  30. Difficulties in CI migration • Learning curve • Fundamental differences between the source and target of the migration • Trial-and-error nature of configuring a new CI tool • Lack of familiarity with the new CI tool • Important missing features 34
  31. On the Use of GitHub Actions in Software Development Repositories Alexandre Decan Software Engineering Lab University of Mons Mons, Belgium alexandre.decan@umons.ac.be Tom Mens Software Engineering Lab University of Mons Mons, Belgium tom.mens@umons.ac.be Pooya Rostami Mazrae Software Engineering Lab University of Mons Mons, Belgium pooya.rostamimazrae@umons.ac.be Mehdi Golzadeh Software Engineering Lab University of Mons Mons, Belgium mehdi.golzadeh@umons.ac.be Abstract—GitHub Actions was introduced in 2019 and con- stitutes an integrated alternative to CI/CD services for GitHub repositories. The deep integration with GitHub allows reposi- tories to easily automate software development workflows. This paper empirically studies the use of GitHub Actions on a dataset comprising 68K repositories on GitHub, of which 43.9% are using GitHub Actions workflows. We analyse which workflows are automated and identify the most frequent automation practices. We show that reuse of actions is a common practice, even if this reuse is concentrated in a limited number of actions. We study which actions are most frequently used and how workflows refer to them. Furthermore, we discuss the related security and versioning aspects. As such, we provide an overview of the use of GitHub Actions, constituting a necessary first step towards a better understanding of this emerging ecosystem and its implications on collaborative software development in the GitHub social coding platform. Index Terms—GitHub Actions, continuous integration, collab- orative software development, workflow automation I. INTRODUCTION Open source software (OSS) development is a continuous, highly distributed and collaborative endeavour [1]. Develop- ment of OSS projects faces many socio-technical challenges [2]–[4]. The multitude of tools (e.g., version control systems, software distribution managers, bug and issue trackers) and development-related activities makes it very challenging for contributor communities to keep up with the rapid pace of producing and maintaining high-quality software releases. Automated workflows were introduced to automate numer- ous repetitive social or technical activities that are inherently part of the collaborative software development process. Con- tinuous integration, deployment and delivery (CI/CD) have become the cornerstone of collaborative software develop- ment and DevOps practices. Well-known examples of CI/CD services are Travis, Jenkins, CircleCI and TeamCity. They automate the integration of code changes from multiple con- tributors into a central repository where automated builds, tests and code quality checks run. GitHub is by far the largest social coding platform, hosting the development history of millions of collaborative software repositories, and accommodating over 73 million users in 2021 [5]. GitHub publicly announced the beta version of GitHub Actions (abbreviated to GHA in the remainder of this paper) in October 2018 based on popular demand, and in response to GitLab’s integrated CI/CD support [6]. In August 2019, GitHub officially began supporting CI through GHA, and the product was released publicly in November 2019. GHA [7] allows the automation of a wide range of tasks based on a variety of triggers such as commits, issues, pull requests, comments, schedules, and many more. Its deep integration into GitHub implies that GHA can be used not only for executing test suites or deploying new releases as in traditional CI/CD services, but also to facilitate code reviews, communication, dependency and security monitoring and management, etc. GHA also promotes the use and sharing of reusable components, called actions, in workflows. These actions are distributed in public repositories and on the GitHub Marketplace. They allow workflow developers to easily in- tegrate specific tasks (e.g., set up a specific programming language environment, publish a release on a package registry, run tests and check code quality) without having to write the corresponding code. Since its public release in November 2019, GHA has become the most dominant CI/CD service, only 18 months after its introduction [8]. Its Marketplace of reusable actions has been growing exponentially ever since, reaching 12K reusable actions in February 2022. It is therefore fair to say that GHA has become a software ecosystem of its own, comparable to ecosystems of reusable software libraries (such as npm, RubyGems, CRAN, Maven, and PyPI) that have been empirically studied by many researchers in recent years (e.g., [9]–[14]). The emerging GHA ecosystem is worthy of being empiri- cally studied in its own right since it is likely to suffer from the same issues related to dependency management, security vulnerabilities, outdated or obsolete components, backward compatibility, and so on. This article therefore quantitatively studies the use of GHA in 68K repositories on GitHub. We analyse which workflows are automated and identify the most frequent automation practices. We show that reuse of actions is a common practice and identify which actions are reused and how. As such, we provide an overview of the use of GHA, a necessary first step towards a better understanding of the emerging GHA ecosystem and its implications on software development in GitHub repositories. More concretely, we answer the following research questions: 35 https://doi.org/10.1109/ICSME55016.2022.00029
  32. Research Questions 36 What are the characteristics of repositories using workflows? Which kinds of workflows are automated? What are the most frequent jobs in workflows? What are the automation practices? Which types of Actions are reused?
  33. Dataset • 67,870 repositories • 4 out of 10 repositories use GitHub Actions workflows • 70,278 workflow files • 108,500 jobs • 567,352 steps 37
  34. Quantification of jobs and workflows Workflows in repositories single workflow (49.3%) more than one workflow (50.7%) Jobs in workflows single job (77.8%) more than one job (22.2%) 38
  35. Characteristics of GitHub repositories using GitHub Actions Median Effect size Characteristic With workflows Without workflows Interpretation Pull Requests 124 41 medium Contributors 20 11 small Commits 598 344 small Issues 105 59 small 40 Repos with GHA workflows tend to have more contributors, pull requests, commits, and issues
  36. Most frequent event types triggering workflows 63,4 56,3 16,1 15,4 6,2 8,6 0 10 20 30 40 50 60 70 push PR schedule workflow_dispatch release others 41
  37. DifferDifferent ways of executing codecode Step type Action target % of steps % of repositories run: -- 49,9% 93,5% uses: Local path 0,8% 2,0% Docker image 0,1% 1,8% Same repository 0,2% 0,4% Same owner 0,7% 4,3% Other public repository 48,3% 99,3% 42 Reusing Actions in steps is a common practice
  38. Which Actions are reused? 35,50% 7,20% 6,60% 5,90% 5,80% 98% 22% 26% 19% 21% 0,00% 10,00% 20,00% 30,00% 40,00% 50,00% 60,00% 70,00% 80,00% 90,00% 100,00% actions/checkout actions/cache actions/setup-node actions/upload-artifact actions/setup-python Top 5 most frequent Actions in steps and repositories steps repositories 44 • A few Actions concentrate most of the reuse • Most of them being developed by GitHub
  39. 45 On the Outdatedness of Workflows in the GitHub Actions Ecosystem Alexandre Decan1 , Hassan Onsori Delicheh, Tom Mens aSoftware Engineering Lab, University of Mons, Mons, Belgium Abstract GitHub Actions was introduced as a way to automate CI/CD workflows in GitHub, the largest social coding platform. Thanks to its deep integration into GitHub, GitHub Actions can be used to automate a wide range of social and technical activities. Among its main features, it allows automation workflows to rely on reusable components – the so-called Actions – to enable developers to focus on the tasks that should be automated rather than on how to automate them. As any other kind of reusable software components, Actions are contin- uously updated, causing many automation workflows to use outdated versions of these Actions. Based on a dataset of nearly one million workflows obtained from 22K+ repositories between November 2019 and September 2022, we pro- vide quantitative empirical evidence that reusing Actions in GitHub workflows is common practice, even if this reuse tends to concentrate on a limited number of Actions. We show that Actions are frequently updated, and we quantify to which extent automation workflows are outdated with respect to these Actions. Using two complementary metrics, technical lag and opportunity lag, we found that most of the workflows are using an outdated Action release, are lagging behind the latest available release for at least 7 months, and had the oppor- tunity to be updated during at least 9 months. This calls for a more rigorous management of Action outdatedness in automation workflows, as well as for better policies and tooling to keep workflows up-to-date. Keywords: software ecosystem, dependency management, continuous integration, collaborative software development, workflow automation, technical lag Email addresses: alexandre.decan@umons.ac.be (Alexandre Decan), hassan.onsoridelicheh@umons.ac.be (Hassan Onsori Delicheh), tom.mens@umons.ac.be (Tom Mens) 1F.R.S.-FNRS Research Associate Preprint submitted to Journal of Systems & Software March 21, 2023
  40. Outdatedness in the GitHub Actions ecosystem 46 • Four out of five workflows and nearly two thirds of the steps are using an outdated release of an Action. • Steps using Actions provided by GitHub are responsible for most of the outdatedness. • More than one third of the other steps and nearly half of the other workflows are using an outdated release of an Action. release of actions/checkout@v2 release of actions/checkout@v3 release of actions/setup-*@v2 release of actions/setup-*@v3
  41. v1 v2 v3 v4 latest technical lag observation date GitHub workflow selected Action lifeline Outdatedness in the GitHub Actions ecosystem Technical lag of workflows / steps: the time period between the start of reusing a selected Action and the latest release of that Action.
  42. Outdatedness in the GitHub Actions ecosystem Technical lag of workflows / steps: the time period between the start of reusing a selected Action and the latest release of that Action. • Technical lag of outdated steps tends to increase over time. • Half of the outdated steps using other Actions are using a version that is lagging behind the latest one for at least 7.3 months. • Main cause of technical lag = Actions provided by GitHub
  43. Outdatedness in the GitHub Actions ecosystem Opportunity lag of workflows / steps: the time period during which a workflow could have updated an outdated step to a more recent version of an Action, but didn’t. v1 v2 v3 v4 opportunity lag observation time GitHub workflow first update opportunity Action lifeline selected
  44. Outdatedness in the GitHub Actions ecosystem Opportunity lag of workflows / steps: the time period during which a workflow could have updated an outdated step to a more recent version of an Action, but didn’t. • The opportunity lag of outdated steps tends to increase over time. • On average, maintainers of outdated steps have had the opportunity to update them for 9 months, but have not done so. • Main cause of opportunity lag = Actions provided by GitHub new releases for docker/*
  45. Thank you for your attention. Any questions? 55
Publicité