SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
Google’s Innovation Factory:
                         Testing, Culture, And Infrastructure
                                               Patrick Copeland, Google
                                                    copeland@google.com

ABSTRACT                                                          1.2 EMERGING LEFTISM
                                                                  One thread common to formal models are that they focus
Google’s external mythology has been one of a brilliant           on a few of the many variables: improving efficiency,
and chaotic innovation machine that produces new                  predictable process, estimation of quality, or others. As
products and features at an amazing rate. Behind the              most practitioners know, a development process is a
curtain of public perception is a company that takes quality      polynomial wrapped inside of a culture, and solving for a
seriously and is reinventing how software is created, tested,     few variables only achieves a momentary local maxima.
released, and maintained; a reality that’s even more
                                                                  While process-heavy development models may work well
interesting than the myth.
                                                                  for manufacturing airplanes and have been successfully
At Google we’ve learned a lot in the last few years about         applied by some companies [4] , they have been viewed by
accelerating very large scale software development; in this       many developers as burdensome and contrary to the
paper we'll share what has worked and what hasn't worked          creative nature of writing innovative software. Conversely,
for us.                                                           “process-less process”, can lead to a heroic culture that’s
                                                                  unable to repeatedly deliver. There needs to be balance.
1. DYNAMIC EQUILIBRIA
                                                                  Consider the physics of flight as an analogy to software
Since humans began writing software in the middle of the          process. In addition to reasonable flying conditions and an
last century, the process has been cumbersome, error prone        experienced pilot, the key to getting airborne is having an
and has more often than not created an end product that is        appropriate balance of factors
low in quality. Most companies are better at talking about        that match the situation: too
software quality than implementing it. [1]                        much weight or too little
This is clearly not a new problem. In 1962 the “most              thrust can be disastrous
expensive hyphen in history” forced the destruction of the        depending      on       the
Mariner I rocket only 293 seconds after it was launched.          situation.     Similarly,
Instead of its intended flyby of Venus, the rocket ended up       teams, products and
in the Atlantic Ocean. [2]                                        process all have virtual
                                                                  physics. For instance, adding
Such events have been a mainstay of computing history
                                                                  engineers late in a product cycle doesn’t necessarily
ever since. In fact, Googling the search term “software
                                                                  provide more lift [5] . Adopting a new process may give a
bug” turns up over 80 million hits. Buggy software is part
                                                                  team more thrust momentarily, but may also incur a longer
of the industry’s fabric.
                                                                  term drag that makes them incapable of innovation.
1.1 TORRENTIAL PROCESS                                            The popularity of Agile, while not a wholesale rejection of
There have been numerous attempts over the prior decades          more rigid processes, indicates that developers desire more
to build more reliable software and these have come under         balance and creativity. Whatever we do to make software
many guises. Total Quality, Zero Defect, Six Sigma and            higher quality and more predictable to build, we must
Cleanroom have all borrowed ideas that were successful in         maintain a balance that encourages the innovative aspects
manufacturing, specifically prescribing more methodical           of the art form. We need to motivate smart minds to solve
and process-driven approaches to software development.            hard problems and deliver rich features to customers. In
Yet here we are in 2010 still talking about software              other words, we need to focus on staying airborne for the
quality! It’s hard to come to any other conclusion than that      long term.
the lessons learned from manufacturing don’t translate well
                                                                  1.3 PARADIGM SHIFTS
to software.
                                                                  A lot of software is now released as services and deployed
Quality is still very hard to evaluate in software and we
                                                                  to data centers controlled by the software producers rather
end up with estimations that focus on quantifying the
                                                                  than being installed on customer-owned servers/clients of
measurable and rely on subjectivity for the rest. As Niklaus
                                                                  infinite configurations scattered around the globe. Software
Wirth recently said,
                                                                  can be released to early adopters and beta users, bug fixes
     "The experience, judgment, and intuition of                  can be deployed to all users simultaneously or to a small
     programmers who have survived the rigors of testing          percentage, maintenance and updates are handled centrally
     are what make programs of the present day useful,            by experts and not by end users. With more control of the
     efficient, and correct." [3]                                 end product, development teams can experiment and take
                                                                  more risks providing innovation faster and with less fear.
                                                                  When problems appear, they can be identified and fast-
                                                                  fixed before impacting large groups of users.
ICST 2010                                                                               Copeland, Google’s Innovation Factory - 2

But the cloud paradigm is only part of the equation. We          2.2 FLAT & AUTONOMOUS
also need to think differently about using these capabilities    The organizational structure we use is atypical in the
for software development itself. Can we align our culture,       industry. For one, Google is a flat organization with many
tools and processes to take full advantage of this new           Nooglers being no more than 2-3 steps below senior
model? Can we use automation to solve, once and for all,         executives. The company structure can be characterized as:
the repetitive, mundane and downright boring aspects of          flat and autonomous.
building products? Can we integrate development and
                                                                 At Google, managers are not controllers, they are
testing so tightly that writing good code is easier than
                                                                 connectors charged with ensuring that teams make
writing bad code? Can we encourage big thinking that
                                                                 effective use of information and tools. Many managers
leads to new ideas? Can we do all of this at the scale and
                                                                 have 15 or more direct reports, introducing some chaos and
hypercompetitive pace of the Internet?
                                                                 reducing the time available to micromanage. Managers are
We’ve been tackling this problem for several years at            judged on their ability to enable smart people to get things
Google and this paper is a report of our progress. Our           done.
approach has been to automate those development tasks
                                                                 Teams are aligned along business lines we call “focus
that shouldn’t require a human in-the-loop, to focus on
                                                                 areas” rather than around strict product lines. People doing
building a culture around quality, to promote multiple
                                                                 similar work, no matter what products they are
approaches to innovation, reduce bureaucratic creep, and to
                                                                 contributing to, will find themselves in close reporting
invest in reusable infrastructure.
                                                                 proximity to their colleagues. This matrix encourages some
2. INNOVATION FACTORY                                            amount of competition, but also the reuse of good ideas.
                                                                 Projects live and die based on free-market Darwinism,
We encourage our engineers to focus on innovation. Eric          where successful projects are further funded and less
Schmidt, has said, “We take our jobs to be innovators and        successful ones face atrophy. We take many short and long
we are failing if we are not innovating quickly enough. [6] ”    term bets, but projects must produce value to survive.
Many of our best ideas were envisioned by engineers who
                                                                 2.3 AVOIDING PLAUSIBLE DENIABILITY
were passionate about solving a problem. Popular
products, like Gmail, were initially developed by a few          The entire product team is responsible for quality, and is
passionate engineers outside of their normal work.               judged on their ability to enable innovation, anticipate
                                                                 problems, make plans, and implement high quality
Linus Pauling is commonly quoted as saying, “The best
                                                                 software. Teams adopt processes that are in their own self
way to have a good idea is to have lots of ideas.” Google
                                                                 interest and that allow them to focus on innovation.
has made its mark on the industry with new approaches to
old problems. For example, our systems are built on              The role of someone doing testing in this environment is
“flaky” commodity hardware and an infrastructure that            structured slightly differently than other technology
dynamically compensates for that flakiness. Initially this       companies. Testers avoid becoming codependents within
was a subversive idea, as other companies at the time were       this system and generally do not write unit tests or other
building servers that attempted to eliminate all failures        activities that are best done by the developer. Testing
(like the foolproof HAL9000 from 2001). We expect                teams focus on higher abstractions, like identifying
everything to fail and use redundancy and automated              latencies, system or customer focused testing, and enabling
compensation techniques to maintain overall reliability.         the process.
                                                                 Code is expected to have high reliability as it is written and
2.1 BUILDING FOR SCALE
                                                                 we adhere to a socially reinforced code review and check-
Outside the walls of Google, this innovation factory has         in practices. Development teams write good tests because
created desirable products for our users. Inside the walls, it   they care about the products, but also because they want
has created large repositories of code, data, dependencies       more time to spend writing features and less on debugging.
and information that must be managed closely. Consider           Teams with good testing hygiene upstream have more time
the logistics of delivering at Google’s current pace:            to innovate, and are thus more adaptable and competitive.
•    More than 6,000 engineers and >40 offices.                  In addition, there is one source tree and poorly written
•    2,500 ongoing projects (2.5 developers / project).          code is quickly identified because it breaks other people’s
                                                                 tests and projects. Aggressive rolling back is employed to
•    1,600 active external release branches for products.
                                                                 keep the tree building “green.”
•    59,000 builds / day each with 10-1000 targets..
                                                                 Unlike traditional testing approaches, teams do not focus
•    1.5 million tests / day, both manual and automated.         on the tail end of the process or pad the schedule for
•    Most products localized into 40 languages.                  special testing phases. Instead, they look for ways to
•    At least bi-weekly release cycles.                          anticipate issues and solve them proactively in real time.
                                                                 Within each project are experts in the field of software
                                                                 quality and they ensure that the right tools, test cases and
                                                                 test procedures are in place throughout the product
                                                                 lifecycle. When bugs do slip through, or more commonly
ICST 2010                                                                              Copeland, Google’s Innovation Factory - 3

unanticipated complex behavior situations occur, we              unit level tests and using practices like mocking and
aggressively do postmortems and quickly put in place             distributed execution.
solutions that prevent them from reoccurring.                    In subsequent levels, code coverage goals are explicitly
2.4 VIRAL ADOPTION                                               defined, rules about releasing on “non-green” builds are
                                                                 imposed, and a broader array of testing is expected, such as
At an individual project level, uniformity is rarely
                                                                 integration, system level, and various other techniques.
mandated and adoption of tools and process is left to an
internal “market” to decide. Apart from our core systems,        This process is defined not to dictate to developers what to
discussed later, a large portion of our tools are developed      do but to identify goals that will help them develop better
by motivated individuals to solve local challenges.              software faster and spend less time in later phases fixing
Similarly, process is tailored specifically to projects. While   bugs. Advancement won’t guarantee higher quality
this leads to a healthy amount of chaos, good ideas tend to      software but it does pattern a roadmap that makes good
spread quickly, because they have been proven useful by          quality more probable.
others. Engineers decide what's best for engineering, to         2.6 ELEMENTS OF CONTROL
articulate the right vision, and to drive initiatives in the
most sustainable fashion, and then others follow after           As a counterbalance to the randomness incurred by our
grassroots success. We’ve found that positive experience is      relatively freeform process are a set of release standards
an effective means of persuasion.                                and guidelines. These “launch reviews and criteria” are
                                                                 outlined to ensure that products answer common sense
An example of viral adoption is a “fix it”, or an event
                                                                 questions before release. A few examples are:
organized by engineers, that encourages Googlers to work
on the same problem at the same time. The idea is to get a       •    Is the design secure and customer data private?
large amount of work done in a short amount of time by           •    Will the service scale with the anticipated load?
leveraging the power of masses. In the past, these have          •    Does the UI meet standards?
been focused on fixing 1000 TODOs in the code-base or
                                                                 •    What are the data center utilization estimates?
fixing tests to take advantage of new infrastructure
improvements.                                                    •    What are the latency estimates?
Testing on the Toilet is another example of a viral              The point is that the release process is not friction free.
adoption. It started as an offhand joke and it became a          There are many high standards that must be met and it can
world wide sensation, making headlines in the Wall Street        be frustrating for teams that procrastinate. Pain can be
Journal. The idea was to communicate ideas about testing         avoided by driving change up stream as early as possible.
and to do it in a place where we know people would have          Given the forewarning, teams can meet the standards in a
the time to read it. It’s is now published in hundreds of        way appropriate to their constraints.
stalls in most Google offices, taking submissions from
different programming languages and application domains,         3. FASTER DEVELOPER WORKFLOW
and appears on Google's public testing blog [7] . While the      The build/test system is at the core of day-to-day activity
articles themselves need to be short enough for people to        for software engineers at Google. Almost everything
read while they “do their business”, the ideas create a buzz     deployed in production is developed, tested, and built using
about specific topics and that would otherwise be difficult      this system. Thus, the performance and usability of the
to achieve.                                                      build tools has a large impact on engineer productivity,
2.5 CMM WITH A TWIST                                             where even small changes are multiplied by the total
                                                                 number of tool interactions.
One popular grass roots initiation that resembles a more
traditional process methodology is called the Test Certified     As traditional companies scale, sub-organizations begin to
Program. Test Certified is a series of increasingly              maintain separate code silos, build tribal release and
advanced levels, each defined by a list of measurable            integration procedures, and duplicate effort. But, more
testing goals and capabilities. These goals are set by testers   disturbing, they end up sinking a large amount of time into
and present quality practices, advanced techniques and           maintenance issues. Time that should be spent adding
quality-oriented goals for a development team to strive to       value is instead used to atone for past sins.
achieve. As a development team achieves more goals,              Engineering teams should be able to concentrate a
using whatever techniques that suit their team culture and       maximum of their time on quality and innovation. At
problem domain, they move up through the Test Certified          Google that time is achieved, at least in part, by making the
ladder levels from TC1 to TC5.                                   hard and the mundane simple and automatic. As a case-in-
At the initial stages of the process, teams are asked to clean   point, consider our build and deployment infrastructure.
up and do several remedial actives, all of which are             3.1 DESIGN CONSIDERATIONS
designed to get them seeing the benefits of testing
immediately. Establishing a continuous build that runs a         Prior to 2006, Google employed a fairly slow build and test
set of fast deterministic tests is the most important aspect     process that was designed for a much smaller company.
of the first phase. Speed is achieved by focusing on small       Back then, builds might be broken for days or weeks, the
ICST 2010                                                                                  Copeland, Google’s Innovation Factory - 4

“unpaid mortgage” of new code would build up, and then          tools. We did this improving the highest traffic workflows
would be followed by lengthy debugging and stabilization        with caching, distributed execution, and avoiding
phases. We needed an approach that provided developers          bottlenecks.
nearly instant feedback on every code check-in.                 CHART: TIME WAITING ON TOOLS IN HOURS/MONTH/DEVELOPER.
We designed the system with the following principles:
•    Speed: All test and analysis systems need to return
     results very fast. If it takes too long, engineers will
     either ignore or not bother looking for that data.
•    Feedback: The focus of test systems must be on high
     quality feedback. We want engineers to keep code at
     production quality at all times, not adding time to fix
     code that was broken earlier.
•    Simplicity: Engineers should not have to understand
     how the underlying build and test systems work. All        More importantly we were able to change how products
     data and feedback must be easy to understand,              are produced with an emphasis on continual improvement.
     integrated into commonly-used productivity tools, and      The chart above shows the number of hours “saved” per
     presented in a workflow that allows them to take           month per developer on different types of projects. For
     appropriate action.                                        instance “big” are defined as having more than 20k or
                                                                more files.
Within milliseconds of a code check-in, our build process
will automatically select the appropriate tests to run based
                                                                5. CONCLUSION
on dependency analysis, run those tests and report the
results. By reducing the window of opportunity for bad          Just as we are witnessing a paradigm shift to cloud
code to go unnoticed, overall debugging and bug isolation       computing that stretches our imagination and challenges
time is radically reduced. The net result is that the           the limits of software, our process for developing that
engineering teams no longer sink hours into debugging           software is going through an equally dramatic revolution.
build problems and test failures.                               We are reconsidering the appropriateness of the lessons
                                                                we’ve taken from manufacturing. We believe that software
3.2 ESTIMATING IMPACT
                                                                development models require a new set of physics.
We created a more holistic approach to estimate the overall
                                                                Google has experimented with this new physics with
impact of improvements. We know the general workflow
                                                                innovative new tools, processes and infrastructure. While
of engineers, and we can estimate how much time
                                                                there is no magic bullet, there is a pragmatism that can be
engineers spend in each area of the workflow. From this
                                                                applied to software development that seeks to balance the
we can model a “representative engineer” which provides a
                                                                art form of creating software with the needs for
framework for estimating where engineers spend their time
                                                                repeatability, efficiency, and quality. At Google that has
with tools. With this model we can measure the effect of
                                                                meant eliminating the tedious and repetitive tasks with
improvements on each area of the workflow to estimate
                                                                automation and streamlined processes allowing testers to
overall impact.
                                                                engage the full extent of their creativity on innovation and
TABLE: ESTIMATED MONTHLY ACTIVITY PER DEVELOPER
                                                                meeting the challenges of modern software development.
     ACTIVITY   INITIAL   CLEAN    BUILD    BUILD    RUN
                CHECK-    BUILD    AFTER    AFTER   TESTS       6. ACKNOWLEDGEMENTS
                 OUT                EDIT    SYNC
                                                                James Whittaker, Alberto Savoia, Nathan York, Mark
    FREQUENCY     2         4       160      20       60
                                                                Striebeck, and the following teams from Google:
From the workflow we can identify five key activities           Engineering Productivity, and the Test Grouplet.
involving build tools. These are: Initial Checkout, Clean
Build, Build After Edit, Build After Incremental Sync, and      7. REFERENCES
Run Tests. The tricky part is estimating the frequency of       [1]
                                                                    David N. Wilson, Tracy Hall, Perceptions of software quality: a pilot
these actions. This is subjective since the details vary for
                                                                    study, Software Quality Journal 7, (1998) 67–75.
each engineer. Some do a clean checkout and build for           [2]
                                                                    Eric Roberts, Mariner I, The Risk Digest, Volume 5, Issue 66, (1987).
every task. Others never do a full sync/clean build after the   [3]
                                                                    Niklaus Wirth, Opening Talk GTAC 2009, (2009).
                                                                [4]
initial build is created. By collecting the data and                Michael Diaz, Joseph Sligo, How Software Process Improvement
                                                                    Helped Motorola, IEEE, 0740-7459, (1997) 75-81.
identifying the most frequent use cases, we were able focus     [5]
                                                                    Fred Brooks, Mythical Man Month, (1995).
on the largest productivity wins.                               [6]
                                                                    Ravi Mattu, World exclusive interview with Google!,
3.3 RESULTS                                                         ft.com/managementblog, July 8, 2009.
                                                                [7]
                                                                    Introducing "Testing on the Toilet", Google Testing Blog, January 21,
We were able to save the company about 600 person years             2007.
of time that would otherwise have been spent waiting on

Contenu connexe

Tendances

Agile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective ActionsAgile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective ActionsTechWell
 
20081027 Smart Use Cases Hogeschool Arnhem Nijmegen
20081027   Smart Use Cases   Hogeschool Arnhem Nijmegen20081027   Smart Use Cases   Hogeschool Arnhem Nijmegen
20081027 Smart Use Cases Hogeschool Arnhem NijmegenSander Hoogendoorn
 
DevOps for the Discouraged
DevOps for the Discouraged DevOps for the Discouraged
DevOps for the Discouraged James Wickett
 
Pulse 2013: DevOps Review and Roadmap
Pulse 2013: DevOps Review and RoadmapPulse 2013: DevOps Review and Roadmap
Pulse 2013: DevOps Review and RoadmapDaniel Berg
 
Whitepaper: Ten Benefits of Integrated ALM
Whitepaper: Ten Benefits of Integrated ALMWhitepaper: Ten Benefits of Integrated ALM
Whitepaper: Ten Benefits of Integrated ALMKovair
 
What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?Bill Holtshouser
 
Agile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective ActionsAgile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective ActionsTechWell
 
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...SlideTeam
 
Le cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenaires
Le cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenairesLe cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenaires
Le cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenairesClub Alliances
 
Industry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average BusinessIndustry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average BusinessMichael Elder
 
DevOps Perspectives II
DevOps Perspectives IIDevOps Perspectives II
DevOps Perspectives IIPaul Speers
 
Building a DevOps Team that isn't Evil
Building a DevOps Team that isn't EvilBuilding a DevOps Team that isn't Evil
Building a DevOps Team that isn't EvilIBM UrbanCode Products
 
Informatics Platforms for Biologics R&D: 5 Key Capabilities to Look For
Informatics Platforms for Biologics R&D: 5 Key Capabilities to Look ForInformatics Platforms for Biologics R&D: 5 Key Capabilities to Look For
Informatics Platforms for Biologics R&D: 5 Key Capabilities to Look ForRoger Pellegrini
 
DevOps: What does this term mean and why should we care?
DevOps: What does this term mean and why should we care?DevOps: What does this term mean and why should we care?
DevOps: What does this term mean and why should we care?Jean-Christophe HUC (Jay C)
 
Insurecom Case Study
Insurecom Case StudyInsurecom Case Study
Insurecom Case StudyThoughtWorks
 

Tendances (18)

Agile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective ActionsAgile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective Actions
 
20081027 Smart Use Cases Hogeschool Arnhem Nijmegen
20081027   Smart Use Cases   Hogeschool Arnhem Nijmegen20081027   Smart Use Cases   Hogeschool Arnhem Nijmegen
20081027 Smart Use Cases Hogeschool Arnhem Nijmegen
 
DevOps for the Discouraged
DevOps for the Discouraged DevOps for the Discouraged
DevOps for the Discouraged
 
Pulse 2013: DevOps Review and Roadmap
Pulse 2013: DevOps Review and RoadmapPulse 2013: DevOps Review and Roadmap
Pulse 2013: DevOps Review and Roadmap
 
Whitepaper: Ten Benefits of Integrated ALM
Whitepaper: Ten Benefits of Integrated ALMWhitepaper: Ten Benefits of Integrated ALM
Whitepaper: Ten Benefits of Integrated ALM
 
Blue Sages
Blue SagesBlue Sages
Blue Sages
 
What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?
 
Agile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective ActionsAgile Project Failures: Root Causes and Corrective Actions
Agile Project Failures: Root Causes and Corrective Actions
 
Insights success recognition of excellence in devops 2018
Insights success recognition of excellence in devops 2018Insights success recognition of excellence in devops 2018
Insights success recognition of excellence in devops 2018
 
Tcl and zappers background and brief for media v0.01 vs 190712
Tcl and zappers background and brief for media v0.01 vs 190712Tcl and zappers background and brief for media v0.01 vs 190712
Tcl and zappers background and brief for media v0.01 vs 190712
 
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
 
Le cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenaires
Le cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenairesLe cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenaires
Le cloudvupardesexperts 9pov-curationparloicsimon-clubclouddespartenaires
 
Industry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average BusinessIndustry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average Business
 
DevOps Perspectives II
DevOps Perspectives IIDevOps Perspectives II
DevOps Perspectives II
 
Building a DevOps Team that isn't Evil
Building a DevOps Team that isn't EvilBuilding a DevOps Team that isn't Evil
Building a DevOps Team that isn't Evil
 
Informatics Platforms for Biologics R&D: 5 Key Capabilities to Look For
Informatics Platforms for Biologics R&D: 5 Key Capabilities to Look ForInformatics Platforms for Biologics R&D: 5 Key Capabilities to Look For
Informatics Platforms for Biologics R&D: 5 Key Capabilities to Look For
 
DevOps: What does this term mean and why should we care?
DevOps: What does this term mean and why should we care?DevOps: What does this term mean and why should we care?
DevOps: What does this term mean and why should we care?
 
Insurecom Case Study
Insurecom Case StudyInsurecom Case Study
Insurecom Case Study
 

Similaire à Google's Innovation Factory (ICST 2010)

Emerging Trends of Software Engineering
Emerging Trends of Software Engineering Emerging Trends of Software Engineering
Emerging Trends of Software Engineering DR. Ram Kumar Pathak
 
HOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdf
HOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdfHOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdf
HOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdfLaura Miller
 
3 Crucial Application Modernization Strategies for Enterprises.pptx
3 Crucial Application Modernization Strategies for Enterprises.pptx3 Crucial Application Modernization Strategies for Enterprises.pptx
3 Crucial Application Modernization Strategies for Enterprises.pptxArpitGautam20
 
Agile Tour Dublin 2013 - Product Lines and Agile
Agile Tour Dublin 2013 - Product Lines and AgileAgile Tour Dublin 2013 - Product Lines and Agile
Agile Tour Dublin 2013 - Product Lines and AgileParaic Hegarty
 
Pcloudy Unveils a New Platform for a Unified App Testing Experience.pdf
Pcloudy Unveils a New Platform for a Unified App Testing Experience.pdfPcloudy Unveils a New Platform for a Unified App Testing Experience.pdf
Pcloudy Unveils a New Platform for a Unified App Testing Experience.pdfpcloudy2
 
Product Vs Craft
Product Vs CraftProduct Vs Craft
Product Vs CraftMagenTys
 
CTLR 2010 Issue 7 Waterfall Contract
CTLR 2010 Issue 7 Waterfall ContractCTLR 2010 Issue 7 Waterfall Contract
CTLR 2010 Issue 7 Waterfall Contractsusanatkinson
 
Ibm smarter quality_management
Ibm smarter quality_managementIbm smarter quality_management
Ibm smarter quality_managementCristiano Caetano
 
The Software Manager"s Guide to Practical Innovation
The Software Manager"s Guide to Practical InnovationThe Software Manager"s Guide to Practical Innovation
The Software Manager"s Guide to Practical Innovationmacadamian
 
DevOps trends to look out for in 2022.pdf
DevOps trends to look out for in 2022.pdfDevOps trends to look out for in 2022.pdf
DevOps trends to look out for in 2022.pdfEnov8
 
Agile Localization Fundamentals: An Integrative Approach
Agile Localization Fundamentals: An Integrative ApproachAgile Localization Fundamentals: An Integrative Approach
Agile Localization Fundamentals: An Integrative ApproachAlberto Ferreira
 
Unlocking Software Testing Circa 2016
Unlocking Software Testing Circa 2016Unlocking Software Testing Circa 2016
Unlocking Software Testing Circa 2016MentorMate
 
White paper - Adhoc 2.0
White paper - Adhoc 2.0White paper - Adhoc 2.0
White paper - Adhoc 2.0Nuno Brito
 
Agile Corporation for MIT
Agile Corporation for MITAgile Corporation for MIT
Agile Corporation for MITCaio Candido
 
copados-5-steps-to-devops-success-2022.pdf
copados-5-steps-to-devops-success-2022.pdfcopados-5-steps-to-devops-success-2022.pdf
copados-5-steps-to-devops-success-2022.pdfSrinivas Kannan
 
Rational collaborative-lifecycle-management-2012
Rational collaborative-lifecycle-management-2012Rational collaborative-lifecycle-management-2012
Rational collaborative-lifecycle-management-2012Strongback Consulting
 

Similaire à Google's Innovation Factory (ICST 2010) (20)

Emerging Trends of Software Engineering
Emerging Trends of Software Engineering Emerging Trends of Software Engineering
Emerging Trends of Software Engineering
 
HOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdf
HOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdfHOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdf
HOW TO SCALE AGILE IN OFFSHORE SOFTWARE DEVELOPMENT.pdf
 
3 Crucial Application Modernization Strategies for Enterprises.pptx
3 Crucial Application Modernization Strategies for Enterprises.pptx3 Crucial Application Modernization Strategies for Enterprises.pptx
3 Crucial Application Modernization Strategies for Enterprises.pptx
 
Agile Methodologies & Key Principles
Agile Methodologies & Key Principles Agile Methodologies & Key Principles
Agile Methodologies & Key Principles
 
Agile Tour Dublin 2013 - Product Lines and Agile
Agile Tour Dublin 2013 - Product Lines and AgileAgile Tour Dublin 2013 - Product Lines and Agile
Agile Tour Dublin 2013 - Product Lines and Agile
 
Pcloudy Unveils a New Platform for a Unified App Testing Experience.pdf
Pcloudy Unveils a New Platform for a Unified App Testing Experience.pdfPcloudy Unveils a New Platform for a Unified App Testing Experience.pdf
Pcloudy Unveils a New Platform for a Unified App Testing Experience.pdf
 
Product Vs Craft
Product Vs CraftProduct Vs Craft
Product Vs Craft
 
CTLR 2010 Issue 7 Waterfall Contract
CTLR 2010 Issue 7 Waterfall ContractCTLR 2010 Issue 7 Waterfall Contract
CTLR 2010 Issue 7 Waterfall Contract
 
Ibm smarter quality_management
Ibm smarter quality_managementIbm smarter quality_management
Ibm smarter quality_management
 
The Software Manager"s Guide to Practical Innovation
The Software Manager"s Guide to Practical InnovationThe Software Manager"s Guide to Practical Innovation
The Software Manager"s Guide to Practical Innovation
 
Agile.usability
Agile.usabilityAgile.usability
Agile.usability
 
DevOps trends to look out for in 2022.pdf
DevOps trends to look out for in 2022.pdfDevOps trends to look out for in 2022.pdf
DevOps trends to look out for in 2022.pdf
 
Agile Localization Fundamentals: An Integrative Approach
Agile Localization Fundamentals: An Integrative ApproachAgile Localization Fundamentals: An Integrative Approach
Agile Localization Fundamentals: An Integrative Approach
 
Unlocking Software Testing Circa 2016
Unlocking Software Testing Circa 2016Unlocking Software Testing Circa 2016
Unlocking Software Testing Circa 2016
 
Scrum the new silver bullet
Scrum the new silver bulletScrum the new silver bullet
Scrum the new silver bullet
 
White paper - Adhoc 2.0
White paper - Adhoc 2.0White paper - Adhoc 2.0
White paper - Adhoc 2.0
 
Agile Corporation for MIT
Agile Corporation for MITAgile Corporation for MIT
Agile Corporation for MIT
 
Web20 report excerpt
Web20 report excerptWeb20 report excerpt
Web20 report excerpt
 
copados-5-steps-to-devops-success-2022.pdf
copados-5-steps-to-devops-success-2022.pdfcopados-5-steps-to-devops-success-2022.pdf
copados-5-steps-to-devops-success-2022.pdf
 
Rational collaborative-lifecycle-management-2012
Rational collaborative-lifecycle-management-2012Rational collaborative-lifecycle-management-2012
Rational collaborative-lifecycle-management-2012
 

Dernier

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Dernier (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Google's Innovation Factory (ICST 2010)

  • 1. Google’s Innovation Factory: Testing, Culture, And Infrastructure Patrick Copeland, Google copeland@google.com ABSTRACT 1.2 EMERGING LEFTISM One thread common to formal models are that they focus Google’s external mythology has been one of a brilliant on a few of the many variables: improving efficiency, and chaotic innovation machine that produces new predictable process, estimation of quality, or others. As products and features at an amazing rate. Behind the most practitioners know, a development process is a curtain of public perception is a company that takes quality polynomial wrapped inside of a culture, and solving for a seriously and is reinventing how software is created, tested, few variables only achieves a momentary local maxima. released, and maintained; a reality that’s even more While process-heavy development models may work well interesting than the myth. for manufacturing airplanes and have been successfully At Google we’ve learned a lot in the last few years about applied by some companies [4] , they have been viewed by accelerating very large scale software development; in this many developers as burdensome and contrary to the paper we'll share what has worked and what hasn't worked creative nature of writing innovative software. Conversely, for us. “process-less process”, can lead to a heroic culture that’s unable to repeatedly deliver. There needs to be balance. 1. DYNAMIC EQUILIBRIA Consider the physics of flight as an analogy to software Since humans began writing software in the middle of the process. In addition to reasonable flying conditions and an last century, the process has been cumbersome, error prone experienced pilot, the key to getting airborne is having an and has more often than not created an end product that is appropriate balance of factors low in quality. Most companies are better at talking about that match the situation: too software quality than implementing it. [1] much weight or too little This is clearly not a new problem. In 1962 the “most thrust can be disastrous expensive hyphen in history” forced the destruction of the depending on the Mariner I rocket only 293 seconds after it was launched. situation. Similarly, Instead of its intended flyby of Venus, the rocket ended up teams, products and in the Atlantic Ocean. [2] process all have virtual physics. For instance, adding Such events have been a mainstay of computing history engineers late in a product cycle doesn’t necessarily ever since. In fact, Googling the search term “software provide more lift [5] . Adopting a new process may give a bug” turns up over 80 million hits. Buggy software is part team more thrust momentarily, but may also incur a longer of the industry’s fabric. term drag that makes them incapable of innovation. 1.1 TORRENTIAL PROCESS The popularity of Agile, while not a wholesale rejection of There have been numerous attempts over the prior decades more rigid processes, indicates that developers desire more to build more reliable software and these have come under balance and creativity. Whatever we do to make software many guises. Total Quality, Zero Defect, Six Sigma and higher quality and more predictable to build, we must Cleanroom have all borrowed ideas that were successful in maintain a balance that encourages the innovative aspects manufacturing, specifically prescribing more methodical of the art form. We need to motivate smart minds to solve and process-driven approaches to software development. hard problems and deliver rich features to customers. In Yet here we are in 2010 still talking about software other words, we need to focus on staying airborne for the quality! It’s hard to come to any other conclusion than that long term. the lessons learned from manufacturing don’t translate well 1.3 PARADIGM SHIFTS to software. A lot of software is now released as services and deployed Quality is still very hard to evaluate in software and we to data centers controlled by the software producers rather end up with estimations that focus on quantifying the than being installed on customer-owned servers/clients of measurable and rely on subjectivity for the rest. As Niklaus infinite configurations scattered around the globe. Software Wirth recently said, can be released to early adopters and beta users, bug fixes "The experience, judgment, and intuition of can be deployed to all users simultaneously or to a small programmers who have survived the rigors of testing percentage, maintenance and updates are handled centrally are what make programs of the present day useful, by experts and not by end users. With more control of the efficient, and correct." [3] end product, development teams can experiment and take more risks providing innovation faster and with less fear. When problems appear, they can be identified and fast- fixed before impacting large groups of users.
  • 2. ICST 2010 Copeland, Google’s Innovation Factory - 2 But the cloud paradigm is only part of the equation. We 2.2 FLAT & AUTONOMOUS also need to think differently about using these capabilities The organizational structure we use is atypical in the for software development itself. Can we align our culture, industry. For one, Google is a flat organization with many tools and processes to take full advantage of this new Nooglers being no more than 2-3 steps below senior model? Can we use automation to solve, once and for all, executives. The company structure can be characterized as: the repetitive, mundane and downright boring aspects of flat and autonomous. building products? Can we integrate development and At Google, managers are not controllers, they are testing so tightly that writing good code is easier than connectors charged with ensuring that teams make writing bad code? Can we encourage big thinking that effective use of information and tools. Many managers leads to new ideas? Can we do all of this at the scale and have 15 or more direct reports, introducing some chaos and hypercompetitive pace of the Internet? reducing the time available to micromanage. Managers are We’ve been tackling this problem for several years at judged on their ability to enable smart people to get things Google and this paper is a report of our progress. Our done. approach has been to automate those development tasks Teams are aligned along business lines we call “focus that shouldn’t require a human in-the-loop, to focus on areas” rather than around strict product lines. People doing building a culture around quality, to promote multiple similar work, no matter what products they are approaches to innovation, reduce bureaucratic creep, and to contributing to, will find themselves in close reporting invest in reusable infrastructure. proximity to their colleagues. This matrix encourages some 2. INNOVATION FACTORY amount of competition, but also the reuse of good ideas. Projects live and die based on free-market Darwinism, We encourage our engineers to focus on innovation. Eric where successful projects are further funded and less Schmidt, has said, “We take our jobs to be innovators and successful ones face atrophy. We take many short and long we are failing if we are not innovating quickly enough. [6] ” term bets, but projects must produce value to survive. Many of our best ideas were envisioned by engineers who 2.3 AVOIDING PLAUSIBLE DENIABILITY were passionate about solving a problem. Popular products, like Gmail, were initially developed by a few The entire product team is responsible for quality, and is passionate engineers outside of their normal work. judged on their ability to enable innovation, anticipate problems, make plans, and implement high quality Linus Pauling is commonly quoted as saying, “The best software. Teams adopt processes that are in their own self way to have a good idea is to have lots of ideas.” Google interest and that allow them to focus on innovation. has made its mark on the industry with new approaches to old problems. For example, our systems are built on The role of someone doing testing in this environment is “flaky” commodity hardware and an infrastructure that structured slightly differently than other technology dynamically compensates for that flakiness. Initially this companies. Testers avoid becoming codependents within was a subversive idea, as other companies at the time were this system and generally do not write unit tests or other building servers that attempted to eliminate all failures activities that are best done by the developer. Testing (like the foolproof HAL9000 from 2001). We expect teams focus on higher abstractions, like identifying everything to fail and use redundancy and automated latencies, system or customer focused testing, and enabling compensation techniques to maintain overall reliability. the process. Code is expected to have high reliability as it is written and 2.1 BUILDING FOR SCALE we adhere to a socially reinforced code review and check- Outside the walls of Google, this innovation factory has in practices. Development teams write good tests because created desirable products for our users. Inside the walls, it they care about the products, but also because they want has created large repositories of code, data, dependencies more time to spend writing features and less on debugging. and information that must be managed closely. Consider Teams with good testing hygiene upstream have more time the logistics of delivering at Google’s current pace: to innovate, and are thus more adaptable and competitive. • More than 6,000 engineers and >40 offices. In addition, there is one source tree and poorly written • 2,500 ongoing projects (2.5 developers / project). code is quickly identified because it breaks other people’s tests and projects. Aggressive rolling back is employed to • 1,600 active external release branches for products. keep the tree building “green.” • 59,000 builds / day each with 10-1000 targets.. Unlike traditional testing approaches, teams do not focus • 1.5 million tests / day, both manual and automated. on the tail end of the process or pad the schedule for • Most products localized into 40 languages. special testing phases. Instead, they look for ways to • At least bi-weekly release cycles. anticipate issues and solve them proactively in real time. Within each project are experts in the field of software quality and they ensure that the right tools, test cases and test procedures are in place throughout the product lifecycle. When bugs do slip through, or more commonly
  • 3. ICST 2010 Copeland, Google’s Innovation Factory - 3 unanticipated complex behavior situations occur, we unit level tests and using practices like mocking and aggressively do postmortems and quickly put in place distributed execution. solutions that prevent them from reoccurring. In subsequent levels, code coverage goals are explicitly 2.4 VIRAL ADOPTION defined, rules about releasing on “non-green” builds are imposed, and a broader array of testing is expected, such as At an individual project level, uniformity is rarely integration, system level, and various other techniques. mandated and adoption of tools and process is left to an internal “market” to decide. Apart from our core systems, This process is defined not to dictate to developers what to discussed later, a large portion of our tools are developed do but to identify goals that will help them develop better by motivated individuals to solve local challenges. software faster and spend less time in later phases fixing Similarly, process is tailored specifically to projects. While bugs. Advancement won’t guarantee higher quality this leads to a healthy amount of chaos, good ideas tend to software but it does pattern a roadmap that makes good spread quickly, because they have been proven useful by quality more probable. others. Engineers decide what's best for engineering, to 2.6 ELEMENTS OF CONTROL articulate the right vision, and to drive initiatives in the most sustainable fashion, and then others follow after As a counterbalance to the randomness incurred by our grassroots success. We’ve found that positive experience is relatively freeform process are a set of release standards an effective means of persuasion. and guidelines. These “launch reviews and criteria” are outlined to ensure that products answer common sense An example of viral adoption is a “fix it”, or an event questions before release. A few examples are: organized by engineers, that encourages Googlers to work on the same problem at the same time. The idea is to get a • Is the design secure and customer data private? large amount of work done in a short amount of time by • Will the service scale with the anticipated load? leveraging the power of masses. In the past, these have • Does the UI meet standards? been focused on fixing 1000 TODOs in the code-base or • What are the data center utilization estimates? fixing tests to take advantage of new infrastructure improvements. • What are the latency estimates? Testing on the Toilet is another example of a viral The point is that the release process is not friction free. adoption. It started as an offhand joke and it became a There are many high standards that must be met and it can world wide sensation, making headlines in the Wall Street be frustrating for teams that procrastinate. Pain can be Journal. The idea was to communicate ideas about testing avoided by driving change up stream as early as possible. and to do it in a place where we know people would have Given the forewarning, teams can meet the standards in a the time to read it. It’s is now published in hundreds of way appropriate to their constraints. stalls in most Google offices, taking submissions from different programming languages and application domains, 3. FASTER DEVELOPER WORKFLOW and appears on Google's public testing blog [7] . While the The build/test system is at the core of day-to-day activity articles themselves need to be short enough for people to for software engineers at Google. Almost everything read while they “do their business”, the ideas create a buzz deployed in production is developed, tested, and built using about specific topics and that would otherwise be difficult this system. Thus, the performance and usability of the to achieve. build tools has a large impact on engineer productivity, 2.5 CMM WITH A TWIST where even small changes are multiplied by the total number of tool interactions. One popular grass roots initiation that resembles a more traditional process methodology is called the Test Certified As traditional companies scale, sub-organizations begin to Program. Test Certified is a series of increasingly maintain separate code silos, build tribal release and advanced levels, each defined by a list of measurable integration procedures, and duplicate effort. But, more testing goals and capabilities. These goals are set by testers disturbing, they end up sinking a large amount of time into and present quality practices, advanced techniques and maintenance issues. Time that should be spent adding quality-oriented goals for a development team to strive to value is instead used to atone for past sins. achieve. As a development team achieves more goals, Engineering teams should be able to concentrate a using whatever techniques that suit their team culture and maximum of their time on quality and innovation. At problem domain, they move up through the Test Certified Google that time is achieved, at least in part, by making the ladder levels from TC1 to TC5. hard and the mundane simple and automatic. As a case-in- At the initial stages of the process, teams are asked to clean point, consider our build and deployment infrastructure. up and do several remedial actives, all of which are 3.1 DESIGN CONSIDERATIONS designed to get them seeing the benefits of testing immediately. Establishing a continuous build that runs a Prior to 2006, Google employed a fairly slow build and test set of fast deterministic tests is the most important aspect process that was designed for a much smaller company. of the first phase. Speed is achieved by focusing on small Back then, builds might be broken for days or weeks, the
  • 4. ICST 2010 Copeland, Google’s Innovation Factory - 4 “unpaid mortgage” of new code would build up, and then tools. We did this improving the highest traffic workflows would be followed by lengthy debugging and stabilization with caching, distributed execution, and avoiding phases. We needed an approach that provided developers bottlenecks. nearly instant feedback on every code check-in. CHART: TIME WAITING ON TOOLS IN HOURS/MONTH/DEVELOPER. We designed the system with the following principles: • Speed: All test and analysis systems need to return results very fast. If it takes too long, engineers will either ignore or not bother looking for that data. • Feedback: The focus of test systems must be on high quality feedback. We want engineers to keep code at production quality at all times, not adding time to fix code that was broken earlier. • Simplicity: Engineers should not have to understand how the underlying build and test systems work. All More importantly we were able to change how products data and feedback must be easy to understand, are produced with an emphasis on continual improvement. integrated into commonly-used productivity tools, and The chart above shows the number of hours “saved” per presented in a workflow that allows them to take month per developer on different types of projects. For appropriate action. instance “big” are defined as having more than 20k or more files. Within milliseconds of a code check-in, our build process will automatically select the appropriate tests to run based 5. CONCLUSION on dependency analysis, run those tests and report the results. By reducing the window of opportunity for bad Just as we are witnessing a paradigm shift to cloud code to go unnoticed, overall debugging and bug isolation computing that stretches our imagination and challenges time is radically reduced. The net result is that the the limits of software, our process for developing that engineering teams no longer sink hours into debugging software is going through an equally dramatic revolution. build problems and test failures. We are reconsidering the appropriateness of the lessons we’ve taken from manufacturing. We believe that software 3.2 ESTIMATING IMPACT development models require a new set of physics. We created a more holistic approach to estimate the overall Google has experimented with this new physics with impact of improvements. We know the general workflow innovative new tools, processes and infrastructure. While of engineers, and we can estimate how much time there is no magic bullet, there is a pragmatism that can be engineers spend in each area of the workflow. From this applied to software development that seeks to balance the we can model a “representative engineer” which provides a art form of creating software with the needs for framework for estimating where engineers spend their time repeatability, efficiency, and quality. At Google that has with tools. With this model we can measure the effect of meant eliminating the tedious and repetitive tasks with improvements on each area of the workflow to estimate automation and streamlined processes allowing testers to overall impact. engage the full extent of their creativity on innovation and TABLE: ESTIMATED MONTHLY ACTIVITY PER DEVELOPER meeting the challenges of modern software development. ACTIVITY INITIAL CLEAN BUILD BUILD RUN CHECK- BUILD AFTER AFTER TESTS 6. ACKNOWLEDGEMENTS OUT EDIT SYNC James Whittaker, Alberto Savoia, Nathan York, Mark FREQUENCY 2 4 160 20 60 Striebeck, and the following teams from Google: From the workflow we can identify five key activities Engineering Productivity, and the Test Grouplet. involving build tools. These are: Initial Checkout, Clean Build, Build After Edit, Build After Incremental Sync, and 7. REFERENCES Run Tests. The tricky part is estimating the frequency of [1] David N. Wilson, Tracy Hall, Perceptions of software quality: a pilot these actions. This is subjective since the details vary for study, Software Quality Journal 7, (1998) 67–75. each engineer. Some do a clean checkout and build for [2] Eric Roberts, Mariner I, The Risk Digest, Volume 5, Issue 66, (1987). every task. Others never do a full sync/clean build after the [3] Niklaus Wirth, Opening Talk GTAC 2009, (2009). [4] initial build is created. By collecting the data and Michael Diaz, Joseph Sligo, How Software Process Improvement Helped Motorola, IEEE, 0740-7459, (1997) 75-81. identifying the most frequent use cases, we were able focus [5] Fred Brooks, Mythical Man Month, (1995). on the largest productivity wins. [6] Ravi Mattu, World exclusive interview with Google!, 3.3 RESULTS ft.com/managementblog, July 8, 2009. [7] Introducing "Testing on the Toilet", Google Testing Blog, January 21, We were able to save the company about 600 person years 2007. of time that would otherwise have been spent waiting on