Contenu connexe Similaire à A Scalable Software Build Accelerator Similaire à A Scalable Software Build Accelerator (20) A Scalable Software Build Accelerator1. A Scalable Software Build Accelerator:
Break the Build Bottleneck with Faster, More
Accurate Builds
John Ousterhout
Founder and Chairman
John Graham-Cumming
Founder
2. Executive Summary
For organizations that depend on software innovation, a slow software
build process can be a bottleneck for the entire company. Slow build
times not only impact engineering efficiency, they also affect product
quality and company agility. Furthermore, diagnosing build problems is
difficult or impossible: cryptic output in Make log files can be hard to
decipher and is difficult to relate to the individual build steps in a large
build. Electric Cloud's core products, ElectricAccelerator and
ElectricInsight, solve these problems by reducing software build times
dramatically and providing graphical insight into the performance and
structure of builds at a level impossible with existing tools.
The solution to the build speed problem is at a first glance simple: create
a distributed version of industry standard build tools (such as GNU Make
or Microsoft NMAKE) that distributes individual job steps in parallel to a
cluster of inexpensive servers. Over the years, many attempts have been
made to create “parallel” or “distributed” build systems. However, none
are in widespread use due to dependency issues and distributed
computing problems that lead to broken builds.
Electric Cloud, Inc. has developed an automated dependency
management system that makes parallel builds safe, scalable, and
efficient. ElectricAccelerator is a software build accelerator that
significantly reduces software build times by distributing the build over a
large cluster of inexpensive servers. ElectricAccelerator uses its patented
dependency management system to identify and fix problems in real time
that would break traditional parallel builds. ElectricAccelerator plugs
seamlessly into existing Make- or Visual Studio-based infrastructures, and
includes Web-based reporting and management tools.
ElectricAccelerator has been proven in some of the most demanding
software organizations and against large open-source projects.
In one organization, a product took four and a half hours
to build on a single system. It now builds 20x faster on
a 30-node ElectricAccelerator cluster, finishing in less
than 13 minutes.
2. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
3. In another organization, a product took 3 hours and 12
minutes to build on a single system. It now builds 16x
faster on a 30-node ElectricAccelerator cluster, finishing
in less than 12 minutes.
The open-source Samba file and print server takes 16
minutes to build on a single processor, but builds 16x
faster, in 58 seconds, on a 20-node ElectricAccelerator
cluster.
The open-source MySQL database takes over 23 minutes
to build on a single processor. It builds 12x faster, in 1
minute 54 seconds, on a 20-node ElectricAccelerator
cluster.
ElectricAccelerator improves the software development process by
reducing build times, so development teams can reduce costs, shorten
time-to-market, and improve quality and customer satisfaction.
ElectricInsight is Electric Cloud’s build visualization tool. A companion
tool to ElectricAccelerator, it takes advantage of structural information
recorded by ElectricAccelerator and provides a graphical display of this
information that makes it easy to understand the behavior of builds, tune
performance, and quickly debug broken builds.
3. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
4. The Real Impact of Slow Builds
Every software engineer has experienced the frustration and delay caused
by slow software builds. Every VP of Engineering has dreamed of rapid,
accurate builds that meet the needs of overextended QA teams, enable
rapid innovation to acquire new customers and allow timely turnaround
for critical bug fixes.
For most organizations such builds are quite simply dreams.
Although the widespread Make-based build infrastructure used by many
large software organizations has been around for over 20 years, it has
lagged behind advanced IDEs, new languages, and template libraries that
churn out more and more lines of code and result in slower and slower
builds.
As these projects grow, they are typically partitioned into a deep
hierarchy of directories with dependencies hidden in the arcane language
of Make. Trying to untangle a complex Makefile is a black art; it creates a
vast legacy of code that any solution must support without change.
Additionally, these deeply recursive Makes lead to brittleness as
dependencies between directories are often implicitly defined by the order
in which jobs are run without taking advantage of Make’s dependency
mechanism.
We have met with hundreds of commercial software development teams
and very few have production build times less than two hours. More than
half of the projects had build times in the 5-10 hour range, and a few
organizations reported that build times had reached 40 hours or more at
some point. Furthermore, the build issues are compounded since most
organizations must simultaneously support multiple platforms and product
versions.
A large tangible cost is engineering team productivity: the time engineers
spend waiting for their builds to complete. Most developers spend at
least two hours per week waiting for builds to complete, and in some
organizations developers spend as many as 10 hours per week waiting for
builds during some development phases. Engineers are often forced to
4. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
5. switch back to a bug fix checked in the day before because an overnight
build has finally shown that there was a problem.
Another cost is those times in the software development process where
the team is constrained by the build process. Typically, this is during the
“integration storm” phase of a release. Integration storms are periods of
instability that occur several times during a release cycle when developers
synchronize their changes into the main code line. Inevitably there are
interactions between the changes made by different developers, causing
broken builds and incorrect product behavior. It can take anywhere from
several days to several weeks to iron out all the problems; during this
period virtually the entire engineering organization is tied up fixing
problems or waiting for the code line to stabilize. If builds take overnight,
the organization may not be able to fix more than one or two problems
per day.
Long builds can also impact product quality. If builds take too long,
developers don’t have the resources to quickly do a complete re-build of
their product before they check in. It’s not uncommon for their changes
to break the build, but that problem is not discovered until the nightly
build runs. If that build was intended for the QA team, then after the
problem has been identified and fixed, the team is forced to wait for the
next nightly build. This means one less day testing that build. Since the
period between QA drop builds is often fixed, there is not enough time to
execute all of the scenarios on that build. If this happens near the end of
a release cycle, it’s possible for a bug to make it out to the field, where
customers encounter it.
Traditional Approaches to Improving Build Performance
There have been numerous attempts to improve the performance of Make
over the last two decades. They fall into two general classes: “faster”
approaches that execute pieces of the build in parallel, and “smarter”
approaches that avoid work entirely.
5. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
6. SMP Hardware
One solution to build speed is to buy a large multiprocessor machine and
use GNU Make’s -j switch to force it to run multiple jobs in parallel on the
same machine.
Although this approach gives some speedup (typically 2-4x), it does not
scale well because of the high per-CPU cost in a multiprocessor machine
and because incomplete dependencies (especially in hierarchical Makes)
become the build’s Achilles’ Heel. With incomplete dependencies, the
parallel build tends to reorder build steps in ways that break the build,
leading to unpredictable and inaccurate builds.
ElectricAccelerator uses its patented dependency management system to
identify and fix problems in real time that would break traditional parallel
builds. With this perfect information, ElectricAccelerator can achieve
speedups of up to 20x.
In addition, builds using GNU Make’s -j switch produce log output that
differs with each run, which makes it difficult to verify and debug builds.
Electric Make ensures that the build log is written in the same order every
time.
Distributed Builds
A variation of the parallel build approach is distributed builds, where
builds are run in parallel using a cluster of machines instead of a
multiprocessor.
In addition to all of the build ordering problems of parallel builds
described in the previous section, this approach is fraught with difficulties.
The clocks on the remote machines must be synchronized to ensure that
Make’s time stamp-based dependencies work correctly. All of the
machines must be mounted on a reliable shared file system. Any failure
on an individual node will cause the build to fail. Incomplete dependency
information can still cause inaccurate build results. Furthermore, the time
taken invoking a job (e.g. with ‘rsh’) can be high in traditional
approaches, limiting the performance benefits.
The ElectricAccelerator architecture eliminates several distributed-systems
issues that threaten the correctness or robustness of distributed builds.
6. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
7. Electric Make manages timestamps centrally to avoid clock
synchronization problems. It communicates with the nodes providing all
files through a reliable protocol that can self-heal if a node fails, thus
eliminating the need for a mounted file system and ensuring build
accuracy every time regardless of hardware or operating system failure.
Additionally, Electric Make uses a fast binary protocol to send jobs to
nodes to reduce overhead and help achieve massively distributed builds.
Manually Partition Makefiles
Some organizations have taken the extreme step of manually breaking a
build up into a small number of steps that are run in parallel on different
machines. This difficult and error-prone task requires detailed knowledge
of Makefile internals and typically yields only small speedups as
partitioning the build into smaller and smaller steps requires enormous
effort to ensure correct results.
ElectricAccelerator completely automates the parallelization of builds at
the lowest level possible: individual job steps. It merges hierarchical
builds into a single build and uses a multi-threaded architecture to run as
many jobs as possible at the same time.
Build Avoidance
Another approach for improving build performance is to reduce the
amount of work that must be done, either by doing better incremental
builds or by sharing results between independent builds. This is typically
done by trying to rely on incremental builds rather than complete builds.
Very few build organizations are willing to do incremental builds for their
production software; instead they rely on complete builds for QA and
release. The risk of a broken build and complexity of ensuring that a
build is accurate across multiple product components leads most
organizations to rely on full clean builds.
Summary
In summary, each of the approaches described above offers the potential
for speeding up builds, but each makes the build process more brittle by
increasing the risk that a build will fail or that it will be inconsistent with
the sources.
7. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
8. None of the organizations we have talked with has been able to achieve
more than a 6x speedup reliable enough for production builds, and only a
very few have achieved even a 3x speedup after significant investments
of time and resources. Most organizations run their builds completely
sequentially or with only a small speedup, in order to keep the process as
reliable as possible.
ElectricAccelerator: a Highly Scalable Solution
ElectricAccelerator is a software build accelerator that takes advantage of
the abundant parallelism available in builds and capitalizes on recent
technology improvements in inexpensive servers and fast networks.
Instead of running a build sequentially on a single processor,
ElectricAccelerator executes pieces of the build in parallel on a large
cluster of inexpensive servers. (see Figure 1). ElectricAccelerator has
four main software components:
Electric Make, a new version of Make that reads Makefiles, analyzes
dependencies, and coordinates activities on the nodes. Electric Make also
acts as a file server for the nodes in the build cluster.
Electric File System, a special-purpose file system driver that runs on the
nodes in the cluster. It monitors every file access to provide the complete
dependency information that allows ElectricAccelerator to automatically
detect and correct out-of-order build steps.
Electric Agent, a user-level component that runs on the nodes serves as
an intermediary between Electric Make and Electric File System, and runs
jobs at Electric Make's request.
Cluster Manager, a Web server that allocates nodes for individual builds
and provides reporting and management tools.
To a user, ElectricAccelerator appears identical to other versions of Make
or Visual Studio. Electric Make can be invoked anywhere that other
versions of Make might be invoked, such as engineer workstations or
dedicated build machines. Electric Make can be invoked interactively or
8. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
9. as part of build scripts. The use of a cluster for the builds is invisible to
the Electric Make user, except that the builds run much faster.
Build Machine Cluster Manager
Electric Make
HTTP Web Server
File Server Scheduler
DB
TCP/IP HTTP
Node
Cluster
Node
Agent Node Node
Agent Agent Agent
User Level
Kernel
Electric File System Electric File System Electric File System
Electric File System
Figure 1: The ElectricAccelerator Architecture
Massively Distributed Builds
To achieve build speedups of up to 20x, ElectricAccelerator couples a
cluster of servers running as many jobs in parallel as possible with the
kernel-level Electric File System. The Electric File System monitors every
file access to compute dependencies automatically and ensure that build
results are perfect every time.
When a build starts, Electric Make reads existing Makefiles and
determines the list of jobs that need to be executed. Electric Make then
communicates with the Cluster Manager component. Cluster Manager
controls access to the cluster of nodes and allocates a collection of nodes
to Electric Make.
The Cluster Manager controls access, but also adjudicates between
competing builds. If multiple builds are requested simultaneously on a
build cluster, the Cluster Manager is able to fairly allocate cluster nodes to
the builds taking into account the available build resources, the
requirements of a specific build and build priorities and related policies set
in the Cluster Manager.
9. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
10. For example, the Cluster Manager might be configured to allow general
use of the cluster with automatic sharing and allow a build manager to
take over the entire cluster when a build must be rapidly produced (for
example, for address a critical bug fix for a customer). Cluster Manager
will ensure that a build manager is able to use the entire cluster without
terminating other running builds. Low priority builds would be placed in
a wait state until the top priority build was completed and then would be
automatically continued. After continuing, normal sharing of the cluster
would resume.
Electric Make then instructs each node to perform jobs on its behalf. A
node running a job reads and writes a variety of files (such as source and
object files) which are passed dynamically across the network via Electric
Make using a fast binary protocol developed by Electric Cloud. When each
job completes, it sends its results (such as files written and log output) to
Electric Make for final storage on disk. File data is cached on nodes for
the life of a build, in order to minimize network traffic for files that are
reused.
In addition the Electric File System running on the nodes captures every
single file access performed by jobs and provides that information back to
Electric Make. Using that information, coupled with the Makefiles’
dependency information, Electric Make is able to determine the exact
relationship between jobs and files and fix any missing dependencies in
real time.
If Electric Make detects that two jobs were performed in the wrong order
because missing Makefile dependency information made them appear
independent, it will automatically reschedule them for completion in the
correct order and make a note of the missing dependency.
Electric Make saves this missing dependency information to increase the
performance of subsequent runs. Successive builds use the additional
dependency information to run the build steps in the correct order, further
improving performance. Electric Make updates this missing dependency
information after every build, so that even as Makefiles evolve,
performance is automatically maintained. Other approaches to parallel
builds require a large on-going investment in Makefile maintenance to
10. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
11. keep the build from breaking, and to ensure that parallel performance is
preserved.
At the end of the build, Electric Make communicates build results to the
Cluster Manager where they are made available through a web-based
interface for reporting and management.
ElectricAccelerator’s Cluster Manager
Accurate Incremental Builds
ElectricAccelerator uses its perfect information about dependencies for
more than just safe parallel builds. It also uses the dependency
information in a feature called eDepend, which enables accurate
incremental builds. Historically, incremental builds have been unreliable:
they only work if perfect dependency information is available, so that
Make knows which subset of files must be regenerated after a particular
file is changed. Since dependencies were not perfect, incremental builds
would sometimes fail to regenerate files affected by a change, so the only
safe approach was to do a complete rebuild.
Electric Make uses dependency information collected during previous
builds to decide what to regenerate during incremental builds. This
makes incremental builds accurate and reliable, and eliminates the need
for clumsy, slow ‘make depend’ steps. eDepend is superior to
dependency searching techniques (such as the free tool makedepend)
because it is completely language agnostic. eDepend detects
11. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
12. dependencies at the file level without the need to search within files and
understand individual languages. This language independence means
that eDepend also detects dependencies between object files (for
example, eDepend can automatically detect a dependency between an
executable and a library that the executable is linked with).
Hierarchical Builds
Another patented technique enables Electric Make to achieve massively
distributed builds by flattening deep hierarchies of nested Makes. In a
typical recursive build, a top-level Make executes a set of Makes in
subdirectories to make individual components.
Electric Make treats this hierarchy as a single Make by merging
dependency information from the complete hierarchy and identifying jobs
that can be safely run in parallel. In this way it achieves massive
concurrency with the largest number of jobs running at the same time,
even from different Makes.
To ensure that the build is 100% accurate, Electric Make's eDepend
feature also analyzes file usage information from the Electric File System
to automatically detect dependencies between recursive Makes that were
never specified in the Makefiles.
Plug Compatible
Electric Cloud’s replacement for the Make program, known as Electric
Make, is compatible with GNU Make, Microsoft NMAKE and Visual Studio.
It understands the complete GNU Make and Microsoft NMAKE languages
and has identical command-line options.
Starting to use Electric Make is a simple matter of changing invocations of
gmake or nmake to eMake with an appropriate command-line option
specifying the emulation mode. For the ultimate in slot-in deployment, if
the eMake program is renamed to gmake or nmake it automatically
determines whether it is implementing GNU Make or Microsoft NMAKE.
Because Electric Make requires no Makefile changes it’s easy to deploy
and it’s even easier to verify. In addition to reading standard Makefiles
and accepting standard command-line options, Electric Make produces
identical log file output. For example, verifying that Electric Make is doing
12. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
13. the same work as an existing Make is a simple matter of running the diff
program. Electric Make even produces identical error messages in the
event of a broken build step.
The only difference is that Electric Make runs the build as much as 20x
faster.
Because Electric Make looks just like Make, existing build scripts (such as
Perl wrappers) can be run without change. Electric Make plugs right into
the existing build system.
Robust
ElectricAccelerator’s three user-level components (Electric Make, Cluster
Manager and Electric Agent) are in constant communication ensuring that
all are operating correctly so that failures are detected rapidly and fixed in
real time.
As a build is running, Electric Make keeps track of the files and jobs
running on each machine on the cluster. In the event that a cluster node
fails, Electric Make automatically detects the failure and reruns the
incomplete job on another node.
Electric Make communicates node failures to the Cluster Manager for
reporting and management purposes, and continues the build maintaining
high speed and build accuracy.
As well as being able to handle cluster failures, ElectricAccelerator’s
unique Electric File System automatically makes up for deficiencies in
Makefiles where missing dependencies can cause traditional parallelization
methods to create incorrect builds.
Multi-Platform
Although some organizations have the luxury of working on a single
platform, many face the realities of a heterogeneous world. Electric Cloud
is no different. All of the ElectricAccelerator components work on
Microsoft Windows, Sun Solaris and Linux. Electric Make emulates the
popular GNU Make and Microsoft NMAKE programs.
13. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
14. Real World Performance
ElectricAccelerator has been tested in some of the most demanding
enterprise software organizations and against large open-source projects.
In one organization, a product took four and a half hours to build on a
single system. It now builds 20x faster on a 30-node ElectricAccelerator
cluster, finishing is less than 13 minutes.
In another organization, a product took 3 hours and 12 minutes to build
on a single system. It now builds 16x faster on a 30-node
ElectricAccelerator cluster, finishing is less than 12 minutes.
The open-source Samba file and print server takes 16 minutes to build on
a single processor, but it builds 16x faster, in 58 seconds, on a 20-node
ElectricAccelerator cluster.
The open-source MySQL database takes over 23 minutes to build on a
single processor. It builds 12x faster, in 1 minute 54 seconds, on a 20-
node ElectricAccelerator cluster.
14. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
15. Build Visualization
Complementing ElectricAccelerator is ElectricInsight, a graphical tool that
mines extensive build information generated when a build is run with
Electric Make to provide unprecedented information about the structure of
a large build.
With ElectricInsight it's easy to get an overview of the running of a long
build. This screen shot shows an example build that last 4,380s (about
1h 13m) and consists of 1,457 jobs. In the terminology of ElectricInsight,
a job is an individual step from a Makefile such as a compilation or link.
The ElectricInsight display shows the name of the machine running the build (in
this case 'node0') and a horizontal bar chart. Each bar represents a single job and
the bars length is proportional to that job's running time. The bars are ordered
from left to right in the same order as the jobs were executed by make.
Just glancing at the display gives instant information about the build. In
addition to the figures displayed in the bottom right hand corner (where
you can find the total number of jobs executed and running time in
seconds), the bar chart shows that:
• There are a number of very long running jobs. Very early on in
the build there's a single job (shown as a large patch of blue)
which takes up about 15% of the build.
• Areas of black are when many small jobs are running in
succession with tight packing. Zooming on the display will reveal
the actual jobs in detail.
15. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
16. • There's a gap about half way through the build when no job is
running. That's clearly a waste of time that needs investigating.
• One of the large jobs is highlighted (by hovering the cursor over
it) in pink and the bottom half of the ElectricInsight display shows
the details for the job. In this case the job built
build/motor/output/sharea5mass/debug/a2a5mass.so and
spent 219s (or about 5% of the total time) on it.
One job in the example build is taking 15% of the total build time.
Investigating the job is trivial with a graphical tool like ElectricInsight.
First, it's obvious which jobs are taking up most of the time, and hovering
over the longest one reveals information about it.
The name of the binary being built is revealed:
build/bin/output/binaryserver/ debug/a1binaryserver.so. And it's
taking 639.21s; over 10 minutes to compile! ElectricInsight shows that
this one job consumes 14.59% of the entire build.
If the name of the binary isn't enough to track down in the Makefile the
job that's running, a double click on the blue bar brings up an additional
window containing detailed information on the job. Here are
a1binaryserver.so's details.
16. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
17. The Output pane shows the actual output in the Make log that's
associated with this one job. It reveals that the job consists of deleting
four files using the rm command, followed by compiling the object using
compile. All the arguments and options for each command can be clicking
the Show Commands check box.
The Job Details display also helps narrow down the job even further by
giving the name of the Makefile (in this case simply Makefile) and the line
number within the Makefile (in this case 1953) where the rule for this job
is defined.
ElectricInsight also provides valuable information when a build is run in
parallel with the ElectricAccelerator system. The same example build
when run against an ElectricAccelerator build cluster of ten nodes is
shown here.
17. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
18. With ElectricAccelerator, this build has dropped to running in 20m. If the
build manager wants to reduce the build time even more, the
ElectricInsight display shows areas for further optimization:
• The same long running jobs (a1binaryserver.so, a2motor.so,
etc.) still dominate the build and optimizing them would bring
down the overall time.
• There are a number of large gaps meaning that parallelism isn't
perfect. The build runs initially for about 50% of the time with
jobs on every node consuming the CPU resources. After that, a
number of large jobs block the rest of the build. This blocking
effect occurs because the jobs that run at the end of the build are
waiting for the large jobs to complete. Typically the blockage is
caused by explicit dependency information in the Makefile (i.e. the
final jobs are not permitted to run until specific objects have been
built).
In summary, optimizing the long jobs will make this build faster (and
much faster in the parallel case).
After optimizing the longest running jobs (perhaps by adding more
memory to the machines they are running on, or getting developer help in
breaking the project apart), ElectricInsight can be used to visualize the
result.
18. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
19. From the ElectricInsight diagram it's clear that all the nodes in the
ElectricAccelerator cluster are being used fully to run the build, leading to
a much reduced build time.
Conclusion
Fast, reliable builds are now a reality.
Electric Cloud's solution provides enterprise-class software that
accelerates the time consuming and costly software build process by as
much as 20x and removes the guess work from keeping builds running
accurately and quickly. ElectricAccelerator transforms inexpensive
servers into highly scalable clusters so that eight-hour builds can finish in
30 minutes. ElectricInsight provides never-before-seen information about
the structure, timing and dependencies in every build. And eDepend
means the end of long dependency generation steps and perfect
incremental builds every time.
19. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.
20. With Electric Cloud, development teams can reduce costs, shorten time-
to-market, and improve overall product quality.
About Electric Cloud
Electric Cloud is the leading provider of software production management
solutions that automate, accelerate, and analyze the software
development tasks that follow the check-in of new code. These include
the software build, package, test and deploy processes. The company's
patented and award-winning solutions improve productivity in the face of
increasing product complexity and time-to-market pressures for software
delivery. In addition to ElectricAccelerator and ElectricInsight, Electric
Cloud offers the only enterprise-class build and release management
solution, called ElectricCommander.
Leading companies such as Qualcomm, Intuit, and Expedia rely on
Electric Cloud's Software Production Management solutions to change
software production from a liability to a competitive advantage. For
customer inquiries please contact Electric Cloud at (650) 968-2950 or
www.electric-cloud.com.
Electric Cloud, ElectricInsight, ElectricAccelerator, ElectricCommander and
Electric Make are trademarks of Electric Cloud. Other company and
product names may be trademarks of their respective owners.
20. v2007.06
whitepaper ©Electric Cloud, Inc. All rights reserved.