2. 2Cloud Computing’s Coming of Age operation; applying this degree of “industrialization” to datacenters and theirWe consider cloud computing first. There’s operation isn’t really viable at small scale.4certainly plenty of buzz about it. For our purposeshere, we define cloud computing as accessing Open Source in the Cloudcomputing resources over a network—whetherthose resources take the form of a complete Open source is very much part of cloud computing.application (Software as a Service—SaaS); a The benefit of open source to the cloud providers isdeveloper platform such as Google Apps or clear at several levels.Microsoft Azure; or something that’s more akin to First, there’s the matter of cost. Open source isn’ta barebones operating system, storage, or a necessarily “free as in beer” (to use the populardatabase (Amazon Web Services).2 expression)—that is, zero cost; companies oftenAs recounted by, among others, Nick Carr in his want subscription offerings and support contractsThe Big Switch, cloud computing metaphorically even if the bits are nominally available for free. Butmirrors the evolution of power generation and it does tend to be less expensive than proprietarydistribution. Industrial Revolution factories—such alternatives even when some production-scaleas those that once occupied many of the riverside features are extra-cost options (as in the case of thebrick buildings I overlook from my Nashua, New monitoring tools in MySQL Enterprise). And thisHampshire office—built largely customized is no small consideration when you look at the sizesystems to run looms and other automated tools, of providers like Amazon and Google which oftenpowered by water and other sources. These power seem to add datacenters at a rate that manygeneration and distribution systems were a companies once added computers.competitive differentiator; the more power you Open source software is also just a good match forhad, the more machines you could run, and the this style of computing. For one thing, cloudmore you could produce for sale. Today, by contrast, providers—almost by definition—are technicallypower (in the form of electricity) is just a savvy and sophisticated. Although they don’t wantcommodity for most companies—something that to reinvent every wheel, they’re generally ready,they pull off the grid and pay for based on how able, and willing to tweak software and evenmuch they use. hardware in the interests of optimization. OpenThe economic argument underpinning cloud source software and, more broadly, open sourcecomputing has two basic parts. The first relates to communities with which they can engage, arehow firms should generally focus their resources therefore a good fit given that they can modifyon those things that differentiate them and give source code and otherwise participate in evolvingthem advantage over competitors. Computer software in a way that meets their requirements.systems—especially those devoted to mundane There are some areas of friction between opentasks such as email—aren’t one of those source and cloud computing. We see this in thedifferentiators for many companies.3 The second ongoing social and community pressure on largepart relates to size and scope of computing facilities. cloud vendors such as Goggle to make their “fairEfficient IT operations involve a high degree of share” of contributions to open source projects.5standardization, up-front design, and automated2 See our To Cloud or Not to Cloud for more discussion of 4 There’s an ongoing debate over how big “big” needs the different forms that cloud computing takes. to be. See our Bigness in the Cloud. But there’s general3 One of the earliest examples of widespread agreement that the entry point is somewhere around outsourcing of a computing task was payroll. This large datacenter scale. function is certainly important but having “better 5 Most “copyleft” open source licenses, such as the payroll” (whatever that would mean) isn’t something GPL, don’t require that code enhancements be that advantages a company. contributed back to the community when the *3B5DB40603D89DF
3. 3Proprietary Web-based applications and services— the physical server’s hardware under the control ofsuch as those from Google, 37signals, the hypervisor).Salesforce.com, and even some traditional software This ability to share a single (often underutilized)vendors—also tend to mirror certain open source physical server is certainly a salient trait ofstrengths such as easy acquisition. virtualization. In fact, it’s the main reason thatHowever, in the main, it’s a largely healthy and most companies first adopt virtualization—tomutually beneficial relationship. Open source is reduce the number of physical servers they have towidely embraced by all manner of technology purchase to run a given number of workloads.companies because they’ve found that, for many However, looking forward, the abstraction layerpurposes, open source is a great way to engage with that virtualization inserts between hardware anddeveloper and user communities—and even with application software is at least as importantcompetitors. In other words, they’ve found that it’s whether it’s used to run multiple operating systemin their own interests to participate in the ongoing images on a single server or not.evolution of relevant projects rather than simply Historically, once an application was installed on ataking a version of a project private and then system, it was pretty much stuck there for life.working on it in isolation. That’s because the act of installing the applicationVirtualization —together with its associated operating system and other components—effectively bound it to theVirtualization is the other buzziest IT topic today. specifics of the physical hardware. Moving theTruth be told, when it comes to enterprise application meant dealing with all manner ofcomputing, it’s actually of more immediate interest dependencies and, in short, breaking “stuff.”than cloud computing given that it’s a more Placing a hypervisor in the middle means that thedeveloped set of technologies and its use cases are software is now dealing with a relativelybetter understood.6 standardized abstraction of the hardware rather than actual hardware. The result is greatlyTo better understand how server virtualization7 increased portability.plays with both cloud computing and open source,it helps to think about what virtualization really is Portability, in turn, enables lots of interestingand how it is evolving. The core component of practical uses. For example, administrators can takeserver virtualization is a hypervisor, a layer of a snapshot of an entire running system for archivesoftware that sits between a server’s hardware and purposes or to rollback to if there’s a problem withthe operating system or systems that run on top in a system upgrade. VMs can be transferred from onetheir isolated virtual machines (VM). Essentially, system to another, without interrupting users, tothe hypervisor presents an idealized abstraction of balance loads or to perform scheduled maintenancethe server to the software above. It can also make it on a server. Ultimately, virtualization enables whatappear as if there are multiple such independent is often called a virtual infrastructure or a dynamicservers (all of which, in reality, cooperatively share infrastructure—by whatever name, an infrastructure in which workloads to move to software is delivered only in the form of a service, as wherever they are most appropriately run, rather is typical with cloud computing. than where they happened to be installed once6 Although various antecedents to, and subsets of, upon a time. cloud computing go back some time—think hosting providers or even timesharing.7 Open Source in Virtualization At its most conceptual, virtualization is an approach to system design and management that happens in many places and at many layers in a system. For our VMware both brought proprietary server purposes here, virtualization refers specifically to the virtualization to the mainstream and has been the particular approach to server virtualization described. *3B5DB40603D89DF
4. 4vendor who has most benefited from it to date. particular often uses an alternative form ofHowever, a variety of virtualization options are virtualization that’s more about distributing anow available, enabled in part by enhancements to single large job to a large number of servers usingprocessor hardware from AMD and Intel that certain standard protocols—sometimes called asimplify some of the more difficult aspects of grid. However, server virtualization is certainly ansimplifying x86 hardware. ideal complement to many cloud computing implementations.Among these options are Xen and KVM. Both opensource projects are part of standard Linux Cloud computing providers have adopted opendistributions.8 They are also available in the form source virtualization approaches (especially Xen)of a “standalone hypervisor”—essentially a small for many of the same reasons that they’ve widelypiece of code (often embedded in flash memory) adopted open source in general. Amazon Elasticthat lets a server directly boot up into a virtualized Compute Cloud (EC2) is an illustrative and well-state without first installing an operating system. known example of virtualization, paired with“Guest” operating systems can then be installed on Linux, in the cloud.11 With EC2, you rent VMs bytop in the usual manner. Xen is the more widely- the hour. Each VM comes with a specific quantityused and mature of the two today. But Red Hat of CPU, memory, and storage; currently there arebought Qumranet—the startup behind KVM—in five different size combinations available. Users canearly September 2008 and is focusing on KVM as then build their own complete VM from scratch.its strategic virtualization technology going More commonly, they’ll start from a standardforward; KVM has also been incorporated into the Amazon Machine Image (AMI)—an archived VMmainline Linux kernel since version 2.6.20.9 pre-loaded with an operating system, middleware, and other software.Virtualization has a close relationship to cloudcomputing, especially cloud computing Initially, these AMIs consisted almost entirely ofimplementations that provide users with an community-supported Linux distributions.execution environment in the form of a virtual However, one of the things that we now seemachine.10 Virtualization brings a lot of the happening as cloud computing evolves from aproperties you’d want a cloud computing developer-centric, kick-the-tires stage to somethingenvironment to have. You want to be able to store that supports production applications and evensnapshots of your environment to use in the entire businesses,12 is that some of the samefuture. Check. You want to be able to spin up concerns that are relevant to software running inapplications dynamically and shut them down an enterprise datacenter are finding their way intowhen they’re no longer needed. Check. You want to software running at cloud providers.insulate users from details of the physical An example of this trend is AMIs with Red Hatinfrastructure so that you can make transparent Enterprise Linux (RHEL) and the JBoss Enterpriseupgrades and other changes. Check. Virtualization Application Platform (currently in a supportedisn’t a universal requirement for all types of cloud public beta phase). This allows enterprises runningcomputing. High performance computing in RHEL inside their firewall to run the same operating system on Amazon Web Services (AWS).8 Xen is also the basis for the server virtualization in They might do this as part of migrating to, or Sun’s OpenSolaris and xVM. See our Virtualization Strategies: Sun Microsystems. running, just new applications in the cloud—or for9 Red Hat is doing this for both business and technical using the cloud to handle temporary workload reasons. See our Red Hat Makes Buy for KVM—But VDI Too. spikes. Precise support policies can vary by software10 Providers of other types of cloud computing, such as 11 At the end of 2008, Amazon also added support for SaaS, may also use virtualization—but, such details of their technology infrastructure are hidden from Microsoft Windows and SQL Server to EC2. users of the service. In fact, that’s sort of the point. 12 See our SmugMug and Amazon S3. *3B5DB40603D89DF
5. 5vendor, but—in general—running the exact same Among the benefits of automation is that, as theoperating system and middleware stack in a remote demands on a datacenter’s infrastructure changeenvironment as locally should allow applications over the course of a day, the course of a quarter, orthat are certified and supported in one place to also in response to traffic spikes, physical resources thatbe certified and supported in the other. aren’t immediately required can be turned off until they are needed. If fewer Web servers or applicationThe Green in Green servers are needed at night, they needn’t be powered-up all the time. There are also analogousIt’s popular to talk about “green,” examples in the storage arena where data thatenvironmentally-sensitive computing these days. doesn’t have to be instantly available can be movedSome green computing is, indeed, green for green’s to lower-power near-line storage such as tape orsake. It may be driven by government regulation13 MAID (massive array of idle disks).or it may be part of a high-level corporate initiativeundertaken for brand image or other reasons. Automating and IntegratingHowever, much of the time, companies undertakeenergy efficiency and conservation projects for the In practice, most datacenters are still in quite earlymost pragmatic of reasons: they can help the days when it comes to broadly applyingbottom line. Especially when bringing new IT automation. There are a variety of reasons for this.capacity on-line, it’s profitable to factor power and One is simply that it’s a largely new way ofcooling costs into any financial analysis. thinking about operating servers. After all, it took years before (just about) everyone got comfortableOptimizing power use dovetails with the other two with letting an operating system schedule jobs ontrends that we’ve discussed—virtualization and multi-processors within a single server. Thecloud computing—in important ways. adoption pattern of automation in a distributedPerhaps the most obvious intersection is with environment will be no different. What’s more,virtualization in its basic guise as a way to virtualization—to say nothing of the tools thatconsolidate applications onto fewer physical automate its use—is still quite new on the timeservers. Reducing the number of servers cuts scale of IT evolution. Short version: These thingsacquisition costs, certainly. However, it’s also the take time.case that the server with the lowest environmental There’s also a prosaic but very real issue associatedimpact—in all dimensions, not just power draw—is with changing IT operations for the benefit of theone that doesn’t exist. power bill; the budget for IT gear, software, and theMore broadly, virtualization brings dynamism to people needed to run it efficiently is often (indeed,an IT environment through features such as the usually) separate from the budget that pays forlive migration of virtual machines from one server utilities. And, however much many managers wantto another and the dynamic allocation of hardware to—in principle—make the right decisions for theirresources in response to workload changes. These in company take as a whole, in practice actions tend toturn enable what’s often called “automation,” be driven by individual and departmental budgetsessentially distributed workload management in and other incentives. Which brings us back to cloudresponse to user-specified policies. In the open computing.source space, Red Hat Enterprise MRG (messaging, Many of the things we think of as Green ITrealtime, and grid) is an example of technologies fundamentally relate to efficiency. So is,organized around an automation theme. fundamentally, cloud computing. It’s based on the premise that specialized service providers can deliver computing less expensively and with a13 Such as the RoHS (Regulation of Hazardous Substances) directive in the European Union. *3B5DB40603D89DF
6. 6better level of service than small operations that Conclusiondon’t have IT as a core competency. A new wave of IT is now forming at theEfficiency partly relates to the idea that larger scale intersection of cloud computing, virtualization, andhelps to smooth out demand. If different customers a generalized drive towards simplicity andconsume computing at different times and in efficiency. Cloud computing’s aim is to providedifferent patterns, the aggregate demand is software, platforms, and infrastructure in the formsmoothed out—and requires less infrastructure (at of a service from providers optimized to do so atconsequently reduced cost and power consumption) scale. The collective goal of technologies and—than would if individual customers were to concepts such as automation, orchestration, andprovision themselves for their own peak loads. virtualization is to break tight bonds between applications and the underlying physicalIt also suggests an “industrialization” of the infrastructure in order to use that infrastructuredatacenter, to use a term applied by Irving more efficiently and to change the resourcesWladawsky-Berger of IBM. By this he means available to those applications on-the-fly.professional, disciplined, and efficient ITmanagement. This suggests things like reducing Each of these individual trends is important. Butcomplexity (in terms of number of applications and some of the biggest wins for IT departments willplatform types) and the use of the sort of come when they consider the ways that theseautomation tools such as we discussed earlier. trends play off and cross-support each other. AndUltimately, one of the big things that cloud their technology choices should reflect this—providers sell is the quality of their IT, whether including those that involve open source software.manifested as high service levels, low cost, ability to Open source is clearly a significant part of this newscale, or some other attribute. And, even for those wave. Part of its role is essentially more of the sameapplications that enterprises decide to continue as open source projects proliferate and maturerunning internally, cloud providers provide a throughout software ecosystems. Open sourcebenchmark of what is possible. benefits such as easy acquisition and trial, ability toIf we talk specifically about software services modify, and general adherence to standards thatdelivered by an external provider, however, one have been important to end-users are also equallyfinal aspect of cloud computing is especially (or even more) important to the cloud providersinteresting in a Green IT context. When we buy delivering a new generation of software services.software in the form of a service rather than However, “free and open source software” (FOSS)installing it locally—say SugarCRM On-Demand has never been just about viewing and modifyingor Zoho Business—we no longer need to buy, bits. It’s introduced new ways of thinking aboutoperate, or power a local server. The software is still how we collectively develop, consume, and pay forrunning someplace, of course: on a service software. And continuing that thinking about whatprovider’s hardware. The difference is that we are protections and rights the users of software havenow implicitly paying for all those costs as part of and should have may be as important a continuingour software subscription. They’re no longer role for FOSS as the actual operating systems,hidden in someone else’s budget. middleware, and applications built on the openThus, cloud computing—as it develops—should source model.help lead to more efficient IT operations and, as asort of side effect, will make the full costs ofrunning an application more visible. *3B5DB40603D89DF