2. PC to Internet
• computing and storage are moving from
clients to Internet service
• email, photo, video, and office application...
• allow user to run at a low cost
3. Toward WSC
• the massive scale of their software
infrastructure, data repositories, and
hardware platform
• a departure from a single machine
• the program is an Internet service
4. Overview of WSC
• run a smaller number of very large
applications(or Internet services)
• must be designed to gracefully tolerate
large numbers of component faults
• building and operating a large computing
platform is expensive
5. Cost Efficiency
• building and operating a large computing
platform is expensive
• cost efficiency must be defined broadly to
account for all the components of cost
6. eg.
• Web search system is driven by the following:
1. Increasing popularity means higher request
loads
2. 1,000,000p/day means building a index
3. most substantial improvements demand
additional computing resources
7. What’s WSC?
• the machine, the computer, is the large
cluster or aggregation of servers itself and a
single computing unit
• WSCs have an additional layer of
complexity beyond systems consisting of
individual servers or small groups of server
• introduce new challenge to programmer
productivity, a challenge perhaps greater
than programming multicore systems
8. Why WSC?
• a rack with 40 servers, each with four 8-
core dual-threaded CPUs, would contain
more than two thousand H/W threads
• many org. will soon be able to afford
similarly sized computers at a much lower
cost
• this experience will be useful for ubiquitous
next-generation machines
9. Architecture
Typical elements in warehouse-scale systems: 1U server(left), 7`
rack with Ethernet switch(middle), and diagram of a small cluster
with a cluster-level Ethernet switch/router(right)
10. Architecture -Storage
• connecting directly to each individual
server(managed by GFS) or being part of
NAS?
• NAS provides extra reliability
• GFS implement replication across different
machines
• WSCs deploy desktop-class disk drives
instead of enterprise-grade disks
11. Architecture -Networking Fabric
• trading-off between speed, scale, and cost
• a switch that has 10 times the bi-section
bandwidth costs about 100 times as much
• intra rack connectivity is often cheaper
than inter rack connectivity
12. Architecture ~Storage Hierarchy
• DRAM and disk
resources within the
rack are accessible
through the first-level
rack switches
• all resources in the racks
are accessible via the
cluster-level switch
14. Latency, Bandwidth, Capacity
• these discrepancies are much larger than those
seen on a single machine
• a key challenge for architects of WSCs is to
smooth out these discrepancies in a cost-efficient
manner
• a key challenge for software architects is to build
cluster infrastructure and services that hide most
of this complexity from application developers
15. Power Usage
• energy-related costs
have become an
important component
of the total cost of
ownership
• CPUs can no longer
be the sole focus of
energy efficiency
16. Handling Failures
• the sheer scale of WSCs requires that
Internet services software tolerate
relatively high component fault rates
• an application running across thousands of
machines may need to react to failure
conditions on an hourly basis