2. What is DCOS ?
• Some have declared that “the datacenter is
the new computer”
• Claim: this new computer increasingly
needs an operating system
• Not necessarily a new host OS, but a
common software layer that manages
resources and provides shared services for
the whole datacenter, like an OS does for
one host
3. Why Datacenters Need an OS
• Growing number of applications
– Parallel processing systems: MapReduce,
Dryad, Pregel, Percolator, Dremel, MR Online
– Storage systems: GFS, BigTable, Dynamo,
SCADS
– Web apps and supporting services
• Growing number of users
– 200+ for Facebook’s Hadoop data
warehouse, running near-interactive ad hoc
queries
4. What Operating Systems Provide
• Resource sharing across applications &
users
• Data sharing between programs
• Programming abstractions (e.g. threads,
IPC)
• Debugging facilities (e.g. ptrace, gdb)
Result: OSes enable a highly interoperable
software ecosystem that we now take for
granted
5. Today’s Datacenter OS
• Hadoop MapReduce as common
execution and resource sharing platform
• Hadoop InputFormat API for data sharing
• Abstractions for productivity programmers,
but not for system builders
• Very challenging to debug across all the
layers
6. Tomorrow’s Datacenter OS
• Resource sharing:
– Lower-level interfaces for fine-grained sharing
(Mesos is a first step in this direction)
– Optimization for a variety of metrics (e.g.
energy)
– Integration with network scheduling
mechanisms (e.g. Seawall [NSDI ‘11], NOX,
Orchestra)
7. Tomorrow’s Datacenter OS
• Data sharing:
– Standard interfaces for cluster file systems,
key-value stores, etc
– In-memory data sharing (e.g. Spark, DFS
cache), and a unified system to manage this
memory
– Streaming data abstractions (analogous to
pipes)
– Lineage instead of replication for reliability
(RDDs)
8. Tomorrow’s Datacenter OS
• Programming abstractions:
– Tools that can be used to build the next
MapReduce / BigTable in a week (e.g.
BOOM)
– Efficient implementations of communication
primitives (e.g. shuffle, broadcast)
– New distributed programming models
9. Tomorrow’s Datacenter OS
• Debugging facilities:
– Tracing and debugging tools that work across
the cluster software stack (e.g. X-Trace,
Dapper)
– Replay debugging that takes advantage of
limited languages / computational models
– Unified monitoring infrastructure and APIs
10. Putting it AllTogether
• A successful datacenter OS might let
users:
– Build a Hadoop-like software stack in a week
using the OS’s abstractions, while gaining
other benefits (e.g. cross-stack replay
debugging)
– Share data efficiently between independently
developed programming models and
applications
– Understand cluster behavior without having to
log into individual nodes
– Dynamically share the cluster with other users
11. Future Of DCOS• Focus on paradigms, not only performance
– Industry is spending a lot of time on performance
• Explore clean-slate approaches
– Much datacenter software is written from scratch
– People using Erlang, Scala, functional models
(MR)
• Bring cluster computing to non-experts
– Most impactful (datacenter as the new
workstation)
– Hard to make a Google-scale stack usable
without a Google-scale ops team
12. Conclusion
• Datacenters need an OS-like software
stack for the same reasons single
computers did: manageability, efficiency &
programmability
• An OS is already emerging in an ad-hoc
way
• Researchers can help by taking a long-
term approach towards these problems
Notes de l'éditeur
Doesn’t have to be a host OS, but rather a software stack that performs the same functions as the host OS on a single computer
Point out that apps are developed independently and assume they have dedicated (slices of) machines
Go back to DC being the new computer
Mention lower level storage interfaces such as block store
Note about how it may be easier to have impact here than in a “real” OS