Over view of OpenStack deployment at MIT Computer Science and Artificial Intelligence Lab with focus on specific research use cases. Delivered at OpenStack Hong Kong Summit November 2013
Exploring the Future Potential of AI-Enabled Smartphone Processors
MIT/CSAIL OpenStack Use Cases - Hong Kong 2014
1.
2. CSAIL
Computer Science and
Artificial Intelligence Laboratory
●
●
●
●
●
Largest Lab at MIT
50 Year Legacy
107 Primary Investigators
1,033 Members Total
??? Active Research Projects
5. So Who's Using It?
●
●
●
114 Users
38 Projects
468,866 Instances
(lifetime total)
●
●
6,303,639 vCPU Hours
As of 25 October ...
●
ALFA
http://groups.csail.mit.edu/EVO-DesignOpt
●
TREC-KBA
http://trec-kba.org
●
Jigsaw
http://people.csail.mit.edu/sanchez
●
NMS
http://nms.csail.mit.edu
●
LIS
http://lis.csail.mit.edu
●
Julia
http://julialang.org/
6. ALFA
Anyscale Learning For All
●
●
●
Scalable Evolutionary
Algorithms
Machine Learning
Frameworks for knowledge
mining, prediction, analytics and
optimization
http://groups.csail.mit.edu/EVO-DesignOpt/groupWebSite/
7. TREC-KBA
Text Retrieval Conference's
Knowledge Base Acceleration
competition
●
●
MIT is the Organizing Team
Content Stream
–
–
–
462M Texts, 40% English
4,973 hourly chunks of 100k
doc/hr stream
News, blogs, forums
http://trec-kba.org
8. Ubik
service guarantees for
latency-critical workloads
●
●
●
Hardware & Software System
Simulation
~ 1 Quadrillion Instructions
Saw 2x - 3x speed up using
cloud in addition to dedicated
hardware
http://people.csail.mit.edu/sanchez
9. NMS
Network & Mobile
Systems
●
●
●
Student contest for congestion
control algorithms
Machine learning to get
computers to design congestion
control algorithms
Possible future work on
intelligent VM placement in
clouds
http://nms.csail.mit.edu
12. Julia
●
●
●
VHLL for Parallel Programing
Now MIT Licensed Community
project
Serving IJulia cluster used for
research and teaching
http://julialang.org
13. Future Work
●
●
●
●
●
●
●
Infrastructure as Research
Platform
Internet Facing Applications
Consolidate Existing
Virtualization
“One Cloud Per Student” ?
Bare Metal ?
GPUs ?
“Cloud Desktops” ?
Notes de l'éditeur
Intro
Sr. Tech. Architect
Charged with ensuring researchers have the “Infrastructure of the Future”
Also ensure today's architecture is deployed properly and yesterday's architecture doesn't collapse...
OpenStack running just over 1yr fits in as “tomorrow” (between today & future)
Approx 50 groups in diverse areas
Robotics
Machine Learning
Biomedical
Computer Architecture
Etc...
All solving different problems using different methods
Highly Complex
Highly Diverse
Incredibly Open Environment
This one Is just as read about 30sec on what we setup
Again just 30sec on recent changes
Significant change in use since switching to over commit, though we rarely reach 2:1 on any node and are very nearly 1:1 over all (have seen peaks at 3:1)
Larger multicore instance types and longer runtimes
Pre over commit:
Avg runtime 3:45
Avg vcpu 1.8
Post over commit:
Avg runtime 39:20
Avg vcpu 4.2
It is not clear if this is causal...
Alfa: Scalable evolutionary algorithms, machine learning and frameworks for knowledge mining, prediction, analytics and optimization.
TREC-KBA: the MIT team organizing the 2013 Text Retrieval Conference's Knowledge Base Acceleration competition which seeks to help humans expand knowledge bases like Wikipedia by automatically recommending edits based on incoming content streams. This open evaluation measures an automatic system's ability to filter a large stream of text for new knowledge about entities.
Ubik(sanchez): HW Simulation Wazzit?- simulated ~1 quadrillion instructions total, and at its peak openstack doubled to tripled the capacity
Ubik proposes new hardware and software techniques to achieve systems
that provide very strict quality of service guarantees for
latency-critical workloads, and high throughput for batch workloads.
The main motivation is that datacenters burn terawatt-hours, but
servers, which make up the buk of datacenter power, are run at 10-15%
of capacity to guarantee QoS for critical services. At the same time,
datacenters have a lot of non-critical computing (e.g., MapReduce,
overcommitted openstack VMs, etc.). In Ubik, we're developing a number
of techniques that enable colocating both types of compute in the same
system, sharing resources between batch and latency-critical workloads
to achieve maximum utilization of CPU, memory, etc, but safely
protecting latency-critical workloads from any noticeably degradation.
NMS: used the cluster to host a contest for students in 6.829 (MIT's graduate networking class) to develop the best congestion control algorithms, by running the students' algorithms on pre-recorded traces of cellular networks. Also using it for heavy computation, running big machine learning problems to try to get computers to design new congestion control algorithms..
LIS: Learning & Inteligent Systems conduct interdisciplinary research aimed at discovering the principles underlying the design of artificially intelligent robots.
LIS: Learning & Inteligent Systems conduct interdisciplinary research aimed at discovering the principles underlying the design of artificially intelligent robots.
Julia: Julia is a VHLLs or very high level languages for parallel computing and now an MIT Licensed opensource project, this is the group at MIT that originally developed it
https://ijulia.csail.mit.edu:8000 Cert protected frontend
Alan's apaprently using this in 18.06 & possibly other classes http://web.mit.edu/18.06/www
NMS – intelligent placement of VMs to reduce network congestion in the data center
Scaling Note Bene for use with EdX (http://nb.mit.edu) and DetectMe new project similar to http://labelme.csail.mit.edu
Consolidation isn't “exciting” but is a metric of stability and manageability
OCPS (bit of a play on OLPC )idea has been around longer than our cloud, in this case each lab member would get a moderate quota allocation independent of any particular project just to hack around and do cool stuff.
Factors preventing some users from taking advantage, need to access low level hardware, special coprocessing requirements.
Cloud Desktops old idea coming round again thin client accessible from anywhere, suggestion from community testing various implementations. Will it be useful? Don't know...