Challenges and Issues of Next Cloud Computing Platforms

Challenges and Issues of Next
Cloud Computing Platforms
Frédéric Desprez
Frederic.Desprez@inria.fr
Labex UCN@Sophia – Feb. 18th 2016

Labex UCN@Sophia – F. Desprez
Acknowledgements
Feb. 18, 2016 2
Gabriel Antoniu Inria (Rennes, Kerdata)
Olivier Beaumont Inria (Bordeaux, CEPAGE)
Alexandru Costan Inria (Rennes, Kerdata)
Thierry Coupaye Orange Labs Grenoble
Paulo Goncalvez Inria (Lyon, Dante)
Shadi Ibrahim Inria (Rennes, Kerdata)
Kate Keahey Argonne National Lab
Cristian Klein Umea University, Suède
Adrien Lèbre Inria et Ecole des Mines de Nantes (Ascola)
Laurent Lefèvre Inria, (Lyon, Avalon)
Ignacio Llorente Complutense University of Madrid, Espagne
Christine Morin Inria (Rennes, Myriads)
Martin Quinson ENS (Rennes, Myriads)
David Margery Inria (Rennes, Myriads)
Anne-Cécile Orgerie CNRS (Rennes, Myriads)
Manish Parashar Rutgers University
Christian Perez Inria (Lyon, Avalon)
Thierry Priol Inria (Rennes, Myriads)
Jonathan Rouzaud-Cornabas Insa (Lyon, Beagle)
Frédéric Suter CNRS/IN2P3 (Lyon, Avalon)
Patrick Valduriez Inria (Montpellier, Zenith)
Rich Wolsky University of California Santa Barbara, USA

Outline
• Introduction and Context
• Energy Issues
• Distributed Clouds
• Big Data
• Other issues
• Conclusions
3Labex UCN@Sophia – F. Desprez Feb. 18, 2016

Context
Cloud computing has emerged as a “new” paradigm for many commercial
and scientific venues
• Starts to be widely adopted by the industries
• Many platforms and infrastructures available around the world
• Several offers for IaaS, PaaS, and SaaS platforms
• Public, private, community, and hybrid clouds
… But still many applications left that could benefit from such platforms
Several issues still needs to (better) addressed
• Elasticity, availability, self-configuration, heterogeneous computing and storage
capacities
• Several challenges remain to be addressed and transferred into industrial
products
• Energy management
• New applications (IoT)

Clouds Essential Characteristics
• On-demand service
 No need of human interaction to get an access to storage and computation resources (Utility
Computing)
• Access through large scale networks
 Access to resources through networks from lightweight and heavy-weight clients (WAN, LAN,
Wireless)
• Resource Polling
 Resources (CPU, storage, memory, network) are taken from datacenters without (almost) locality
notion
• Elasticity
 Ressources can be allocated and freed in an elastic fashion based on the application needs (with an
"infinite" capacity)
• Measured service
 Possibility to monitor resource usage
• Pro
 Disponibility and extensibility
 Dynamicity
 Fault tolerance
 Resource mutualization
• Cons
 Heterogeneity
 No locality
 Application porting
 Security ?

Transparency is the Key
“I don't care if my cloud computing architecture is powered by a grid, a mainframe, my
neighbour's desktop or an army of monkeys, so long as it's fast, cheap and secure.”
Sam Johnston, Sept. 2008

Research Issues
• Explosion of the number of research work around Clouds and virtualization !
• Some research challenges
• Energy
• Service composition
• Service Level Agreement (SLA)
• Security
• Fault tolerance and recovery
• Infrastructure management
• Elastic management of resources
• (Big) Data management
• Seamless access to hybrid platforms
• Multi-clouds, Sky computing, federations, infrastructure distribution, edge computing
• New models
• economic, energy
• Application design and description
• New languages, new models
• Simulation and experimentation
• ...

ENERGY ISSUES
Laurent Lefèvre’s team in Avalon (LIP/ENS Lyon & INRIA)

Electrical consumption of ICT…. 2013… gwatt.net
Devices
Telecommunication networks = 83 GW

Improving Energy Efficiency of Cloud Infrastructures
• Understanding the energy usage of large scale systems mixing virtual instances
of applications, physical IT resources, and physical infrastructures remains a
real challenge.
• How to profile the energy consumption of large sets of virtual machines (generic metrics,
benchmarks, and energy models)
• Analyzing tools and frameworks to support large scale energy efficient management of
resource
• Optimize the energy consumption of distributed infrastructures and service
compositions in the presence of ever more dynamic service applications
• Use of renewable energies
• Exploring the trade-off between energy saving and performance aspects in large-scale
distributed system
• Energy efficiency of storage systems and networks

Energy Efficiency by knowing application and services or
not ?
• Exploring 2 different approaches
• With knowledge on the application and services
• Enable the user to choose the less consuming implementation of services
 Estimate the energy consumption of the different implementations
(protocols) of each service
• Without knowledge
• Allow some intelligence to reduce the energy usage
 Autonomically estimate the energy consumption of the HPC system in
order to apply green levers

Improving EE with application expertise
• Considered services: resilience & data broadcasting
• 4 steps
• Service analysis, Measurements, Calibration, Estimation
• Helping users make the right choices depending on context and
parameters
M. Diouri, O. Glück, L. Lefèvre, and Franck Cappello. "ECOFIT: A Framework to Estimate Energy Consumption of Fault Tolerance Protocols during HPC executions",
CCGrid2013, the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013

Without knowledge of applications and services ?
• HPC applications keep growing in complexity
• too many bugs in HPC applications already present, adding energy management
and considerations won’t help 😀
• Are HPC programmers ready for eco design of applications ?
• Applications can share the same infrastructure
• Optimizations made for saving energy considering some applications are likely to
impact the performance of others
• Instead of looking at applications and service ⇒ Focusing on the
infrastructure
• Detect and characterize system’s runtime behaviors/phases
• Optimize each subsystem (storage, memory, interconnect, CPU) accordingly
• Helping users to find the best service

Without knowledge on applications
Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Jean-Marc Pierson, Patricia Stolf,
Georges Da-Costa. "Application-Agnostic Framework for Improving the Energy
Efficiency of Multiple HPC Subsystems", PDP2015 : 23rd Euromicro International
Conference on Parallel, Distributed and Network-based Processing, 2015.
• Irregular usage of resources
• Phase detection, characterization
• Power saving modes deployment

Towards Energy Proportionality with Heterogeneous
machines
OBSERVATIONS [Barroso and Hölzle 2007]
Average server utilization between 10 and 50 %
→ Most inefficient region
No proportionality due to high idle consumption
→ Can be up to 50 % of peak power
PROPOSITION
Heterogeneous Infrastructure composed of machines with different characteristics in terms of performance
and energy consumption
• Classical servers → Only used at their most energy efficient region
• Low power processors → Reduce static costs
TECHNICAL CHALLENGES
- Application placement: Dynamically find the most suitable combinations of machines
- Infrastructure reconfiguration: Power On/Off machines at the right time
[Barroso and Hölzle, The Case for Energy Proportional Computing, IEEE Computer, 2007
16
Labex UCN@Sophia – F. Desprez Feb. 18, 2016 16
BIG
MEDIUM
LITTLE

Towards Energy Proportionality with Heterogeneous
machines
V. Villebonnet, G. Da Costa, L. Lefèvre, J-M. Pierson, P. Stolf, “Big, Medium, Little”: Reaching Energy Proportionality with Heterogeneous Computing Scheduler”,
Parallel Processing Letters, 25 (03), World Scientific Publishing, 2015.

Towards Energy Proportionality with Heterogeneous machines
Application: Stateless Web Servers
Traces: Day of 98 WorldCup Website access
BIG only
Joules per request: 0,2268
Infrastructure utilization: 40,7%
Number of reconfiguration: 4
BML combination
Joules per request: 0,2155
Infrastructure utilization: 69,7%
Number of reconfiguration: 194
⇒ Infrastructure is dynamically reconfigured
to meet the load demand of the application
→ Energy consumption more proportional to
the load

Virtual Machines and Energy efficient Clouds
Taking into account the energy consumption in the
scheduling process
• Energy and resource usage are highly fluctuating
• Large disparities between similar nodes
→ Decisions needs to be proactive based on recent and
historical activity
How to efficiently assign those tasks?
Combine
• A metric to balance performance and energy
consumption
• An interface to express tradeoffs between
users and providers requirements
• A manager of energy-related events
Results
• Up to 20% of energy savings in real-life
experimentations
Daniel Balouek-Thomert, Eddy Caron, Laurent Lefevre, "Energy-Aware Server Provisioning by Introducing Middleware-Level Dynamic Green Scheduling", HPPAC
2015: The 11th Workshop on High-Performance, Power-Aware Computing, May 2015

Virtual Machines and Energy efficient Clouds
Combining energy with other criteria and constraints for a given problem
• Large spectrum of potential solutions
• NP-Hard problem
Daniel Balouek-Thomert, Arya K. Bhattacharya, Eddy Caron, Karunakar Gadireddy, Laurent Lefèvre, Minimizing energy and makespan concurrently in Cloud
Computing workloads using Multi-Objective Differential Evolution, under reviewing
Genetic Approach
• A model to capture affinities
between tasks and resources
• An algorithm that mimicks the
“survival of the fittest”: only
efficient servers are used through
time
• A learning engine that integrates
constraints
Strategies needs to be validated in terms of correctness and computing
time

Network is Part of the Story: Dynamic, Energy Efficient,
Network Reconfiguration
Van Heddeghem et al. “Power Consumption Modeling in Optical Multilayer
Networks” PNET 24 (2), 86–102, 2012
Carpa R., Gluck O., Lefevre L. and Mignot J.-C., "Improving the energy
efficiency of software-defined backbone networks", Photonic Network
Communications, vol. 30(3), p. 337-347, 2015.
Network energy consumption
40 Gwatts in 2013 (source: gwatt.net)
A lot of improvement possible during off-peak hours
Especially in core networks
Re-route to improve the energy efficiency
Consumption reduced by up to 39 %
Hassidim, A et al.
“Network utilization: The
flow view”, INFOCOM,
2013 IEEE, 1429–1437,
2013

Energy Efficient Networks
Carpa R., Assuncao M.,Gluck O., Lefevre L. and Mignot J.-C., "Responsive Algorithms
for Handling Load Surgesand Switching Links On in Green Networks” - Submitted to
ICC 2016
Simulations of high-speed core networks
• Rerouting in less than a second
• Improved energy efficiency compared to related
work (12 %)
• Same quality of service
NetFPGA + Openflow testbed (Work in progress)
• Targeting access networks
• Few, frequently changing, flows
• Cross-layer L3 / L4 optimizations for stability

DISTRIBUTED CLOUDS
Adrien Lebre’s team in ASCOLA (LINA/EMN Nantes & INRIA)

The current Situation
• Large off shore DCs
• To cope with the increasing UC demand while handling energy concerns
• But
• Juridiction concerns (data locality)
• Reliability
• Network overhead
• Localization is a key element to deliver efficient as well as sustainable
Utility Computing solutions
credits: coloandcloud.com

Charles
Alice
Paula
Bob Dan
Sam
Rob
Duke
The Cloud from End-Users
25

Charles
Alice
Paula
Bob
Dan
Sam
Rob
Duke
Internet
backbone
The Cloud in Reality
26

Cloud Evolution
Not only mega data centres !
Courtesy to Thierry Coupaye (Orange)

Trends for Next Generation Clouds
Centralized public clouds are in fact generally distributed over multiple
(mega) data centres for availability reasons
Verizon (©)
Orange (©)Microsoft (©)
Amazon (©)

Hybrid and community clouds are by nature distributed over multiple data
centres/clouds

Networks are getting « softwarized » and are converging with a distributed
vision of cloud computing.
3 examples
 Virtual CDN (vCDN)
 Cloud RAN (C-RAN)
 Mobile Edge Computing (MEC)

The DISCOVERY Proposal
• DIStributed and COoperative framework to manage Virtual EnviRonments
autonomously
• Locality-based Utility Computing platform (“LUC-OS”)
• A fully distributed IaaS system and not a distributed system of IaaS systemS.
• We want to/must go further than high level cloud APIs (cross-cutting concerns such as
energy/security)
• Leverage P2P algorithms and self-* approaches
• Lots of scientific/technical challenges
• Cost of the network ?
• Partial view of the system ?
• Impact on the others VMs ?
• Management of VM images ?
• How to take into account locality aspects?
• Which software abstractions to make the development easier and more reliable
(distributed event programming)? …
Lèbre, A., J. Pastor, J., Bertier, M., Desprez, F., Rouzaud-Cornabas, J., Tedeschi, C., Anedda, P., Zanetti, G., Nou, R., Cortes, T., Riviere, E. and Ropars, T., Beyond The
Cloud, How Should Next Generation Utility Computing Infrastructures Be Designed? INRIA Research Report 8348, Aug. 2013.

Ali
ce
Duke
DISCOVERY Network
DISCOVERY Network
Paula
DISCOVERY Network
DISCOVERY Network
DISCOVERY Network
DISCOVERY Network
Tom
DISCOVERY Network
DISCOVERY Network
Charles
Bob Dan
Sam
Rob
DISCOVERY Network
The DISCOVERY Initiative
32

Beyond the Clouds, the DISCOVERY Initiative
Locality-based UC infrastructures / Fog / Edge
A promising way to deliver highly efficient and sustainable UC services is to provide UC platforms as close
as possible to the end-users.
http://www.renater.fr/raccourci?lang=fr

Beyond the Clouds, the DISCOVERY Initiative
• Leveraging network backbones
• Extend any point of presence of network backbones with UC servers (from network hubs
up to major DSLAMs that are operated by telecom companies and network institutions)
• Leveraging wireless backbones
Paula
Bob
Alice
Duke
Charles
Pam
Bob
core backbone
DISCOVERY
DISCOVERY
Network
DISCOVERY
Network
DISCOVERY
Network
DISCOVERY
Network
DISCOVERY
Network

Would OpenStack be the solution?
• Do not reinvent the wheel …
• OpenStack
• Open source IaaS manager with a large community
• Composed of several services dedicated to each aspect of a cloud

Distributing OpenStack
• Services collaborate through
• A messaging queue
• A SQL database
• Alternate solutions exists for storing states over a highly distributed infrastructure ⇒ NoSQL DB
• Few proposals to federate/operate distinct OpenStack DCs
• ‘Flat’ approach Hierarchical approaches
http://beyondtheclouds.github.io/dcc.html

ROME
• Relational Object Mapping Extension for key/value stores
• Jonathan Pastor’s Phd
• Enables the query of key/value store DB with the same interface as SQLAlchemy
• Enables Nova OpenStack to switch to a KVS without being too intrusive
• The KVS is clustered on controllers
• Compute nodes connect to the Key/value cluster
Non-Relational
Key/Value
DB
Relational
Nova
Network
Nova
Compute
Nova
Scheduler
Nova
Conductor
db.api
MySQL
DB
https://github.com/badock/rome

The DISCOVERY INITIATIVE PROS AND CONS
• Pro
• Locality (jurisdiction concerns, latency-aware apps, minimize network overhead)
• Reliability/redundancy (no critical point/location/center)
• The infrastructure is naturally distributed throughout multiple areas
• Lead time to delivery
• Leverage current PoPs and extend them according to UC demands
• Energy footprint (on-going investigations with RENATER)
• Bring back part of the revenue to NRENs/Telcos
• Cons
• Security concerns (in terms of who can access to the PoPs)
• Operate a fully IaaS in a unified but distributed manner at WAN level
• Not suited for all kinds of applications : Large tightly coupled HPC workloads 50
nodes/1000 cores, 200 nodes / 4000 cores (5 racks), so 1000 nodes in one PoP does
not look realistic …
• Peering agreement / economic model between network operators
http://beyondtheclouds.github.io/

BIG DATA
Gabriel Antoniu’s team KERDATA (IRISA & INRIA)

Data Processing, Big Data
• Huge amount of data to be moved and processed
• LHC, simulations, genomics, astrophysics, social networks, sensors, …
• Heterogeneity in their storage (DB, files, …) and processing (cleaning, transformation,
analysis, search, indexing, visualization, ...)
• Challenges
• Resources issues
• Fault tolerance and recovery, energy management
• Handling complex distributed workflows at a large scale (computation and data transfers and
replications)
• Resource management (computation, storage, network), solutions interoperability
• Describing these workflows
• Meta-data management
• Data provenance
• Which transformations were applied
• Programming next generation applications
• Which langage for which application
• Strong relations with resource management systems
• Performance and transparency
• Genericity
Sakr, S. Liu, A., Batista, D.M., Alomari, M., A Survey of Large Scale Data Management Approaches in Cloud Environments, IEEE Communications Surveys and
Tutorials, 2011.
Middleton A.M., Data-Intensive Technologies for Cloud Computing, Handbook of Cloud Computing, Springer, 83-135, 2010.
http://research.microsoft.com/en-us/collaboration/fourthparadigm/

Beyond Hadoop: BlobSeer
Scalable Storage for Data-Intensive Analytics
Started in 2008, 6 PhD (Gilles Kahn/SPECIF PhD Thesis Award in 2011)
Main goal: optimized for concurrent accesses under heavy concurrency
Three key ideas
- Decentralized metadata management
- Lock-free concurrent writes (enabled by versioning)
- Data and metadata “patching” rather than updating
A back-end for higher-level data management systems
- Highly scalable distributed file systems
- Storage for cloud services
Approach
- Design and implementation of distributed algorithms
- Experiments on the Grid’5000 testbed
- Validation with “real” apps on “real” platforms: IBM clouds, Microsoft Azure, OpenNebula
- Results on Grid’5000: BlobSeer improves Hadoop by 35% (execution time)
http://blobseer.gforge.inria.fr/
B. Nicolae, G. Antoniu, L. Bougé, D. Moise, A. Carpen-Amarie. “BlobSeer: Next Generation Data Management for Large Scale Infrastructures”, in: Journal of
Parallel and Distributed Computing, February 2011, vol. 71, no 2, pp. 169-184.

BlobSeer on Commercial Clouds
The A-Brain Microsoft Research – Inria Project
p( ),
Genetic dataBrain image
Y
q~105-6
N~2000
X
p~106
– Anatomical MRI
– Functional MRI
– Diffusion MRI
– DNA array (SNP/CNV)
– gene expression data
– others...
• TomusBlobs storage (based on BlobSeer)
• Processing approach: MapReduce
• Gain / Blobs Azure : 45%
• Scalability : 1000 cores
http://www.msr-inria.fr/projects/a-brain/
• KerData, PARIETAL teams at INRIA
• European Microsoft Innovation Center (Aachen)

Executing the A-Brain Application at Large-Scale
• The TomusBlobs data-storage layer developed within the A-Brain project was
demonstrated to scale up to 1,000 cores on 3 Azure data centers (from EU, US)
• Gain compared to Azure BLOBs: close to 50%
• Experiment duration: ~ 14 days
• More than 210,000 hours of computation used
• Cost of the experiments: 20,000 euros (VM price, storage, outbound traffic)
• 28,000 map jobs (each lasting about 2 hours) and ~600 reduce jobs
Scientific Discovery:
Provided the first statistical
evidence of the heritability of
functional signals in a failed stop
task in basal ganglia
B. Da Mota, R. Tudoran, A. Costan, G. Varoquaux, G. Brasche, P. J. Conrod, H. Lemaitre, T. Paus, M. Rietschel, V. Frouin, J.-B. Poline, G. Antoniu, B. Thirion. Machine
Learning Patterns for Neuroimaging-Genetic Studies in the Cloud, in: Frontiers in Neuroinformatics, vol. 8 , April 2014.

Going Further: Managing Metadata for Geo-Distributed Workflows
The Z-CloudFlow Microsoft Research – Inria Project
• Multisite cloud = a cloud with multiple data
centers
• Each with its own cluster, data and programs
• Matches well the requirements of scientific apps
• Goal
• Investigate approaches to metadata
management integrated with workflow
execution engine to support multi-site
scheduling

Four Strategies
Centralized
• Baseline
Replicated
• Local metadata accesses
• Synchronization agent
Decentralized
Non-replicated
• Scattered
metadata
across sites
• DHT-based
Decentralized
Replicated
• Metadata stored
locally and replicated
to a remote location
(using hashing)

Matching strategies to workflows
• Centralized
• Small scale
• Replicated
• Intensive computations
• Large files
• Decentralized approaches
• A large number of small files
• Non-replicated
• Parallel jobs
• Replicated
• For sequential, tightly
dependent jobs, data
available locally
L. Pineda-Morales, A. Costan, G. Antoniu. « Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows », in: CLUSTER 2015 - IEEE
International Conference on Cluster Computing, Chicago, United States, September 2015.

Failure-Aware Scheduling in Hadoop
In large-scale cloud node failures are inevitable
• 1000 machine failures in the 1st year of Google cluster*
• 10% -15% job failure rate in a CMU clusters
Failure recovery in Hadoop
• Hadoop re-executes the tasks of failed machines
• Waits uncertain amount of time for a free slot
• Ignores the data locality of the recovery tasks
*J. Dean, “Large-scale distributed systems at Google: Current systems and future directions" in keynote speech at the 3rd ACM SIGOPS International Workshop on
Large Scale Distributed Systems and Middleware, 2009.

Chronos: a Failure-aware scheduler
• Takes early actions upon failures
• Employs work-conserving preemption technique
• Considers local execution of recovery tasks
• Independent of scheduling policy and increases performance (10-20%) over
state-of-the-art Hadoop schedulers
• It reduces the waiting time of recovery tasks from 46 seconds to 1.5 seconds
on average
O. Yildiz, S. Ibrahim, T.A. Phuong, G. Antoniu. “Chronos: Failure-aware scheduling in shared Hadoop clusters”, The 2015 IEEE International Conference on Big
Data (BigData 2015), Nov 2015.

Explore the impact of DVFS in Hadoop clusters
There is a significant potential of energy saving by scaling down the CPU frequency
when peak CPU is not needed
Diversity of MapReduce
applications
Multiple phases of MapReduce
application
Disk I/O CPU Disk I/O Network
CPU load is
high (98%)
during almost
75% of the job
running
CPU load is
high(80%)
during only
15% of the job
running
S. Ibrahim, T-D Phan, A. Carpen-Amarie, H-E. Chihoub, D. Moise, G. Antoniu, “Governing Energy Consumption in Hadoop through CPU Frequency Scaling: An
Analysis”, Future Generation Computer Systems, Volume 54, January 2016

Mitigating Stragglers in Hadoop
Performance variation is common in the Cloud
• Stragglers can severely increase the execution time
• Hadoop lunches another copy of the straggler with the hope that it will finish
earlier (i.e., speculation)
Task 1Node1
Node2 Task 2
Task 3Node3
Node4 Task 4
time
Straggler
T-D Phan, S. Ibrahim, G. Antoniu, L. Bouge, “On Understanding the Energy Impact of Speculative Execution in Hadoop”, The 2015 IEEE InternationalConference on Green
Computing and Communications (GreenCom 2015), Dec 2015

Speculation benefit in Heterogeneous-environment
0
2
4
6
8
10
12
14
16
18
Executiontime(103
s)
Speculation disabled
Speculation enabled
0
5
10
15
20
25
Energyconsumption(MJ)
-47%
-28%
The energy reduction is not proportional to the execution time improvement. This
strongly depends on the extra power due to the extra resource consumption
0
10
20
30
40
50
60
70
80
90
CloudBurst
Averagepowerconsumption
0
10
20
30
40
50
60
70
80
90
CloudBurst Sort WordCount
Averagepowerconsumption
Speculation disabled
Speculation enabled
+32%

New approaches for Data Management in the Cloud
• No “one size fits all” solution
• NoSQL, key-value data stores (e.g. Bigtable, HBase, Cassandra, HyperTable), graph
databases (e.g. Neo4j, Pregel), array data stores (e.g. SciDB), analytical Cloud databases
(e.g. Greenplum and Vertica), analytical Cloud frameworks (e.g. Hadoop Map-Reduce,
Cloudera Impala), document databases (e.g. MongoDB, CouchBase), data stream
management systems (e.g. Storm)
• Wide diversification of data store interfaces and the loss of a common
programming paradigm
• Design of multistore data management systems
• Data management in multisite Clouds
• Deviation between Cloud and HPC storage infrastructures
• New I/O mechanisms to guide I/O-systems in order to deliver the best performance
• Distributed file systems with Cloud capabilities such as elasticity
• development of a unified architecture for HPC and Cloud storage back-ends

New Models for Cloud Application Description
• Adaptation to various kinds of hardware resources is mandatory
• Interesting approach: distinguish the description of the various possible
configurations (application architecture description) from the quality of
service looked for a particular execution (minimize cost, maximize
performance, respect a deadline, etc)
• Challenges
• Description of the structure of the application
• Description of the expected behavior
• Several issues
• Take data into account
• How to model application workflows
• New languages for non-functional objectives (budget, performance/dead-line,
security, data)

More Efficient Techniques and Algorithms for Cloud Resource Allocation
• Large number of resources to be used by applications
• Hardware heterogeneity including new resources (GPU, FPGA, …)
• Difficult for users to choose the most appropriate hardware configuration
• Need of performance models for applications
• Seamless choice of resources following user demands and resource availability
• SLAs put the emphasis on providers to provide robust allocations despite the large
number of hardware failures
• Include a reliability constraint
• Use replication to cope with faults and failures
• Take dynamicity and elasticity into account
• Allocation problems adapted to Cloud constraints (CPU, memory, disks, network,
complex topologies)
• Design of sophisticated algorithms with guarantees on their reliability
• Put the optimization on impactful jobs
• Efficient representation of the search space and theoretical analysis

New Approach to Integrate Cloud, IoT, CPS, and Mobile Devices
Cloud systems are now the cornerstones of the Internet ecosystem allowing
any connected devices, such as things, Smartphones, tablets, set-top boxes
and PCs, to store and share information in a seamless way
• But
• centralized Internet, increasing impact of failures on Internet users, loss of control on
citizen's private data, vendor lock-in from hardware and software providers, massive leaks of
sensible data when Cloud systems are under attack and surveillance by national security
agencies
• Ideas
• More decentralized Cloud infrastructures, i.e. fog computing, taking into account the rapid
evolution of very cheap and low-power consumption hardware
• Use nano-PC based on Smartphone technologies (ARM based processors)
• Many challenges
• Seamless integration of nano-PC within Cloud infrastructures,
• New Cloud services combining nano-PCs and data-centers,
• Server-less sharing, security, and privacy

Promoting simulations to Investigate Cloud Concerns
• Difficult for users to select the Cloud services that best meet their
requirements (in terms of performance, cost, energy, etc)
• Preliminary evaluations with partial deployments on real platforms such as Amazon Web
Service or Microsoft Azure
• Investigation of new hardware and new software mechanisms for Cloud
providers in order to stay competitive
• Provision a part of the Cloud to evaluate the benefits of such change
• Use of simulation for these scenarios
• Reduction of development cost
• Controlling of parameters such as network latency, reliability, scalability, etc
• Development of accurate and versatile simulation framework

• Scientific instrument for the study of large scale distributed computing
• Main Features
• Versatile: Grid, P2P, HPC, Volunteer Computing, ..., Clouds
http://infra-songs.gforge.inria.fr
• Valid: Accuracy limits studied and pushed further for years
• Scalable and Fast (despite precise models)
• Usable: Tooling (generators, runner, vizu); Open-source, Portable, ...
• On-Going work
• SCHIaaS: Simulation of Clouds and Hybrid IaaS
• Adding virtualization capabilities into SimGrid (VM migration, boot, …)
SimGrid: Simulator of Distributed Applications
simgrid.gforge.inria.fr/

GRID’5000 – Real IaaS for Researchers
• Testbed for research on distributed systems
• Born from the observation that we need a better and larger testbed
• HPC, Grids, P2P systems and more recently Cloud computing
Adding virtualization capabilities into Grid’5000 INRIA RR8026/Jul 2012
• A complete access to the nodes’ hardware in an exclusive mode
(from one node to the whole infrastructure)
• Current status
• 9 sites,1195 machines, 8184 cores
• Diverse technologies/resources
(Intel, AMD, Myrinet, Infiniband, two GPU clusters, energy probes)
• Ready to use OpenStack distribution
• Last significant experiment
• Dynamic scheduling of 10K VMs across 4 sites
https://www.grid5000.fr/

Conclusion
• Cloud Computing technology is changing every day New features, new
requirements (IaaS ++ services)
• Many research issues addressed in our research labs that should/will
be transfered in tomorrow’s cloud infrastructures
• Connection between “classical” Cloud infrastructures to next generation
platforms (IoT)
• Distributed Cloud Computing is happening !
• Dist. CC workshop (UCC 2013, SIGCOMM 2014/2015)FOG Computing
workshop (collocated with IEEE ICC 2013), IEEE CloudNet …
• How developers should develop new applications to benefit from such
geographically distributed infrastructures

References
• European Commission report on The Future of Cloud Computing
 http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf
• A Roadmap for Advanced Cloud Technologies under H2020, European Commission, Recommandations by
the Cloud Expert Group, Digital Agenda for Europe, Dec. 2012
•Report on the public consultation for H2020 Work Programme 2016-17: Cloud Computing and Software
 ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?doc_id=8161
• Key Challenges in Cloud Computing, Enabling the Future Internet of Services, Rafael Moreno-Vozmediano,
Ruben S. Montero, and Ignacio M. Llorente, IEEE INTERNET COMPUTING, Jul 2013
• NIST Cloud Strategy and Innovation Blog (I. Llorrente)
 http://blog.cloudplan.org/
• Above the Clouds: A Berkeley View of Cloud Computing
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html
•DRAFT Cloud Computing Synopsis and Recommendations, NIST,
 http://csrc.nist.gov/publications/drafts/800-146/Draft-NIST-SP800-146.pdf
• SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
 http://www.sienainitiative.eu/Repository/FileScaricati/8ee3587a-f255-4e5c-aed4-9c2dc7b626f6.pdf
• The Magellan Report on Cloud Computing for Science, Yellick et al., Dec. 2011
 http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_final_report.pdf
• Livre blanc sur le calcul intensif, Comité d’orientation pour le calcul intensif (Cocin) du CNRS, 2012
 http://www.cnrs.fr/ins2i/IMG/pdf/Livre_blanc_-_derniere_version.pdf
• Synergistic Challenges in Data-Intensive Science and Exascale Computing, DOE ASCAC Data Subcommittee
Report, March 2013
• Integration of Cloud computing and Internet of Things: A survey, A. Botta, W. de Donato, V. Persico, A.
Pescapé, Future Generation Computer Systems, 56 (2016)

QUESTIONS ?

Challenges and Issues of Next Cloud Computing Platforms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Challenges and Issues of Next Cloud Computing Platforms

Similar to Challenges and Issues of Next Cloud Computing Platforms (20)

More from Frederic Desprez

More from Frederic Desprez (13)

Recently uploaded

Recently uploaded (20)

Challenges and Issues of Next Cloud Computing Platforms

Editor's Notes