SlideShare a Scribd company logo
1 of 63
Challenges and Issues of Next
Cloud Computing Platforms
Frédéric Desprez
Frederic.Desprez@inria.fr
Labex UCN@Sophia – Feb. 18th 2016
Labex UCN@Sophia – F. Desprez
Acknowledgements
Feb. 18, 2016 2
Gabriel Antoniu Inria (Rennes, Kerdata)
Olivier Beaumont Inria (Bordeaux, CEPAGE)
Alexandru Costan Inria (Rennes, Kerdata)
Thierry Coupaye Orange Labs Grenoble
Paulo Goncalvez Inria (Lyon, Dante)
Shadi Ibrahim Inria (Rennes, Kerdata)
Kate Keahey Argonne National Lab
Cristian Klein Umea University, Suède
Adrien Lèbre Inria et Ecole des Mines de Nantes (Ascola)
Laurent Lefèvre Inria, (Lyon, Avalon)
Ignacio Llorente Complutense University of Madrid, Espagne
Christine Morin Inria (Rennes, Myriads)
Martin Quinson ENS (Rennes, Myriads)
David Margery Inria (Rennes, Myriads)
Anne-Cécile Orgerie CNRS (Rennes, Myriads)
Manish Parashar Rutgers University
Christian Perez Inria (Lyon, Avalon)
Thierry Priol Inria (Rennes, Myriads)
Jonathan Rouzaud-Cornabas Insa (Lyon, Beagle)
Frédéric Suter CNRS/IN2P3 (Lyon, Avalon)
Patrick Valduriez Inria (Montpellier, Zenith)
Rich Wolsky University of California Santa Barbara, USA
Outline
• Introduction and Context
• Energy Issues
• Distributed Clouds
• Big Data
• Other issues
• Conclusions
3Labex UCN@Sophia – F. Desprez Feb. 18, 2016
INTRODUCTION AND CONTEXT
Context
Cloud computing has emerged as a “new” paradigm for many commercial
and scientific venues
• Starts to be widely adopted by the industries
• Many platforms and infrastructures available around the world
• Several offers for IaaS, PaaS, and SaaS platforms
• Public, private, community, and hybrid clouds
… But still many applications left that could benefit from such platforms
Several issues still needs to (better) addressed
• Elasticity, availability, self-configuration, heterogeneous computing and storage
capacities
• Several challenges remain to be addressed and transferred into industrial
products
• Energy management
• New applications (IoT)
5Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Clouds Essential Characteristics
• On-demand service
 No need of human interaction to get an access to storage and computation resources (Utility
Computing)
• Access through large scale networks
 Access to resources through networks from lightweight and heavy-weight clients (WAN, LAN,
Wireless)
• Resource Polling
 Resources (CPU, storage, memory, network) are taken from datacenters without (almost) locality
notion
• Elasticity
 Ressources can be allocated and freed in an elastic fashion based on the application needs (with an
"infinite" capacity)
• Measured service
 Possibility to monitor resource usage
• Pro
 Disponibility and extensibility
 Dynamicity
 Fault tolerance
 Resource mutualization
• Cons
 Heterogeneity
 No locality
 Application porting
 Security ?
6Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Transparency is the Key
“I don't care if my cloud computing architecture is powered by a grid, a mainframe, my
neighbour's desktop or an army of monkeys, so long as it's fast, cheap and secure.”
Sam Johnston, Sept. 2008
7Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Research Issues
• Explosion of the number of research work around Clouds and virtualization !
• Some research challenges
• Energy
• Service composition
• Service Level Agreement (SLA)
• Security
• Fault tolerance and recovery
• Infrastructure management
• Elastic management of resources
• (Big) Data management
• Seamless access to hybrid platforms
• Multi-clouds, Sky computing, federations, infrastructure distribution, edge computing
• New models
• economic, energy
• Application design and description
• New languages, new models
• Simulation and experimentation
• ...
8Labex UCN@Sophia – F. Desprez Feb. 18, 2016
ENERGY ISSUES
Laurent Lefèvre’s team in Avalon (LIP/ENS Lyon & INRIA)
Electrical consumption of ICT…. 2013… gwatt.net
Devices
Telecommunication networks = 83 GW
10Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Improving Energy Efficiency of Cloud Infrastructures
• Understanding the energy usage of large scale systems mixing virtual instances
of applications, physical IT resources, and physical infrastructures remains a
real challenge.
• How to profile the energy consumption of large sets of virtual machines (generic metrics,
benchmarks, and energy models)
• Analyzing tools and frameworks to support large scale energy efficient management of
resource
• Optimize the energy consumption of distributed infrastructures and service
compositions in the presence of ever more dynamic service applications
• Use of renewable energies
• Exploring the trade-off between energy saving and performance aspects in large-scale
distributed system
• Energy efficiency of storage systems and networks
11Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Energy Efficiency by knowing application and services or
not ?
• Exploring 2 different approaches
• With knowledge on the application and services
• Enable the user to choose the less consuming implementation of services
 Estimate the energy consumption of the different implementations
(protocols) of each service
• Without knowledge
• Allow some intelligence to reduce the energy usage
 Autonomically estimate the energy consumption of the HPC system in
order to apply green levers
12Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Improving EE with application expertise
• Considered services: resilience & data broadcasting
• 4 steps
• Service analysis, Measurements, Calibration, Estimation
• Helping users make the right choices depending on context and
parameters
M. Diouri, O. Glück, L. Lefèvre, and Franck Cappello. "ECOFIT: A Framework to Estimate Energy Consumption of Fault Tolerance Protocols during HPC executions",
CCGrid2013, the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013
13Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Without knowledge of applications and services ?
• HPC applications keep growing in complexity
• too many bugs in HPC applications already present, adding energy management
and considerations won’t help 😀
• Are HPC programmers ready for eco design of applications ?
• Applications can share the same infrastructure
• Optimizations made for saving energy considering some applications are likely to
impact the performance of others
• Instead of looking at applications and service ⇒ Focusing on the
infrastructure
• Detect and characterize system’s runtime behaviors/phases
• Optimize each subsystem (storage, memory, interconnect, CPU) accordingly
• Helping users to find the best service
14Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Without knowledge on applications
Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Jean-Marc Pierson, Patricia Stolf,
Georges Da-Costa. "Application-Agnostic Framework for Improving the Energy
Efficiency of Multiple HPC Subsystems", PDP2015 : 23rd Euromicro International
Conference on Parallel, Distributed and Network-based Processing, 2015.
• Irregular usage of resources
• Phase detection, characterization
• Power saving modes deployment
15Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Towards Energy Proportionality with Heterogeneous
machines
OBSERVATIONS [Barroso and Hölzle 2007]
Average server utilization between 10 and 50 %
→ Most inefficient region
No proportionality due to high idle consumption
→ Can be up to 50 % of peak power
PROPOSITION
Heterogeneous Infrastructure composed of machines with different characteristics in terms of performance
and energy consumption
• Classical servers → Only used at their most energy efficient region
• Low power processors → Reduce static costs
TECHNICAL CHALLENGES
- Application placement: Dynamically find the most suitable combinations of machines
- Infrastructure reconfiguration: Power On/Off machines at the right time
[Barroso and Hölzle, The Case for Energy Proportional Computing, IEEE Computer, 2007
16
Labex UCN@Sophia – F. Desprez Feb. 18, 2016 16
BIG
MEDIUM
LITTLE
Towards Energy Proportionality with Heterogeneous
machines
V. Villebonnet, G. Da Costa, L. Lefèvre, J-M. Pierson, P. Stolf, “Big, Medium, Little”: Reaching Energy Proportionality with Heterogeneous Computing Scheduler”,
Parallel Processing Letters, 25 (03), World Scientific Publishing, 2015.
17Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Towards Energy Proportionality with Heterogeneous machines
Application: Stateless Web Servers
Traces: Day of 98 WorldCup Website access
BIG only
Joules per request: 0,2268
Infrastructure utilization: 40,7%
Number of reconfiguration: 4
BML combination
Joules per request: 0,2155
Infrastructure utilization: 69,7%
Number of reconfiguration: 194
⇒ Infrastructure is dynamically reconfigured
to meet the load demand of the application
→ Energy consumption more proportional to
the load
18Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Virtual Machines and Energy efficient Clouds
Taking into account the energy consumption in the
scheduling process
• Energy and resource usage are highly fluctuating
• Large disparities between similar nodes
→ Decisions needs to be proactive based on recent and
historical activity
How to efficiently assign those tasks?
Combine
• A metric to balance performance and energy
consumption
• An interface to express tradeoffs between
users and providers requirements
• A manager of energy-related events
Results
• Up to 20% of energy savings in real-life
experimentations
Daniel Balouek-Thomert, Eddy Caron, Laurent Lefevre, "Energy-Aware Server Provisioning by Introducing Middleware-Level Dynamic Green Scheduling", HPPAC
2015: The 11th Workshop on High-Performance, Power-Aware Computing, May 2015
19Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Virtual Machines and Energy efficient Clouds
Combining energy with other criteria and constraints for a given problem
• Large spectrum of potential solutions
• NP-Hard problem
Daniel Balouek-Thomert, Arya K. Bhattacharya, Eddy Caron, Karunakar Gadireddy, Laurent Lefèvre, Minimizing energy and makespan concurrently in Cloud
Computing workloads using Multi-Objective Differential Evolution, under reviewing
Genetic Approach
• A model to capture affinities
between tasks and resources
• An algorithm that mimicks the
“survival of the fittest”: only
efficient servers are used through
time
• A learning engine that integrates
constraints
Strategies needs to be validated in terms of correctness and computing
time
20Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Network is Part of the Story: Dynamic, Energy Efficient,
Network Reconfiguration
Van Heddeghem et al. “Power Consumption Modeling in Optical Multilayer
Networks” PNET 24 (2), 86–102, 2012
Carpa R., Gluck O., Lefevre L. and Mignot J.-C., "Improving the energy
efficiency of software-defined backbone networks", Photonic Network
Communications, vol. 30(3), p. 337-347, 2015.
Network energy consumption
40 Gwatts in 2013 (source: gwatt.net)
A lot of improvement possible during off-peak hours
Especially in core networks
Re-route to improve the energy efficiency
Consumption reduced by up to 39 %
Hassidim, A et al.
“Network utilization: The
flow view”, INFOCOM,
2013 IEEE, 1429–1437,
2013
21Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Energy Efficient Networks
Carpa R., Assuncao M.,Gluck O., Lefevre L. and Mignot J.-C., "Responsive Algorithms
for Handling Load Surgesand Switching Links On in Green Networks” - Submitted to
ICC 2016
Simulations of high-speed core networks
• Rerouting in less than a second
• Improved energy efficiency compared to related
work (12 %)
• Same quality of service
NetFPGA + Openflow testbed (Work in progress)
• Targeting access networks
• Few, frequently changing, flows
• Cross-layer L3 / L4 optimizations for stability
22Labex UCN@Sophia – F. Desprez Feb. 18, 2016
DISTRIBUTED CLOUDS
Adrien Lebre’s team in ASCOLA (LINA/EMN Nantes & INRIA)
The current Situation
• Large off shore DCs
• To cope with the increasing UC demand while handling energy concerns
• But
• Juridiction concerns (data locality)
• Reliability
• Network overhead
• Localization is a key element to deliver efficient as well as sustainable
Utility Computing solutions
credits: coloandcloud.com
24Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Charles
Alice
Paula
Bob Dan
Sam
Rob
Duke
The Cloud from End-Users
25
Labex UCN@Sophia – F. Desprez Feb. 18, 2016 25
Charles
Alice
Paula
Bob
Dan
Sam
Rob
Duke
Internet
backbone
The Cloud in Reality
26
Labex UCN@Sophia – F. Desprez Feb. 18, 2016 26
Cloud Evolution
Not only mega data centres !
Courtesy to Thierry Coupaye (Orange)
27Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Trends for Next Generation Clouds
Centralized public clouds are in fact generally distributed over multiple
(mega) data centres for availability reasons
Verizon (©)
Orange (©)Microsoft (©)
Amazon (©)
Courtesy to Thierry Coupaye (Orange)
28Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Trends for Next Generation Clouds
Hybrid and community clouds are by nature distributed over multiple data
centres/clouds
Courtesy to Thierry Coupaye (Orange)
29Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Trends for Next Generation Clouds
Networks are getting « softwarized » and are converging with a distributed
vision of cloud computing.
3 examples
 Virtual CDN (vCDN)
 Cloud RAN (C-RAN)
 Mobile Edge Computing (MEC)
Courtesy to Thierry Coupaye (Orange)
30Labex UCN@Sophia – F. Desprez Feb. 18, 2016
The DISCOVERY Proposal
• DIStributed and COoperative framework to manage Virtual EnviRonments
autonomously
• Locality-based Utility Computing platform (“LUC-OS”)
• A fully distributed IaaS system and not a distributed system of IaaS systemS.
• We want to/must go further than high level cloud APIs (cross-cutting concerns such as
energy/security)
• Leverage P2P algorithms and self-* approaches
• Lots of scientific/technical challenges
• Cost of the network ?
• Partial view of the system ?
• Impact on the others VMs ?
• Management of VM images ?
• How to take into account locality aspects?
• Which software abstractions to make the development easier and more reliable
(distributed event programming)? …
Lèbre, A., J. Pastor, J., Bertier, M., Desprez, F., Rouzaud-Cornabas, J., Tedeschi, C., Anedda, P., Zanetti, G., Nou, R., Cortes, T., Riviere, E. and Ropars, T., Beyond The
Cloud, How Should Next Generation Utility Computing Infrastructures Be Designed? INRIA Research Report 8348, Aug. 2013.
31Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Ali
ce
Duke
DISCOVERY Network
DISCOVERY Network
Paula
DISCOVERY Network
DISCOVERY Network
DISCOVERY Network
DISCOVERY Network
Tom
DISCOVERY Network
DISCOVERY Network
Charles
Bob Dan
Sam
Rob
DISCOVERY Network
The DISCOVERY Initiative
32
Labex UCN@Sophia – F. Desprez Feb. 18, 2016 32
Beyond the Clouds, the DISCOVERY Initiative
Locality-based UC infrastructures / Fog / Edge
A promising way to deliver highly efficient and sustainable UC services is to provide UC platforms as close
as possible to the end-users.
http://www.renater.fr/raccourci?lang=fr
33Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Beyond the Clouds, the DISCOVERY Initiative
• Leveraging network backbones
• Extend any point of presence of network backbones with UC servers (from network hubs
up to major DSLAMs that are operated by telecom companies and network institutions)
• Leveraging wireless backbones
Paula
Bob
Alice
Duke
Charles
Pam
Bob
core backbone
DISCOVERY
DISCOVERY
Network
DISCOVERY
Network
DISCOVERY
Network
DISCOVERY
Network
DISCOVERY
Network
34Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Would OpenStack be the solution?
• Do not reinvent the wheel …
• OpenStack
• Open source IaaS manager with a large community
• Composed of several services dedicated to each aspect of a cloud
35Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Distributing OpenStack
• Services collaborate through
• A messaging queue
• A SQL database
• Alternate solutions exists for storing states over a highly distributed infrastructure ⇒ NoSQL DB
• Few proposals to federate/operate distinct OpenStack DCs
• ‘Flat’ approach Hierarchical approaches
http://beyondtheclouds.github.io/dcc.html
36Labex UCN@Sophia – F. Desprez Feb. 18, 2016
ROME
• Relational Object Mapping Extension for key/value stores
• Jonathan Pastor’s Phd
• Enables the query of key/value store DB with the same interface as SQLAlchemy
• Enables Nova OpenStack to switch to a KVS without being too intrusive
• The KVS is clustered on controllers
• Compute nodes connect to the Key/value cluster
Non-Relational
Key/Value
DB
Relational
Nova
Network
Nova
Compute
Nova
Scheduler
Nova
Conductor
db.api
MySQL
DB
https://github.com/badock/rome
37Labex UCN@Sophia – F. Desprez Feb. 18, 2016
The DISCOVERY INITIATIVE PROS AND CONS
• Pro
• Locality (jurisdiction concerns, latency-aware apps, minimize network overhead)
• Reliability/redundancy (no critical point/location/center)
• The infrastructure is naturally distributed throughout multiple areas
• Lead time to delivery
• Leverage current PoPs and extend them according to UC demands
• Energy footprint (on-going investigations with RENATER)
• Bring back part of the revenue to NRENs/Telcos
• Cons
• Security concerns (in terms of who can access to the PoPs)
• Operate a fully IaaS in a unified but distributed manner at WAN level
• Not suited for all kinds of applications : Large tightly coupled HPC workloads 50
nodes/1000 cores, 200 nodes / 4000 cores (5 racks), so 1000 nodes in one PoP does
not look realistic …
• Peering agreement / economic model between network operators
http://beyondtheclouds.github.io/
38Labex UCN@Sophia – F. Desprez Feb. 18, 2016
BIG DATA
Gabriel Antoniu’s team KERDATA (IRISA & INRIA)
Data Processing, Big Data
• Huge amount of data to be moved and processed
• LHC, simulations, genomics, astrophysics, social networks, sensors, …
• Heterogeneity in their storage (DB, files, …) and processing (cleaning, transformation,
analysis, search, indexing, visualization, ...)
• Challenges
• Resources issues
• Fault tolerance and recovery, energy management
• Handling complex distributed workflows at a large scale (computation and data transfers and
replications)
• Resource management (computation, storage, network), solutions interoperability
• Describing these workflows
• Meta-data management
• Data provenance
• Which transformations were applied
• Programming next generation applications
• Which langage for which application
• Strong relations with resource management systems
• Performance and transparency
• Genericity
Sakr, S. Liu, A., Batista, D.M., Alomari, M., A Survey of Large Scale Data Management Approaches in Cloud Environments, IEEE Communications Surveys and
Tutorials, 2011.
Middleton A.M., Data-Intensive Technologies for Cloud Computing, Handbook of Cloud Computing, Springer, 83-135, 2010.
http://research.microsoft.com/en-us/collaboration/fourthparadigm/
40Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Beyond Hadoop: BlobSeer
Scalable Storage for Data-Intensive Analytics
Started in 2008, 6 PhD (Gilles Kahn/SPECIF PhD Thesis Award in 2011)
Main goal: optimized for concurrent accesses under heavy concurrency
Three key ideas
- Decentralized metadata management
- Lock-free concurrent writes (enabled by versioning)
- Data and metadata “patching” rather than updating
A back-end for higher-level data management systems
- Highly scalable distributed file systems
- Storage for cloud services
Approach
- Design and implementation of distributed algorithms
- Experiments on the Grid’5000 testbed
- Validation with “real” apps on “real” platforms: IBM clouds, Microsoft Azure, OpenNebula
- Results on Grid’5000: BlobSeer improves Hadoop by 35% (execution time)
http://blobseer.gforge.inria.fr/
B. Nicolae, G. Antoniu, L. Bougé, D. Moise, A. Carpen-Amarie. “BlobSeer: Next Generation Data Management for Large Scale Infrastructures”, in: Journal of
Parallel and Distributed Computing, February 2011, vol. 71, no 2, pp. 169-184.
41Labex UCN@Sophia – F. Desprez Feb. 18, 2016
BlobSeer on Commercial Clouds
The A-Brain Microsoft Research – Inria Project
p( ),
Genetic dataBrain image
Y
q~105-6
N~2000
X
p~106
– Anatomical MRI
– Functional MRI
– Diffusion MRI
– DNA array (SNP/CNV)
– gene expression data
– others...
• TomusBlobs storage (based on BlobSeer)
• Processing approach: MapReduce
• Gain / Blobs Azure : 45%
• Scalability : 1000 cores
http://www.msr-inria.fr/projects/a-brain/
• KerData, PARIETAL teams at INRIA
• European Microsoft Innovation Center (Aachen)
42Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Executing the A-Brain Application at Large-Scale
• The TomusBlobs data-storage layer developed within the A-Brain project was
demonstrated to scale up to 1,000 cores on 3 Azure data centers (from EU, US)
• Gain compared to Azure BLOBs: close to 50%
• Experiment duration: ~ 14 days
• More than 210,000 hours of computation used
• Cost of the experiments: 20,000 euros (VM price, storage, outbound traffic)
• 28,000 map jobs (each lasting about 2 hours) and ~600 reduce jobs
Scientific Discovery:
Provided the first statistical
evidence of the heritability of
functional signals in a failed stop
task in basal ganglia
B. Da Mota, R. Tudoran, A. Costan, G. Varoquaux, G. Brasche, P. J. Conrod, H. Lemaitre, T. Paus, M. Rietschel, V. Frouin, J.-B. Poline, G. Antoniu, B. Thirion. Machine
Learning Patterns for Neuroimaging-Genetic Studies in the Cloud, in: Frontiers in Neuroinformatics, vol. 8 , April 2014.
43Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Going Further: Managing Metadata for Geo-Distributed Workflows
The Z-CloudFlow Microsoft Research – Inria Project
• Multisite cloud = a cloud with multiple data
centers
• Each with its own cluster, data and programs
• Matches well the requirements of scientific apps
• Goal
• Investigate approaches to metadata
management integrated with workflow
execution engine to support multi-site
scheduling
44Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Four Strategies
Centralized
• Baseline
Replicated
• Local metadata accesses
• Synchronization agent
Decentralized
Non-replicated
• Scattered
metadata
across sites
• DHT-based
Decentralized
Replicated
• Metadata stored
locally and replicated
to a remote location
(using hashing)
45Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Matching strategies to workflows
• Centralized
• Small scale
• Replicated
• Intensive computations
• Large files
• Decentralized approaches
• A large number of small files
• Non-replicated
• Parallel jobs
• Replicated
• For sequential, tightly
dependent jobs, data
available locally
L. Pineda-Morales, A. Costan, G. Antoniu. « Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows », in: CLUSTER 2015 - IEEE
International Conference on Cluster Computing, Chicago, United States, September 2015.
46Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Failure-Aware Scheduling in Hadoop
In large-scale cloud node failures are inevitable
• 1000 machine failures in the 1st year of Google cluster*
• 10% -15% job failure rate in a CMU clusters
Failure recovery in Hadoop
• Hadoop re-executes the tasks of failed machines
• Waits uncertain amount of time for a free slot
• Ignores the data locality of the recovery tasks
*J. Dean, “Large-scale distributed systems at Google: Current systems and future directions" in keynote speech at the 3rd ACM SIGOPS International Workshop on
Large Scale Distributed Systems and Middleware, 2009.
47Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Chronos: a Failure-aware scheduler
• Takes early actions upon failures
• Employs work-conserving preemption technique
• Considers local execution of recovery tasks
• Independent of scheduling policy and increases performance (10-20%) over
state-of-the-art Hadoop schedulers
• It reduces the waiting time of recovery tasks from 46 seconds to 1.5 seconds
on average
O. Yildiz, S. Ibrahim, T.A. Phuong, G. Antoniu. “Chronos: Failure-aware scheduling in shared Hadoop clusters”, The 2015 IEEE International Conference on Big
Data (BigData 2015), Nov 2015.
48Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Explore the impact of DVFS in Hadoop clusters
There is a significant potential of energy saving by scaling down the CPU frequency
when peak CPU is not needed
Diversity of MapReduce
applications
Multiple phases of MapReduce
application
Disk I/O CPU Disk I/O Network
CPU load is
high (98%)
during almost
75% of the job
running
CPU load is
high(80%)
during only
15% of the job
running
S. Ibrahim, T-D Phan, A. Carpen-Amarie, H-E. Chihoub, D. Moise, G. Antoniu, “Governing Energy Consumption in Hadoop through CPU Frequency Scaling: An
Analysis”, Future Generation Computer Systems, Volume 54, January 2016
49Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Mitigating Stragglers in Hadoop
Performance variation is common in the Cloud
• Stragglers can severely increase the execution time
• Hadoop lunches another copy of the straggler with the hope that it will finish
earlier (i.e., speculation)
Task 1Node1
Node2 Task 2
Task 3Node3
Node4 Task 4
time
Straggler
T-D Phan, S. Ibrahim, G. Antoniu, L. Bouge, “On Understanding the Energy Impact of Speculative Execution in Hadoop”, The 2015 IEEE InternationalConference on Green
Computing and Communications (GreenCom 2015), Dec 2015
50Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Speculation benefit in Heterogeneous-environment
0
2
4
6
8
10
12
14
16
18
Executiontime(103
s)
Speculation disabled
Speculation enabled
0
5
10
15
20
25
Energyconsumption(MJ)
-47%
-28%
The energy reduction is not proportional to the execution time improvement. This
strongly depends on the extra power due to the extra resource consumption
0
10
20
30
40
50
60
70
80
90
CloudBurst
Averagepowerconsumption
0
10
20
30
40
50
60
70
80
90
CloudBurst Sort WordCount
Averagepowerconsumption
Speculation disabled
Speculation enabled
+32%
51Labex UCN@Sophia – F. Desprez Feb. 18, 2016
New approaches for Data Management in the Cloud
• No “one size fits all” solution
• NoSQL, key-value data stores (e.g. Bigtable, HBase, Cassandra, HyperTable), graph
databases (e.g. Neo4j, Pregel), array data stores (e.g. SciDB), analytical Cloud databases
(e.g. Greenplum and Vertica), analytical Cloud frameworks (e.g. Hadoop Map-Reduce,
Cloudera Impala), document databases (e.g. MongoDB, CouchBase), data stream
management systems (e.g. Storm)
• Wide diversification of data store interfaces and the loss of a common
programming paradigm
• Design of multistore data management systems
• Data management in multisite Clouds
• Deviation between Cloud and HPC storage infrastructures
• New I/O mechanisms to guide I/O-systems in order to deliver the best performance
• Distributed file systems with Cloud capabilities such as elasticity
• development of a unified architecture for HPC and Cloud storage back-ends
52Labex UCN@Sophia – F. Desprez Feb. 18, 2016
OTHER ISUES
New Models for Cloud Application Description
• Adaptation to various kinds of hardware resources is mandatory
• Interesting approach: distinguish the description of the various possible
configurations (application architecture description) from the quality of
service looked for a particular execution (minimize cost, maximize
performance, respect a deadline, etc)
• Challenges
• Description of the structure of the application
• Description of the expected behavior
• Several issues
• Take data into account
• How to model application workflows
• New languages for non-functional objectives (budget, performance/dead-line,
security, data)
54Labex UCN@Sophia – F. Desprez Feb. 18, 2016
More Efficient Techniques and Algorithms for Cloud Resource Allocation
• Large number of resources to be used by applications
• Hardware heterogeneity including new resources (GPU, FPGA, …)
• Difficult for users to choose the most appropriate hardware configuration
• Need of performance models for applications
• Seamless choice of resources following user demands and resource availability
• SLAs put the emphasis on providers to provide robust allocations despite the large
number of hardware failures
• Include a reliability constraint
• Use replication to cope with faults and failures
• Take dynamicity and elasticity into account
• Allocation problems adapted to Cloud constraints (CPU, memory, disks, network,
complex topologies)
• Design of sophisticated algorithms with guarantees on their reliability
• Put the optimization on impactful jobs
• Efficient representation of the search space and theoretical analysis
55Labex UCN@Sophia – F. Desprez Feb. 18, 2016
New Approach to Integrate Cloud, IoT, CPS, and Mobile Devices
Cloud systems are now the cornerstones of the Internet ecosystem allowing
any connected devices, such as things, Smartphones, tablets, set-top boxes
and PCs, to store and share information in a seamless way
• But
• centralized Internet, increasing impact of failures on Internet users, loss of control on
citizen's private data, vendor lock-in from hardware and software providers, massive leaks of
sensible data when Cloud systems are under attack and surveillance by national security
agencies
• Ideas
• More decentralized Cloud infrastructures, i.e. fog computing, taking into account the rapid
evolution of very cheap and low-power consumption hardware
• Use nano-PC based on Smartphone technologies (ARM based processors)
• Many challenges
• Seamless integration of nano-PC within Cloud infrastructures,
• New Cloud services combining nano-PCs and data-centers,
• Server-less sharing, security, and privacy
56Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Promoting simulations to Investigate Cloud Concerns
• Difficult for users to select the Cloud services that best meet their
requirements (in terms of performance, cost, energy, etc)
• Preliminary evaluations with partial deployments on real platforms such as Amazon Web
Service or Microsoft Azure
• Investigation of new hardware and new software mechanisms for Cloud
providers in order to stay competitive
• Provision a part of the Cloud to evaluate the benefits of such change
• Use of simulation for these scenarios
• Reduction of development cost
• Controlling of parameters such as network latency, reliability, scalability, etc
• Development of accurate and versatile simulation framework
57Labex UCN@Sophia – F. Desprez Feb. 18, 2016
• Scientific instrument for the study of large scale distributed computing
• Main Features
• Versatile: Grid, P2P, HPC, Volunteer Computing, ..., Clouds
http://infra-songs.gforge.inria.fr
• Valid: Accuracy limits studied and pushed further for years
• Scalable and Fast (despite precise models)
• Usable: Tooling (generators, runner, vizu); Open-source, Portable, ...
• On-Going work
• SCHIaaS: Simulation of Clouds and Hybrid IaaS
• Adding virtualization capabilities into SimGrid (VM migration, boot, …)
SimGrid: Simulator of Distributed Applications
58Labex UCN@Sophia – F. Desprez Feb. 18, 2016
simgrid.gforge.inria.fr/
GRID’5000 – Real IaaS for Researchers
• Testbed for research on distributed systems
• Born from the observation that we need a better and larger testbed
• HPC, Grids, P2P systems and more recently Cloud computing
Adding virtualization capabilities into Grid’5000 INRIA RR8026/Jul 2012
• A complete access to the nodes’ hardware in an exclusive mode
(from one node to the whole infrastructure)
• Current status
• 9 sites,1195 machines, 8184 cores
• Diverse technologies/resources
(Intel, AMD, Myrinet, Infiniband, two GPU clusters, energy probes)
• Ready to use OpenStack distribution
• Last significant experiment
• Dynamic scheduling of 10K VMs across 4 sites
59Labex UCN@Sophia – F. Desprez Feb. 18, 2016
https://www.grid5000.fr/
CONCLUSIONS
Conclusion
• Cloud Computing technology is changing every day New features, new
requirements (IaaS ++ services)
• Many research issues addressed in our research labs that should/will
be transfered in tomorrow’s cloud infrastructures
• Connection between “classical” Cloud infrastructures to next generation
platforms (IoT)
• Distributed Cloud Computing is happening !
• Dist. CC workshop (UCC 2013, SIGCOMM 2014/2015)FOG Computing
workshop (collocated with IEEE ICC 2013), IEEE CloudNet …
• How developers should develop new applications to benefit from such
geographically distributed infrastructures
61Labex UCN@Sophia – F. Desprez Feb. 18, 2016
References
• European Commission report on The Future of Cloud Computing
 http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf
• A Roadmap for Advanced Cloud Technologies under H2020, European Commission, Recommandations by
the Cloud Expert Group, Digital Agenda for Europe, Dec. 2012
•Report on the public consultation for H2020 Work Programme 2016-17: Cloud Computing and Software
 ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?doc_id=8161
• Key Challenges in Cloud Computing, Enabling the Future Internet of Services, Rafael Moreno-Vozmediano,
Ruben S. Montero, and Ignacio M. Llorente, IEEE INTERNET COMPUTING, Jul 2013
• NIST Cloud Strategy and Innovation Blog (I. Llorrente)
 http://blog.cloudplan.org/
• Above the Clouds: A Berkeley View of Cloud Computing
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html
•DRAFT Cloud Computing Synopsis and Recommendations, NIST,
 http://csrc.nist.gov/publications/drafts/800-146/Draft-NIST-SP800-146.pdf
• SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
 http://www.sienainitiative.eu/Repository/FileScaricati/8ee3587a-f255-4e5c-aed4-9c2dc7b626f6.pdf
• The Magellan Report on Cloud Computing for Science, Yellick et al., Dec. 2011
 http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_final_report.pdf
• Livre blanc sur le calcul intensif, Comité d’orientation pour le calcul intensif (Cocin) du CNRS, 2012
 http://www.cnrs.fr/ins2i/IMG/pdf/Livre_blanc_-_derniere_version.pdf
• Synergistic Challenges in Data-Intensive Science and Exascale Computing, DOE ASCAC Data Subcommittee
Report, March 2013
• Integration of Cloud computing and Internet of Things: A survey, A. Botta, W. de Donato, V. Persico, A.
Pescapé, Future Generation Computer Systems, 56 (2016)
62Labex UCN@Sophia – F. Desprez Feb. 18, 2016
QUESTIONS ?
63Labex UCN@Sophia – F. Desprez Feb. 18, 2016

More Related Content

What's hot

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
Dan Taylor
 
Grid computing the grid
Grid computing the gridGrid computing the grid
Grid computing the grid
Jivan Nepali
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Citrix
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
balmanme
 

What's hot (20)

Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data Science
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
 
The Science DMZ
The Science DMZThe Science DMZ
The Science DMZ
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applications
 
2010 Future of Advanced Computing
2010 Future of Advanced Computing2010 Future of Advanced Computing
2010 Future of Advanced Computing
 
Grid computing the grid
Grid computing the gridGrid computing the grid
Grid computing the grid
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
 
Intelligent Cloud Automation
Intelligent Cloud AutomationIntelligent Cloud Automation
Intelligent Cloud Automation
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
grid computing
grid computinggrid computing
grid computing
 
DGterzo
DGterzoDGterzo
DGterzo
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
CINET: A CyberInfrastructure for Network Science
CINET: A CyberInfrastructure for Network ScienceCINET: A CyberInfrastructure for Network Science
CINET: A CyberInfrastructure for Network Science
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 

Similar to Challenges and Issues of Next Cloud Computing Platforms

Prospering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and DigitalisationProspering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and Digitalisation
KTN
 
resume v 5.0
resume v 5.0resume v 5.0
resume v 5.0
Ye Xu
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 

Similar to Challenges and Issues of Next Cloud Computing Platforms (20)

Cloud computing 14 cloud conceptual model grid to cloud
Cloud computing 14  cloud conceptual model grid to cloudCloud computing 14  cloud conceptual model grid to cloud
Cloud computing 14 cloud conceptual model grid to cloud
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
AI Super computer update
AI Super computer update AI Super computer update
AI Super computer update
 
Prospering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and DigitalisationProspering from the Energy Revolution: Six in Sixty - Data and Digitalisation
Prospering from the Energy Revolution: Six in Sixty - Data and Digitalisation
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
WSSSPE: Building communities
WSSSPE: Building communitiesWSSSPE: Building communities
WSSSPE: Building communities
 
resume v 5.0
resume v 5.0resume v 5.0
resume v 5.0
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
AI for Science
AI for ScienceAI for Science
AI for Science
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 

More from Frederic Desprez

Experimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsExperimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and Instruments
Frederic Desprez
 
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Frederic Desprez
 

More from Frederic Desprez (13)

(R)evolution of the computing continuum - A few challenges
(R)evolution of the computing continuum  - A few challenges(R)evolution of the computing continuum  - A few challenges
(R)evolution of the computing continuum - A few challenges
 
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
 
SILECS/SLICES
SILECS/SLICESSILECS/SLICES
SILECS/SLICES
 
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceSILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
 
Experimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsExperimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and Instruments
 
Cloud Computing: De la recherche dans les nuages ?
Cloud Computing: De la recherche dans les nuages ?Cloud Computing: De la recherche dans les nuages ?
Cloud Computing: De la recherche dans les nuages ?
 
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
 
Les clouds, du buzz à la vraie science
Les clouds, du buzz à la vraie scienceLes clouds, du buzz à la vraie science
Les clouds, du buzz à la vraie science
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
Multiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical MiddlewareMultiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical Middleware
 
Les Clouds: Buzzword ou révolution technologique
Les Clouds: Buzzword ou révolution technologiqueLes Clouds: Buzzword ou révolution technologique
Les Clouds: Buzzword ou révolution technologique
 
Avenir des grilles - F. Desprez
Avenir des grilles - F. DesprezAvenir des grilles - F. Desprez
Avenir des grilles - F. Desprez
 
Cloud introduction
Cloud introductionCloud introduction
Cloud introduction
 

Recently uploaded

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Recently uploaded (20)

Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 

Challenges and Issues of Next Cloud Computing Platforms

  • 1. Challenges and Issues of Next Cloud Computing Platforms Frédéric Desprez Frederic.Desprez@inria.fr Labex UCN@Sophia – Feb. 18th 2016
  • 2. Labex UCN@Sophia – F. Desprez Acknowledgements Feb. 18, 2016 2 Gabriel Antoniu Inria (Rennes, Kerdata) Olivier Beaumont Inria (Bordeaux, CEPAGE) Alexandru Costan Inria (Rennes, Kerdata) Thierry Coupaye Orange Labs Grenoble Paulo Goncalvez Inria (Lyon, Dante) Shadi Ibrahim Inria (Rennes, Kerdata) Kate Keahey Argonne National Lab Cristian Klein Umea University, Suède Adrien Lèbre Inria et Ecole des Mines de Nantes (Ascola) Laurent Lefèvre Inria, (Lyon, Avalon) Ignacio Llorente Complutense University of Madrid, Espagne Christine Morin Inria (Rennes, Myriads) Martin Quinson ENS (Rennes, Myriads) David Margery Inria (Rennes, Myriads) Anne-Cécile Orgerie CNRS (Rennes, Myriads) Manish Parashar Rutgers University Christian Perez Inria (Lyon, Avalon) Thierry Priol Inria (Rennes, Myriads) Jonathan Rouzaud-Cornabas Insa (Lyon, Beagle) Frédéric Suter CNRS/IN2P3 (Lyon, Avalon) Patrick Valduriez Inria (Montpellier, Zenith) Rich Wolsky University of California Santa Barbara, USA
  • 3. Outline • Introduction and Context • Energy Issues • Distributed Clouds • Big Data • Other issues • Conclusions 3Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 5. Context Cloud computing has emerged as a “new” paradigm for many commercial and scientific venues • Starts to be widely adopted by the industries • Many platforms and infrastructures available around the world • Several offers for IaaS, PaaS, and SaaS platforms • Public, private, community, and hybrid clouds … But still many applications left that could benefit from such platforms Several issues still needs to (better) addressed • Elasticity, availability, self-configuration, heterogeneous computing and storage capacities • Several challenges remain to be addressed and transferred into industrial products • Energy management • New applications (IoT) 5Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 6. Clouds Essential Characteristics • On-demand service  No need of human interaction to get an access to storage and computation resources (Utility Computing) • Access through large scale networks  Access to resources through networks from lightweight and heavy-weight clients (WAN, LAN, Wireless) • Resource Polling  Resources (CPU, storage, memory, network) are taken from datacenters without (almost) locality notion • Elasticity  Ressources can be allocated and freed in an elastic fashion based on the application needs (with an "infinite" capacity) • Measured service  Possibility to monitor resource usage • Pro  Disponibility and extensibility  Dynamicity  Fault tolerance  Resource mutualization • Cons  Heterogeneity  No locality  Application porting  Security ? 6Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 7. Transparency is the Key “I don't care if my cloud computing architecture is powered by a grid, a mainframe, my neighbour's desktop or an army of monkeys, so long as it's fast, cheap and secure.” Sam Johnston, Sept. 2008 7Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 8. Research Issues • Explosion of the number of research work around Clouds and virtualization ! • Some research challenges • Energy • Service composition • Service Level Agreement (SLA) • Security • Fault tolerance and recovery • Infrastructure management • Elastic management of resources • (Big) Data management • Seamless access to hybrid platforms • Multi-clouds, Sky computing, federations, infrastructure distribution, edge computing • New models • economic, energy • Application design and description • New languages, new models • Simulation and experimentation • ... 8Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 9. ENERGY ISSUES Laurent Lefèvre’s team in Avalon (LIP/ENS Lyon & INRIA)
  • 10. Electrical consumption of ICT…. 2013… gwatt.net Devices Telecommunication networks = 83 GW 10Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 11. Improving Energy Efficiency of Cloud Infrastructures • Understanding the energy usage of large scale systems mixing virtual instances of applications, physical IT resources, and physical infrastructures remains a real challenge. • How to profile the energy consumption of large sets of virtual machines (generic metrics, benchmarks, and energy models) • Analyzing tools and frameworks to support large scale energy efficient management of resource • Optimize the energy consumption of distributed infrastructures and service compositions in the presence of ever more dynamic service applications • Use of renewable energies • Exploring the trade-off between energy saving and performance aspects in large-scale distributed system • Energy efficiency of storage systems and networks 11Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 12. Energy Efficiency by knowing application and services or not ? • Exploring 2 different approaches • With knowledge on the application and services • Enable the user to choose the less consuming implementation of services  Estimate the energy consumption of the different implementations (protocols) of each service • Without knowledge • Allow some intelligence to reduce the energy usage  Autonomically estimate the energy consumption of the HPC system in order to apply green levers 12Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 13. Improving EE with application expertise • Considered services: resilience & data broadcasting • 4 steps • Service analysis, Measurements, Calibration, Estimation • Helping users make the right choices depending on context and parameters M. Diouri, O. Glück, L. Lefèvre, and Franck Cappello. "ECOFIT: A Framework to Estimate Energy Consumption of Fault Tolerance Protocols during HPC executions", CCGrid2013, the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013 13Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 14. Without knowledge of applications and services ? • HPC applications keep growing in complexity • too many bugs in HPC applications already present, adding energy management and considerations won’t help 😀 • Are HPC programmers ready for eco design of applications ? • Applications can share the same infrastructure • Optimizations made for saving energy considering some applications are likely to impact the performance of others • Instead of looking at applications and service ⇒ Focusing on the infrastructure • Detect and characterize system’s runtime behaviors/phases • Optimize each subsystem (storage, memory, interconnect, CPU) accordingly • Helping users to find the best service 14Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 15. Without knowledge on applications Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Jean-Marc Pierson, Patricia Stolf, Georges Da-Costa. "Application-Agnostic Framework for Improving the Energy Efficiency of Multiple HPC Subsystems", PDP2015 : 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2015. • Irregular usage of resources • Phase detection, characterization • Power saving modes deployment 15Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 16. Towards Energy Proportionality with Heterogeneous machines OBSERVATIONS [Barroso and Hölzle 2007] Average server utilization between 10 and 50 % → Most inefficient region No proportionality due to high idle consumption → Can be up to 50 % of peak power PROPOSITION Heterogeneous Infrastructure composed of machines with different characteristics in terms of performance and energy consumption • Classical servers → Only used at their most energy efficient region • Low power processors → Reduce static costs TECHNICAL CHALLENGES - Application placement: Dynamically find the most suitable combinations of machines - Infrastructure reconfiguration: Power On/Off machines at the right time [Barroso and Hölzle, The Case for Energy Proportional Computing, IEEE Computer, 2007 16 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 16 BIG MEDIUM LITTLE
  • 17. Towards Energy Proportionality with Heterogeneous machines V. Villebonnet, G. Da Costa, L. Lefèvre, J-M. Pierson, P. Stolf, “Big, Medium, Little”: Reaching Energy Proportionality with Heterogeneous Computing Scheduler”, Parallel Processing Letters, 25 (03), World Scientific Publishing, 2015. 17Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 18. Towards Energy Proportionality with Heterogeneous machines Application: Stateless Web Servers Traces: Day of 98 WorldCup Website access BIG only Joules per request: 0,2268 Infrastructure utilization: 40,7% Number of reconfiguration: 4 BML combination Joules per request: 0,2155 Infrastructure utilization: 69,7% Number of reconfiguration: 194 ⇒ Infrastructure is dynamically reconfigured to meet the load demand of the application → Energy consumption more proportional to the load 18Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 19. Virtual Machines and Energy efficient Clouds Taking into account the energy consumption in the scheduling process • Energy and resource usage are highly fluctuating • Large disparities between similar nodes → Decisions needs to be proactive based on recent and historical activity How to efficiently assign those tasks? Combine • A metric to balance performance and energy consumption • An interface to express tradeoffs between users and providers requirements • A manager of energy-related events Results • Up to 20% of energy savings in real-life experimentations Daniel Balouek-Thomert, Eddy Caron, Laurent Lefevre, "Energy-Aware Server Provisioning by Introducing Middleware-Level Dynamic Green Scheduling", HPPAC 2015: The 11th Workshop on High-Performance, Power-Aware Computing, May 2015 19Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 20. Virtual Machines and Energy efficient Clouds Combining energy with other criteria and constraints for a given problem • Large spectrum of potential solutions • NP-Hard problem Daniel Balouek-Thomert, Arya K. Bhattacharya, Eddy Caron, Karunakar Gadireddy, Laurent Lefèvre, Minimizing energy and makespan concurrently in Cloud Computing workloads using Multi-Objective Differential Evolution, under reviewing Genetic Approach • A model to capture affinities between tasks and resources • An algorithm that mimicks the “survival of the fittest”: only efficient servers are used through time • A learning engine that integrates constraints Strategies needs to be validated in terms of correctness and computing time 20Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 21. Network is Part of the Story: Dynamic, Energy Efficient, Network Reconfiguration Van Heddeghem et al. “Power Consumption Modeling in Optical Multilayer Networks” PNET 24 (2), 86–102, 2012 Carpa R., Gluck O., Lefevre L. and Mignot J.-C., "Improving the energy efficiency of software-defined backbone networks", Photonic Network Communications, vol. 30(3), p. 337-347, 2015. Network energy consumption 40 Gwatts in 2013 (source: gwatt.net) A lot of improvement possible during off-peak hours Especially in core networks Re-route to improve the energy efficiency Consumption reduced by up to 39 % Hassidim, A et al. “Network utilization: The flow view”, INFOCOM, 2013 IEEE, 1429–1437, 2013 21Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 22. Energy Efficient Networks Carpa R., Assuncao M.,Gluck O., Lefevre L. and Mignot J.-C., "Responsive Algorithms for Handling Load Surgesand Switching Links On in Green Networks” - Submitted to ICC 2016 Simulations of high-speed core networks • Rerouting in less than a second • Improved energy efficiency compared to related work (12 %) • Same quality of service NetFPGA + Openflow testbed (Work in progress) • Targeting access networks • Few, frequently changing, flows • Cross-layer L3 / L4 optimizations for stability 22Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 23. DISTRIBUTED CLOUDS Adrien Lebre’s team in ASCOLA (LINA/EMN Nantes & INRIA)
  • 24. The current Situation • Large off shore DCs • To cope with the increasing UC demand while handling energy concerns • But • Juridiction concerns (data locality) • Reliability • Network overhead • Localization is a key element to deliver efficient as well as sustainable Utility Computing solutions credits: coloandcloud.com 24Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 25. Charles Alice Paula Bob Dan Sam Rob Duke The Cloud from End-Users 25 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 25
  • 26. Charles Alice Paula Bob Dan Sam Rob Duke Internet backbone The Cloud in Reality 26 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 26
  • 27. Cloud Evolution Not only mega data centres ! Courtesy to Thierry Coupaye (Orange) 27Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 28. Trends for Next Generation Clouds Centralized public clouds are in fact generally distributed over multiple (mega) data centres for availability reasons Verizon (©) Orange (©)Microsoft (©) Amazon (©) Courtesy to Thierry Coupaye (Orange) 28Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 29. Trends for Next Generation Clouds Hybrid and community clouds are by nature distributed over multiple data centres/clouds Courtesy to Thierry Coupaye (Orange) 29Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 30. Trends for Next Generation Clouds Networks are getting « softwarized » and are converging with a distributed vision of cloud computing. 3 examples  Virtual CDN (vCDN)  Cloud RAN (C-RAN)  Mobile Edge Computing (MEC) Courtesy to Thierry Coupaye (Orange) 30Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 31. The DISCOVERY Proposal • DIStributed and COoperative framework to manage Virtual EnviRonments autonomously • Locality-based Utility Computing platform (“LUC-OS”) • A fully distributed IaaS system and not a distributed system of IaaS systemS. • We want to/must go further than high level cloud APIs (cross-cutting concerns such as energy/security) • Leverage P2P algorithms and self-* approaches • Lots of scientific/technical challenges • Cost of the network ? • Partial view of the system ? • Impact on the others VMs ? • Management of VM images ? • How to take into account locality aspects? • Which software abstractions to make the development easier and more reliable (distributed event programming)? … Lèbre, A., J. Pastor, J., Bertier, M., Desprez, F., Rouzaud-Cornabas, J., Tedeschi, C., Anedda, P., Zanetti, G., Nou, R., Cortes, T., Riviere, E. and Ropars, T., Beyond The Cloud, How Should Next Generation Utility Computing Infrastructures Be Designed? INRIA Research Report 8348, Aug. 2013. 31Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 32. Ali ce Duke DISCOVERY Network DISCOVERY Network Paula DISCOVERY Network DISCOVERY Network DISCOVERY Network DISCOVERY Network Tom DISCOVERY Network DISCOVERY Network Charles Bob Dan Sam Rob DISCOVERY Network The DISCOVERY Initiative 32 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 32
  • 33. Beyond the Clouds, the DISCOVERY Initiative Locality-based UC infrastructures / Fog / Edge A promising way to deliver highly efficient and sustainable UC services is to provide UC platforms as close as possible to the end-users. http://www.renater.fr/raccourci?lang=fr 33Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 34. Beyond the Clouds, the DISCOVERY Initiative • Leveraging network backbones • Extend any point of presence of network backbones with UC servers (from network hubs up to major DSLAMs that are operated by telecom companies and network institutions) • Leveraging wireless backbones Paula Bob Alice Duke Charles Pam Bob core backbone DISCOVERY DISCOVERY Network DISCOVERY Network DISCOVERY Network DISCOVERY Network DISCOVERY Network 34Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 35. Would OpenStack be the solution? • Do not reinvent the wheel … • OpenStack • Open source IaaS manager with a large community • Composed of several services dedicated to each aspect of a cloud 35Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 36. Distributing OpenStack • Services collaborate through • A messaging queue • A SQL database • Alternate solutions exists for storing states over a highly distributed infrastructure ⇒ NoSQL DB • Few proposals to federate/operate distinct OpenStack DCs • ‘Flat’ approach Hierarchical approaches http://beyondtheclouds.github.io/dcc.html 36Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 37. ROME • Relational Object Mapping Extension for key/value stores • Jonathan Pastor’s Phd • Enables the query of key/value store DB with the same interface as SQLAlchemy • Enables Nova OpenStack to switch to a KVS without being too intrusive • The KVS is clustered on controllers • Compute nodes connect to the Key/value cluster Non-Relational Key/Value DB Relational Nova Network Nova Compute Nova Scheduler Nova Conductor db.api MySQL DB https://github.com/badock/rome 37Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 38. The DISCOVERY INITIATIVE PROS AND CONS • Pro • Locality (jurisdiction concerns, latency-aware apps, minimize network overhead) • Reliability/redundancy (no critical point/location/center) • The infrastructure is naturally distributed throughout multiple areas • Lead time to delivery • Leverage current PoPs and extend them according to UC demands • Energy footprint (on-going investigations with RENATER) • Bring back part of the revenue to NRENs/Telcos • Cons • Security concerns (in terms of who can access to the PoPs) • Operate a fully IaaS in a unified but distributed manner at WAN level • Not suited for all kinds of applications : Large tightly coupled HPC workloads 50 nodes/1000 cores, 200 nodes / 4000 cores (5 racks), so 1000 nodes in one PoP does not look realistic … • Peering agreement / economic model between network operators http://beyondtheclouds.github.io/ 38Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 39. BIG DATA Gabriel Antoniu’s team KERDATA (IRISA & INRIA)
  • 40. Data Processing, Big Data • Huge amount of data to be moved and processed • LHC, simulations, genomics, astrophysics, social networks, sensors, … • Heterogeneity in their storage (DB, files, …) and processing (cleaning, transformation, analysis, search, indexing, visualization, ...) • Challenges • Resources issues • Fault tolerance and recovery, energy management • Handling complex distributed workflows at a large scale (computation and data transfers and replications) • Resource management (computation, storage, network), solutions interoperability • Describing these workflows • Meta-data management • Data provenance • Which transformations were applied • Programming next generation applications • Which langage for which application • Strong relations with resource management systems • Performance and transparency • Genericity Sakr, S. Liu, A., Batista, D.M., Alomari, M., A Survey of Large Scale Data Management Approaches in Cloud Environments, IEEE Communications Surveys and Tutorials, 2011. Middleton A.M., Data-Intensive Technologies for Cloud Computing, Handbook of Cloud Computing, Springer, 83-135, 2010. http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 40Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 41. Beyond Hadoop: BlobSeer Scalable Storage for Data-Intensive Analytics Started in 2008, 6 PhD (Gilles Kahn/SPECIF PhD Thesis Award in 2011) Main goal: optimized for concurrent accesses under heavy concurrency Three key ideas - Decentralized metadata management - Lock-free concurrent writes (enabled by versioning) - Data and metadata “patching” rather than updating A back-end for higher-level data management systems - Highly scalable distributed file systems - Storage for cloud services Approach - Design and implementation of distributed algorithms - Experiments on the Grid’5000 testbed - Validation with “real” apps on “real” platforms: IBM clouds, Microsoft Azure, OpenNebula - Results on Grid’5000: BlobSeer improves Hadoop by 35% (execution time) http://blobseer.gforge.inria.fr/ B. Nicolae, G. Antoniu, L. Bougé, D. Moise, A. Carpen-Amarie. “BlobSeer: Next Generation Data Management for Large Scale Infrastructures”, in: Journal of Parallel and Distributed Computing, February 2011, vol. 71, no 2, pp. 169-184. 41Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 42. BlobSeer on Commercial Clouds The A-Brain Microsoft Research – Inria Project p( ), Genetic dataBrain image Y q~105-6 N~2000 X p~106 – Anatomical MRI – Functional MRI – Diffusion MRI – DNA array (SNP/CNV) – gene expression data – others... • TomusBlobs storage (based on BlobSeer) • Processing approach: MapReduce • Gain / Blobs Azure : 45% • Scalability : 1000 cores http://www.msr-inria.fr/projects/a-brain/ • KerData, PARIETAL teams at INRIA • European Microsoft Innovation Center (Aachen) 42Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 43. Executing the A-Brain Application at Large-Scale • The TomusBlobs data-storage layer developed within the A-Brain project was demonstrated to scale up to 1,000 cores on 3 Azure data centers (from EU, US) • Gain compared to Azure BLOBs: close to 50% • Experiment duration: ~ 14 days • More than 210,000 hours of computation used • Cost of the experiments: 20,000 euros (VM price, storage, outbound traffic) • 28,000 map jobs (each lasting about 2 hours) and ~600 reduce jobs Scientific Discovery: Provided the first statistical evidence of the heritability of functional signals in a failed stop task in basal ganglia B. Da Mota, R. Tudoran, A. Costan, G. Varoquaux, G. Brasche, P. J. Conrod, H. Lemaitre, T. Paus, M. Rietschel, V. Frouin, J.-B. Poline, G. Antoniu, B. Thirion. Machine Learning Patterns for Neuroimaging-Genetic Studies in the Cloud, in: Frontiers in Neuroinformatics, vol. 8 , April 2014. 43Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 44. Going Further: Managing Metadata for Geo-Distributed Workflows The Z-CloudFlow Microsoft Research – Inria Project • Multisite cloud = a cloud with multiple data centers • Each with its own cluster, data and programs • Matches well the requirements of scientific apps • Goal • Investigate approaches to metadata management integrated with workflow execution engine to support multi-site scheduling 44Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 45. Four Strategies Centralized • Baseline Replicated • Local metadata accesses • Synchronization agent Decentralized Non-replicated • Scattered metadata across sites • DHT-based Decentralized Replicated • Metadata stored locally and replicated to a remote location (using hashing) 45Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 46. Matching strategies to workflows • Centralized • Small scale • Replicated • Intensive computations • Large files • Decentralized approaches • A large number of small files • Non-replicated • Parallel jobs • Replicated • For sequential, tightly dependent jobs, data available locally L. Pineda-Morales, A. Costan, G. Antoniu. « Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows », in: CLUSTER 2015 - IEEE International Conference on Cluster Computing, Chicago, United States, September 2015. 46Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 47. Failure-Aware Scheduling in Hadoop In large-scale cloud node failures are inevitable • 1000 machine failures in the 1st year of Google cluster* • 10% -15% job failure rate in a CMU clusters Failure recovery in Hadoop • Hadoop re-executes the tasks of failed machines • Waits uncertain amount of time for a free slot • Ignores the data locality of the recovery tasks *J. Dean, “Large-scale distributed systems at Google: Current systems and future directions" in keynote speech at the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware, 2009. 47Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 48. Chronos: a Failure-aware scheduler • Takes early actions upon failures • Employs work-conserving preemption technique • Considers local execution of recovery tasks • Independent of scheduling policy and increases performance (10-20%) over state-of-the-art Hadoop schedulers • It reduces the waiting time of recovery tasks from 46 seconds to 1.5 seconds on average O. Yildiz, S. Ibrahim, T.A. Phuong, G. Antoniu. “Chronos: Failure-aware scheduling in shared Hadoop clusters”, The 2015 IEEE International Conference on Big Data (BigData 2015), Nov 2015. 48Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 49. Explore the impact of DVFS in Hadoop clusters There is a significant potential of energy saving by scaling down the CPU frequency when peak CPU is not needed Diversity of MapReduce applications Multiple phases of MapReduce application Disk I/O CPU Disk I/O Network CPU load is high (98%) during almost 75% of the job running CPU load is high(80%) during only 15% of the job running S. Ibrahim, T-D Phan, A. Carpen-Amarie, H-E. Chihoub, D. Moise, G. Antoniu, “Governing Energy Consumption in Hadoop through CPU Frequency Scaling: An Analysis”, Future Generation Computer Systems, Volume 54, January 2016 49Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 50. Mitigating Stragglers in Hadoop Performance variation is common in the Cloud • Stragglers can severely increase the execution time • Hadoop lunches another copy of the straggler with the hope that it will finish earlier (i.e., speculation) Task 1Node1 Node2 Task 2 Task 3Node3 Node4 Task 4 time Straggler T-D Phan, S. Ibrahim, G. Antoniu, L. Bouge, “On Understanding the Energy Impact of Speculative Execution in Hadoop”, The 2015 IEEE InternationalConference on Green Computing and Communications (GreenCom 2015), Dec 2015 50Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 51. Speculation benefit in Heterogeneous-environment 0 2 4 6 8 10 12 14 16 18 Executiontime(103 s) Speculation disabled Speculation enabled 0 5 10 15 20 25 Energyconsumption(MJ) -47% -28% The energy reduction is not proportional to the execution time improvement. This strongly depends on the extra power due to the extra resource consumption 0 10 20 30 40 50 60 70 80 90 CloudBurst Averagepowerconsumption 0 10 20 30 40 50 60 70 80 90 CloudBurst Sort WordCount Averagepowerconsumption Speculation disabled Speculation enabled +32% 51Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 52. New approaches for Data Management in the Cloud • No “one size fits all” solution • NoSQL, key-value data stores (e.g. Bigtable, HBase, Cassandra, HyperTable), graph databases (e.g. Neo4j, Pregel), array data stores (e.g. SciDB), analytical Cloud databases (e.g. Greenplum and Vertica), analytical Cloud frameworks (e.g. Hadoop Map-Reduce, Cloudera Impala), document databases (e.g. MongoDB, CouchBase), data stream management systems (e.g. Storm) • Wide diversification of data store interfaces and the loss of a common programming paradigm • Design of multistore data management systems • Data management in multisite Clouds • Deviation between Cloud and HPC storage infrastructures • New I/O mechanisms to guide I/O-systems in order to deliver the best performance • Distributed file systems with Cloud capabilities such as elasticity • development of a unified architecture for HPC and Cloud storage back-ends 52Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 54. New Models for Cloud Application Description • Adaptation to various kinds of hardware resources is mandatory • Interesting approach: distinguish the description of the various possible configurations (application architecture description) from the quality of service looked for a particular execution (minimize cost, maximize performance, respect a deadline, etc) • Challenges • Description of the structure of the application • Description of the expected behavior • Several issues • Take data into account • How to model application workflows • New languages for non-functional objectives (budget, performance/dead-line, security, data) 54Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 55. More Efficient Techniques and Algorithms for Cloud Resource Allocation • Large number of resources to be used by applications • Hardware heterogeneity including new resources (GPU, FPGA, …) • Difficult for users to choose the most appropriate hardware configuration • Need of performance models for applications • Seamless choice of resources following user demands and resource availability • SLAs put the emphasis on providers to provide robust allocations despite the large number of hardware failures • Include a reliability constraint • Use replication to cope with faults and failures • Take dynamicity and elasticity into account • Allocation problems adapted to Cloud constraints (CPU, memory, disks, network, complex topologies) • Design of sophisticated algorithms with guarantees on their reliability • Put the optimization on impactful jobs • Efficient representation of the search space and theoretical analysis 55Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 56. New Approach to Integrate Cloud, IoT, CPS, and Mobile Devices Cloud systems are now the cornerstones of the Internet ecosystem allowing any connected devices, such as things, Smartphones, tablets, set-top boxes and PCs, to store and share information in a seamless way • But • centralized Internet, increasing impact of failures on Internet users, loss of control on citizen's private data, vendor lock-in from hardware and software providers, massive leaks of sensible data when Cloud systems are under attack and surveillance by national security agencies • Ideas • More decentralized Cloud infrastructures, i.e. fog computing, taking into account the rapid evolution of very cheap and low-power consumption hardware • Use nano-PC based on Smartphone technologies (ARM based processors) • Many challenges • Seamless integration of nano-PC within Cloud infrastructures, • New Cloud services combining nano-PCs and data-centers, • Server-less sharing, security, and privacy 56Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 57. Promoting simulations to Investigate Cloud Concerns • Difficult for users to select the Cloud services that best meet their requirements (in terms of performance, cost, energy, etc) • Preliminary evaluations with partial deployments on real platforms such as Amazon Web Service or Microsoft Azure • Investigation of new hardware and new software mechanisms for Cloud providers in order to stay competitive • Provision a part of the Cloud to evaluate the benefits of such change • Use of simulation for these scenarios • Reduction of development cost • Controlling of parameters such as network latency, reliability, scalability, etc • Development of accurate and versatile simulation framework 57Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 58. • Scientific instrument for the study of large scale distributed computing • Main Features • Versatile: Grid, P2P, HPC, Volunteer Computing, ..., Clouds http://infra-songs.gforge.inria.fr • Valid: Accuracy limits studied and pushed further for years • Scalable and Fast (despite precise models) • Usable: Tooling (generators, runner, vizu); Open-source, Portable, ... • On-Going work • SCHIaaS: Simulation of Clouds and Hybrid IaaS • Adding virtualization capabilities into SimGrid (VM migration, boot, …) SimGrid: Simulator of Distributed Applications 58Labex UCN@Sophia – F. Desprez Feb. 18, 2016 simgrid.gforge.inria.fr/
  • 59. GRID’5000 – Real IaaS for Researchers • Testbed for research on distributed systems • Born from the observation that we need a better and larger testbed • HPC, Grids, P2P systems and more recently Cloud computing Adding virtualization capabilities into Grid’5000 INRIA RR8026/Jul 2012 • A complete access to the nodes’ hardware in an exclusive mode (from one node to the whole infrastructure) • Current status • 9 sites,1195 machines, 8184 cores • Diverse technologies/resources (Intel, AMD, Myrinet, Infiniband, two GPU clusters, energy probes) • Ready to use OpenStack distribution • Last significant experiment • Dynamic scheduling of 10K VMs across 4 sites 59Labex UCN@Sophia – F. Desprez Feb. 18, 2016 https://www.grid5000.fr/
  • 61. Conclusion • Cloud Computing technology is changing every day New features, new requirements (IaaS ++ services) • Many research issues addressed in our research labs that should/will be transfered in tomorrow’s cloud infrastructures • Connection between “classical” Cloud infrastructures to next generation platforms (IoT) • Distributed Cloud Computing is happening ! • Dist. CC workshop (UCC 2013, SIGCOMM 2014/2015)FOG Computing workshop (collocated with IEEE ICC 2013), IEEE CloudNet … • How developers should develop new applications to benefit from such geographically distributed infrastructures 61Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 62. References • European Commission report on The Future of Cloud Computing  http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf • A Roadmap for Advanced Cloud Technologies under H2020, European Commission, Recommandations by the Cloud Expert Group, Digital Agenda for Europe, Dec. 2012 •Report on the public consultation for H2020 Work Programme 2016-17: Cloud Computing and Software  ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?doc_id=8161 • Key Challenges in Cloud Computing, Enabling the Future Internet of Services, Rafael Moreno-Vozmediano, Ruben S. Montero, and Ignacio M. Llorente, IEEE INTERNET COMPUTING, Jul 2013 • NIST Cloud Strategy and Innovation Blog (I. Llorrente)  http://blog.cloudplan.org/ • Above the Clouds: A Berkeley View of Cloud Computing  http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html •DRAFT Cloud Computing Synopsis and Recommendations, NIST,  http://csrc.nist.gov/publications/drafts/800-146/Draft-NIST-SP800-146.pdf • SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond  http://www.sienainitiative.eu/Repository/FileScaricati/8ee3587a-f255-4e5c-aed4-9c2dc7b626f6.pdf • The Magellan Report on Cloud Computing for Science, Yellick et al., Dec. 2011  http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_final_report.pdf • Livre blanc sur le calcul intensif, Comité d’orientation pour le calcul intensif (Cocin) du CNRS, 2012  http://www.cnrs.fr/ins2i/IMG/pdf/Livre_blanc_-_derniere_version.pdf • Synergistic Challenges in Data-Intensive Science and Exascale Computing, DOE ASCAC Data Subcommittee Report, March 2013 • Integration of Cloud computing and Internet of Things: A survey, A. Botta, W. de Donato, V. Persico, A. Pescapé, Future Generation Computer Systems, 56 (2016) 62Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  • 63. QUESTIONS ? 63Labex UCN@Sophia – F. Desprez Feb. 18, 2016

Editor's Notes

  1. 6% of the total energy of the world 122 Gigawatts 1 nuclear plant = 1 Gigawatt Devices = 7 NYC
  2. EV = Execution Vector
  3. Cloud Radio Access Networks (C-RAN)
  4. UC platforms should be tightly coupled with any facilities available through the Internet, starting from the cores routers of the backbone, the different network access points and any small and medium-size computing infrastructures that may be provisioned by Internet Service Providers (ISPs), governments and academic institutions.The definition of a complete distributed system in charge of turning a complex and diverse network of resources into a collection ofabstracted computing fa- cilities that is both easy to operate and reliable Attention exemple de pizza: aucun interet d’un point de vue du restaurant mais interet d’un point de vue impact performance (latence)+energy footprint
  5. UC platforms should be tightly coupled with any facilities available through the Internet, starting from the cores routers of the backbone, the different network access points and any small and medium-size computing infrastructures that may be provisioned by Internet Service Providers (ISPs), governments and academic institutions.The definition of a complete distributed system in charge of turning a complex and diverse network of resources into a collection ofabstracted computing fa- cilities that is both easy to operate and reliable Attention exemple de pizza: aucun interet d’un point de vue du restaurant mais interet d’un point de vue impact performance (latence)+energy footprint
  6. Two main approaches: flat: distributing the DB thanks to Galera (active replication). hierarchical: Cells: As in the right figure, a "top cell" (api cell) exposes the API and then distribute the workload on compute cells. As the top cell is not distributed, it exposes the infrastructure to SPOF. Cascading OpenStack: A recent solution developed by engineers from Huawey. A top OpenStack infrastructure exposes the OpenStack API and distribute the workload to children OpenStack infrastructures. We are interested by the first approach, however we don't like the "active replication" part of the solution, as it is adapted to 100s of sites.
  7. Ajouter des logos
  8. Pour valider les bénéfices de l’approche BlobSeer nous l’avons expérimenté Objectif : plate-forme Map-Reduce optimisée pour clouds et architectures hybrides Accès massivement concurrents aux données, tolérance aux fautes, ordonnancement Rôle de BlobSeer Stockage des données applicatives Stockage d’images de machines virtuelles (multi-déploiement, multi-snapshotting) ----- Notes de la réunion (01/10/12 11:58) ----- Q: pas clair les défis atteint et ce qui reste Couper en deux + figure MapReduce
  9. Makespan. We firstly confirm that at small scale a decentralized approach actually adds overhead to the computation and hence centralized solutions are best for smaller settings, regardless of the workflow layout. Overall, we assert that our decentralized solutions fit better to complex workflow execution environments, notably metadata intensive applications, where we achieved a 15% gain in a near-pipeline workflow (BuzzFlow) and 28% in a parallel, geo-distributed application (Montage) compared to the centralized baseline. With tasks taking long enough time to process large files, the agent has sufficient time to synchronize the registry instances and to provide consistency guarantees that enable easy reasoning on concurrency at application level. We noticed that workflow execution engines schedule sequential jobs with tight data dependencies in the same site as to prevent unnecessary data movements. With our approach, when two consecutive tasks are scheduled in the same datacenter the metadata is available locally.
  10. CDF = Cumulative Distribution Function
  11. mix the two aforementioned scenarios to consider both the interest of the Cloud operator and the ones of the users, speaking for generic tools allowing global studies.
  12. SimGrid Cloud Broker (simulate the cost of using amazon, simulate the performance that you might expect for you hybrid cloud) Provide sound models for live migration on high consolidated cluster. To provide such models we should first understand and second mode cloud platforms as well as application behaviors
  13. AL: Où est la partie bigData