Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Challenges and Issues of Next
Cloud Computing Platforms
Frédéric Desprez
Frederic.Desprez@inria.fr
Labex UCN@Sophia – Feb....
Labex UCN@Sophia – F. Desprez
Acknowledgements
Feb. 18, 2016 2
Gabriel Antoniu Inria (Rennes, Kerdata)
Olivier Beaumont In...
Outline
• Introduction and Context
• Energy Issues
• Distributed Clouds
• Big Data
• Other issues
• Conclusions
3Labex UCN...
INTRODUCTION AND CONTEXT
Context
Cloud computing has emerged as a “new” paradigm for many commercial
and scientific venues
• Starts to be widely ad...
Clouds Essential Characteristics
• On-demand service
 No need of human interaction to get an access to storage and comput...
Transparency is the Key
“I don't care if my cloud computing architecture is powered by a grid, a mainframe, my
neighbour's...
Research Issues
• Explosion of the number of research work around Clouds and virtualization !
• Some research challenges
•...
ENERGY ISSUES
Laurent Lefèvre’s team in Avalon (LIP/ENS Lyon & INRIA)
Electrical consumption of ICT…. 2013… gwatt.net
Devices
Telecommunication networks = 83 GW
10Labex UCN@Sophia – F. Desprez...
Improving Energy Efficiency of Cloud Infrastructures
• Understanding the energy usage of large scale systems mixing virtua...
Energy Efficiency by knowing application and services or
not ?
• Exploring 2 different approaches
• With knowledge on the ...
Improving EE with application expertise
• Considered services: resilience & data broadcasting
• 4 steps
• Service analysis...
Without knowledge of applications and services ?
• HPC applications keep growing in complexity
• too many bugs in HPC appl...
Without knowledge on applications
Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Jean-Marc Pierson, Patricia Stolf,
Geor...
Towards Energy Proportionality with Heterogeneous
machines
OBSERVATIONS [Barroso and Hölzle 2007]
Average server utilizati...
Towards Energy Proportionality with Heterogeneous
machines
V. Villebonnet, G. Da Costa, L. Lefèvre, J-M. Pierson, P. Stolf...
Towards Energy Proportionality with Heterogeneous machines
Application: Stateless Web Servers
Traces: Day of 98 WorldCup W...
Virtual Machines and Energy efficient Clouds
Taking into account the energy consumption in the
scheduling process
• Energy...
Virtual Machines and Energy efficient Clouds
Combining energy with other criteria and constraints for a given problem
• La...
Network is Part of the Story: Dynamic, Energy Efficient,
Network Reconfiguration
Van Heddeghem et al. “Power Consumption M...
Energy Efficient Networks
Carpa R., Assuncao M.,Gluck O., Lefevre L. and Mignot J.-C., "Responsive Algorithms
for Handling...
DISTRIBUTED CLOUDS
Adrien Lebre’s team in ASCOLA (LINA/EMN Nantes & INRIA)
The current Situation
• Large off shore DCs
• To cope with the increasing UC demand while handling energy concerns
• But
•...
Charles
Alice
Paula
Bob Dan
Sam
Rob
Duke
The Cloud from End-Users
25
Labex UCN@Sophia – F. Desprez Feb. 18, 2016 25
Charles
Alice
Paula
Bob
Dan
Sam
Rob
Duke
Internet
backbone
The Cloud in Reality
26
Labex UCN@Sophia – F. Desprez Feb. 18, ...
Cloud Evolution
Not only mega data centres !
Courtesy to Thierry Coupaye (Orange)
27Labex UCN@Sophia – F. Desprez Feb. 18,...
Trends for Next Generation Clouds
Centralized public clouds are in fact generally distributed over multiple
(mega) data ce...
Trends for Next Generation Clouds
Hybrid and community clouds are by nature distributed over multiple data
centres/clouds
...
Trends for Next Generation Clouds
Networks are getting « softwarized » and are converging with a distributed
vision of clo...
The DISCOVERY Proposal
• DIStributed and COoperative framework to manage Virtual EnviRonments
autonomously
• Locality-base...
Ali
ce
Duke
DISCOVERY Network
DISCOVERY Network
Paula
DISCOVERY Network
DISCOVERY Network
DISCOVERY Network
DISCOVERY Netw...
Beyond the Clouds, the DISCOVERY Initiative
Locality-based UC infrastructures / Fog / Edge
A promising way to deliver high...
Beyond the Clouds, the DISCOVERY Initiative
• Leveraging network backbones
• Extend any point of presence of network backb...
Would OpenStack be the solution?
• Do not reinvent the wheel …
• OpenStack
• Open source IaaS manager with a large communi...
Distributing OpenStack
• Services collaborate through
• A messaging queue
• A SQL database
• Alternate solutions exists fo...
ROME
• Relational Object Mapping Extension for key/value stores
• Jonathan Pastor’s Phd
• Enables the query of key/value s...
The DISCOVERY INITIATIVE PROS AND CONS
• Pro
• Locality (jurisdiction concerns, latency-aware apps, minimize network overh...
BIG DATA
Gabriel Antoniu’s team KERDATA (IRISA & INRIA)
Data Processing, Big Data
• Huge amount of data to be moved and processed
• LHC, simulations, genomics, astrophysics, soci...
Beyond Hadoop: BlobSeer
Scalable Storage for Data-Intensive Analytics
Started in 2008, 6 PhD (Gilles Kahn/SPECIF PhD Thesi...
BlobSeer on Commercial Clouds
The A-Brain Microsoft Research – Inria Project
p( ),
Genetic dataBrain image
Y
q~105-6
N~200...
Executing the A-Brain Application at Large-Scale
• The TomusBlobs data-storage layer developed within the A-Brain project ...
Going Further: Managing Metadata for Geo-Distributed Workflows
The Z-CloudFlow Microsoft Research – Inria Project
• Multis...
Four Strategies
Centralized
• Baseline
Replicated
• Local metadata accesses
• Synchronization agent
Decentralized
Non-repl...
Matching strategies to workflows
• Centralized
• Small scale
• Replicated
• Intensive computations
• Large files
• Decentr...
Failure-Aware Scheduling in Hadoop
In large-scale cloud node failures are inevitable
• 1000 machine failures in the 1st ye...
Chronos: a Failure-aware scheduler
• Takes early actions upon failures
• Employs work-conserving preemption technique
• Co...
Explore the impact of DVFS in Hadoop clusters
There is a significant potential of energy saving by scaling down the CPU fr...
Mitigating Stragglers in Hadoop
Performance variation is common in the Cloud
• Stragglers can severely increase the execut...
Speculation benefit in Heterogeneous-environment
0
2
4
6
8
10
12
14
16
18
Executiontime(103
s)
Speculation disabled
Specul...
New approaches for Data Management in the Cloud
• No “one size fits all” solution
• NoSQL, key-value data stores (e.g. Big...
OTHER ISUES
New Models for Cloud Application Description
• Adaptation to various kinds of hardware resources is mandatory
• Interestin...
More Efficient Techniques and Algorithms for Cloud Resource Allocation
• Large number of resources to be used by applicati...
New Approach to Integrate Cloud, IoT, CPS, and Mobile Devices
Cloud systems are now the cornerstones of the Internet ecosy...
Promoting simulations to Investigate Cloud Concerns
• Difficult for users to select the Cloud services that best meet thei...
• Scientific instrument for the study of large scale distributed computing
• Main Features
• Versatile: Grid, P2P, HPC, Vo...
GRID’5000 – Real IaaS for Researchers
• Testbed for research on distributed systems
• Born from the observation that we ne...
CONCLUSIONS
Conclusion
• Cloud Computing technology is changing every day New features, new
requirements (IaaS ++ services)
• Many res...
References
• European Commission report on The Future of Cloud Computing
 http://cordis.europa.eu/fp7/ict/ssai/docs/cloud...
QUESTIONS ?
63Labex UCN@Sophia – F. Desprez Feb. 18, 2016
Prochain SlideShare
Chargement dans…5
×

Challenges and Issues of Next Cloud Computing Platforms

746 vues

Publié le

Cloud computing has now crossed the frontiers of research to reach industry. It is used every day , whether to exchange emails or make
reservations on web sites. However, many research works remain to be done to improve the performance and functionality of these platforms of tomorrow. In this talk, I will do an overview of some these theoretical and appliead researches done at INRIA and particularly around Clouds distribution, energy monitoring and management, massive data processing and exchange, and resource management.

Publié dans : Ingénierie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Challenges and Issues of Next Cloud Computing Platforms

  1. 1. Challenges and Issues of Next Cloud Computing Platforms Frédéric Desprez Frederic.Desprez@inria.fr Labex UCN@Sophia – Feb. 18th 2016
  2. 2. Labex UCN@Sophia – F. Desprez Acknowledgements Feb. 18, 2016 2 Gabriel Antoniu Inria (Rennes, Kerdata) Olivier Beaumont Inria (Bordeaux, CEPAGE) Alexandru Costan Inria (Rennes, Kerdata) Thierry Coupaye Orange Labs Grenoble Paulo Goncalvez Inria (Lyon, Dante) Shadi Ibrahim Inria (Rennes, Kerdata) Kate Keahey Argonne National Lab Cristian Klein Umea University, Suède Adrien Lèbre Inria et Ecole des Mines de Nantes (Ascola) Laurent Lefèvre Inria, (Lyon, Avalon) Ignacio Llorente Complutense University of Madrid, Espagne Christine Morin Inria (Rennes, Myriads) Martin Quinson ENS (Rennes, Myriads) David Margery Inria (Rennes, Myriads) Anne-Cécile Orgerie CNRS (Rennes, Myriads) Manish Parashar Rutgers University Christian Perez Inria (Lyon, Avalon) Thierry Priol Inria (Rennes, Myriads) Jonathan Rouzaud-Cornabas Insa (Lyon, Beagle) Frédéric Suter CNRS/IN2P3 (Lyon, Avalon) Patrick Valduriez Inria (Montpellier, Zenith) Rich Wolsky University of California Santa Barbara, USA
  3. 3. Outline • Introduction and Context • Energy Issues • Distributed Clouds • Big Data • Other issues • Conclusions 3Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  4. 4. INTRODUCTION AND CONTEXT
  5. 5. Context Cloud computing has emerged as a “new” paradigm for many commercial and scientific venues • Starts to be widely adopted by the industries • Many platforms and infrastructures available around the world • Several offers for IaaS, PaaS, and SaaS platforms • Public, private, community, and hybrid clouds … But still many applications left that could benefit from such platforms Several issues still needs to (better) addressed • Elasticity, availability, self-configuration, heterogeneous computing and storage capacities • Several challenges remain to be addressed and transferred into industrial products • Energy management • New applications (IoT) 5Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  6. 6. Clouds Essential Characteristics • On-demand service  No need of human interaction to get an access to storage and computation resources (Utility Computing) • Access through large scale networks  Access to resources through networks from lightweight and heavy-weight clients (WAN, LAN, Wireless) • Resource Polling  Resources (CPU, storage, memory, network) are taken from datacenters without (almost) locality notion • Elasticity  Ressources can be allocated and freed in an elastic fashion based on the application needs (with an "infinite" capacity) • Measured service  Possibility to monitor resource usage • Pro  Disponibility and extensibility  Dynamicity  Fault tolerance  Resource mutualization • Cons  Heterogeneity  No locality  Application porting  Security ? 6Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  7. 7. Transparency is the Key “I don't care if my cloud computing architecture is powered by a grid, a mainframe, my neighbour's desktop or an army of monkeys, so long as it's fast, cheap and secure.” Sam Johnston, Sept. 2008 7Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  8. 8. Research Issues • Explosion of the number of research work around Clouds and virtualization ! • Some research challenges • Energy • Service composition • Service Level Agreement (SLA) • Security • Fault tolerance and recovery • Infrastructure management • Elastic management of resources • (Big) Data management • Seamless access to hybrid platforms • Multi-clouds, Sky computing, federations, infrastructure distribution, edge computing • New models • economic, energy • Application design and description • New languages, new models • Simulation and experimentation • ... 8Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  9. 9. ENERGY ISSUES Laurent Lefèvre’s team in Avalon (LIP/ENS Lyon & INRIA)
  10. 10. Electrical consumption of ICT…. 2013… gwatt.net Devices Telecommunication networks = 83 GW 10Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  11. 11. Improving Energy Efficiency of Cloud Infrastructures • Understanding the energy usage of large scale systems mixing virtual instances of applications, physical IT resources, and physical infrastructures remains a real challenge. • How to profile the energy consumption of large sets of virtual machines (generic metrics, benchmarks, and energy models) • Analyzing tools and frameworks to support large scale energy efficient management of resource • Optimize the energy consumption of distributed infrastructures and service compositions in the presence of ever more dynamic service applications • Use of renewable energies • Exploring the trade-off between energy saving and performance aspects in large-scale distributed system • Energy efficiency of storage systems and networks 11Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  12. 12. Energy Efficiency by knowing application and services or not ? • Exploring 2 different approaches • With knowledge on the application and services • Enable the user to choose the less consuming implementation of services  Estimate the energy consumption of the different implementations (protocols) of each service • Without knowledge • Allow some intelligence to reduce the energy usage  Autonomically estimate the energy consumption of the HPC system in order to apply green levers 12Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  13. 13. Improving EE with application expertise • Considered services: resilience & data broadcasting • 4 steps • Service analysis, Measurements, Calibration, Estimation • Helping users make the right choices depending on context and parameters M. Diouri, O. Glück, L. Lefèvre, and Franck Cappello. "ECOFIT: A Framework to Estimate Energy Consumption of Fault Tolerance Protocols during HPC executions", CCGrid2013, the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013 13Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  14. 14. Without knowledge of applications and services ? • HPC applications keep growing in complexity • too many bugs in HPC applications already present, adding energy management and considerations won’t help 😀 • Are HPC programmers ready for eco design of applications ? • Applications can share the same infrastructure • Optimizations made for saving energy considering some applications are likely to impact the performance of others • Instead of looking at applications and service ⇒ Focusing on the infrastructure • Detect and characterize system’s runtime behaviors/phases • Optimize each subsystem (storage, memory, interconnect, CPU) accordingly • Helping users to find the best service 14Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  15. 15. Without knowledge on applications Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Jean-Marc Pierson, Patricia Stolf, Georges Da-Costa. "Application-Agnostic Framework for Improving the Energy Efficiency of Multiple HPC Subsystems", PDP2015 : 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2015. • Irregular usage of resources • Phase detection, characterization • Power saving modes deployment 15Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  16. 16. Towards Energy Proportionality with Heterogeneous machines OBSERVATIONS [Barroso and Hölzle 2007] Average server utilization between 10 and 50 % → Most inefficient region No proportionality due to high idle consumption → Can be up to 50 % of peak power PROPOSITION Heterogeneous Infrastructure composed of machines with different characteristics in terms of performance and energy consumption • Classical servers → Only used at their most energy efficient region • Low power processors → Reduce static costs TECHNICAL CHALLENGES - Application placement: Dynamically find the most suitable combinations of machines - Infrastructure reconfiguration: Power On/Off machines at the right time [Barroso and Hölzle, The Case for Energy Proportional Computing, IEEE Computer, 2007 16 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 16 BIG MEDIUM LITTLE
  17. 17. Towards Energy Proportionality with Heterogeneous machines V. Villebonnet, G. Da Costa, L. Lefèvre, J-M. Pierson, P. Stolf, “Big, Medium, Little”: Reaching Energy Proportionality with Heterogeneous Computing Scheduler”, Parallel Processing Letters, 25 (03), World Scientific Publishing, 2015. 17Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  18. 18. Towards Energy Proportionality with Heterogeneous machines Application: Stateless Web Servers Traces: Day of 98 WorldCup Website access BIG only Joules per request: 0,2268 Infrastructure utilization: 40,7% Number of reconfiguration: 4 BML combination Joules per request: 0,2155 Infrastructure utilization: 69,7% Number of reconfiguration: 194 ⇒ Infrastructure is dynamically reconfigured to meet the load demand of the application → Energy consumption more proportional to the load 18Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  19. 19. Virtual Machines and Energy efficient Clouds Taking into account the energy consumption in the scheduling process • Energy and resource usage are highly fluctuating • Large disparities between similar nodes → Decisions needs to be proactive based on recent and historical activity How to efficiently assign those tasks? Combine • A metric to balance performance and energy consumption • An interface to express tradeoffs between users and providers requirements • A manager of energy-related events Results • Up to 20% of energy savings in real-life experimentations Daniel Balouek-Thomert, Eddy Caron, Laurent Lefevre, "Energy-Aware Server Provisioning by Introducing Middleware-Level Dynamic Green Scheduling", HPPAC 2015: The 11th Workshop on High-Performance, Power-Aware Computing, May 2015 19Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  20. 20. Virtual Machines and Energy efficient Clouds Combining energy with other criteria and constraints for a given problem • Large spectrum of potential solutions • NP-Hard problem Daniel Balouek-Thomert, Arya K. Bhattacharya, Eddy Caron, Karunakar Gadireddy, Laurent Lefèvre, Minimizing energy and makespan concurrently in Cloud Computing workloads using Multi-Objective Differential Evolution, under reviewing Genetic Approach • A model to capture affinities between tasks and resources • An algorithm that mimicks the “survival of the fittest”: only efficient servers are used through time • A learning engine that integrates constraints Strategies needs to be validated in terms of correctness and computing time 20Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  21. 21. Network is Part of the Story: Dynamic, Energy Efficient, Network Reconfiguration Van Heddeghem et al. “Power Consumption Modeling in Optical Multilayer Networks” PNET 24 (2), 86–102, 2012 Carpa R., Gluck O., Lefevre L. and Mignot J.-C., "Improving the energy efficiency of software-defined backbone networks", Photonic Network Communications, vol. 30(3), p. 337-347, 2015. Network energy consumption 40 Gwatts in 2013 (source: gwatt.net) A lot of improvement possible during off-peak hours Especially in core networks Re-route to improve the energy efficiency Consumption reduced by up to 39 % Hassidim, A et al. “Network utilization: The flow view”, INFOCOM, 2013 IEEE, 1429–1437, 2013 21Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  22. 22. Energy Efficient Networks Carpa R., Assuncao M.,Gluck O., Lefevre L. and Mignot J.-C., "Responsive Algorithms for Handling Load Surgesand Switching Links On in Green Networks” - Submitted to ICC 2016 Simulations of high-speed core networks • Rerouting in less than a second • Improved energy efficiency compared to related work (12 %) • Same quality of service NetFPGA + Openflow testbed (Work in progress) • Targeting access networks • Few, frequently changing, flows • Cross-layer L3 / L4 optimizations for stability 22Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  23. 23. DISTRIBUTED CLOUDS Adrien Lebre’s team in ASCOLA (LINA/EMN Nantes & INRIA)
  24. 24. The current Situation • Large off shore DCs • To cope with the increasing UC demand while handling energy concerns • But • Juridiction concerns (data locality) • Reliability • Network overhead • Localization is a key element to deliver efficient as well as sustainable Utility Computing solutions credits: coloandcloud.com 24Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  25. 25. Charles Alice Paula Bob Dan Sam Rob Duke The Cloud from End-Users 25 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 25
  26. 26. Charles Alice Paula Bob Dan Sam Rob Duke Internet backbone The Cloud in Reality 26 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 26
  27. 27. Cloud Evolution Not only mega data centres ! Courtesy to Thierry Coupaye (Orange) 27Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  28. 28. Trends for Next Generation Clouds Centralized public clouds are in fact generally distributed over multiple (mega) data centres for availability reasons Verizon (©) Orange (©)Microsoft (©) Amazon (©) Courtesy to Thierry Coupaye (Orange) 28Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  29. 29. Trends for Next Generation Clouds Hybrid and community clouds are by nature distributed over multiple data centres/clouds Courtesy to Thierry Coupaye (Orange) 29Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  30. 30. Trends for Next Generation Clouds Networks are getting « softwarized » and are converging with a distributed vision of cloud computing. 3 examples  Virtual CDN (vCDN)  Cloud RAN (C-RAN)  Mobile Edge Computing (MEC) Courtesy to Thierry Coupaye (Orange) 30Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  31. 31. The DISCOVERY Proposal • DIStributed and COoperative framework to manage Virtual EnviRonments autonomously • Locality-based Utility Computing platform (“LUC-OS”) • A fully distributed IaaS system and not a distributed system of IaaS systemS. • We want to/must go further than high level cloud APIs (cross-cutting concerns such as energy/security) • Leverage P2P algorithms and self-* approaches • Lots of scientific/technical challenges • Cost of the network ? • Partial view of the system ? • Impact on the others VMs ? • Management of VM images ? • How to take into account locality aspects? • Which software abstractions to make the development easier and more reliable (distributed event programming)? … Lèbre, A., J. Pastor, J., Bertier, M., Desprez, F., Rouzaud-Cornabas, J., Tedeschi, C., Anedda, P., Zanetti, G., Nou, R., Cortes, T., Riviere, E. and Ropars, T., Beyond The Cloud, How Should Next Generation Utility Computing Infrastructures Be Designed? INRIA Research Report 8348, Aug. 2013. 31Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  32. 32. Ali ce Duke DISCOVERY Network DISCOVERY Network Paula DISCOVERY Network DISCOVERY Network DISCOVERY Network DISCOVERY Network Tom DISCOVERY Network DISCOVERY Network Charles Bob Dan Sam Rob DISCOVERY Network The DISCOVERY Initiative 32 Labex UCN@Sophia – F. Desprez Feb. 18, 2016 32
  33. 33. Beyond the Clouds, the DISCOVERY Initiative Locality-based UC infrastructures / Fog / Edge A promising way to deliver highly efficient and sustainable UC services is to provide UC platforms as close as possible to the end-users. http://www.renater.fr/raccourci?lang=fr 33Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  34. 34. Beyond the Clouds, the DISCOVERY Initiative • Leveraging network backbones • Extend any point of presence of network backbones with UC servers (from network hubs up to major DSLAMs that are operated by telecom companies and network institutions) • Leveraging wireless backbones Paula Bob Alice Duke Charles Pam Bob core backbone DISCOVERY DISCOVERY Network DISCOVERY Network DISCOVERY Network DISCOVERY Network DISCOVERY Network 34Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  35. 35. Would OpenStack be the solution? • Do not reinvent the wheel … • OpenStack • Open source IaaS manager with a large community • Composed of several services dedicated to each aspect of a cloud 35Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  36. 36. Distributing OpenStack • Services collaborate through • A messaging queue • A SQL database • Alternate solutions exists for storing states over a highly distributed infrastructure ⇒ NoSQL DB • Few proposals to federate/operate distinct OpenStack DCs • ‘Flat’ approach Hierarchical approaches http://beyondtheclouds.github.io/dcc.html 36Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  37. 37. ROME • Relational Object Mapping Extension for key/value stores • Jonathan Pastor’s Phd • Enables the query of key/value store DB with the same interface as SQLAlchemy • Enables Nova OpenStack to switch to a KVS without being too intrusive • The KVS is clustered on controllers • Compute nodes connect to the Key/value cluster Non-Relational Key/Value DB Relational Nova Network Nova Compute Nova Scheduler Nova Conductor db.api MySQL DB https://github.com/badock/rome 37Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  38. 38. The DISCOVERY INITIATIVE PROS AND CONS • Pro • Locality (jurisdiction concerns, latency-aware apps, minimize network overhead) • Reliability/redundancy (no critical point/location/center) • The infrastructure is naturally distributed throughout multiple areas • Lead time to delivery • Leverage current PoPs and extend them according to UC demands • Energy footprint (on-going investigations with RENATER) • Bring back part of the revenue to NRENs/Telcos • Cons • Security concerns (in terms of who can access to the PoPs) • Operate a fully IaaS in a unified but distributed manner at WAN level • Not suited for all kinds of applications : Large tightly coupled HPC workloads 50 nodes/1000 cores, 200 nodes / 4000 cores (5 racks), so 1000 nodes in one PoP does not look realistic … • Peering agreement / economic model between network operators http://beyondtheclouds.github.io/ 38Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  39. 39. BIG DATA Gabriel Antoniu’s team KERDATA (IRISA & INRIA)
  40. 40. Data Processing, Big Data • Huge amount of data to be moved and processed • LHC, simulations, genomics, astrophysics, social networks, sensors, … • Heterogeneity in their storage (DB, files, …) and processing (cleaning, transformation, analysis, search, indexing, visualization, ...) • Challenges • Resources issues • Fault tolerance and recovery, energy management • Handling complex distributed workflows at a large scale (computation and data transfers and replications) • Resource management (computation, storage, network), solutions interoperability • Describing these workflows • Meta-data management • Data provenance • Which transformations were applied • Programming next generation applications • Which langage for which application • Strong relations with resource management systems • Performance and transparency • Genericity Sakr, S. Liu, A., Batista, D.M., Alomari, M., A Survey of Large Scale Data Management Approaches in Cloud Environments, IEEE Communications Surveys and Tutorials, 2011. Middleton A.M., Data-Intensive Technologies for Cloud Computing, Handbook of Cloud Computing, Springer, 83-135, 2010. http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 40Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  41. 41. Beyond Hadoop: BlobSeer Scalable Storage for Data-Intensive Analytics Started in 2008, 6 PhD (Gilles Kahn/SPECIF PhD Thesis Award in 2011) Main goal: optimized for concurrent accesses under heavy concurrency Three key ideas - Decentralized metadata management - Lock-free concurrent writes (enabled by versioning) - Data and metadata “patching” rather than updating A back-end for higher-level data management systems - Highly scalable distributed file systems - Storage for cloud services Approach - Design and implementation of distributed algorithms - Experiments on the Grid’5000 testbed - Validation with “real” apps on “real” platforms: IBM clouds, Microsoft Azure, OpenNebula - Results on Grid’5000: BlobSeer improves Hadoop by 35% (execution time) http://blobseer.gforge.inria.fr/ B. Nicolae, G. Antoniu, L. Bougé, D. Moise, A. Carpen-Amarie. “BlobSeer: Next Generation Data Management for Large Scale Infrastructures”, in: Journal of Parallel and Distributed Computing, February 2011, vol. 71, no 2, pp. 169-184. 41Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  42. 42. BlobSeer on Commercial Clouds The A-Brain Microsoft Research – Inria Project p( ), Genetic dataBrain image Y q~105-6 N~2000 X p~106 – Anatomical MRI – Functional MRI – Diffusion MRI – DNA array (SNP/CNV) – gene expression data – others... • TomusBlobs storage (based on BlobSeer) • Processing approach: MapReduce • Gain / Blobs Azure : 45% • Scalability : 1000 cores http://www.msr-inria.fr/projects/a-brain/ • KerData, PARIETAL teams at INRIA • European Microsoft Innovation Center (Aachen) 42Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  43. 43. Executing the A-Brain Application at Large-Scale • The TomusBlobs data-storage layer developed within the A-Brain project was demonstrated to scale up to 1,000 cores on 3 Azure data centers (from EU, US) • Gain compared to Azure BLOBs: close to 50% • Experiment duration: ~ 14 days • More than 210,000 hours of computation used • Cost of the experiments: 20,000 euros (VM price, storage, outbound traffic) • 28,000 map jobs (each lasting about 2 hours) and ~600 reduce jobs Scientific Discovery: Provided the first statistical evidence of the heritability of functional signals in a failed stop task in basal ganglia B. Da Mota, R. Tudoran, A. Costan, G. Varoquaux, G. Brasche, P. J. Conrod, H. Lemaitre, T. Paus, M. Rietschel, V. Frouin, J.-B. Poline, G. Antoniu, B. Thirion. Machine Learning Patterns for Neuroimaging-Genetic Studies in the Cloud, in: Frontiers in Neuroinformatics, vol. 8 , April 2014. 43Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  44. 44. Going Further: Managing Metadata for Geo-Distributed Workflows The Z-CloudFlow Microsoft Research – Inria Project • Multisite cloud = a cloud with multiple data centers • Each with its own cluster, data and programs • Matches well the requirements of scientific apps • Goal • Investigate approaches to metadata management integrated with workflow execution engine to support multi-site scheduling 44Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  45. 45. Four Strategies Centralized • Baseline Replicated • Local metadata accesses • Synchronization agent Decentralized Non-replicated • Scattered metadata across sites • DHT-based Decentralized Replicated • Metadata stored locally and replicated to a remote location (using hashing) 45Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  46. 46. Matching strategies to workflows • Centralized • Small scale • Replicated • Intensive computations • Large files • Decentralized approaches • A large number of small files • Non-replicated • Parallel jobs • Replicated • For sequential, tightly dependent jobs, data available locally L. Pineda-Morales, A. Costan, G. Antoniu. « Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows », in: CLUSTER 2015 - IEEE International Conference on Cluster Computing, Chicago, United States, September 2015. 46Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  47. 47. Failure-Aware Scheduling in Hadoop In large-scale cloud node failures are inevitable • 1000 machine failures in the 1st year of Google cluster* • 10% -15% job failure rate in a CMU clusters Failure recovery in Hadoop • Hadoop re-executes the tasks of failed machines • Waits uncertain amount of time for a free slot • Ignores the data locality of the recovery tasks *J. Dean, “Large-scale distributed systems at Google: Current systems and future directions" in keynote speech at the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware, 2009. 47Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  48. 48. Chronos: a Failure-aware scheduler • Takes early actions upon failures • Employs work-conserving preemption technique • Considers local execution of recovery tasks • Independent of scheduling policy and increases performance (10-20%) over state-of-the-art Hadoop schedulers • It reduces the waiting time of recovery tasks from 46 seconds to 1.5 seconds on average O. Yildiz, S. Ibrahim, T.A. Phuong, G. Antoniu. “Chronos: Failure-aware scheduling in shared Hadoop clusters”, The 2015 IEEE International Conference on Big Data (BigData 2015), Nov 2015. 48Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  49. 49. Explore the impact of DVFS in Hadoop clusters There is a significant potential of energy saving by scaling down the CPU frequency when peak CPU is not needed Diversity of MapReduce applications Multiple phases of MapReduce application Disk I/O CPU Disk I/O Network CPU load is high (98%) during almost 75% of the job running CPU load is high(80%) during only 15% of the job running S. Ibrahim, T-D Phan, A. Carpen-Amarie, H-E. Chihoub, D. Moise, G. Antoniu, “Governing Energy Consumption in Hadoop through CPU Frequency Scaling: An Analysis”, Future Generation Computer Systems, Volume 54, January 2016 49Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  50. 50. Mitigating Stragglers in Hadoop Performance variation is common in the Cloud • Stragglers can severely increase the execution time • Hadoop lunches another copy of the straggler with the hope that it will finish earlier (i.e., speculation) Task 1Node1 Node2 Task 2 Task 3Node3 Node4 Task 4 time Straggler T-D Phan, S. Ibrahim, G. Antoniu, L. Bouge, “On Understanding the Energy Impact of Speculative Execution in Hadoop”, The 2015 IEEE InternationalConference on Green Computing and Communications (GreenCom 2015), Dec 2015 50Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  51. 51. Speculation benefit in Heterogeneous-environment 0 2 4 6 8 10 12 14 16 18 Executiontime(103 s) Speculation disabled Speculation enabled 0 5 10 15 20 25 Energyconsumption(MJ) -47% -28% The energy reduction is not proportional to the execution time improvement. This strongly depends on the extra power due to the extra resource consumption 0 10 20 30 40 50 60 70 80 90 CloudBurst Averagepowerconsumption 0 10 20 30 40 50 60 70 80 90 CloudBurst Sort WordCount Averagepowerconsumption Speculation disabled Speculation enabled +32% 51Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  52. 52. New approaches for Data Management in the Cloud • No “one size fits all” solution • NoSQL, key-value data stores (e.g. Bigtable, HBase, Cassandra, HyperTable), graph databases (e.g. Neo4j, Pregel), array data stores (e.g. SciDB), analytical Cloud databases (e.g. Greenplum and Vertica), analytical Cloud frameworks (e.g. Hadoop Map-Reduce, Cloudera Impala), document databases (e.g. MongoDB, CouchBase), data stream management systems (e.g. Storm) • Wide diversification of data store interfaces and the loss of a common programming paradigm • Design of multistore data management systems • Data management in multisite Clouds • Deviation between Cloud and HPC storage infrastructures • New I/O mechanisms to guide I/O-systems in order to deliver the best performance • Distributed file systems with Cloud capabilities such as elasticity • development of a unified architecture for HPC and Cloud storage back-ends 52Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  53. 53. OTHER ISUES
  54. 54. New Models for Cloud Application Description • Adaptation to various kinds of hardware resources is mandatory • Interesting approach: distinguish the description of the various possible configurations (application architecture description) from the quality of service looked for a particular execution (minimize cost, maximize performance, respect a deadline, etc) • Challenges • Description of the structure of the application • Description of the expected behavior • Several issues • Take data into account • How to model application workflows • New languages for non-functional objectives (budget, performance/dead-line, security, data) 54Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  55. 55. More Efficient Techniques and Algorithms for Cloud Resource Allocation • Large number of resources to be used by applications • Hardware heterogeneity including new resources (GPU, FPGA, …) • Difficult for users to choose the most appropriate hardware configuration • Need of performance models for applications • Seamless choice of resources following user demands and resource availability • SLAs put the emphasis on providers to provide robust allocations despite the large number of hardware failures • Include a reliability constraint • Use replication to cope with faults and failures • Take dynamicity and elasticity into account • Allocation problems adapted to Cloud constraints (CPU, memory, disks, network, complex topologies) • Design of sophisticated algorithms with guarantees on their reliability • Put the optimization on impactful jobs • Efficient representation of the search space and theoretical analysis 55Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  56. 56. New Approach to Integrate Cloud, IoT, CPS, and Mobile Devices Cloud systems are now the cornerstones of the Internet ecosystem allowing any connected devices, such as things, Smartphones, tablets, set-top boxes and PCs, to store and share information in a seamless way • But • centralized Internet, increasing impact of failures on Internet users, loss of control on citizen's private data, vendor lock-in from hardware and software providers, massive leaks of sensible data when Cloud systems are under attack and surveillance by national security agencies • Ideas • More decentralized Cloud infrastructures, i.e. fog computing, taking into account the rapid evolution of very cheap and low-power consumption hardware • Use nano-PC based on Smartphone technologies (ARM based processors) • Many challenges • Seamless integration of nano-PC within Cloud infrastructures, • New Cloud services combining nano-PCs and data-centers, • Server-less sharing, security, and privacy 56Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  57. 57. Promoting simulations to Investigate Cloud Concerns • Difficult for users to select the Cloud services that best meet their requirements (in terms of performance, cost, energy, etc) • Preliminary evaluations with partial deployments on real platforms such as Amazon Web Service or Microsoft Azure • Investigation of new hardware and new software mechanisms for Cloud providers in order to stay competitive • Provision a part of the Cloud to evaluate the benefits of such change • Use of simulation for these scenarios • Reduction of development cost • Controlling of parameters such as network latency, reliability, scalability, etc • Development of accurate and versatile simulation framework 57Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  58. 58. • Scientific instrument for the study of large scale distributed computing • Main Features • Versatile: Grid, P2P, HPC, Volunteer Computing, ..., Clouds http://infra-songs.gforge.inria.fr • Valid: Accuracy limits studied and pushed further for years • Scalable and Fast (despite precise models) • Usable: Tooling (generators, runner, vizu); Open-source, Portable, ... • On-Going work • SCHIaaS: Simulation of Clouds and Hybrid IaaS • Adding virtualization capabilities into SimGrid (VM migration, boot, …) SimGrid: Simulator of Distributed Applications 58Labex UCN@Sophia – F. Desprez Feb. 18, 2016 simgrid.gforge.inria.fr/
  59. 59. GRID’5000 – Real IaaS for Researchers • Testbed for research on distributed systems • Born from the observation that we need a better and larger testbed • HPC, Grids, P2P systems and more recently Cloud computing Adding virtualization capabilities into Grid’5000 INRIA RR8026/Jul 2012 • A complete access to the nodes’ hardware in an exclusive mode (from one node to the whole infrastructure) • Current status • 9 sites,1195 machines, 8184 cores • Diverse technologies/resources (Intel, AMD, Myrinet, Infiniband, two GPU clusters, energy probes) • Ready to use OpenStack distribution • Last significant experiment • Dynamic scheduling of 10K VMs across 4 sites 59Labex UCN@Sophia – F. Desprez Feb. 18, 2016 https://www.grid5000.fr/
  60. 60. CONCLUSIONS
  61. 61. Conclusion • Cloud Computing technology is changing every day New features, new requirements (IaaS ++ services) • Many research issues addressed in our research labs that should/will be transfered in tomorrow’s cloud infrastructures • Connection between “classical” Cloud infrastructures to next generation platforms (IoT) • Distributed Cloud Computing is happening ! • Dist. CC workshop (UCC 2013, SIGCOMM 2014/2015)FOG Computing workshop (collocated with IEEE ICC 2013), IEEE CloudNet … • How developers should develop new applications to benefit from such geographically distributed infrastructures 61Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  62. 62. References • European Commission report on The Future of Cloud Computing  http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf • A Roadmap for Advanced Cloud Technologies under H2020, European Commission, Recommandations by the Cloud Expert Group, Digital Agenda for Europe, Dec. 2012 •Report on the public consultation for H2020 Work Programme 2016-17: Cloud Computing and Software  ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?doc_id=8161 • Key Challenges in Cloud Computing, Enabling the Future Internet of Services, Rafael Moreno-Vozmediano, Ruben S. Montero, and Ignacio M. Llorente, IEEE INTERNET COMPUTING, Jul 2013 • NIST Cloud Strategy and Innovation Blog (I. Llorrente)  http://blog.cloudplan.org/ • Above the Clouds: A Berkeley View of Cloud Computing  http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html •DRAFT Cloud Computing Synopsis and Recommendations, NIST,  http://csrc.nist.gov/publications/drafts/800-146/Draft-NIST-SP800-146.pdf • SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond  http://www.sienainitiative.eu/Repository/FileScaricati/8ee3587a-f255-4e5c-aed4-9c2dc7b626f6.pdf • The Magellan Report on Cloud Computing for Science, Yellick et al., Dec. 2011  http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_final_report.pdf • Livre blanc sur le calcul intensif, Comité d’orientation pour le calcul intensif (Cocin) du CNRS, 2012  http://www.cnrs.fr/ins2i/IMG/pdf/Livre_blanc_-_derniere_version.pdf • Synergistic Challenges in Data-Intensive Science and Exascale Computing, DOE ASCAC Data Subcommittee Report, March 2013 • Integration of Cloud computing and Internet of Things: A survey, A. Botta, W. de Donato, V. Persico, A. Pescapé, Future Generation Computer Systems, 56 (2016) 62Labex UCN@Sophia – F. Desprez Feb. 18, 2016
  63. 63. QUESTIONS ? 63Labex UCN@Sophia – F. Desprez Feb. 18, 2016

×