Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Suning OpenStack Cloud and Heat

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 17 Publicité

Suning OpenStack Cloud and Heat

Télécharger pour lire hors ligne

An experience sharing of the OpenStack deployment at Suning.com, a large online retailer in China. The talk presents the challenges and opportunities on orchestrating the enterprise workloads using Heat.

An experience sharing of the OpenStack deployment at Suning.com, a large online retailer in China. The talk presents the challenges and opportunities on orchestrating the enterprise workloads using Heat.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Suning OpenStack Cloud and Heat (20)

Publicité

Plus récents (20)

Suning OpenStack Cloud and Heat

  1. 1. Xiaobin Zhang, zhangxbk@cnsuning.com Long Jin, jinlongb@cnsuning.com Qiming Teng, tengqim@cn.ibm.com
  2. 2. • Suning Overview • Suning OpenStack Journey • Suning Cloud Workload • Lessons Learnt • Wishlists 2
  3. 3. • Basic Information • Established in 1990 • The largest commercial enterprise in China • Top 3 Chinese private enterprises • The 50th among the Top 500 enterprises in China • Business Lines • retail, logistics, supply chain, real estates, investments, ... • By the end of 2012 • Suning has stores in 700+ cities in China and other countries. • The total number of staff is 180,000 • 4 R&D centers: Beijing, Shanghai, Nanjing, Silicon Valley • Brand value of $ 13 B, annual revenue $ 37 B 3 $ 175 Billion 43.9% YTY Growth 1H 2014, China
  4. 4. • Opportunities • Improve Efficiency, Collaboration Paradigms and Business Models • Traditional e-Commerce  O2O (Online-to-Offline)  Cloudified Whole Value Chain 4 PERSONALIZED SHOPPING EXPERIENCES EFFICIENT MERCHANDISING AND SUPPLY NETWORK TRANSFORM AND OPTIMIZE OPERATIONS OPERATING EFFICIENCY REVENUE GROWTH ECOSYSTEM DOMINATION
  5. 5. Suning Private Cloud • Multiple Data Centers • 1000s of Hosts • 10x1000s of Virtual Machines • Rich & Customized Middleware • Automated Deployment / Operation • Workflow Consolidation 5 Suning Public Cloud • Cloud Server • VPC • Shared, Object Storage • Cloud Database • Fast Deployment • Monitoring and Billing
  6. 6. • The journey starts since early 2013 • single deployment -> multi-region deployment across data centers • R & D workloads testing -> Internet/production workload 6 Domain Status Compute • 256 GB memory, 4 GB NICs, 64 cores; Windows/Linux guests Network • Isolated network for admin, data and storage; OVS bonding; HW LB Storage • LVM and GlusterFS resource pool with QoS support; Cinder multi-backend Container • Docker resource pool and docker repo with HA enabled Deployment • Cobbler, Puppet Management • 3 nodes HA setup for controllers; RabbitMQ cluster Monitoring • Proprietary resource/service monitoring tools, guest agents for data collection • RabbitMQ portal and LogStash Optimization • Resource scheduling for standalone, clustered and layered applications
  7. 7. • 100+ applications of diverse characteristics • Mixed CPU-intensive and I/O Intensive workload: • CPU-intensive, long-hour duration mobile application compilation and building • huge storage and volume (800G ~ 1T) • search engine compilation • big data analytics, e.g. sentiment analysis • thumbnail generation • Different software stacks for Internet applications • Apache + JBoss + MySQL • IHS + WAS + DB2 • Others 7
  8. 8. 8 Web / Frond-End AppServer / Middle-Tier Database / Back-End • Optionally Clustering • Optionally Auto-Scaling • Dispatch to different hosts, regions, networks, ... • Optionally Clustering • Optionally Auto-Scaling • Schedule to different hosts, regions, networks, ... • Short upgrade cycle: 1-4 weeks (not whole system) • Optionally Active/Passive • Dispatch to different hosts, networks, regions, ... Dynamic Discovery Live Registration Request Granularity Dynamic Discovery Live Registration Transaction Dispatching
  9. 9. • A component/service may play different roles • Apache: web-server and/or reverse-proxy and/or load-balancer • JBoss: front-end, back-end or both • Master agent, Host agent, JBoss instances • Service discovery and registration is complex • IT requirements like SSH key, service user, password, directory, package repository… • Legacy script and automation tools (taking in or discard) • Workload distribution has to be planned ahead • traditional process forking is not acceptable on a virtualized platform • VMs become the management unit on cloud • tuning specs: quota, profile, application characteristics • scaling VMs instead of forking new processes 9
  10. 10. • An orchestrator sitting above compute, storage and network • Template based VM provisioning, aka. stack creation • Heat’s auto-scaling solution is valuable for Suning's Internet applications • A standardized approach of cloud application deployment • and orchestration? 10 milk powder: 5 million cans milk: 100 containers promotion season: 3 days
  11. 11. • Standard images and software packages • Post-launch configuration • creation of user accounts • key distribution and revocation • VM roles assignment • package update or upgrade • middleware install and configuration • application install and configuration • monitoring tools install and configuration • service discover and registration ??? • ....... 11
  12. 12. • Deployment and Orchestration • Heat based deployment only covers part of the story • Cloud-init only concerns with the initial deployment • What we need is an integrated end-to-end tool chain that covers runtime/maintenance orchestration as well 12 • Orchestration is not thoroughly tested in community (e.g. Auto-scaling) • involves Heat, Ceilometer, Nova, Keystone... • rolling-update may not work as expected • scaling out may jump from one to many directly • ceilometer alarm evaluator may not work • ... • fixing these is not an easy job
  13. 13. • Triggers for scaling • network metrics (packets processed, bytes transferred) sounds interesting, but • CPU and memory are still the primary bottlenecks • Scaling may be triggered with combination factors of CPU, memory, disk I/O and/or network I/O with customized algorithm • Rolling update is of critical importance • ensure a given number of instances are always online when performing updates • Deletion policy from resource groups • sometimes the newest members are preferred to oldest members, considering that • old members may have state cached, may have proved to be stable, ... • Fast detection and fast scaling (seconds level) 13
  14. 14. • Availability • Storage Reliability/Availability • Hard disk errors are common • VM High-Availability (aka. the "Pets" story) • it doesn't seem like a single project mission • host failure, network failure, storage failure, guest failure, application failure ... • may need to get Nova, Ceilometer (Zaqar?), Heat, Keystone to work together • AutoScaling • Semi-AutoScaling (Scale at a given point in time) • Smarter VM placement, aka. Global Scheduling • e.g. 3 Apache server per host is okay, but 9 Apache per host is risky • VM placement is mainly concerned with service availability • Scaling across availability zones, across regions 14
  15. 15. • Application Profile and Management • Each application has a unique architecture where some components are reusable • Most components are capable of playing different roles (e.g. front-end vs back-end) • domain role, slave role, host role, etc. • Combinations are difficult to predict and manage • Solum? Murano? • Provider Templates? + promote template reusability + facilitate fine granularity version control - difficult to reference resources (attributes) from outer/inner templates - difficult to get dependencies done right - Tools to standardize Heat template collections 15
  16. 16. • Configurable frequency for Heat engine calls • mostly from os-xxx-config • may need a short interval during bootup, then switch to a longer interval • Tools and guidance for the establishment of standard workflows • need to abstract away common features and parameters • need to simplify the deployment, management process • need to adapt to new technologies • e.g. transition from using shared disk volumes to use storage cloud 16
  17. 17. Thank You! 17

×