Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Inoreader OpenNebula + StorPool migration

848 vues

Publié le

How Inoreader Migrated from Bare-metal Servers to OpenNebula + StorPool

Publié dans : Technologie
  • Soyez le premier à commenter

Inoreader OpenNebula + StorPool migration

  1. 1. Introduction Introduction 2 Presenter and company intro Who are we and what we do? Migration to OpenNebula and StorPool In order to fix our scalability problems we pinpointed the need for a virtualization layer and distributed storage. After thorough research we ended up with OpenNebula and StorPool Inoreader What is Inoreader and what challenges we faced while building and maintaining it? Tips Infrastructure issues We were facing numerous scalability issues while at the same time we hade a an array of servers doing nothing mostly because of filled storage. At certain point we hit a brick wall. QA If you have any questions I will gladly answer them Some useful takeaways for you.
  2. 2. I have 10+ years of experience in the Telco IT sector, working with large enterprise solutions as well as building specialized solutions from scratch. I have founded a company called Innologica in 2013 with the mission of developing Next-Gen OSS and BSS solutions. A side project was born back then called Inoreader, which quickly turned into a leading platform for content consumption and is now a core product of the company. Yordan Yordanov 3 CEO Innologica
  3. 3. Who Are We? 5 Product company We are not a sweatshop. We make successful products. International market Our customers are all over the globe. Relaxed environment We do not push the devs, but we cherish top performers. Smart team The team is small, but each member brings great value.
  4. 4. Inoreader RSS News aggregator and information hub 6 150,000 DAU We have 150k daily active users (DAU) and more than 30k simultaneous sessions in peak times. Closing in on 1M registered users soon. 10k and counting premium subscribers. 15,000,000,000 articles in MySQL and ES We keep the full archive in enormous MySQL Databases and a separate Elasticsearch cluster just for searching. Around 20TB of data without the replicas. 10M+ new articles per day. 1,000,000 feed updates per hour We need to update our 10+ Million feeds in a timely manner. A lot of machines are dedicated for this task only. 40 VMs and 10 physical hosts The platform is currently running on 30 Virtual Machines mainly in our main DC. There are some physical hosts that were not good candidates for virtualization mainly for Elasticsearch.
  5. 5. 7 Extreme Makeover The old and the new setup 7 100% Virtualized No more services running directly on bare-metal. Lighter power footprint300% more capacity with 60% of the previous servers with room for expansion. Performance gains Huge compute and storage performance gains. Maintainability is a breeze too.
  6. 6. INFRASTRUCTURE ISSUES Our main drivers to migrate to fully virtualized environment
  7. 7. Hardware capacity 9 We needed to constantly buy new servers just to keep up with the growing databases, because local storages were being quickly exhausted. We were using expensive RAID cards and RAID-10 setups for all databases. Those severs never used more than 10% of their CPUs, so it was a complete waste of resources. Our problem CPU 10% Memory Storage Rack space 50% 90% 100%
  8. 8. Hardware failures Not so common but always hair-pulling 10 All components are bound to fail. Whenever we lose a server, there was always at least some service disruption if not a whole outage. All databases needed to have replications, which skyrocketed server costs and didn’t provide automatic HA. If a hard-drive fails in a RAID-10 setup you need to replace it ASAP. Bigger drives are more prone to cause errors while rebuilding. Large databases on RAID-10 are slow to recover from crashes, so replications should be carefully set up and should be on identical (expensive) hardware in case a replication should be promoted to a master. Nobody likes to go to a DC on Saturday to replace a failed drive, reinstall OS and rotate replications. We much prefer to ride bikes! Problem description
  9. 9. CHOSEN SOLUTION We chose to virtualize everything using OpenNebula + StorPool
  10. 10. Project Timeline 12 2017 Nov 2017 Nov 2017 – Jan 2018 Feb 2018 Mar 2018 PROJECT START We knew for quite a while that we need a solution to the growth problem. PLANNING AND FIRST TESTS While the hardware was in transit we took our time to learn OpenNebula and test it as much as possible SUCCESS We have finally migrated our last server and all VMs were happily running on OpenNebula and StorPool. CHOOSING A SOLUTION We held some meetings with vendors and researched different solutions EXECUTION We have migrated all servers through several iterations which will be described in more detail here
  11. 11. Hardware 13 StorPool nodes We chose a standard 3x SuperMicro SC836 3U servers. Switches As recommended by StorPool we chose Quanta LB8 for the 10G network and Quanta LB4-M for the Gigabit network. Hypervisors We have reused our old servers, but modified their CPUs and memory. Others 10G LAN cards and cables
  12. 12. StorPool Nodes 14 StorPool recommends to use commodity hardware. Supermicro offers a good platform without vendor specific requirements for RAID cards, etc. and is very budget friendly. Our setup: • Supermicro CSE-836B chassis • Supermicro X10SRL-F motherboard • 1x Intel Xeon E5-1620 v4 CPU (8 threads @3.5Ghz) • 64GB DDR4-2666 RAM • Avago 3108L RAID controller with 2G cache • Intel X520-DA2 10G Ethernet card • 8x 4TB HDD LFF SATA3 7200 RPM • 8x 2TB HDD LFF SATA3 7200 RPM (reused from older servers) Around 3300 EUR per server
  13. 13. Gigabit Network – Quanta LB4M 15 We were struggling with some old TP-Link SG2424 switches that we wanted to upgrade, so we used the opportunity to upgrade the regular 1G network too. We chose the Quanta LB4M. Key aspects • 48x Gigabit RJ45 ports • 2x 10G SFP+ ports • Redundant power supplies • Very cheap! • EOL – You might want to stack up some spare switches! • Stable (4 months without a single flop for now) Around 250 EUR per switch from eBay.
  14. 14. 10G Network – Quanta LB8 16 Again due to StorPool recommendation we procured three Quanta LB8 switches. They seem to be performing great so far. Key aspects • 48x 10G SFP+ ports • Redundant power supplies • Very cheap for what they offer! • EOL – You might want to stack up some spare switches! • Stable (4 months without a single flop for now) 700-1000 EUR per switch from eBay including customs taxes.
  15. 15. Hypervisors 17 We have reused our old servers, but with some significant upgrades. We currently have 12 hypervisors with the following configuration: • Supermicro 1U chassis with X9DRW motherboards • 2x Intel Xeon E5-2650 v2 CPU (32 total threads) • Dual power supply • 128G DDR3 12800R Memory • Intel X520-DA2 10G card • 2xHDD in mdraid for OS only
  16. 16. EXECUTION Story with pictures
  17. 17. New Rack 19 We have rented a new rack in our collocation center since we didn’t have any more space available in the old rack. The idea was simple – Deploy StorPool in the new rack only and gradually migrate hypervisors.
  18. 18. StorPool Nodes 20 The servers landed in our office in late January. It was Friday afternoon, but we quickly installed them in the lab and let the StorPool guys do their magic over the weekend.
  19. 19. Installation Day 21 The next Monday StorPool finished all tests and the equipment was ready to be installed in our DC.
  20. 20. Installation Day 22 Fast forward several hours and we had our first StorPool cluster up and running. Still not hypervisors. StorPool needed to perform a full cluster check in the real environment to see if everything works well.
  21. 21. First hypervisors 23 The very next day we installed our first hypervisors – the temporary ones that were holding VMs installed during our test period. Those VMs were still running on local storage and NFS. The next step was to migrate them to StorPool.
  22. 22. VM Migration to StorPool 24 Shut down the VM Use SunStone or cli to shut down the VM.01 Create StorPool volumesOn the host, use the storpool cli to create volume(s) for the VM with the exact size of the original images 02 Copy the VolumesUse dd or qemu-convert for raw and qcow2 images respectively to copy the images to the StorPool volumes. 03 Reattach imagesDetach local images and attach StorPool ones. Mind the order. There’s a catch with large images* 04 Power up the VM Check if the VM boots properly. We’re not done yet…05 Finalize the migrationTo fully migrate persistent VMs use the Recover -> delete-recreate function to redeploy all files to StorPool. 06 *Large images (100G+) takes forever to detach on slow local storage, so we had to kill the cp process and use the onevm recover success option to lie to OpenNebula that the detach actually completed. This is risky but save a LOT of downtime. After all VMs are migrated, you can delete the old system and image datastores and leave only StorPool DSs At this point we are completely on StorPool! StorPool helps their customers with this step, but here’s the summary of what we did.
  23. 23. Next hypervisors 25 From here on we had several iterations that consisted of roughly the following: • Create a list of servers for migration. The more hypervisors the more servers we can move in a single iteration • Create VMs and migrate the services there • Use the opportunity to untangle microservices running on the same machine • Make sure servers are completely drained from any services. • Shut down the servers and plan a visit to the DC the next day • Continue on the next slide…
  24. 24. Remove servers from the old rack 26
  25. 25. Remove HDDs and RAID controllers 27
  26. 26. Upgrade CPUs and RAM 28
  27. 27. Install 10G card and smaller HDDs and reinstall OS 29
  28. 28. Install the servers in the new rack and hand over to StorPool 30
  29. 29. RINSE AND REPEAT At each iteration we move more servers at once because we have more capacity for VMs
  30. 30. Current capacity 32 At the end we have achieved 3x capacity boost in terms of processing power and memory with just a fraction of our previous servers, because with virtualization we can distribute the resources however we’d like. In terms of storage we are on a completely different level since we are no longer restricted to a single machine capacity, we have 3x redundancy and all the performance we need. We did it! Allocated CPU 37% Allocated Memory Storage Rack space 32% 67% 70%
  31. 31. Our Dashboard 33 A glimpse at our OpenNebula dashboard. 336 CPU cores and 1.2TB of RAM in just 12 hypervisors.
  32. 32. Hypervisor view 34 All hypervisors are all nicely balanced using the default scheduler. There’s always enough room to move VMs around in case a hypervisor crashes or if we need to reboot a host.
  33. 33. SOME TIPS
  34. 34. Optimize CPU for homogenous clusters 36 Available as template setting since OpenNebula 5.4.6. Set to host- passthrough. This option presents the real CPU model to the VMs instead of the default QEMU CPU. It can substantially increase the performance especially if instructions like aes are needed. Do not use it if you have different CPU models across the cluster since it will cause the VMs to crash after live migration. For older OpenNebula setups set this as RAW DATA in the template: <cpu mode="host-passthrough"/>
  35. 35. Beware of mkfs.xfs on large StorPool volumes inside VMs 37 We noticed that when doing mkfs.xfs on large StorPool volumes (e.g. 4TB) there was a big delay before the command completes. What’s worse is that during this time all VMs on this host starve for IO, because the storpool_block.bin process is using 100% CPU time. The image shown on the left is for 1TB volume. The reason is that mkfs uses TRIM by default and the StorPool driver support that. To remedy it use -K option for mkfs.xfs or -E nodiscard for mkfs.ext4, e.g.: • mkfs.xfs -K /dev/sdb1 • mkfs.ext4 -E nodiscard /dev/sdb1
  36. 36. Use the 10G network for OpenNebula too 38 This is probably an obvious one, but it deserves to be mentioned. By default your hosts will probably resolve others via the regular Gigabit network. Forcing them to talk through the 10G storage network will drastically improve the live VM migration. The migration is not IO bound so it will completely saturate the network. Usually a simple /etc/hosts modification. Consult with StorPool for your specific use case before doing that. Live migrating a VM with 8G of ram takes 7 seconds on 10G. The same VM will take aboud 1.5 minutes on a Gigabit network and will probably disturb VM communications if the network is saturated. Live migration on highly loaded VMs can take significantly longer and should be monitored. In some cases it’s enough to stop busy services for just a second for the migration to complete.
  37. 37. Other tips 39 Those are the more obvious ones that probably everyone uses in production, but still worth mentioning. • Use cache=none, io=native when attaching volumes • Use virtio networking instead of the default 8139 nic. The latter has performance issues and drops packets when host IO is high • Measure IO latency instead of IO load to judge saturation. We have several machines with constant 99% IO load which are doing perfectly fine. /etc/one/vmm_exec/vmm_exec_kvm.conf: … DISK = [ driver = "raw" , cache = "none", io = "native", discard = "unmap", bus = "scsi" ] NIC = [ filter = "clean-traffic", model="virtio" ] ….
  38. 38. MONITORING Dashboards
  39. 39. Grafana Dashboards 41 We have adapted the OpenNebula Dashboards with Graphite and Grafana scripts by Sebastian Mangelkramer and used them to create our own Grafana dashboards so we can see at a glance which hypervisors are most loaded and how much overall capacity we have.
  40. 40. Grafana TV Dashboard 42 Why not have a master dashboard on the TV at the office? This gives our team a very quick and easy way to tell if everything is working smoothly. If all you see is green, we’re good  This dashboard show our main DC on the first row, our backup DC on the second and then some other critical aspects of our system. It’s still a WIP, hence the empty space. At the top is our Geckoboard that we use for more business KPIs.
  41. 41. Server Power Usage in Grafana 43 Part of our virtualization project was to optimize the electricity bill by using less servers. We were able to easily measure our power usage by using Graphite and Grafana. If you are interested, the script for getting the data into Graphite is here: https://gist.github.com/Jacketbg/6973efdb41a2ecfcf2a83ea8 4c086887 The Grafana Dashboard can be found here: https://gist.github.com/Jacketbg/7255b4f81ebb2de0e8a570 8b4335c9d7 Obviously you will need to tweak it, especially the formula for the power bill.
  42. 42. StorPool’s Grafana 44 StorPool were nice to give us an access to their own Grafana instance where they collect a lot of internal data about the system and KPIs. It gives us great insights that we couldn’t get otherwise so we can plan and estimate the system load very well.
  43. 43. What’s Left? 45 SSD Pool We are currently only using a HDD pool, but we could benefit from a smaller SSD pool for picky MySQL databases. Add more hypervisors As the service grows our needs will too. We will probably have rack space for the near years to come. Add more StorPool nodes We have maxed out the HDD bays on our our current nodes, so we’ll probably need to add more nodes in the future. Upgrade StorPool nodes to 40G Currently the nodes use 2x10G ports like the hypervisors. After adding an SSD pool we are considering upgrading to 40G