  1. How We Defined Our Own Cloud May 19th, 2022 Andrew Hajinikitas Cloud Platform Department Rakuten Group, Inc.
  2. 2 About Me Andrew Hajinikitas Worked in industry over 20+ years! ~ seen a lot of change and made a a lot of change! Currently working in Rakuten’s: Cloud platform Department Core Infrastructure Section as Vice Senior Manager Started at Rakuten in 2018 as an Architect in Cloud platform Department - Core Infrastructure – Network Group Specializing in pipelining automation, infrastructure, networking, security, and much much more! ! !
  3. 3 CONTENTS 1. Some of Rakuten's key infrastructure 2. About the current cloud platform 3. Potential future efforts
  4. 4 Metal Instance A machine is one of fundamental bricks of our Cloud, alongside the network We’re here providing building blocks Metal Containers / Hypervisors Internet projects Customer projects One Cloud Services
  5. 5 Microservers Quanta X11C-8N • Intel Xeon E-2100 system. • Four DDR4 DIMM slots. • For storage, each sled has a 2.5” SFF SATA drive bay • Can use either two M.2 drives or two NF1 drives with the sled. • Networking is provided by either dual 10GbE or a single 25GbE uplink Reference Page: server/Microserver/QuantaMicro-X11C-8N
  6. 6 Internals to a Sled Reference Page:
  7. 7 GPU Servers Quanta SYS-420GP-TNAR • 2x 32 core Intel Xeon 8358 @2.6 Ghz • 2 Terrabyte of RAM • 8x A100 - 80 GB NVidia GPU • 2x 960 GB (OS) • 4x 3.84 TB (data) • 8x HDR (200 Gbs, GPU) • 2x HDR (200 Gbs, CPU) • 2x 100 Gbs • 2x 200 Gbs 0.16 PFlops (FP32) • 1.25 PFlops (FP32+TC) Reference Page:
  8. 8 GPU Servers
  9. 9 Software Tech Stack
  10. 10 US EU 9data centers in operation in Japan and overseas Providing around 30,000servers • Expansion of new managed services • Expanding the implementation / use of study sessions for in-house users LBaaS DBaaS Storage CaaS Monitoring JP Current Status New Managed Services
  11. 11 What’s Coming Up? • Extension of network self-service • Making GPU resources a platform service • Efficiency including DC resources • Resource-efficient physical equipment selection according to the usage of each upper service layer • HW / NW resource management and stable supply (responding to silicon shortage) • Network CI / CD (automation of upgrades and verifications) 未来を想像させる画像
  12. 12 What’s Coming Up? • We have built a full-fledged integrated private cloud to support the Rakuten ecosystem and have begun using it in production environments. • Steer towards managed services and simplify core infrastructure. • Providing high-performance physical server / network resources at low cost. • Multi-tenant support / Clarify the scope of responsibility between each platform and user. • We are promoting the use of Rakuten services not only in Japan but also overseas, and have already expanded to 9 DCs in Japan and overseas. 未来を想像させる画像
  13. 13 Goal ~ “Manage the Instance Life Cycle from One Dashboard”