A retrospective on 20 years of storage best practices, the original reasons behind them, and the slaughtering of the sacred cows of storage architecture that must take place in this new era of infinite bandwidth, IOPs, capacity, and customer choice in the datacenter. Attendees will learn real examples of storage architecture as applied at Booking.com, much of which will be surprising.
Peter is Booking.com’s resident storage nerd. From his secret base on an Amsterdam houseboat, he is working to make Ethernet storage great again. With 24+ years working in data-centers and storage (including 7 years at NetApp), he now applies his experience (painstakingly acquired through many failures) at Booking.com, one of the largest travel e-commerce companies in the world. He is obsessive about latency and still thinks the cloud is just someone else’s computer.
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Lies, damned lies, and best practices
1.
2. Peter Buschman
Lies, Damned Lies, and Best Practices
13.10-15.10-2020
data://disrupted
(Storage in the Age of Abundance)
3. About
● Majority part of Booking Holdings (formerly Priceline Group)
● One of the largest e-Commerce websites in the world
● The largest online accommodation website in the world
● >1.5 Million properties in 220+ countries and territories
● 1.55 Million room nights booked every 24 hours
● >15,000 employees in 198 offices in 70 countries
● 1000s of LUNs, NFS shares, and S3 buckets
● Managed by a storage team of only 4 people
(as of February, 2020)
4. About the Title
"There are three kinds of lies:
lies, damned lies, and statistics."
--anonymous
"There are three kinds of lies:
lies, damned lies, and best practices."
--me
5. The Age of Abundance
No more scarcity of....
●IOPs
●Bandwidth
●Latency
7. Jumbo Frames
● Definition
● Ethernet frames with a MTU > 1500 Bytes (typically 9000)
● Recommended for 20+ years for
● NAS protocols (NFS, SMB)
● Increased throughput
● "Performance"
● No longer a best practice because
● Network speed has increased to a level we simply cannot saturate
● Modern NICs can achieve max performance with 1500 Bytes
● Efficiencies gained no longer worth the added complexity
14. FibreChannel SANs
● Definition
● Separate network for storage IO using the FibreChannel protocol
● FibreChannel most commonly used
● Recommended for
● Performance and latency
● Elimination of storage islands within servers
● No longer a best practice because
● Performance and latency now worse than Ethernet
● High cost of investment / poor port density
● Complexity of maintaining another network (2 of them!)
● No support for file or object protocols
19. Lossless Ethernet
● Definition
● Ethernet enhancements for guaranteeing performance to certain
applications and protocols (Converged Enhanced Ethernet)
● Data Center Bridging, Priority Flow Control, etc.,
● Recommended for
● Prioritizing important applications over less important ones
● Preventing noisy neighbors
● No longer a best practice because
● Bandwidth is now an abundant resource
● Added complexity is no longer worth it
● Performance can actually be better with QoS turned off!
20. RAID Controllers with SSDs
● Definition
● Hardware RAID (including cache) in front of SAS or SATA SSDs
● Recommended for
● Performance
● Redundancy
● No longer a best practice because
● Performance is saturated with very few SSDs
● Reliability of SSDs is greater than that of the controllers
● Latency of SSDs is lower without the controller in front of it
25. SAS Disk Drives
● Definition
● Hard Disk Drives with Serial Attached SCSI (SAS) interfaces
● Recommended for
● Performance vs. Serial ATA (SATA) drives
● Dual ports for High Availability (HA) configurations
● No longer a best practice because
● Most disk deployments these days are nearline capacity
● Nearline drives (7200rpm) cannot exceed SATA performance
● SATA edges out SAS on both cost and power metrics
29. SAS vs. SATA
Image credits: Lothar Spurzem - Wikimedia CommonsImage credits: sv1ambo - Wikimedia Commons
Nearline SAS Nearline SATA
"Same engine, different chassis. Not actually faster"
30. Write Intensive SSDs
● Definition
● SSDs rated for an especially high number of write cycles
● Recommended for
● Write intensive applications with a high degree of data churn
● Protection against premature wearing out of NAND
● No longer a best practice because
● Manufacturers greatly over estimated NAND wear rates
● No physical difference between drive SKUs
● Read intensive drives can be re-formatted to increase write cycles
40. Summary
Old New
Jumbo Frames Standard 1500B MTU above 10Gb/s
FibreChannel SANs Converge network on 100Gb Ethernet
Storage Best Practices in the Age of Abundance
41. Summary
Old New
Jumbo Frames Standard 1500B MTU above 10Gb/s
FibreChannel SANs Converge network on 100Gb Ethernet
Lossless Ethernet Over-provision bandwidth
Storage Best Practices in the Age of Abundance
42. Summary
Old New
Jumbo Frames Standard 1500B MTU above 10Gb/s
FibreChannel SANs Converge network on 100Gb Ethernet
Lossless Ethernet Over-provision bandwidth
RAID Controllers High quality direct attached SSDs
Storage Best Practices in the Age of Abundance
43. Summary
Old New
Jumbo Frames Standard 1500B MTU above 10Gb/s
FibreChannel SANs Converge network on 100Gb Ethernet
Lossless Ethernet Over-provision bandwidth
RAID Controllers High quality direct attached SSDs
SAS Disk Drives Use SATA for high capacity nearline use cases
Storage Best Practices in the Age of Abundance
44. Summary
Old New
Jumbo Frames Standard 1500B MTU above 10Gb/s
FibreChannel SANs Converge network on 100Gb Ethernet
Lossless Ethernet Over-provision bandwidth
RAID Controllers High quality direct attached SSDs
SAS Disk Drives Use SATA for high capacity nearline use cases
Write Intensive SSDs Read Intensive SSDs formatted to desired endurance
Storage Best Practices in the Age of Abundance
45. What lies have you been told lately?
Image credit: U.S. Public Domain