Scaling API-first – The story of a global engineering organization
Hops fs huawei internal conference july 2021
1. Jim Dowling
Associate Professor, KTH Royal Institute of Technology
CEO, Logical Clocks
July 2021
HopsFS
Huawei Cloud InnovWave 2021
2. Evolutionary History of Distributed OLTP Databases
Serializable
Past
Present
Oracle RAC
DynamoDB,
Cassandra
Two-Phase Commit,
Shared State
Quorum-Based,
Eventually Consistent
“Transactions” with
Timestamps
(Wallclock or Logical Time)
No SQL New SQL
Spanner,
Cockroach, RonDB
SQL
3. Evolutionary History of Scalable SQL Data Warehouses
Past
Present
Apache Hive
Delta Lake,
Apache Hudi
Metadata Server
(MySQL)
File-Based Metadata
(S3, HDFS)
Scaleout Metadata
Database
(FoundationDB, Spanner)
Lakehouse Scaleout Metadata
Warehouses
Snowflake,
BigQuery
Scalable SQL
4. Evolutionary History of Distributed File Systems
POSIX
Past
Present
NFS, HDFS S3, GCS, ABFS
Strongly Consistent
Metadata (1 Server)
Eventually Consistent
Metadata (Distributed)
Strongly Consistent
Metadata (Distributed)
Object
Stores
Scaleout Metadata
Filesystems
HopsFS, Colossus,
Tectonic, ADLSv2
Distributed File
Systems
7. Read Hotspots: Scale out by adding Data Nodes (partitions) or Scale Up by adding Query Threads
RonDB Thread Pipeline Architecture Cache-Aware Thread Pipeline Architecture
https://www.logicalclocks.com/blog/rondb-automatic-thread-configuration
https://www.logicalclocks.com/blog/designing-a-thread-pipeline-for-optimal-database-throughput-with-high-ipc-and-low-cpu-cache-misses
8. HopsFS Architecture
Niazi et Al, HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases, USENIX FAST 2017
9. HopsFS Metadata is Highly Available across Availability Zones in the Cloud
Ismail et al, Distributed Hierarchical File Systems strike back in the Cloud, ICDCS 2020
HopsFS outperforms CephFS on a Spotify workload
11. Small Files in HopsFS using RonDB and On-Disk Data (NVMe disks)
Niazi et al, Size Matters: Improving the Performance of Small Files in Hadoop, Middleware 2018
At Spotify, Yahoo, up to 40% of files are small files Store small files (under 64-128KB) in RonDB
12. 12
HopsFS
Get all images with 1 cat and 1 guitar
1 cat and 1 guitar
Free-text search is not supported by RonDB
Polyglot Metadata - Adding Free Text Search to HopsFS
13. Free-Text Search for Files enabled by ePipe and Elasticsearch
Ismail et al, ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata, CCGrid 2019
HopsFS
RonDB
Inode (File/Dir)
Inode
Metadata
Elasticsearch
ePipe sync
Free-Text Search
15. Building a Unified Machine Learning Platform on HopsFS’ Scaleout Metadata
16. HopsFS
Complete MLOps Infrastructure with Hopsworks, Feature Store using HopsFS/RonDB
Code and
configuration
Data Lake,
Warehouse,
Kafka
Model
Registry
Feature
Engineering
Model
Serving
Model
Training
Model
Deploy
Model
Monitoring
Model
Development
Retrieve Online Features
Log Predictions Training Data Statistics
Sync
Experiment
Versioning
Feature Versioning/Statistics
A/B Test
Model
Versioning
& Statistics
Serving
Statistics
Free-text Search
Feature
Store
Elasticsearch
RonDB