2. About Spreadshirt
2
Spread it with Spreadshirt
A global e-commerce platform for everyone to create, sell and buy
ideas on clothing and accessories across many points of sale.
• 12 languages, 11 currencies
• 19 markets
• 150+ shipping regions
• community of >70.000 active sellers
• € 72M revenue (2014)
• >3.3M items shipped (2014)
3. Object Storage at Spreadshirt
• What?
– Store and read primarily user generated content, mostly images
• Typical sizes:
– a few dozen KB, a few MB
• Some 10s of terabyte (TB) of data
• Read > Write
• „Never change a running system“?
– Currently solution from the early days with big storage + lots of files /
directories doesn‘t work anymore
• Regular UNIX tools get unusable in practice
• Not designed for „the cloud“ (e.g. replication is an issue)
– Growing number of users à more content
– Build a truly global platform (multiple regions and data centers)
3
4. Ceph
• Why Ceph?
– Vendor independent
– Open source
– Runs on commodity hardware
– Local installation for minimal latency
– Existing knowledge and experience
– S3-API
• Simple bucket-to-bucket replication
– A good fit also for < Petabyte
– Easy to add more storage
– (Can be used later for block storage)
4
5. Ceph Object Storage Architecture
5
Overview
Ceph Object Gateway
Monitor
Cluster Network
Public Network
OSDOSD OSDOSDOSD
MonitorMonitor
A lot of
nodes and
disks
Client
HTTP (S3 or SWIFT API)
RADOS
(reliable autonomic distributed object store)
6. Ceph Object Storage Architecture
6
A little more detailled
Monitor
Cluster Network
Public Network
Client
RadosGW
HTTP (S3 or SWIFT API)
MonitorMonitor
Some SSDs
(for journals)
More HDDs
JBOD
(no RAID)
OSD node
Ceph Object
Gateway
librados
Odd number
(Quorum)
OSD node OSD node OSD node OSD node
1G
10G
(the more the
better)
...
RADOS
(reliable autonomic distributed object store)
OSD node
7. Ceph Object Storage Architecture
7
Initial Setup (planned)
Cluster Network (OSD Replication)
Cluster nodes
3 x SSD
(journal / index)
9 x HDD (data)
3 Monitors
2 x 1G, IPv4
2 x 10G, IPv6
Public Network
Client
HTTP (S3 or SWIFT API)
HAProxy
RadosGW
Monitor
RadosGW
Monitor
RadosGW
Monitor
RadosGWRadosGW
2 x 10G, IPv6Cluster Network
RadosGW
on each node
8. Ceph Object Storage Performance
8
Some smoke tests
• How fast is RadosGW? Get an impression.
– Response times (read / write)
• Average?
• Percentiles (P99)?
– Compared to AWS S3?
• A very minimalistic test setup
– 3 VMs (KVM) all with RadosGW, Monitor and OSD
• 2 Cores, 4GB RAM, 1 OSD each (15 GB + 5GB), 10G Network
between nodes, HAProxy (round-robin), LAN, HTTP
– No further optimizations
9. Ceph Object Storage Performance
9
Some smoke tests
• How fast is RadosGW?
– Random read and write
– Object size: 4 KB
• Results: Pretty promising!
– E.g. 16 parallel threads, read:
• Avg 9 ms
• P99 49 ms
• > 1.300 requests/s
10. Ceph Object Storage Performance
10
Some smoke tests
• Compared to Amazon S3?
– Comparing apples and oranges (unfair, but interresting)
• http vs. https, LAN vs. WAN etc.
• Reponse times
– Random read, object size: 4KB, 4 parallel threads, location: Leipzig
Ceph S3 AWS S3
eu-central-1 eu-west-1
Location Leipzig Frankfurt Ireland
Avg 6 ms 25 ms 56 ms
P99 47 ms 128 ms 374 ms
Requests/s 405 143 62
11. Global Availability
11
• 1 Ceph cluster per data center
• S3 bucket-to-bucket replication
• Multiple regions, local delivery