5. ¿What is Dinahosting?
Our main business is web hosting and domain registration.
We offer the user all tools needed to develop their project on
Internet with guarantees:
- Domain name for your site.
- E-mail services.
- Hosting plans: from the simplest ones to complex and
powerful solutions like Cloud Hosting, as well as VPS and
Dedicated Servers.
6. ¿Where are we?
Presence on more than
130 international markets
México, Argentina, Colombia,
Chile, Portugal, Peru, Venezuela,
USA, Brazil, Ecuador, France,
United Kingdom, Italy, Denmark,
Netherlands, Uruguay, Bolivia,
Japan, China, Senegal, etc.
Santiago
9. - Toll-free phone number.
- Chat.
- E-mail.
- Social network presence.
- 24/7 service.
- No call-center auto-attendant.
Customer service
10. • Only for clients of managed services.
• Restorations at file level.
• 30 days max retention.
• Weekly complete backup, incremental rest of the week.
• ~3000 machines.
• ~1PB available space.
• ~30 bare metal storage servers.
• Complete backup size ~125TB.
Backups
Data size increases year by year an so their management complexity
13. RAID:
the end of an era
• Slow recovery.
• Hazardous recovery.
• Painful recovery.
• Disk incompatibility.
• Wasted disk for hot-spare.
• Expensive storage cards.
• Hard to scale.
• False sense of security.
[2]
14. ¿Would we be protected against … ?
hardware error
network outage
datacenter disaster
power supply
operating system error
filesystem failure
RAID:
the end of an era
15. Problems
managing files
- Backwards compatibility.
- Wasted space.
- ¿Storage node partition table?
- Corrupt files when disk is full.
- Many hours spent by SysOps.
- Forced to deploy and maintain an API.
19. Price per month 1PB cloud storage
AWS Cost Blocking elements
S3 Infrequent Access (IA) ~15.000 € Price
S3 Glacier ~5.000 €*
- Slow data retrieval**
- Limited availability***
- 500TB upload limit****
* Files deleted before 90 days incur a pro-rated charge.
** Expedited retrievals: 1-5 min. 3 expedited retrievals can be performed every 5 minutes.
Each unit of provisioned capacity costs 100$ per month.
Retrieval Standard: 3-5 hours.
*** Glacier inventory refresh every 24h.
**** Increasing the upload limit is available contacting AWS support.
AZURE Cost Blocking elements
Storage Standard Cool ~9.000 € Price
Storage Standard Archive ~5.000 € Restauraciones <15h
Azure charges extra cost if files are deleted before 30 and 180 days respectively.
GCP Cost Blocking elements
Nearline Storage ~8.000 € Price
Coldline Storage ~5.000 € Price
In both types of storage, data access is measured in milliseconds.
Upload backup
to the cloud
22. Unified, distributed storage system.
Intelligence on software.
- Open source.
- Massively scalable.
- Independent components.
- No SPOF.
- High performance.
- S3 API.
- Active community.
- Use of commodity hardware.
23. Clients
Object Storage
Ceph Storage Cluster
Block Storage File Storage
Ceph
Linux OS
CPU
Memory
HDDs
Network
Ceph Ceph
Linux OS Linux OS
CPU
Memory
HDDs
Network
CPU
Memory
HDDs
Network
Linux OS
CPU
Memory
HDDs
Network
Ceph
…
…
…
Ceph SDS
Linux OS
Hardware
Distributed
storage
Server1 Server2 Server3 ServerN
24. OSD (object storage daemon)
Monitors
- From one to thousands.
- Generally speaking, 1 OSD = 1 hard disk.
- Communicate between them to replicate data and make recoveries.
- Maintain cluster maps.
- Provide consensus on the decisions of data distribution.
- Small and odd number.
- Do not store data.
Gateways
- Entry points to cluster.
25. OSD
NODE
Disk
OSD OSD OSD
DiskDiskDisk
OSD
Disk
OSD OSD OSD
DiskDiskDisk
OSD
Disk
OSD OSD OSD
DiskDiskDisk
Server1 Server2 Server3
NODE
RADOS sends the read
request to the primary
OSD.
Primary OSD reads
data from local disk
and notifies Ceph
client.
WRITEREAD
1
2
1
2
Client writes data,
RADOS creates object
and sends data to the
primary OSD.
1
4
3
2 Primary OSD finds the
number of replicas and
sends data to replica
OSDs.
Replica OSDs write data
and send completion to
primary OSD.
Primary OSD signals
write completion to
Ceph client.
1
2 2
3
4
3
Data flow on OSDs
27. Ceph OSDs
• DELL R720XD / R730XD
• CPU: 2xE5-2660 8 cores 2,20ghz
• RAM: 64GB-96GB
• Disks:
• 12x8TB SATA
• 1 SATA disk for OS
• NIC: 10G
• Controller: H730 / LSI JBOD
Hardware
planification
Ceph monitors
• VPS.
• 4 vcores
• RAM: 4GB
• NIC: 10G
Ceph gateways
• DELL R230
• CPU: E3-1230v5 3.4GHz (8)
• RAM: 8GB
• NIC: 10G
¿Optimize for cost or performance?
In our case, the principal objective is to optimize total cost per GB.
28. ¿What happen to the OSDs if the OS disk dies?
“We recommend using a dedicated drive for the operating system and software,
and one drive for each OSD Daemon you run on the host”
So…¿where do I put the operating system of the OSD node?
PROS CONS
OS in RAID1 - Cluster protected against OS failures.
- Hot-swap disks.
- We do not have a RAID card*.
- We would need 1 extra disk.
OS single disk
- Only 1 disk slot used.
- High reliability. Monitor disk with
SMART.
If disk dies, all OSDs of that machine
die too.
OS in SATADOM All disk slots available for OSDs.
They are not reliable after
months of use.
OS from SAN - All disk slots available for OSDs.
- RAID protected.
We depend on the network
and remote storage.
OS in SD All disk slots available for OSDs. Poor performance, not reliable.
*PERC H730 supports RAID.
Hardware
planification
29. Rules of thumb for a Ceph installation.
- 10G networking as a minimum.
- Deeply knowledge of hardware you wish to use.
- Always use at least 3 replicas
- Try to use enterprise SSDs.
- Don’t use configuration options you don’t understand.
- Power loss testing.
- Have a recovery plan.
- Use a CM (configuration management) system.
Hardware
planification
30. https://github.com/ceph/ceph-ansible.git
• Gradual learning curve.
• Plain deploy, no lifecycle management.
• No orchestration.
• No server needed.
• Evolution of ceph-deploy tool.
http://docs.ceph.com/ceph-ansible/
TIPS:
- Use ansible compatible version (no bleeding edge version supported).
- Do not use master branch unless you like strong emotions.
37. Tuning
No silver bullets.
root@ceph:~# ceph --show-config | wc -l
1397
root@ceph:~#
Default options are designed for
general use cases.
Most of the times you need to make
some adjustments in order to achieve
real performance.
Ceph documentation is highly valuable
and extensive:
http://docs.ceph.com/docs/master/
38. Tuning
• Enable Jumbo Frames.
ping6 -M do -s 8972 <ip>
• Monitor options:
[mon]
mon osd nearfull ratio = .90
mon osd down out subtree limit = host
• OSD options:
[osd]
osd scrub sleep = .1
osd scrub load threshold = 1.0
osd scrub begin hour = 12
osd scrub end hour = 0
Overhead
Data
Standard Frames (1522 MTU)
Jumbo Frame (9000 MTU)
Overhead
Data
• Daily reweight:
ceph osd reweight-by-utilization [threshold]
39. Erasure Code
Replicated pool vs. Erasure code pool
OBJECT
COPY
COPYCOPY
Replicated pool
CEPH STORAGE CLUSTER
OBJECT
Erasure coded pool
CEPH STORAGE CLUSTER
1 2 3 X Y
Full copies of stored objects
• High durability.
• 3x (200% overhead)
• Quicker recovery
• Admit all kind of operations.
• Use less resources (CPU).
One copy plus parity
• Cost-effective durability.
• 1.5x (50% overhead)
• Expensive recovery.
• Partial writes not supported*.
• Higher CPU usage.
40. CEPH CLIENT
Erasure coded pool
CEPH STORAGE CLUSTER
1
OSD
2
OSD
3
OSD
4
OSD
X
OSD
Y
OSD
READ
Reads
Erasure Code
¿HOW DOES ERASURE CODE WORKS?
41. Erasure coded pool
CEPH STORAGE CLUSTER
1
OSD
2
OSD
3
OSD
4
OSD
X
OSD
Y
OSD
READ
READS
Erasure Code CEPH CLIENT
Reads
¿HOW DOES ERASURE CODE WORKS?
42. Erasure coded pool
CEPH STORAGE CLUSTER
1
OSD
2
OSD
3
OSD
4
OSD
X
OSD
Y
OSD
READ REPLY
READS
Erasure Code CEPH CLIENT
Reads
¿HOW DOES ERASURE CODE WORKS?
46. Two variables: K + M
K = data shards
M = erasure code shards
Usable space
OSDs number
allowed to fail
3+1 75 % 1
4+2 66 % 2
18+2 90 % 2
11+3 78.5 % 3
n=k+m n= total shards
r=k/n r=encoding rate
n=4+2=6
r=4/6=0.66
CERN
Erasure Code
¿HOW DOES ERASURE CODE WORKS?
49. ¿How do we
sync backups?
message broker
AGENTBACKUP_DONE
Consume n elements
SERVER
publish
publish
SERVER
publish
SERVER
…
User-RW
WRITE
50. ¿How do we
restore
backups?
PANEL AGENT
Generate temporary links
Method 1: Restorations ordered by control panel.
SERVER GET
User-RO
Method 2: Restorations from the same machine.
52. s3cmd minio
Ceph client
requirements
• No dependencies.
• Multi-OS compatible.
• Low resources requirements.
• Active development.
• Bandwidth limit.
We need to limit used bandwidth.
53. CPU and IO
problems
CPU
NET
1G
Limit bandwidth
Powerful machine
CPU
NET
1G
No limits
Not-so-powerful machine
CPU
NET
1G
No limits
Powerful machine
Elastic limit
CPU
NET
1G
54. Linux Traffic Control (TC)
Flow1
Flow2
Flow3
Flow4
Port
FIFO queue
Ceph traffic
Default behaviour
Flow1
Flow2
Flow3
Flow4
Classifier
Ceph traffic
Port
Ceph traffic
FIFO queue
Hierarchical Token Bucket (HTB)
Applying tc policy
CPU and IO
adjustments
55. Linux Traffic Control (TC)
Regulate outgoing traffic using system load.
CPU
NET
Network
limit
Allowed
CPU load
range
Reduce/increase transfer rate
CPU and IO
adjustments