200
● SSD
2000
● NVMe
4000
Tune for your hardware. Higher is better but avoid over-committing IOPS.
innodb_flush_log_at_trx_commit 1 Flush logs at each transaction commit for ACID compliance.
innodb_log_buffer_size 16M-64M Default is 8M. Increase for more transactions per second.
innodb_log_file_size 1G Default is 48M. Increase for more transactions per second.
innodb_flush_method O_DIRECT Bypass OS cache for better durability.
innodb_thread_concurrency 0 Allow InnoDB to manage thread concurrency level.
Tips to drive maria db cluster performance for nextcloud
1.
2. Copyright 2021 Severalnines AB
2
Hello! I'm Vinay from the Severalnines Team and I'm your host for today's webinar!
Feel free to ask any questions via the Q&A chat.
You can also contact me directly via the chat box or via email:
info@severalnines.com during or after the webinar.
Björn Schiessle
Co-founder and PreSales Lead at Nextcloud
Ashraf Sharif
Senior Support Engineer at Severalnines
Our Presenters:
Welcome note
5. PRESENTER:
Björn Schiessle , Co-founder and PreSales Lead at Nextcloud
bjoern.schiessle@nextcloud.com | www.nextcloud.com | @nextclouders
Nextcloud Architecture
June 15th, 2021
10. Copyright 2021 Severalnines AB
From small to large Enterprise setups
Single Server Setup
(<= 150 users)
Cluster
(5.000 - 100.000 users)
Global Scale
(Millions of users)
10
20. Copyright 2021 Severalnines AB
WSREP API
WSREP API
WSREP API
Galera
What is MariaDB Cluster?
● MariaDB Server patched with:
○ MySQL-writeset replication API (wsrep)
○ Galera wsrep provider library for group
communication and writeset replication
● Linux only, XtraDB & InnoDB storage engine only
● Part of MariaDB since MariaDB 10.1
● 3 nodes for HA ("1-node cluster" is also possible)
● Features:
○ Active-active multi-master
○ Automatic membership control
○ Automatic node joining
○ Parallel slave applying
○ Native MySQL look & feel
● Benefits:
○ Almost no slave lag
○ No lost transactions
○ Read scalability
○ Small client latency
20
21. Copyright 2021 Severalnines AB
MariaDB version Galera version EOL
10.1 3.x 17th Oct 2020
10.2 3.x 23rd May 2022
10.3 3.x 25th May 2023
10.4 4.x 18th June 2024
10.5 4.x 24th June 2025
Supported Versions
21
22. Copyright 2021 Severalnines AB
How MariaDB Cluster works?
1. During the commit stage on the originator node:
a. Pre-commit the transaction on the node
b. Encapsulate into writeset (key, seqno, binlog
events)
c. Broadcast to all nodes
2. After receiving the writeset, on all receiver nodes:
a. Galera certifies the replicated writeset
b. If certification succeeds, queue the replicated
writeset to be applied
c. If certification fails, discard the writeset
3. Back on the originator node:
a. If 2b, return OK to client
b. If 2c, return ERROR and rollback the transaction
on the originator node
22
24. Copyright 2021 Severalnines AB
Network latency
(RTT to the farthest node)
Flow control
(throttle the replication if
necessary)
Parallel slave threads
(number of threads to use in
applying slave write-sets)
Hardware specs
(CPU clock, cores, disk
subsystem, RAM, swap,
NIC, etc)
MariaDB + InnoDB
configuration
(I/O, threads, buffers,
caches, etc)
MariaDB Cluster Performance Factors
24
25. Copyright 2021 Severalnines AB
Design Concept
● Galera Cluster write and replication performance are depending on:
○ Network latency: Lower round trip time (RTT) to the farthest cluster node from originator node is better.
○ Flow control: Galera is as fast as the slowest node in the cluster. Galera will slow down its replication to allow the
slowest node to keep up in attempt to minimize the slave lag.
● Standard MySQL & InnoDB optimizations can be applied for buffers, caches, threads, queries and indexing.
● To take full advantage of a Galera cluster, use a database load balancer (reverse proxy), e.g:
○ ProxySQL
○ MariaDB MaxScale
○ HAProxy
● The importance of database load balancer:
○ Distribute database load to multiple servers
○ Split writes from reads to avoid Galera deadlocks (requires ProxySQL or MaxScale)
○ Automatic redirect connections to the next available node
○ Connection queueing and overload protection
○ Single endpoint for applications
25
26. Copyright 2021 Severalnines AB
Database Load Balancer
● Centralized:
○ Independent tier
○ Simple and easy to manage
○ Additional hosts for LBs
○ Combine with Virtual IP to eliminate SPOF
● Distributed:
○ Co-locate with each application server
○ Faster from application standpoint (caching, query
rerouting, query firewall)
○ Harder to manage
○ Might affect the health check performance with too
many LB instances
26
Nextcloud
App
Nextcloud
App
APP
LB
LB
APP VIP
LB
LB
LB
Nextcloud
App
Centralized Reverse
Proxies
Distributed Reverse Proxies
Nextcloud
App
27. Copyright 2021 Severalnines AB
ProxySQL
(Active)
ProxySQL
(Passive)
Virtual IP
Address
RO
RO
RO
RW
RO
RW
Primary writer
Backup writer
Backup writer
Read-Write Splitting
● For Nextcloud workload, read-write splitting is mandatory,
due to:
○ PHP PDO does not support multiple DB hosts per
PDO instance.
○ Some of the Nextcloud tables are missing primary
keys (which is critical for certification and conflict
resolution in Galera replication). Thus, write to a
single node in Galera to eliminate the risk of write-
conflicts.
● Use database load balancer that supports read-write
splitting natively:
○ ProxySQL (via query rules)
○ MariaDB MaxScale (via readwritesplit router)
○ HAProxy (via source IP hash, a.k.a session
stickiness)
27
28. Copyright 2021 Severalnines AB
● There are two types of I/O operations on MySQL files:
○ Sequential I/O-oriented files:
■ InnoDB system tablespace (ibdata)
■ MySQL log files:
● Binary/Relay logs
● REDO logs (ib_logfile*)
● General logs
● Slow query logs
● Error log
○ Random I/O-oriented files:
■ InnoDB file-per-table data file (*.ibd)
● Storage layer is critical in InnoDB:
○ Use at least SSD for MySQL data directory (or at least
the random I/O-oriented files).
○ RAID 10 is generally good for all workloads.
○ Use filesystem EXT4 or XFS on Linux.
● Mount your file systems with the -o noatime option to skip updates to the last access time
in inodes on the file system, which avoids some disk seeks.
Disk I/O Subsystem
28
29. Copyright 2021 Severalnines AB
Target Nextcloud active users: 1,000 - 10,000 users
Recommended Architecture I
29
APP
ProxySQL
(Active)
ProxySQL
(Passive)
APP
Virtual IP
Address
Nextcloud App
RO
RO
RO
RW
RO
RW
30. Copyright 2021 Severalnines AB
Target Nextcloud active users: 10,000 - 100,000 users.
Beyond this, use Nextcloud Federation feature.
Recommended Architecture II
30
APP
ProxySQL
(Active)
ProxySQL
(Passive)
APP
Virtual IP
Address
Nextcloud App
RO
RO
RO
RW
RO
RW
Redis
Redis
Redis Asynchronous
slave
Cache
32. Copyright 2021 Severalnines AB
Performance Tuning
● Tuning motivation:
○ Improve database read & write performance.
○ As minimal tradeoffs as possible.
● Requirements and specs:
○ Database vendor and server version: MariaDB 10.5.4
○ Database cluster type: Galera cluster
○ Nextcloud version: Stable 19.0.1
○ Target Nextcloud users: 10,000 users
○ Target Nextcloud database size: ~10 GB
○ Database load balancer: ProxySQL 2.0.13
● Some of the suggested MariaDB configuration options are applicable to MySQL 5.6 & MySQL 5.7.
32
33. Copyright 2021 Severalnines AB
MariaDB Cluster Performance Factors
Network latency
(RTT to the farthest
node)
Flow control
(throttle the
replication if
necessary)
Parallel slave threads
(number of threads to
use in applying slave
write-sets)
Hardware specs
(CPU clock, cores,
disk subsystem, RAM,
swap, NIC, etc)
MariaDB + InnoDB
configuration
(I/O, threads, buffers,
caches, etc)
34. Copyright 2021 Severalnines AB
Parameter Recommended Value Justification
max_connections 500 ProxySQL can handle thousands of client connections with
connection pooling and multiplexing. Don't set this too high.
performance_schema OFF Default is enabled. 3-5% performance overhead if enabled.
log_bin OFF (unset) Default is disabled. 10-15% performance overhead if enabled.
skip_name_resolve ON If you need to use hostname or FQDN, make sure /etc/hosts is
mapped with all hosts that is going to connect to the cluster on all
Galera nodes. Do not rely on DNS resolver.
max_allowed_packet 64M The default is 16M. You can go up to 1G to improve packet
transmission and faster mysqldump. Increase in 16,32,64 style.
sync_binlog OFF (unset) Default is disabled (usually this would be enabled together with
log_bin). 20-30% performance overhead if enabled.
*Based on MariaDB 10.5.4 default values.
General MariaDB Tuning
34
35. Copyright 2021 Severalnines AB
Parameter Recommended Value Justification
innodb_buffer_pool_size 50 - 70% of RAM, specify with K, M, G.
Example: 20G
Try to fit total data + index into memory to avoid hitting disk.
innodb_buffer_pool_instances Starts with 8, maximum is 64. If buffer pool > 8G, starts with 8. Go with 16, 32, 64 incremental if
necessary. Specify a combination of innodb_buffer_pool_instances
and innodb_buffer_pool_size so that each buffer pool instance is
at least 1 GB.
tx_isolation 'READ-COMMITTED' Transaction isolation level recommended by Nextcloud.
innodb_io_capacity ● SATA SSD starts with 1000
● NVMe starts with 10000
The default value 200. Should be set to around the number of IOPS
that system can handle. Ideally, opt for a lower setting, as at higher
value data is removed from the buffers too quickly, reducing the
effectiveness of caching.
innodb_io_capacity_max ● SATA SSD starts with 4000
● NVMe starts with 20000
The default value 2000. Hard limit of innodb_io_capacity.
innodb_flush_method 'O_DIRECT' Default is fsync. Use this if the MariaDB data directory is located on
a direct physical disk.
*Based on MariaDB 10.5.4 default values.
InnoDB Tuning (1 of 2)
36. Copyright 2021 Severalnines AB
Parameter Recommended Value Justification
innodb_read_io_threads Number of CPU cores Default is 4. Max is 64. The total number of innodb_*_io_threads
should not surpass the total number of CPU cores available to
MariaDB.
innodb_write_io_threads Number of CPU cores Default is 4. Max is 64. The total number of innodb_*_io_threads
should not surpass the total number of CPU cores available to
MariaDB.
innodb_log_file_size 512M Default is 96M. The larger the value, the less checkpoint flush
activity is required in the buffer pool, saving disk I/O with a
tradeoff of slower recovery. However, it doesn't really matter in
Galera since we have other operational nodes to cover the service
availability while the node is recovering.
innodb_flush_log_at_trx_commit 0 Default is 1 for full durability. This can be relaxed in Galera, since
we always have other nodes to cover write durability. Set to 0 so
InnoDB log buffer is written to file once per second. Up to 100% to
200% improvement.
*Based on MariaDB 10.5.4 default values.
InnoDB Tuning (2 of 2)
37. Copyright 2021 Severalnines AB
InnoDB Buffer Pool Best Practice
• Reading a row from disk is about 100,000 times slower than
reading the same row from RAM.
• Therefore, InnoDB uses its own buffer,
innodb_buffer_pool_size as the "working area".
• The first read of data copies the data to RAM, and then
subsequent reads take advantage of the faster performance
of RAM.
• Basic rule-of-thumb:
◦ If your database size can fit into the buffer pool size,
MariaDB will very likely not hitting the disk.
◦ Try to reach 1000/1000 in the buffer pool hit rate.
37
mysql> SHOW ENGINE INNODB STATUSG
...
----------------------
BUFFER POOL AND MEMORY
----------------------
...
Buffer pool hit rate 929 / 1000 ...
...
The hit rate of 929 / 1000 indicates that out of 1000
row reads, it was able to read the row in RAM 929
times. The remaining 71 times, it had to read the row
of data from disk.
This is not a bad ratio, but if your frequently-accessed
data fits fully in RAM, you'll see this ratio increase to
999 / 1000 or even round up to 1000 / 1000.
38. Copyright 2021 Severalnines AB
Parameter Recommended Value Justification
wsrep_slave_threads Number of CPU cores x 2 The minimum value for the slave threads should be 2 x number of
CPU cores, and must not exceed the wsrep_cert_deps_distance
value.
wsrep_provider_options "gcache.size=2G" Default is 128M. The bigger gcache size, the higher chance of a
joiner can join with incremental state transfer (IST). IST is faster
and lesser impact than full syncing (SST).
*Based on MariaDB 10.5.4 default values.
Galera Tuning
38
39. Copyright 2021 Severalnines AB
● If you opt for multi-writer Galera setup, make sure every
table has an explicit primary key defined.
○ Some tables in Nextcloud do not have a PK defined. Simply
add a primary key with auto-increment column.
● DDL operations (ALTER, CREATE, TRUNCATE, DROP) might cause
the cluster to stall (wsrep_OSU_method is 'TOI') until the
operation completes.
○ Perform NextCloud upgrade (commonly contains DDL
operations) during a maintenance window, or low-peak
hours or use online schema change tools like pt-online-
schema-change.
● Due to flow-control, if one of the nodes slows down, very
likely the replication performance will slow down as well.
○ Run heavy operations like backup, reporting or analytics
during low-peak hours.
○ Attach an asynchronous replication slave, replicate from
one of the Galera nodes.
/* Get tables without primary keys */
SELECT DISTINCT CONCAT(t.table_schema,'.',t.table_name) AS
tbl, t.engine, IF(ISNULL(c.constraint_name),'NOPK','') AS
nopk, IF(s.index_type = 'FULLTEXT','FULLTEXT','') AS ftidx,
IF(s.index_type = 'SPATIAL','SPATIAL','') AS gisidx
FROM information_schema.tables AS t LEFT JOIN
information_schema.key_column_usage AS c ON (t.table_schema
= c.constraint_schema AND t.table_name = c.table_name AND
c.constraint_name = 'PRIMARY') LEFT JOIN
information_schema.statistics AS s ON (t.table_schema =
s.table_schema AND t.table_name = s.table_name AND
s.index_type IN ('FULLTEXT','SPATIAL'))
WHERE t.table_schema NOT IN
('information_schema','performance_schema','mysql') AND
t.table_type = 'BASE TABLE' AND (t.engine <> 'InnoDB' OR
c.constraint_name IS NULL OR s.index_type IN
('FULLTEXT','SPATIAL'))
ORDER BY t.table_schema,t.table_name;
/* Add an auto-increment primary key */
ALTER TABLE nextcloud.oc_collres_accesscache ADD COLUMN `id` INT
PRIMARY KEY AUTO_INCREMENT;
Galera Best Practice
39
40. Copyright 2021 Severalnines AB
● Enable Redis to offload the database cluster from handling the
transactional file locking job.
● A performance boost on installations going from tens of thousands to
hundreds of thousands of users
○ E.g, when editing a single text file, the database writes decreases by
~40-50% (26 writes vs 14 writes).
○ Reduce write contention on table oc_file_locks.
● Package php-redis must be installed
○ If Redis is running on the same system as Nextcloud, configure it to
listen on an Unix socket.
○ Redis Cluster is also supported, while authentication requires php-redis
4.2.1+.
○ With cluster-mode enabled, there is no need to use a load balancer
since php-redis is already cluster-aware (php-redis 3.0.0+).
● You can also use Redis as PHP sessions handler (requires changes to
php.ini).
// Nextcloud config.php
'filelocking.enabled' => true,
'memcache.locking' => 'OCMemcacheRedis',
'memcache.distributed' => 'OCMemcacheRedis',
'redis' => [
'host' => '/var/run/redis/redis.sock',
'port' => 0,
],
// Nextcloud config.php
'redis.cluster' => [
'seeds' => [
'192.168.10.101:6379',
'192.168.10.103:6379',
'192.168.10.102:6379',
],
'timeout' => 0.0,
'read_timeout' => 0.0,
'failover_mode' => RedisCluster::FAILOVER_ERROR,
'password' => '',
],
// php.ini
session.save_handler = 'redis';// or 'rediscluster'
session.save_path = 'tcp://127.0.0.1:6379'
Caching
40