At Pinterest, the ProxySQL infrastructure fronts numerous heterogeneous databases. Due to the high number of unique configurations and dynamic nature of cloud deployments, it’s a challenge to reliably provision, change, auto-scale, and monitor ProxySQL servers. Applying Infrastructure as Code principles using Terraform, we made it possible to manage such a large fleet of ProxySQL instances confidently. Come learn how we automated provisioning, testing, and monitoring of ProxySQL at scale.
11. What module creates
resource "aws_route53_record" "elb" {
name = "${var.backend_type}.proxysql"
type = "CNAME"
ttl = local.dns_ttl
zone_id = data.aws_route53_zone.nlb_zone.id
records = [
aws_lb.proxysql.dns_name
]
}
auto_scaling.tf dns.tf
load_balance.tf outputs.tf
12. What module creates
output "nlb_dns_name" {
value = aws_lb.proxysql.dns_name
}
output "nlb_cname" {
value = aws_route53_record.elb.name
}
auto_scaling.tf dns.tf
load_balance.tf outputs.tf
13. Healthchecks
Two decisions to make:
1. Shall I forward traffic to X?
2. Shall I terminate X?
?
Load
Balancer
?
resource "aws_autoscaling_group" "proxysql_zero" {
…
health_check_type = “EC2|ELB”
…
}
14. ELB type Healthchecks
● Traffic port vs health check port
● TCP connection success?
○ Yes
■ Forward traffic
○ No
■ Do not forward traffic
■ Terminate instance
:8080
:3306
15. EC2 type Healthchecks
● Traffic port == health check port
● TCP connection success?
○ Yes
■ Forward traffic
○ No
■ Do not forward traffic
● ProxySQL healthcheck agent
:8080
:3306
try:
self.check_health(cursor)
self.check_at_least_one_live(cursor)
self.check_whg_count(cursor)
self.check_instance_state(local_instance)
self.mark_healthy(local_instance)
except:
self.mark_unhealthy(local_instance)
16. Zero downtime migration
● Needed when change:
○ AMI, subnet, security group, IAM role
○ ami-12345678 -> ami-87654321
● Blue/green deployment
● Must be tested for a proof
Load
Balancer
20. Zero downtime migrations: How to test
Must be tested (tests/zero_downtime/test_zero_downtime.py)
Plan:
● ProxySQL acts as an HTTP server
● Create pool A
● Start a probe in a loop
● Create pool B
● Watch for probe errors
21. Testing a module: terraform-ci
def test_zero_downtime(ec2_client):
# Pool A
with open(osp.join(terraform_dir, "configuration.tfvars"), "w") as fp:
fp.write(configuration_template.format(payload=”old payload”))
with terraform_apply(terraform_dir) as tf_output:
proxysql_nlb = tf_output["proxysql_nlb"]["value"]
# Probe
que = Queue()
child = Process(target=ping, args=(url, que))
# Pool B
with open(osp.join(terraform_dir, "configuration.tfvars"), "w") as fp:
fp.write(configuration_template.format(payload=”new payload”))
with terraform_apply(terraform_dir):
child.terminate()
result = que.get()
assert result["error"] == 0
terraform-ci · PyPI
39. Source of Truth(s)
● SourceOfTruth(ABC):
● SOTLocalProxy(SourceOfTruth):
● SOTRemoteMySQL(SourceOfTruth):
40. ProxySQL Manager
# proxysql-manager
Usage: proxysql-manager [OPTIONS] COMMAND [ARGS]...
ProxySQL manager helps to manage local ProxySQL instance
Options:
--debug Print debug messages
-q, --quiet Print only errors
--logfile TEXT Log messages to this file [default: /var/log/proxysql-
manager.log]
--version Show the version and exit.
--help Show this message and exit.
Commands:
admin ProxySQL admin credentials
deregister-server Deregister MySQL server from a ProxySQL Source of...
deregister-unknown Deregister non-ZK MySQL servers.
expert ProxySQL manager advanced commands.
generate-config Generate ProxySQL config and save it on disk.
is-server-registered The tool will inspect the latest ProxySQL config...
register-server Register MySQL server in a ProxySQL Source of Truth.
show-backends Show all backend types and their NLB DNS name.
show-instance Show ProxySQL instance for given backend_type
show-instances Same as ``proxysql-manager show-instance`` command.
start-agent Update latest configuration to local ProxySQL.
start-healthcheck Health check for upstream NLB.
sync-sot Synchronize ProxySQL config to remote Source of...
41. ProxySQL Manager: Agent
def _watch_table(self, klass, interval):
"""
Start agent daemon.
:param klass: Table class to watch
:param int interval: Time in seconds between SoT
checks.
"""
while True:
with global_lock():
if klass == PSMysqlUsers:
self._sync_mysql_users_with_knox()
else:
self.update_one_table(klass)
sleep(interval)
42. ProxySQL Manager: Agent (cont)
def update_one_table(self, tblclass):
try:
pslist = self._source_of_truth.read(tblclass, self._backend_type)
if pslist.version and pslist.version > self.versions[tblclass]:
# pslist is empty or pslist has newer version
LOG.info(
"It is a new version %d for table %s",
pslist.version,
tblclass.TABLE,
)
self._local_proxysql.write(pslist)
self.versions[tblclass] = pslist.version
self.update_versions_file(tblclass=tblclass, version=pslist.version)
self._handle_fallback(pslist)
43. ProxySQL Manager: Healthcheck
def watcher(self, credentials):
while True:
try:
with connection.cursor() as cursor:
self.check_health(cursor)
self._check_at_least_one_live(cursor)
self._check_whg_count(cursor)
self._check_instance_state(local_instance)
self._mark_healthy(local_instance)
except (
ProxySQLNotHealthy,
MySQLError,
ConnectionRefusedError,
) as err:
self._unhealthy_count += 1
LOG.error(err)
if self._unhealthy_count > self._healthy_threshold:
self._mark_unhealthy(local_instance)
else:
LOG.warning("Health check failed %d times", self._unhealthy_count)
sleep(self._probe_interval)