The continuous increase in terms of services and countries to which QBerg delivers its services requires an ever-increasing load of resources. During the last year QBerg has reached a critical point, storing so much transactional data that standard relational databases were unable to meet the SLAs, or support the features, required by customers. As an example, they had to cap web analytics to running on a maximum of four months of history. The introduction of MariaDB ColumnStore, flanked by existing MariaDB Server databases, not only will allow them to store multiple years’ worth of historical data for analytics – it decreased overall processing time by one order of magnitude right off the bat. The move to a unified platform was incremental, using MariaDB MaxScale as both a router and a replicator. QBerg is now able to replicate full InnoDB schemas to MariaDB ColumnStore and incrementally update big tables without impacting the performance of ongoing transactions.
Active Directory Penetration Testing, cionsystems.com.pdf
How QBerg scaled to store data longer, query it faster
1. How QBerg scaled to store data
longer, query it faster
Openworks 2019
2. QBerg: the company
▪ QBerg is a market research institute
▪ QBerg deals with consumer goods’ price intelligence in Italy,
Europe and Latam.
▪ What we do:
- Collect price & presence of products into stores, flyers, e-commerce
sites and newsletters;
- Manage the data collected with automatic and human activity;
- Deliver aggregated data or raw-data format to our customers in many
ways (portal with analysis and research functions, Spredsheets, alert e-
mails, PPTx, csv, etc)
3. QPoint
▪ QBerg has lanched the new innovative App in early February
2019:
https://vimeo.com/channels/qpointeng/316057717
6. Master common schemas
▪ Common data (3GB)
▪ Store observations (3GB)
▪ Flyer observations (14GB)
▪ Web observations (70GB)
▪ User logs/actions (18GB)
▪ Third party catalogues (3GB)
▪ User segmentations (3GB)
TOTAL 114 GB (Master schema InnoDB tables)
7. «One-day» schemas (Datamarts)
▪ Every night batch procedures produce several «one-day»
databases (datamarts). These databases are used by users’
frontend and backend procedures that process and produce
outputs in several formats.
▪ A datamart is defined by:
▪ Type: store, web, flyer
▪ Time period: last 2 years, last 6 monts, last 36 weeks, etc…
▪ Countries or regions: Italy, Spain, Colombia, etc
▪ Product Families: Flat TV, Washing machines, Bakery and pastries, etc.
▪ Current procedures used a massive quantity of:
▪ CREATE TABLE <DMs> SELECT FROM <MASTER DB>
▪ INSERT INTO <DMs> SELECT FROM <MASTER DB>
10. Issues
▪ General issues:
▪ Crawler queue was very heavy (200 concurrencies)
▪ Having OLTP and OLAP operations on the same db machine is not a
good idea…
▪ Web datamarts
▪ The creation with ETL CREATE SELECT was very slow
▪ The customer queries were slow
▪ The amount of periods (historical data time span) were too little
11. Targets
▪ Make customer queries faster
▪ Uncouple OLTP and OLAP operations
▪ Increase datamarts periods (from 4 to 24 months on web prices)
12. Solution phase 1
▪ Introduced MariaDB AX using INNODB and COLUMSTORE:
▪ INNODB Engine to manage master schemas
▪ Column Store Engine to manage store and web datamart schemas
▪ Datamart schemas are produced with the current procedure and
copied from TX to AX with cpimport
▪ Introduced MariaDB Maxscale:
▪ Routing query to TX (master / slave) or AX, based on schema used by
query (using regex)
▪ Duplicates DDL (Data Definition Language) statements on MariaDB AX
14. Replicate Table On-the-fly
▪ When merging data between TX and AX is needed, it’s possible
to copy data from TX to AX using a simple script like this:
mysql -h $DBSRC -q -e "$QUERY;" -N temp | cpimport -n1 -s 't' $DBDST $TABLEDST
▪ Note:
▪ To be ran on AX (UM) server.
▪ The destination table must exist in advance
15. Replicate schema
▪ It’s possible to replicate an entire schema using the script seen
in the former slide, from every table of the source schema.
▪ It could be necessary to change datatypes:
▪ ENUM is not supported in CS -> CHAR could be good
▪ TIMESTAMP is not supported in CS -> DATETIME could be
good
▪ MEDIUMINT is not supported in CS -> BIGINT could be good
▪ BINARY is not supported in CS -> BIGINT could be good
▪ Note:
▪ 246 tables imported in 1,136 secs (~19’)
▪ 43M rows table imported in 400 secs (6’40")
16. Master-slave delay
▪ Maxscale implements a policy to route query to slave only if the
replication delay is under a threshold (configurable i.e. 1s)
▪ Maxscale polls the slave delay every xx seconds (configurable i.e
0.5s)
▪ If you have a classic master-detail interface (a master list and the
details for the currently selected item), there are several solutions to
retrieve a list with the last inserted record:
▪ Insert a sleep delay waiting slave updates into application;
▪ Introduce statement to forcefully route the query to the master (for example
<space>SELECT);
▪ Exclude readings from slave.
▪ QBerg have a lot of PHP code written in more than 10 years. At the
moment we choose the last option. We’ll use the query to slave for
«SELECT only» when we’ll have completed the migration to new
application architecture.
17. MariaDB
MaxScale
Production phase 2
Applications
MariaDB Server
(secondary)
MariaDB Server
(primary)
MariaDB Server (UM)
MariaDB
MaxScale
Storage (PM)
ColumnStore
(web/store dm)
InnoDB
(master)
Backups
Writes (all data), Reads (historical data)
Replication cpimport
Writes (all data),
Reads (current data)
19. A Team job
▪ 21 support requests in 5 months
▪ 8 different support engineers working to help
▪ Average time on big issues resolution : 4.5 days
▪ Found bug in maxscale 2.2.13 (MXS-2103) immediately
resolved with a custom fix by Marko Mäkelä