SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Say Hello to MyRocks
Sergei Petrunia | Senior Software Engineer, MariaDB
sergey@mariadb.com
M|17, April 11-12th
, 2017
2
3
What can be done about this?
4
Data vs Disk

Put some data into the database

How much is written to disk?
INSERT INTO tbl1 VALUES ('foo')

amplification =
size of data
size of data on disk
5
Amplification

Write amplification

Space amplification

Read amplification
amplification =
size of data
size of data on disk
foo
6
Amplification in InnoDB
● B*-tree
● Read amplification
– Assume random data lookup
– Locate the page, read it
– Root page is cached
● ~1 disk read for the leaf page
– Read amplification is ok
7
Write amplification in InnoDB
● Locate the page to update
● Read; modify; write
– Write more if we caused a page split
● One page write per update!
– page_size / sizeof(int)
● Write amplification is an issue.
8
Space amplification in InnoDB
● Page fill factor <100%
– to allow for updates
● Compression is done per-page
– Compressing bigger portions would be
better
– Page alignment
● Compressed to 3K ? Still on 4k page.
● => Space amplification is an issue
9
InnoDB amplification summary
● Read amplification is ok
● Write and space amplification is an issue
– Saturated IO
– Faster SSD wear out
– Need more space on SSD
● => Low storage efficiency
LSM tree
11
Log-structured merge tree
● Writes go to
– Log
– MemTable
● MemTable is flushed
to SortedStringTable
● Writing 2x
– But only useful data
MemTableWrite
Log SST
MemTable
12
Log-structured merge tree
● Writing produces
more SSTs
● SSTs are immutable
● SSTs may have
multiple versions of
data
MemTableWrite
Log SST
MemTable
SST ...
13
Reads in LSM tree
● Need to merge the
data on read
– Read amplification
suffers
● Should not have too
many SSTs.
MemTable Read
Log SST
MemTable
SST ...SST
14
Compaction
● Merge multiple SSTs into one
● Removes old data versions
● Reduces the number of files
● Write amplification++ :-(
SST SST . . .SST
SST
15
Compaction considerations
● Find the sweetspot
– Reduce # SSTs
– Don’t compact too often
● Be efficient
– Compact files of similar size
– Remove duplicate versions asap
● Many strategies
– Leveled
– Size-tiered
SST SST . . .SST
SST
– ...
16
Leveled Compaction
a-c a-f
. . . . . . . . . . . .
b-gL0
L1 a-e f-i j-n o-z
L2 a-b c-g f-k l-q
L7 a-c d-f g-l q-z
MemTable
q-z
5 mb
50 mb
500 mb
17
Leveled Compaction
MemTable
a-c a-f
. . . . . . . . . . . .
b-gL0
L1 a-e f-i j-n o-z
L2 a-b c-g f-k l-q
L5 a-c d-f g-l q-z
q-z
5 mb
50 mb
500 mb
read(‘f’)
18
Leveled Compaction
MemTable
a-c a-f
. . . . . . . . . . . .
b-gL0
L1 a-e f-i j-n o-z
L2 a-b c-g f-k l-q
L5 a-c d-f g-l q-z
q-z
5 mb
50 mb
500 mb
19
Leveled Compaction
MemTable
a-c a-f
. . . . . . . . . . . .
bL0
L1 a-e f-i j-n o-z
L2 a-b c-g f-k l-q
L5 a-c d-f g-l q-z
q-z
5 mb
50 mb
500 mb
20
LSM Tree summary
● LSM architecture
– Data is stored in log, then SST files
– Writes to SST files are sequential, efficient
● Better compression
– Have to read from multiple SST files
– Compaction process merges SST files
● Efficiency
– Write amplification is reduced
– Space amplification is reduced
– Read amplification increases
RocksDB
22
RocksDB
● “An embeddable key-value store for
fast storage environments”
● Uses LSM architecture
– Leveled compaction
– Server-grade
● Initially a fork of LevelDB
● Developed at Facebook
– First release in 2012
– Used at Facebook and many other companies
23
RocksDB properties
● Embedded library
● Stores (key, value) pairs
– No data types
– No secondary indexes
– No SQL-like tables
● Column Families = tablespaces
● No replication support
– There is a 3rd-party addon
● Efficient, but hard to work with
MyRocks
25
MyRocks
● A MySQL storage engine
● Uses RocksDB for storage
● Implements a MySQL storage engine on top
– Secondary indexes
– Data types
– SQL transactions
– …
● Developed* and used by Facebook
– *-- with some MariaDB involvement
26
Size amplification benchmark
● Benchmark data and
chart from Facebook
● Linkbench run
● 24 hours
27
Write amplification benchmark
● Benchmark data and
chart from Facebook
● Linkbench
28
QPS
● Benchmark data and
chart from Facebook
● Linkbench
29
QPS on in-memory workload
● Benchmark data and
chart from Facebook
● Sysbench read-write,
in-memory
● MyRocks doesn’t
always beat InnoDB
30
QPS
● Benchmark data and
chart from Facebook
● Linkbench
31
A real efficiency test
For a certain workload we could reduce the number of MySQL
servers by 50%
– Yoshinori Matsunobu
32
Another write amplification test
● InnoDB
vs
● MyRocks serving 2x
data
InnoDB 2x RocksDB
0
4
8
12
Flash GC
Binlog / Relay log
Storage engine
33
Cumulative response time
0
50000
100000
150000
200000
250000
300000Kilo-Queries
RocksDB x2
InnoDB
34
CPU usage is higher with MyRocks
Time
InnoDB
2x RocksDB
100%
0%
80%
50-60%
35
MyRocks limitations
● Transactional storage engine
– REPEATABLE READ, READ COMMITTED
– No SERIALIZABLE
● Must use Row-based-Replication
● No cross-engine transactions
● Transaction must fit in memory
● Online DDL more limited than InnoDB
36
MyRocks availability
● Part of github.com/facebook/mysql-5.6
● No binaries
● No packages
● Facebook’s branch of MySQL
– Special extensions
– Special ways to compile, run tests, etc
● Not easy to use
MyRocks in MariaDB
38
MyRocks in MariaDB
● New technology
● Built and used at
Facebook’s scale
● Adoption
● Packaging
● Community
● MariaDB features
● ...
39
Getting MyRocks into MariaDB
● Port MyRocks into MariaDB 10.2
– Decouple it from facebook/mysql-5.6 features
– Make it work with MariaDB’s features
● Set up a merge process
– Need to follow Facebook’s progress
● Setup the process to build packages
● Documentation
● Expertise
● ...
40
Getting MyRocks into MariaDB
● Port MyRocks into MariaDB 10.2
– Decouple it from facebook/mysql-5.6 features
– Make it work with MariaDB’s features
● Set up a merge process
– Need to follow Facebook’s progress
● Setup the process to build packages
● Documentation
● Expertise
● ...
✘
✔
✘
✘✔
✔
✔
✔
41
Current status
● “MariaDB 10.2.5 RC includes an ALPHA version of MyRocks
storage engine”
● It’s a loadable plugin (ha_rocksdb.so)
● Packages
– Bintar, deb, rpm, win64 zip + MSI
– Recent versions of OS due to compilers for RocksDB’s
requirements
● Not all features work yet
– Optimizer and SQL features work
– Replication/binlog features don’t work yet.
42
“Is it stable”?
● The components are stable
– (MyRocks + RocksDB) are run in
production @ Facebook
– RocksDB is also used elsewhere
– MyRocks not much. yet.
● Connections with MariaDB
– Some are stable
– Some are [nearly] missing
MyRocks
MariaDB
RocksDB
43
Plans
● Finish the missing pieces
– Storage Engine + binlog
● Improve support for multiple SEs
– MDEV-12179
● Increase maturity
– Pass the tests
– More test coverage
– Benchmarks
● Documentation
MyRocks
MariaDB
RocksDB
44
References
● https://www.slideshare.net/profyclub_ru/making-the-case-for-writeoptimized-da
tabase-algorithms-mark-callaghan-facebook
● https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write
-optimized-mysql-database
● https://www.facebook.com/atscaleevents/videos/1775545312718565/
● ...
Further information
● 10.00 Real World: Deploying MyRocks in Production at Facebook, Yoshinori
Matsunobu
● http://myrocks.io
● https://mariadb.com/kb/en/mariadb/myrocks-in-mariadb/
Thanks!
Q&A

Contenu connexe

Tendances

Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 

Tendances (20)

When is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar SeriesWhen is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar Series
 
PostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQL
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Myrocks in the wild wild west! FOSDEM 2020
Myrocks in the wild wild west! FOSDEM 2020Myrocks in the wild wild west! FOSDEM 2020
Myrocks in the wild wild west! FOSDEM 2020
 
M|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real WorldM|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real World
 
MariaDB: The 2012 Edition
MariaDB: The 2012 EditionMariaDB: The 2012 Edition
MariaDB: The 2012 Edition
 
MariaDB 5.5 and what comes next - Percona Live NYC 2012
MariaDB 5.5 and what comes next - Percona Live NYC 2012MariaDB 5.5 and what comes next - Percona Live NYC 2012
MariaDB 5.5 and what comes next - Percona Live NYC 2012
 
Inside CynosDB: MariaDB optimized for the cloud at Tencent
Inside CynosDB: MariaDB optimized for the cloud at TencentInside CynosDB: MariaDB optimized for the cloud at Tencent
Inside CynosDB: MariaDB optimized for the cloud at Tencent
 
M|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With AutomationM|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With Automation
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with ease
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
 
My first moments with MongoDB
My first moments with MongoDBMy first moments with MongoDB
My first moments with MongoDB
 
MariaDB: The New M In LAMP - SCALE10x
MariaDB: The New M In LAMP - SCALE10xMariaDB: The New M In LAMP - SCALE10x
MariaDB: The New M In LAMP - SCALE10x
 
M|18 PolarDB: Extending Shared-storage to MyRocks
M|18 PolarDB: Extending Shared-storage to MyRocksM|18 PolarDB: Extending Shared-storage to MyRocks
M|18 PolarDB: Extending Shared-storage to MyRocks
 
M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
 
InnoDB Cluster Experience (MySQL User Camp)
InnoDB Cluster Experience (MySQL User Camp)InnoDB Cluster Experience (MySQL User Camp)
InnoDB Cluster Experience (MySQL User Camp)
 
Distributions from the view a package
Distributions from the view a packageDistributions from the view a package
Distributions from the view a package
 

Similaire à Say Hello to MyRocks

Complex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardComplex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff Pollard
Redis Labs
 

Similaire à Say Hello to MyRocks (20)

PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
 
LDAP at Lightning Speed
 LDAP at Lightning Speed LDAP at Lightning Speed
LDAP at Lightning Speed
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Complex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardComplex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff Pollard
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and Improvements
 
MySQL Performance - Best practices
MySQL Performance - Best practices MySQL Performance - Best practices
MySQL Performance - Best practices
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
MariaDB 10 and Beyond
MariaDB 10 and BeyondMariaDB 10 and Beyond
MariaDB 10 and Beyond
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
 
MySQL DBA
MySQL DBAMySQL DBA
MySQL DBA
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
 

Plus de Sergey Petrunya

Plus de Sergey Petrunya (20)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.
 

Dernier

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Dernier (20)

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

Say Hello to MyRocks

  • 1. Say Hello to MyRocks Sergei Petrunia | Senior Software Engineer, MariaDB sergey@mariadb.com M|17, April 11-12th , 2017
  • 2. 2
  • 3. 3 What can be done about this?
  • 4. 4 Data vs Disk  Put some data into the database  How much is written to disk? INSERT INTO tbl1 VALUES ('foo')  amplification = size of data size of data on disk
  • 5. 5 Amplification  Write amplification  Space amplification  Read amplification amplification = size of data size of data on disk foo
  • 6. 6 Amplification in InnoDB ● B*-tree ● Read amplification – Assume random data lookup – Locate the page, read it – Root page is cached ● ~1 disk read for the leaf page – Read amplification is ok
  • 7. 7 Write amplification in InnoDB ● Locate the page to update ● Read; modify; write – Write more if we caused a page split ● One page write per update! – page_size / sizeof(int) ● Write amplification is an issue.
  • 8. 8 Space amplification in InnoDB ● Page fill factor <100% – to allow for updates ● Compression is done per-page – Compressing bigger portions would be better – Page alignment ● Compressed to 3K ? Still on 4k page. ● => Space amplification is an issue
  • 9. 9 InnoDB amplification summary ● Read amplification is ok ● Write and space amplification is an issue – Saturated IO – Faster SSD wear out – Need more space on SSD ● => Low storage efficiency
  • 11. 11 Log-structured merge tree ● Writes go to – Log – MemTable ● MemTable is flushed to SortedStringTable ● Writing 2x – But only useful data MemTableWrite Log SST MemTable
  • 12. 12 Log-structured merge tree ● Writing produces more SSTs ● SSTs are immutable ● SSTs may have multiple versions of data MemTableWrite Log SST MemTable SST ...
  • 13. 13 Reads in LSM tree ● Need to merge the data on read – Read amplification suffers ● Should not have too many SSTs. MemTable Read Log SST MemTable SST ...SST
  • 14. 14 Compaction ● Merge multiple SSTs into one ● Removes old data versions ● Reduces the number of files ● Write amplification++ :-( SST SST . . .SST SST
  • 15. 15 Compaction considerations ● Find the sweetspot – Reduce # SSTs – Don’t compact too often ● Be efficient – Compact files of similar size – Remove duplicate versions asap ● Many strategies – Leveled – Size-tiered SST SST . . .SST SST – ...
  • 16. 16 Leveled Compaction a-c a-f . . . . . . . . . . . . b-gL0 L1 a-e f-i j-n o-z L2 a-b c-g f-k l-q L7 a-c d-f g-l q-z MemTable q-z 5 mb 50 mb 500 mb
  • 17. 17 Leveled Compaction MemTable a-c a-f . . . . . . . . . . . . b-gL0 L1 a-e f-i j-n o-z L2 a-b c-g f-k l-q L5 a-c d-f g-l q-z q-z 5 mb 50 mb 500 mb read(‘f’)
  • 18. 18 Leveled Compaction MemTable a-c a-f . . . . . . . . . . . . b-gL0 L1 a-e f-i j-n o-z L2 a-b c-g f-k l-q L5 a-c d-f g-l q-z q-z 5 mb 50 mb 500 mb
  • 19. 19 Leveled Compaction MemTable a-c a-f . . . . . . . . . . . . bL0 L1 a-e f-i j-n o-z L2 a-b c-g f-k l-q L5 a-c d-f g-l q-z q-z 5 mb 50 mb 500 mb
  • 20. 20 LSM Tree summary ● LSM architecture – Data is stored in log, then SST files – Writes to SST files are sequential, efficient ● Better compression – Have to read from multiple SST files – Compaction process merges SST files ● Efficiency – Write amplification is reduced – Space amplification is reduced – Read amplification increases
  • 22. 22 RocksDB ● “An embeddable key-value store for fast storage environments” ● Uses LSM architecture – Leveled compaction – Server-grade ● Initially a fork of LevelDB ● Developed at Facebook – First release in 2012 – Used at Facebook and many other companies
  • 23. 23 RocksDB properties ● Embedded library ● Stores (key, value) pairs – No data types – No secondary indexes – No SQL-like tables ● Column Families = tablespaces ● No replication support – There is a 3rd-party addon ● Efficient, but hard to work with
  • 25. 25 MyRocks ● A MySQL storage engine ● Uses RocksDB for storage ● Implements a MySQL storage engine on top – Secondary indexes – Data types – SQL transactions – … ● Developed* and used by Facebook – *-- with some MariaDB involvement
  • 26. 26 Size amplification benchmark ● Benchmark data and chart from Facebook ● Linkbench run ● 24 hours
  • 27. 27 Write amplification benchmark ● Benchmark data and chart from Facebook ● Linkbench
  • 28. 28 QPS ● Benchmark data and chart from Facebook ● Linkbench
  • 29. 29 QPS on in-memory workload ● Benchmark data and chart from Facebook ● Sysbench read-write, in-memory ● MyRocks doesn’t always beat InnoDB
  • 30. 30 QPS ● Benchmark data and chart from Facebook ● Linkbench
  • 31. 31 A real efficiency test For a certain workload we could reduce the number of MySQL servers by 50% – Yoshinori Matsunobu
  • 32. 32 Another write amplification test ● InnoDB vs ● MyRocks serving 2x data InnoDB 2x RocksDB 0 4 8 12 Flash GC Binlog / Relay log Storage engine
  • 34. 34 CPU usage is higher with MyRocks Time InnoDB 2x RocksDB 100% 0% 80% 50-60%
  • 35. 35 MyRocks limitations ● Transactional storage engine – REPEATABLE READ, READ COMMITTED – No SERIALIZABLE ● Must use Row-based-Replication ● No cross-engine transactions ● Transaction must fit in memory ● Online DDL more limited than InnoDB
  • 36. 36 MyRocks availability ● Part of github.com/facebook/mysql-5.6 ● No binaries ● No packages ● Facebook’s branch of MySQL – Special extensions – Special ways to compile, run tests, etc ● Not easy to use
  • 38. 38 MyRocks in MariaDB ● New technology ● Built and used at Facebook’s scale ● Adoption ● Packaging ● Community ● MariaDB features ● ...
  • 39. 39 Getting MyRocks into MariaDB ● Port MyRocks into MariaDB 10.2 – Decouple it from facebook/mysql-5.6 features – Make it work with MariaDB’s features ● Set up a merge process – Need to follow Facebook’s progress ● Setup the process to build packages ● Documentation ● Expertise ● ...
  • 40. 40 Getting MyRocks into MariaDB ● Port MyRocks into MariaDB 10.2 – Decouple it from facebook/mysql-5.6 features – Make it work with MariaDB’s features ● Set up a merge process – Need to follow Facebook’s progress ● Setup the process to build packages ● Documentation ● Expertise ● ... ✘ ✔ ✘ ✘✔ ✔ ✔ ✔
  • 41. 41 Current status ● “MariaDB 10.2.5 RC includes an ALPHA version of MyRocks storage engine” ● It’s a loadable plugin (ha_rocksdb.so) ● Packages – Bintar, deb, rpm, win64 zip + MSI – Recent versions of OS due to compilers for RocksDB’s requirements ● Not all features work yet – Optimizer and SQL features work – Replication/binlog features don’t work yet.
  • 42. 42 “Is it stable”? ● The components are stable – (MyRocks + RocksDB) are run in production @ Facebook – RocksDB is also used elsewhere – MyRocks not much. yet. ● Connections with MariaDB – Some are stable – Some are [nearly] missing MyRocks MariaDB RocksDB
  • 43. 43 Plans ● Finish the missing pieces – Storage Engine + binlog ● Improve support for multiple SEs – MDEV-12179 ● Increase maturity – Pass the tests – More test coverage – Benchmarks ● Documentation MyRocks MariaDB RocksDB
  • 44. 44 References ● https://www.slideshare.net/profyclub_ru/making-the-case-for-writeoptimized-da tabase-algorithms-mark-callaghan-facebook ● https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write -optimized-mysql-database ● https://www.facebook.com/atscaleevents/videos/1775545312718565/ ● ... Further information ● 10.00 Real World: Deploying MyRocks in Production at Facebook, Yoshinori Matsunobu ● http://myrocks.io ● https://mariadb.com/kb/en/mariadb/myrocks-in-mariadb/