1. Use EXPLAIN to profile the query execution plan
Use Slow Query Log (always have it on!)
Don’t use DISTINCT when you have or could use GROUP BY
Use proper data partitions
1. For Cluster. Start thinking about Cluster *before* you need them
Insert performance
1. Batch INSERT and REPLACE
2. Use LOAD DATA instead of INSERT
LIMITm,n may not be as fast as it sounds
Don’t use ORDER BY RAND() if you have > ~2K records
Use SQL_NO_CACHE when you are SELECTing frequently updated data or large sets of
data
avoid wildcards at the start of LIKE queries
avoid correlated subqueries and in select and where clause (try to avoid in)
configparams --
no calculated comparisons -- isolate indexed columns
innodb_flush_commit=0 can help slave lag
ORDER BY and LIMIT work best with equalities and covered indexes
isolate workloads don’t let administrative work interfere with customer performance. (ie
backups)
use optimistic locking, not pessimistic locking. try to use shared lock, not exclusive lock.
share mode vs. FOR UPDATE
use row-level instead of table-level locking for OLTP workloads
Know your storage engines and what performs best for your needs, know that different
ones exist.
1. use MERGE tables ARCHIVE tables for logs
Optimize for data types, use consistent data types. Use PROCEDURE ANALYSE() to help
determine if you need less
separate text/blobs from metadata, don’t put text/blobs in results if you don’t need them
if you can, compress text/blobs
compress static data
don’t back up static data as often
derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs w/out
sorting them. (self-join can speed up a query if 1st part finds the IDs and use it to fetch the
rest)
enable and increase the query and buffer caches if appropriate
ALTER TABLE…ORDER BY can take chronological data and re-order it by a different field
InnoDB ALWAYS keeps the primary key as part of each index, so do not make the primary
key very large, be careful of redundant columns in an index, and this can make the query
faster
2. Do not duplicate indexes
Utilize different storage engines on master/slave ie, if you need fulltext indexing on a
table.
BLACKHOLE engine and replication is much faster than FEDERATED tables for things like
logs.
Design sane query schemas. don’t be afraid of table joins, often they are faster than
denormalization
Don’t use boolean flags
Use a clever key and ORDER BY instead of MAX
Keep the database host as clean as possible. Do you really need a windowing system on
that server?
Utilize the strengths of the OS
Hire a MySQL ™ Certified DBA
Know that there are many consulting companies out there that can help, as well as
MySQL’s Professional Services.
Config variables & tips:
1. use one of the supplied config files
2. key_buffer, unix cache (leave some RAM free), per-connection variables, innodb
memory variables
3. be aware of global vs. per-connection variables
4. check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)
5. be aware of swapping esp. with Linux, “swappiness” (bypass OS filecache for innodb
data files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))
6. defragment tables, rebuild indexes, do table maintenance
7. If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write
controller
8. more RAM is good so faster disk speed
9. use 64-bit architectures
Know when to split a complex query and join smaller ones
Debugging sucks, testing rocks!
Delete small amounts at a time if you can
Archive old data -- don’t be a pack-rat! 2 common engines for this are ARCHIVE tables and
MERGE tables
use INET_ATON and INET_NTOA for IP addresses, not char or varchar
make it a habit to REVERSE() email addresses, so you can easily search domains
--skip-name-resolve
increasemyisam_sort_buffer_size to optimize large inserts (this is a per-connection
variable)
look up memory tuning parameter for on-insert caching
increase temp table size in a data warehousing environment (default is 32Mb) so it doesn’t
write to disk (also constrained by max_heap_table_size, default 16Mb)
Normalize first, and denormalize where appropriate.
Databases are not spreadsheets, even though Access really really looks like one. Then
again, Access isn’t a real database
In 5.1 BOOL/BIT NOT NULL type is 1 bit, in previous versions it’s 1 byte.
3. A NULL data type can take more room to store than NOT NULL
Choose appropriate character sets & collations -- UTF16 will store each character in 2
bytes, whether it needs it or not, latin1 is faster than UTF8.
make similar queries consistent so cache is used
Don’t use deprecated features
Use Triggers wisely
Run in SQL_MODE=STRICT to help identify warnings
Turning OR on multiple index fields (<5.0) into UNION may speed things up (with LIMIT),
after 5.0 the index_merge should pick stuff up.
/tmpdir on battery-backed write cache
consider battery-backed RAM for innodblogfiles
usemin_rows and max_rows to specify approximate data size so space can be pre-
allocated and reference points can be calculated.
as your data grows, indexing may change (cardinality and selectivity change). Structuring
may want to change. Make your schema as modular as your code. Make your code able to
scale. Plan and embrace change, and get developers to do the same.
pare down cron scripts
create a test environment
try out a few schemas and storage engines in your test environment before picking one.
Use HASH indexing for indexing across columns with similar data prefixes
Usemyisam_pack_keys for int data
Don’t use COUNT * on Innodb tables for every search, do it a few times and/or summary
tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT
FOUND_ROWS()
use --safe-updates for client
Redundant data is redundant
Use INSERT … ON DUPLICATE KEY update (INSERT IGNORE) to avoid having to SELECT
usegroupwise maximum instead of subqueries
be able to change your schema without ruining functionality of your code
source control schema and config files
for LVM innodb backups, restore to a different instance of MySQL so Innodb can roll
forward
usemulti_query if appropriate to reduce round-trips
partition appropriately
partition your database when you have real data
segregate tables/databases that benefit from different configuration variables