Very few developers are learning Structured Query Language (about 2%) but then wonder why their database queries stink. This presentation covers five common database problems and how to fix them
2. Hello!I am Dave Stokes
MySQL Community Manager
david.stokes@oracle.com @Stoker elephantanddolphin.blooger.com
Slideshare.net/davidmstokes
2
3. Quick MySQL Catch Up
21 Years Old
MySQL has been part of
Oracle’s family of databases for
six years.
MySQL 8
MySQl 5.7 is the current release
but the next version will be
MySQL 8. Big feature is real
time data dictionary
Group Replication
Active master-master
replication.
JSON
A new native JSON datatype to
store documents in a column of
a table
Document Store
Programmers not know SQL
but need a database? X Devapi
allows them to use RDMS from
language of choice
Encryption
Use Oracle Key Vault to encrypt
your data at rest.
3
4. 1.
Not Understanding
how queries work
You wrote a query - then what?You have produced
a statement in SQL
for processing by
a database
4
5. Structured Query Language is a
special-purpose programming
language designed for managing
data held in a relational database
management system (RDBMS), or
for stream processing in a
relational data stream
management system. --
Wikipedia
5
6. SQL Brief History Break
× Efficient storage of data, minimal duplication
× Been around for decades
× Based on set theory and relational theory
Need normalized data where components are
divided up into logical components -> Requires
data to be architected and data integrity rules are
enforced
6
7. Big
Roughly 2% of developers receive any formal training
in SQL, relational theory, etc.
And they wonder why their queries perform poorly.
7
8. Goal:
Using the MySQL World
Database find the name
of all cities and their
corresponding country
name
So let's create a query Query:
SELECT City.Name,
Country.Name
FROM City
JOIN Country ON
(County.Code =
City.CountryCode);
8
9. Query:
SELECT City.Name,
Country.Name
FROM City
JOIN Country ON
(County.Code =
City.CountryCode);
Previously someone has
split the data for cities
and countries into two
tables. And they
established a link
between the two tables
use three character
country codes to link the
two together.
9
10. The query goes to the
Database server
A fair amount of developers see the database
server as an arcane, messy, dank factory
spewing smoke and mess.
And they may be right.
10
11. Actual PHP code
$query = “SELECT City.Name, Country.Name
FROM City JOIN Country On (County.Code =
City.CountryCode”;
$result = mysqli_query($link,$query);
11
13. Your query goes to server
Your Application Network Server
13
14. One:
Is your system
allowed to connect
to server?
MySQL Authentication
Two:
Are you using a valid
account? Are there
limits to this
account?
Three:
Do you have proper
permission to access
the data you seek?
14
15. Let's assume that your login is successful
Login
Okay
System
Okay
Permissions
Good
15
16. IS the Syntax correct?Checking the basics
Build query planFind the best way to assemble the data
Parse the queryFind the pieces needed
16
17. Cost ModelHow do we return the needed data the least
expensive way?
17
18. Cost model based heavily on disk reads
Reading from disk
is slow, 100,000
times slower than
reading from
memory.
Note: Many vendors are
looking at changing the
cost model to
accommodate new
technologies, mixes of
hardware technologies,
and latencies.
See mysql.server_cost and mysql.engine_cost tables for details
18
20. The Optimizer
The optimizer knows from stored
statistics just what is has had to do on
similar queries in the past and
guesstimates
What it will take to get this query done.
20
22. A little bigger
"nested_loop": [
{
"table": {
"table_name": "Country",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows_examined_per_scan": 239,
"rows_produced_per_join": 239,
"filtered": "100.00",
"cost_info": {
"read_cost": "6.00",
"eval_cost": "47.80",
"prefix_cost": "53.80",
"data_read_per_join": "61K"
},
"used_columns": [
"Code",
"Name"
]
Nested loop join for Country Table
We DO have an index for the JOIN :-)
Statistics
22
23. Every new column sought
adds a factorial of
complexity to getting the
data. Adding a third column
would take our simple
query from two to six
possibilities!!!!!!
23
24. Some Databases allow
locking query plans so once
you have it optimized you
can get consistent results.
MySQL does not do this!!!!
24
25. EXPLAIN -- Prepend to query
So now the optimizer has figured out which indexes(keys) to use and has estimated it will take239x17 rows to read to deliver the requested data.
25
26. 26
MySQL Workbench
(Our second most
popular FREE download)
has the ability to
graphically display the
output of the EXPLAIN
command. This is the
same query as the last
page.
27. Explain another process
Use SHOW
PROCESS_LIST();
to find a running query on
the server.
27
Use EXPLAIN FOR
CONNECTION N to see
output.
28. Indexes
Indexes let you go to
the exact record(s)
desired. They do
have some overhead
and will slow down
bulk inserts.
Without indexes the
entire table/file has
to be read, AKA a FULL
TABLE SCAN -- to be
avoided if your goal
is not to read the
entire table.
Compound indexes
let you use multiple
columns like
YEAR-MONTH-DATE and
allow YEAR-MONTH-DATE,
YEAR-MONTH, and YEAR
searches.
The Optimizer looked to use indexes as much as possible!
28
29. Your data is then
packaged up and sent
to your application
And that is the basics.
29
34. N+1 ProblemBad practice, may come from your ORM
Can be hard to catchLots of fussy little queries
Avoid by thinking in setsLet database do heavy lifting
34
35. 35
Remember all that stuff from item 1?
1. Query goes to server
2. Authentication
3. Permissions
4. Query plan
5. Execution
6. Send back results
Every query has all these
steps plus network
overhead.
36. What is the N+1 problem?!?
You chain a set of queries
together to answer a
question - Look up
employees, then the ones
who live near you, and
then the ones who have a
parking permit so you can
get a ride to work
Each dive into the
database has a cost.
Databases can handle
one BIG request better
than a lot of little. So One
query for someone who
lives near you with
parking permit.
36
37. So do not
loop over
single
queries*
37
Unless you
HAVE to!
39. Which is better?
foreach (sale_emp in
sales_employees)
$pay = $pay * .20;
UPDATE employees
SET pay_rate = pay_rate * .20
WHERE department = ‘sales’;
Your boss asks you to give all the sales
staff a 20% pay increase
39
40. Which is better?
foreach (sale_emp in
sales_employees)
$pay = $pay * .20;
START TRANSACTION;
UPDATE employees
SET pay_rate = pay_rate * .20
WHERE department = ‘sales’;
COMMIT;
A transaction does all the work at the same
time, can be rolled back!!
40
41. Transaction
A transaction is a sequence of operations performed as a single
logical unit of work. A logical unit of work must exhibit four
properties, called the atomicity, consistency, isolation, and
durability (ACID) properties, to qualify as a transaction. Atomicity.
Transactions (Database Engine) - TechNet - Microsoft
https://technet.microsoft.com/en-us/library/ms190612(v=sql.105).aspx
41
42. 4.
SQL
SQL is a pretty simple declarative
language so what is so hard to
write?
42
44. SELECT City.Name as City,
Country.name as Country
FROM City
JOIN ON
(City.CountryCode =
Country.code);
Which runs FASTER??
SELECT City.Name as City,
Country.name as Country
FROM City
JOIN ON
(City.CountryCode =
Country.code)
LIMIT 5;
44
45. answer
The server has to do all the same work to
figure out just the top five as it does to get
the entire list -- usually.
Optimizers are getting smarter!
45
46. SELECT City.Name as
City,
Country.name as Country
FROM City
JOIN ON
(City.CountryCode =
Country.code)
ORDER BY
City.Population;
Which runs FASTER??
SELECT City.Name as
City,
Country.name as Country
FROM City
JOIN ON
(City.CountryCode =
Country.code)
GROUP BY Country.Name;
46
47. That was a trick question
You can not tell without looking at the
And using explain on a query. There is no
Way to tell if a query is good or bad
Just by looking.
47
48. Some clues
Is the ORDER BY or SORT
BY an indexed field?
Do you need a temp table
for sorting?
Are you using the indexes
you wanted to use?
48
Redundant indexes give
the optimizer another
choice (and it may choose
wrong)!
Locks?
MAX_EXECUTION_TIME?
49. 5.
Do not do things in your code
you do not need to do!
Databases can do many functions
more efficiently than your
application. And other options!
49
50. Databases have many
useful functions for
maximums, minimums,
averages, standard
deviations, and more. No
Need to crunch numbers
in your app.
50
51. You may be able to use Business
Intelligence and Report tools like Pentaho
and BIRT (there are many more) to
create reports, dashboards, and
graphics. And they can be generated
automatically.
51
Report Writers
52. MySQL and other RDBMS
offer a JSON data type.
You can store a JSON
document without
schema. Great for JSON
formatted data and those
who just don’t care!
What if I just don’t care, dave?
This works but is not as
fast as normalized data,
no rigor is applied to the
data, the standard is still
being worked on, and this
frustrates old timers,
dag-nabbit!!
52
53. MySQL offers the X Devapi
and acts as a docstore
from your language of
choice for CRUD (Create,
Replace, Update, Delete)
so you do not need
knowledge of SQL.
What if I just don’t care, dave? - 2
Works very well but you
may need a
DBA/Architect to later
normalize some of the
data with generated
columns for performance.
53
54. Some last minute hints
Data
The first step in getting great
performance out of your
database is proper data
normalization.
Indexes
Indexes greatly speed searches
but take overhead. General
rule: Index primary keys and
columns used on right of
WHERE clause
Heavy Lifting
Databases are great at
handling big chunks of data
and can perform transactions.
Try to maximize effects of each
query.
Disks
Do not go cheap on disks. SSDs
do pay for themselves and
move heavy uses to separate
disks/controllers
Slow Query Log
Turn on the slow query log and
pay attention. Note: Some slow
queries are there because they
run a long time and just not
slow
Sys Schema
The Sys Schema was designed
to let you peer into the heart of
your instances and answer
questions like which indexes are
not used, who is hogging I/O
54
55. THANKS!Any questions?
You can find me at @stoker & david.stokes@oracle.com
Slides can be found at http://slideshare.net/davidmstokes
55