Performance Tipping Points - Hitting Hardware Bottlenecks

#MDBW17
Hitting hardware bottlenecks
PERFORMANCE TIPPING POINTS
Akira Kurogane
Senior TSE, MongoDB (Sydney)

WHY YOU'RE HERE TODAY
Curious about DB performance
Responsible for DB performance
You've experienced a bad
"Everything in production is slow!"
day at work before, and you'd
like to never have one again.

#MDBW17
DB PERFORMANCE IS A
COMPLEX EQUATION
• "What if the queries per second rate increased by 50%
compared to now?"
• "What if the queries and aggregations get larger on average?"
• "What if the read to write ratio changes?"
• "What if I downsize the server or use cheaper disk storage to
reduce cost?"
δx = . . . . .?

#MDBW17
DIFFERENT TECHNOLOGIES,
DIFFERENT PHYSICAL LIMITS
• CPU
• SRAM
• DRAM
• Flash memory
• (Magnetic) Disk heads
Powers of ten apart in data delivery
speed.

#MDBW17
TWO DIFFERENT PERFORMANCE-LIMITING
MECHANISMS
1. For any channel
The channel's throughput capacity is saturated -> bottleneck.
Mainly influenced by: The rate of db ops / sec * cost of avg op
2. Storage I/O channels
Small, fast storage layer is full -> the next, slower level is used
L1 Cache -> L2 -> L3 -> RAM -> Disk
Mainly influenced by: How much data is 'active'

#MDBW17
"ACTIVE DATA SET"
Active data set size is not derived simply from total data size, or server
specs.
My definition: The portion of your data where 99%* of reads
are expected to be completed within a fixed latency.
* or 99.9%, or 99.99%, etc., according to your preference
At PerformanceShopper.com we found 99.9% of reads are either on recently-inserted
documents, or from certain small collections.
If we get in-memory latencies for that 99.9% then disk latency for the other 0.1% is fine.
Our total data size may be ~1TB but the "active" documents are < 100GB.
Example:

#MDBW17
DEMO #1: CPU BOTTLENECK
db.collection.aggregate([
{ $match: { city: X } }, //Indexed lookup
{ $limit: 20000 }
])
db.collection.aggregate([
{ $match: { city: X } }, //Indexed lookup
{ $limit: 20000 },
{ $sort: { first_name: 1 } } //Test diff. Unindexed field
])

#MDBW17
Normal query/sec rate
Unwelcome surprise 100% CPU
bottleneck
A SUDDEN CHANGE IN PERFORMANCE

#MDBW17
DEMO #2: NETWORK BOTTLENECK
A simple query that can be switched between tiny and huge result
sizes.
db.collection.find(
/* find */ { _id: X, nested_array: Y },
//Nested array is large in every document
/*project*/ { _id: true, "nested_array.$": true }
//Oops: Let's 'accidentally' forget the ".$".
)
Avg result is ~ 0.4 kB when nested_array.$ is used; ~1.8 MB when it is not

#MDBW17
Large drop in
ops/sec
(Test-created)
huge change in
network usage
Disproportionately
small change to
server-side
latency
DEMO 2B: NETWORK BOTTLENECK (MIXED)

#MDBW17
DEMO #3: ACTIVE DATA SET GROWS BEYOND
WIREDTIGER CACHE
Collection "foo": 160 GB. Average document size is 1kb,
when uncompressed in the WiredTiger cache.
RAM on the server is 15 GB: WiredTiger cache set to
10GB.
Test query:
db.foo.find({_id: <val>})
for random _id within limited range.

#MDBW17
WHAT YOU SEE IN MONGODB
OPCOUNTERS

#MDBW17
SLOW COMMANDS IN THE LOGS
2017-05-28T23:03:29.710+0000 I COMMAND [conn19764] command ptp_as_demo.foo command: find
{ find: "foo", filter: { _id: 24401243 }, projection: { long_string: false } }
planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1
numYields:0 nreturned:1 reslen:459
locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } },
Collection: { acquireCount: { r: 1 } } }
protocol:op_query 1ms

#MDBW17
THE STORY IN WIREDTIGER CACHE
ACTIVITY
N.b. to make decompression
run slower than typical in this
test I artificially constrained
CPU cores to just 2.
2 cores
8 cores
100 MB/s
200 MB/s
300 MB/s

This test’s active data set size was kept within
that size.
So far this test is rigged to be pure RAM.
Disk was avoided.
• Default WiredTiger cache size 60% of RAM.
• Leaves 40% for OS and filesystem cache.
‒ Let's say 35% for filesystem page cache
Even with mildly compressible document data
more than twice the cache size of document
data will be in RAM.
It just needs to be decompressed on the fly.
RAM

#MDBW17
DEMO #4: ACTIVE DATA SET GROWS INTO
DISK RANGES
Continue the same test, gradually increasing the range
of data being queried to be ~10x than the WiredTiger
cache size.
Rough calculation: By the end > 70% of queries will
need to wait for disk.

#MDBW17
WHAT YOU SEE IN MONGODB OP
COUNTERS
Linear increase in active data set size
Welcome to disk-land!

#MDBW17
THE STORY IN WIREDTIGER CACHE ACTIVITY
RAM
RAM
↓
Decompress
↓
RAM
Disk
WT Cache Activity
(before)
Where now?

#MDBW17
AGGREGATE SUMS OF
DIFFERENT LATENCIES
Classic latency comparison numbers
--------------------------
Main memory reference 0.1 μs
Compress 1K bytes with snappy 3 μs
Read 4K randomly from SSD 150 μs
Read 1 MB sequentially from memory 250 μs
Read 1 MB seq'ly from 300MB/s SSD 3,300 μs
Disk seek 10,000 μs
Read 1 MB sequentially from disk 20,000 μs
105
102

#MDBW17
RAM vs Disk % Latency What the users say
100 / 0 25 ms (normal)
99 / 1 42 ms "Everything's really slow"
90 / 10 200 ms "Everything's broken'
50 / 50 1000 ms "What do you mean ETA ..."
0 / 100 2000 ms "... is next week?!"
Magnetic disk
THEORETICAL 100MB READ LATENCIES

#MDBW17
RAM vs SSD % Latency What the users say
100 / 0 25 ms (normal)
99 / 1 28 ms (normal)
90 / 10 50 ms "Everything's really slow"
50 / 50 175 ms "Everything's broken"
0 / 100 330 ms "ETA within today?"
THEORETICAL 100MB READ LATENCIES
Low-end SSD, 300MB/s

#MDBW17
WRITE LOADS
The previous demonstrations focused on read-only cases alone.
Writes are more I/O bound that reads.
Every write is involved in disk access at two points.
• First all writes go to journal. (Commits ~10 times per second)
• Asynchronously WiredTiger cache block marked 'dirty' ->
compressed -> fdatasync'ed to disk (once per min)
Key point: focus even more on disk util% and WiredTiger Cache
Activity than we did in the previous demonstrations.

#MDBW17
CPU, NETWORK BOTTLENECKS
• It's unlikely you're suffering from these
• But on the other hand it's not hard to check them
• Check it, forget it, move onto storage I/O

#MDBW17
STORAGE
• On a logarithmic scale the differences between disk latency and
RAM latency doesn't look so bad ....
... but here in the real, linear-time universe it is.
• Increased read MB/s into WiredTiger cache is not a problem if
it's being read from filesystem page cache, but:
• That metric growing from near-zero to 100's of MB/s gives you a
warning that the active data set is getting closer to ‘disk-land’.

Performance Tipping Points - Hitting Hardware Bottlenecks

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Performance Tipping Points - Hitting Hardware Bottlenecks

Similaire à Performance Tipping Points - Hitting Hardware Bottlenecks (20)

Plus de MongoDB

Plus de MongoDB (20)

Dernier

Dernier (20)

Performance Tipping Points - Hitting Hardware Bottlenecks

Notes de l'éditeur