SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
CaSSanDra:	
  An	
  SSD	
  
Boosted	
  Key-­‐Value	
  Store
Prashanth	
  Menon,	
  Tilmann	
  Rabl,	
  Mohammad	
  Sadoghi	
  (*),	
  
Hans-­‐Arno	
  Jacobsen
!1
*
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Outline
• ApplicaHon	
  Performance	
  Management	
  
• Cassandra	
  and	
  SSDs	
  
• Extending	
  Cassandra’s	
  Row	
  Cache	
  
• ImplemenHng	
  a	
  Dynamic	
  Schema	
  Catalogue	
  
• Conclusions
!2
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Modern	
  Enterprise	
  Architecture
• Many	
  different	
  soPware	
  systems	
  
• Complex	
  interacHons	
  
• Stateful	
  systems	
  oPen	
  distributed/parHHoned/replicated	
  
• Stateless	
  systems	
  certainly	
  duplicated
!3
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
ApplicaHon	
  Performance	
  Management
• Lightweight	
  agent	
  aSached	
  to	
  each	
  soPware	
  system	
  instance	
  
• Monitors	
  system	
  health	
  
• Traces	
  transacHons	
  
• Determines	
  root	
  causes	
  
• Raw	
  APM	
  metric:
!4
Agent
Agent
Agent
Agent
Agent Agent
AgentAgent
Agent
Agent
Agent
Agent
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
ApplicaHon	
  Performance	
  Management
• Problem:	
  Agents	
  have	
  short	
  memory	
  and	
  only	
  have	
  a	
  local	
  view	
  
• What	
  was	
  the	
  average	
  response	
  Hme	
  for	
  requests	
  served	
  by	
  servlet	
  X	
  
between	
  December	
  18-­‐31	
  2011?	
  
• What	
  was	
  the	
  average	
  Hme	
  spent	
  in	
  each	
  service/database	
  to	
  respond	
  
to	
  client	
  requests?
!5
Agent
Agent
Agent
Agent
Agent Agent
AgentAgent
Agent
Agent
Agent
Agent
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
APM	
  Metrics	
  Datastore
• All	
  agents	
  store	
  metric	
  data	
  in	
  high	
  write-­‐throughput	
  datastore	
  
• Metric	
  data	
  is	
  at	
  a	
  fine	
  granularity	
  (per-­‐acHon,	
  millisecond	
  etc)	
  
• User	
  now	
  has	
  global	
  view	
  of	
  metrics	
  
• What	
  is	
  the	
  best	
  database	
  to	
  store	
  APM	
  metrics?
!6
Agent
Agent
Agent
Agent
Agent Agent
AgentAgent
Agent
Agent
Agent
Agent
?
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Cassandra	
  Wins	
  APM
• APM	
  experiments	
  performed	
  by	
  Rabl	
  et	
  al.	
  [1]	
  	
  show	
  Cassandra	
  performs	
  
best	
  for	
  APM	
  use	
  case	
  
• In	
  memory	
  workloads	
  including	
  95%,	
  50%	
  and	
  5%	
  read	
  
• Workloads	
  requiring	
  disk	
  access	
  with	
  95%,	
  50%	
  and	
  5%	
  reads
!7
Read: 95%
0
50000
100000
150000
200000
250000
2 4 6 8 10 12
Throughput(Ops/sec)
Number of Nodes
Cassandra
HBase
Voldemort
VoltDB
Redis
MySQL
Figure 6: Throughput for Workload RW
0.1
1
10
100
1000
2 4 6 8 10 12
Latency(ms)-Logarithmic
Number of Nodes
Cassandra
HBase
Voldemort
VoltDB
Redis
MySQL
Read: 50%
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
2 4 6 8 10 12
Throughput(Operations/sec)
Number of Nodes
Cassandra
HBase
Voldemort
VoltDB
Redis
MySQL
Figure 3: Throughput for Workload R
million records per node, thus, scaling the problem size with the
cluster size. For each run, we used a freshly installed system and
loaded the data. We ran the workload for 10 minutes with max-
imum throughput. Figure 3 shows the maximum throughput for
workload R for all six systems.
In the experiment with only one node, Redis has the highest
throughput (more than 50K ops/sec) followed by VoltDB. There
are no significant differences between the throughput of Cassan-
dra and MySQL, which is about half that of Redis (25K ops/sec).
Voldemort is 2 times slower than Cassandra (with 12K ops/sec).
The slowest system in this test on a single node is HBase with 2.5K
operation per second. However, it is interesting to observe that the
0.1
1
10
100
2 4 6 8 10 12
Latency(ms)-Logarithmic
Number of Nodes
Cassandra
HBase
Voldemort
VoltDB
Redis
MySQL
Figure 4: Read latency for Workload R
0.01
0.1
1
10
100
2 4 6 8 10 12
Latency(ms)-Logarithmic
Number of Nodes
Cassandra
HBase
Voldemort
VoltDB
Redis
MySQL
Figure 5: Write latency for Workload R
[1] http://msrg.org/publications/pdf_files/2012/vldb12-bigdata-Solving_Big_Data_Challenges_fo.pdf
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Cassandra
• Built	
  at	
  Facebook	
  by	
  previous	
  Dynamo	
  engineers	
  
• Open	
  sourced	
  to	
  Apache	
  in	
  2009	
  
• DHT	
  with	
  consistent	
  hashing	
  
• MD5	
  hash	
  of	
  key	
  
• MulHple	
  nodes	
  handle	
  segments	
  of	
  ring	
  for	
  load	
  balancing	
  
• Dynamo	
  distribuHon	
  and	
  replicaHon	
  model	
  +	
  BigTable	
  storage	
  model
!8
Commit&&
Log&
Memtable&
SS&Tables&
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Cassandra	
  and	
  SSDs
• Improve	
  performance	
  by	
  either	
  adding	
  nodes	
  or	
  improving	
  per-­‐
node	
  performance	
  
• Node	
  performance	
  is	
  directly	
  dependent	
  on	
  the	
  disk	
  I/O	
  
performance	
  of	
  the	
  system	
  
• Cassandra	
  stores	
  two	
  enHHes	
  on	
  disk:	
  
• Commit	
  Log	
  
• SSTables	
  
• Should	
  SSDs	
  be	
  used	
  to	
  store	
  both?	
  
• We	
  evaluated	
  each	
  possible	
  configura<on
!9
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Experiment	
  Setup
• Server	
  specificaHon:	
  
• 2x	
  Intel	
  8-­‐core	
  X5450,	
  16GB	
  RAM,	
  2x	
  2TB	
  RAID0	
  HDD,	
  2x	
  250GB	
  Intel	
  x520	
  SSD	
  	
  
• Apache	
  Cassandra	
  1.10	
  
• Used	
  YCSB	
  benchmark	
  
• 100M	
  rows,	
  50GB	
  total	
  raw	
  data,	
  ‘latest’	
  distribuHon	
  
• 95%	
  read,	
  5%	
  write	
  
• Minimum	
  three	
  runs	
  per	
  workload,	
  fresh	
  data	
  on	
  each	
  run	
  
• Broken	
  into	
  phases:	
  
• Data	
  load	
  
• FragmentaHon	
  
• Cache	
  warm-­‐up	
  
• Workload	
  (>	
  12h	
  process)
!10
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
SSD	
  vs.	
  HDD
• LocaHon	
  of	
  log	
  is	
  irrelevant	
  
• LocaHon	
  of	
  data	
  is	
  important	
  
• DramaHc	
  performance	
  improvement	
  of	
  SSD	
  over	
  HDD	
  
• SSD	
  benefits	
  from	
  high	
  parallelism
!11
Configura<on #	
  of	
  clients #	
  of	
  threads/client Loca<on	
  of	
  Data Loca<on	
  of	
  Commit	
  Log
C1 1 2 RAID	
  (HDD) RAID	
  (HDD)
C2 1 2 RAID	
  (HDD) SSD
C3 1 2 SSD RAID	
  (HDD)
C4 1 2 SSD SSD
C5 4 16 RAID	
  (HDD) RAID	
  (HDD)
C6 4 16 SSD SSD
0
1000
2000
3000
4000
5000
6000
7000
8000
C1 C2 C3 C4 C5 C6
Throughput(ops/sec)
Configuration
(a) HDD vs SSD Throughput
0
1
2
3
4
5
6
7
8
C1 C2 C3 C4 C5 C6
Latency(ms)
Configuration
(b) HDD vs SDD Latency
0
1000
2000
3000
4000
5000
6000
7000
8000
HDD
Throughput(ops/sec)
Data
Empty Disk
Full Disk
(c) 99% Fill HDD v
Fig. 4. Throughput/Latency Results for HDD vs SSD and D
on HDD for the bulk of data that is infrequently accessed.
Another reason to do this is the fact that SSD performance
degrades with higher fill ratios. As seen in Figure 4(c), the
performance of a highly filled SSD degrades much worse than
This is becau
the SSD; in f
twice the amo
alone, achiev
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
SSD	
  vs.	
  HDD	
  (II)
• SSD	
  offers	
  more	
  than	
  7x	
  improvement	
  to	
  throughput	
  on	
  empty	
  disk	
  
• SSD	
  performance	
  degrades	
  by	
  half	
  as	
  storage	
  device	
  fills	
  up	
  
• Filling	
  the	
  SSD	
  or	
  running	
  it	
  near	
  capacity	
  is	
  not	
  advisable
!12
3 C4 C5 C6
iguration
SDD Latency
0
1000
2000
3000
4000
5000
6000
7000
8000
HDD SSD
Throughput(ops/sec)
Data Location
Empty Disk
Full Disk
(c) 99% Fill HDD vs SDD Throughput
0
50
100
150
200
250
HDD SSD
Latency(ms)
Data Location
Empty Disk
Full Disk
(d) 99% Fill HDD vs SDD Latency
t/Latency Results for HDD vs SSD and Disk Full vs Disk Empty
quently accessed.
SSD performance
Figure 4(c), the
much worse than
s to be noted that
, for write heavy
experienced.
This is because a larger portion of the hot data is cached on
the SSD; in fact, our configuration enabled storing more than
twice the amount of data than when using an in-memory cache
alone, achieving a cache-hit ratio of more than 85%. When
a read operation reaches the server for a row that does not
reside in the off-heap memory cache, only a single SSD seek
is required to fulfill the request. In addition, cached data is
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
SSD	
  vs.	
  HDD:	
  Summary
• Cassandra	
  benefits	
  most	
  when	
  storing	
  data	
  on	
  SSD	
  (not	
  the	
  log)	
  
• LocaHon	
  of	
  commit	
  log	
  not	
  important	
  
• SSD	
  performance	
  inversely	
  proporHonal	
  to	
  fill	
  raHo	
  
• Storing	
  all	
  data	
  on	
  SSD	
  is	
  uneconomical	
  
• Replacing	
  3TB	
  HDD	
  with	
  3x	
  1TB	
  SSD	
  is	
  10x	
  more	
  costly	
  
• SSDs	
  have	
  limited	
  lifeHme	
  (10-­‐50K	
  write-­‐erase	
  cycles),	
  replacement	
  
more	
  frequently	
  
• Rabl	
  et	
  al.	
  [1]	
  show	
  adding	
  node	
  is	
  100%	
  costlier,	
  with	
  100%	
  throughput	
  
improvement	
  
• Build	
  hybrid	
  system	
  to	
  get	
  comparable	
  performance	
  for	
  marginal	
  cost
!13
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Cassandra:	
  Read	
  +	
  Write	
  Path
• Write	
  path	
  is	
  fast:	
  
1. Write	
  update	
  into	
  commit	
  log	
  
2. Write	
  update	
  into	
  Memtable	
  
• Memtables	
  flush	
  to	
  SSTables	
  asynchronously	
  
when	
  full	
  
• Never	
  blocks	
  writes	
  
• Read	
  path	
  can	
  be	
  slow:	
  
1. Read	
  key-­‐value	
  from	
  Memtable	
  
2. Read	
  key-­‐value	
  from	
  each	
  SSTable	
  on	
  disk	
  
3. Construct	
  merged	
  view	
  of	
  row	
  from	
  each	
  
input	
  source
!14
ReadUpdate
Memtable
SSTableSSTableSSTable
SSTableSSTableSSTable
Memory
• Each	
  read	
  needs	
  to	
  do	
  O(#	
  of	
  SSTables)	
  I/O
Disk
Log
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Cassandra:	
  SSTables
• Cassandra	
  allows	
  blind-­‐writes	
  
• Row	
  data	
  can	
  be	
  fragmented	
  over	
  mulHple	
  SSTables	
  over	
  Hme	
  
!
!
!
!
• Bloom	
  filters	
  and	
  indexes	
  can	
  potenHally	
  help	
  
• Ul<mately,	
  mul<ple	
  fragments	
  need	
  to	
  be	
  read	
  from	
  disk
!15
Employee(ID( First(Name( Last(Name( Age( Department(ID(
99231234& Prashanth& Menon& 25& MSRG&
{SSTables
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Cassandra:	
  Row	
  Cache
• Row	
  cache	
  buffers	
  full	
  merged	
  row	
  in	
  
memory	
  
• Cache	
  miss	
  follows	
  regular	
  read	
  path,	
  
constructs	
  merged	
  row,	
  brings	
  into	
  cache	
  
• Makes	
  read	
  path	
  faster	
  for	
  frequently	
  
accessed	
  data	
  
• Problem:	
  Row	
  cache	
  occupies	
  memory	
  
• Takes	
  away	
  precious	
  memory	
  from	
  
rest	
  of	
  system
!16
• Extend	
  the	
  row	
  cache	
  efficiently	
  onto	
  SSD
ReadUpdate
Memtable
SSTableSSTableSSTable
SSTableSSTableSSTable
Memory
Disk
Log
Row Cache
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Extended	
  Row	
  Cache
• Extend	
  the	
  row	
  cache	
  onto	
  SSD	
  
• Chained	
  with	
  in-­‐memory	
  row	
  cache	
  
• LRU	
  in-­‐memory,	
  overflow	
  onto	
  LRU	
  
SSD	
  row	
  cache	
  
• Implemented	
  as	
  append-­‐only	
  cache	
  files	
  
• Efficient	
  sequenHal	
  writes	
  
• Fast	
  random	
  reads	
  
• Zero	
  I/O	
  for	
  hit	
  in	
  first	
  level	
  row	
  cache	
  
• One	
  random	
  I/O	
  on	
  SSD	
  for	
  second	
  level	
  
row	
  cache	
  
!17
Log SSTableSSTableSSTable
SSTableSSTableSSTable
Memory
Memtable
1rst Level Row
Cache
2nd Level Cache
Index
Disk
2nd Level Row Cache
SSD
ReadUpdate
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
EvaluaHon:	
  SSD	
  Row	
  Cache
• Setup:	
  
• 100M	
  rows,	
  50GB	
  total	
  data,	
  6GB	
  row	
  cache	
  
• Results:	
  
• 75%	
  improvement	
  in	
  throughput	
  
• 75%	
  improvement	
  in	
  latency	
  
• RAM-­‐only	
  cache	
  has	
  too	
  liSle	
  hit	
  raHo
!18
0
200
400
600
800
1000
95% 85% 75%
Throughput(ops/sec)
Read Percentage
Disabled
RAM
RAM+SSD
(a) Row Cache (Throughput)
0
1
2
3
4
5
6
7
8
95% 85% 75%
Latency(ms)
Read Percentage
Disabled
RAM
RAM+SSD
(b) Row Cache (Latency)
0
1000
2000
3000
4000
5000
6000
7000
95%
Throughput(ops/sec)
Re
Regular
Dynamic
(c) Dynamic Sc
Fig. 5. Throughput/Latency Results for Row Cache Exten
and we find this to be much more compelling. In normal
operation, data sizes averaged 6.8GB compressed after the
initial load of 40 million keys. With a modified Cassandra,
data sizes averaged at 6.01GB of data, a savings of roughly
10%. This value will grow as the number of columns in the
table grow and as column names grow in length.
Another potential benefit for dynamic schema model (omit-
we identify
key-value s
In this p
SSDs in k
figurations
and implem
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Dynamic	
  Schema
• Key-­‐value	
  stores	
  covet	
  schema-­‐less	
  data	
  model	
  
• Very	
  flexible,	
  good	
  for	
  highly	
  varying	
  data	
  
• Schemas	
  oPen	
  change,	
  defining	
  up	
  front	
  can	
  be	
  detrimental	
  
!
!
!
!
!
!
• ObservaHon:	
  many	
  big	
  data	
  applicaHons	
  have	
  relaHvely	
  stable	
  schemas	
  
• e.g.,	
  Click	
  stream,	
  APM,	
  sensor	
  data	
  etc.	
  
• Redundant	
  schemas	
  have	
  significant	
  overhead	
  in	
  I/O	
  and	
  space	
  usage
!19
Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988833' Value' 4' Max' 6' Min' 1'
Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988848' Value' 5' Max' 7' Min' 1'
Metric'Name' HostA/AgentX/Failures' Timestamp' 1332988849' All' 4' Warn' 3' Error' 1'
OnHDisk'Format'
Metric'Name' Timestamp' Value' Max' Min'
HostA/AgentX/AVGResponse' 1332988833' 4' 6' 1'
ApplicaKon'Format'
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Dynamic	
  Schema	
  (III)
• Don’t	
  serialize	
  redundant	
  schema	
  with	
  rows	
  
• Extract	
  schema	
  from	
  data,	
  store	
  on	
  SSD,	
  serialize	
  schema	
  ID	
  with	
  data	
  
• Allows	
  for	
  large	
  number	
  of	
  schemas
!20
Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988833' Value' 4' Max' 6' Min' 1'
Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988848' Value' 5' Max' 7' Min' 1'
Metric'Name' HostA/AgentX/Failures' Timestamp' 1332988849' All' 4' Warn' 3' Error' 1'
S1'
S2'
Metric'Name'Timestamp' Value' Max' Min'
Metric'Name'Timestamp' All' Warn' Error'
HostA/AgentX/AVGResponse'1332988833'S1' 4' 6' 1'
HostA/AgentX/AVGResponse'1332988848'
HostA/AgentX/Failures' 1332988849'
S1'
S2'
5' 7' 1'
4' 3' 1'
New'Disk'Format'Schema'Catalogue'
Old'Disk'Format'
SSD
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
EvaluaHon:	
  Dynamic	
  Schema
• Setup:	
  
• 40M	
  rows,	
  variable	
  columns	
  5-­‐10	
  (638	
  schemas),	
  6GB	
  row	
  cache	
  
• Results:	
  
• 10%	
  reducHon	
  in	
  disk	
  usage	
  (6.8GB	
  vs	
  6GB)	
  
• Slightly	
  improved	
  throughput,	
  stable	
  latency	
  
• EffecHve	
  SSD	
  usage	
  (only	
  random	
  reads)	
  &	
  reduce	
  I/O	
  and	
  space	
  usage
!21
85% 75%
Percentage
he (Latency)
0
1000
2000
3000
4000
5000
6000
7000
95% 50% 5%
Throughput(ops/sec)
Read Percentage
Regular
Dynamic
(c) Dynamic Schema (Throughput)
0
20
40
60
80
100
120
140
95% 50% 5%
Latency(ms)
Read Percentage
Regular
Dynamic
(d) Dynamic Schema (Latency)
atency Results for Row Cache Extension and Dynamic Schema
ing. In normal
essed after the
fied Cassandra,
ngs of roughly
columns in the
th.
ma model (omit-
we identify new avenues for exploiting the use of SSDs within
key-value stores, namely, our dynamic cataloguing technique.
VIII. CONCLUSION
In this paper, we investigated the performance benefits of
SSDs in key-value stores. We benchmarked different con-
figurations of SSD and HDD combinations. We proposed
and implemented two specific optimizations for SSD-HDD
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Conclusions
• Storing	
  Cassandra	
  commit	
  logs	
  on	
  SSD	
  doesn’t	
  help	
  
• Managing	
  SSDs	
  at	
  capacity	
  degrades	
  its	
  performance	
  
• Using	
  SSDs	
  as	
  a	
  secondary	
  row-­‐cache	
  dramaHcally	
  
improves	
  performance	
  
• ExtracHng	
  redundant	
  schemas	
  onto	
  and	
  SSD	
  reduces	
  
disk	
  space	
  usage	
  and	
  required	
  I/O
!22
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Thanks!
!
• QuesHons?	
  
!
• Contact:	
  	
  
• Prashanth	
  Menon	
  (prashanth.menon@utoronto.ca)
!23
UNIVERSITY OF TORONTO
UNIVERSITY OF
TORONTO
Fighting back:
Using observability tools to improve
the DBMS (not just diagnose it)
Ryan Johnson
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Future	
  Work
• What	
  types	
  of	
  tables	
  benefit	
  most	
  from	
  a	
  dynamic	
  
schema?	
  
• Impact	
  of	
  compacHon	
  on	
  read-­‐heavy	
  workloads	
  
• How	
  can	
  SSDs	
  be	
  used	
  to	
  improve	
  the	
  performance	
  of	
  
compacHon?	
  
• How	
  is	
  performance	
  when	
  storing	
  only	
  SSTable	
  indexes	
  
on	
  SSD?
!24

Contenu connexe

Tendances

Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Community
 
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsGlobal Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsMarco Obinu
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCCeph Community
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Raid data recovery Tips
Raid data recovery TipsRaid data recovery Tips
Raid data recovery TipsHone Software
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1DataStax Academy
 
NGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDANGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDAUniFabric
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Community
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Community
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCeph Community
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Community
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...DataStax Academy
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLYoshinori Matsunobu
 

Tendances (19)

Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsGlobal Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Raid data recovery Tips
Raid data recovery TipsRaid data recovery Tips
Raid data recovery Tips
 
Raid level 4
Raid level 4Raid level 4
Raid level 4
 
Understanding RAID Controller
Understanding RAID ControllerUnderstanding RAID Controller
Understanding RAID Controller
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
NGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDANGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDA
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQL
 

En vedette

Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackMirantis
 
Arakoon: A distributed consistent key-value store
Arakoon: A distributed consistent key-value storeArakoon: A distributed consistent key-value store
Arakoon: A distributed consistent key-value storeNicolas Trangez
 
SILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value StoreSILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value StoreMahdi Atawneh
 
C22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by Taichi Umeda
C22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by  Taichi UmedaC22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by  Taichi Umeda
C22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by Taichi UmedaInsight Technology, Inc.
 
Pengertian FO (Fiber Optik)
Pengertian FO (Fiber Optik)Pengertian FO (Fiber Optik)
Pengertian FO (Fiber Optik)Febry San
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 
なぜApache HBaseを選ぶのか? #cwt2013
なぜApache HBaseを選ぶのか? #cwt2013なぜApache HBaseを選ぶのか? #cwt2013
なぜApache HBaseを選ぶのか? #cwt2013Cloudera Japan
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
Redis on NVMe SSD - Zvika Guz, Samsung
 Redis on NVMe SSD - Zvika Guz, Samsung Redis on NVMe SSD - Zvika Guz, Samsung
Redis on NVMe SSD - Zvika Guz, SamsungRedis Labs
 
09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)
09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)
09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)Akhila Dakshina
 
5分でわかる Apache HBase 最新版 #hcj2014
5分でわかる Apache HBase 最新版 #hcj20145分でわかる Apache HBase 最新版 #hcj2014
5分でわかる Apache HBase 最新版 #hcj2014Cloudera Japan
 

En vedette (18)

Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStack
 
Cassandra for Rails
Cassandra for RailsCassandra for Rails
Cassandra for Rails
 
Arakoon: A distributed consistent key-value store
Arakoon: A distributed consistent key-value storeArakoon: A distributed consistent key-value store
Arakoon: A distributed consistent key-value store
 
SILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value StoreSILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value Store
 
Gossip事始め
Gossip事始めGossip事始め
Gossip事始め
 
Consistency level
Consistency levelConsistency level
Consistency level
 
C22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by Taichi Umeda
C22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by  Taichi UmedaC22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by  Taichi Umeda
C22 スプリットブレインになっても一貫性を保証するインメモリデータグリッド製品 by Taichi Umeda
 
Cassandra0.7
Cassandra0.7Cassandra0.7
Cassandra0.7
 
Pengertian FO (Fiber Optik)
Pengertian FO (Fiber Optik)Pengertian FO (Fiber Optik)
Pengertian FO (Fiber Optik)
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Cassandra3.0
Cassandra3.0Cassandra3.0
Cassandra3.0
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
なぜApache HBaseを選ぶのか? #cwt2013
なぜApache HBaseを選ぶのか? #cwt2013なぜApache HBaseを選ぶのか? #cwt2013
なぜApache HBaseを選ぶのか? #cwt2013
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Redis on NVMe SSD - Zvika Guz, Samsung
 Redis on NVMe SSD - Zvika Guz, Samsung Redis on NVMe SSD - Zvika Guz, Samsung
Redis on NVMe SSD - Zvika Guz, Samsung
 
09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)
09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)
09. Memory, Storage (RAM, Cache, HDD, ODD, SSD, Flashdrives)
 
5分でわかる Apache HBase 最新版 #hcj2014
5分でわかる Apache HBase 最新版 #hcj20145分でわかる Apache HBase 最新版 #hcj2014
5分でわかる Apache HBase 最新版 #hcj2014
 

Similaire à CaSSanDra: An SSD Boosted Key-Value Store

BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsMatthew Dennis
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningScott Jenner
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarDenny Lee
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarDenny Lee
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraAmazon Web Services
 
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentationandyman3000
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Toronto-Oracle-Users-Group
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 

Similaire à CaSSanDra: An SSD Boosted Key-Value Store (20)

BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
disertation
disertationdisertation
disertation
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentation
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 

Plus de Tilmann Rabl

TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTilmann Rabl
 
Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarksTilmann Rabl
 
Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking TutorialTilmann Rabl
 
A BigBench Implementation in the Hadoop Ecosystem
A BigBench Implementation in the Hadoop EcosystemA BigBench Implementation in the Hadoop Ecosystem
A BigBench Implementation in the Hadoop EcosystemTilmann Rabl
 
MADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event StoreMADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event StoreTilmann Rabl
 
Rapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGFRapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGFTilmann Rabl
 
Solving Big Data Challenges for Enterprise Application Performance Management
Solving Big Data Challenges for Enterprise Application Performance ManagementSolving Big Data Challenges for Enterprise Application Performance Management
Solving Big Data Challenges for Enterprise Application Performance ManagementTilmann Rabl
 

Plus de Tilmann Rabl (7)

TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data Integration
 
Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarks
 
Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking Tutorial
 
A BigBench Implementation in the Hadoop Ecosystem
A BigBench Implementation in the Hadoop EcosystemA BigBench Implementation in the Hadoop Ecosystem
A BigBench Implementation in the Hadoop Ecosystem
 
MADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event StoreMADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event Store
 
Rapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGFRapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGF
 
Solving Big Data Challenges for Enterprise Application Performance Management
Solving Big Data Challenges for Enterprise Application Performance ManagementSolving Big Data Challenges for Enterprise Application Performance Management
Solving Big Data Challenges for Enterprise Application Performance Management
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

CaSSanDra: An SSD Boosted Key-Value Store

  • 1. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG CaSSanDra:  An  SSD   Boosted  Key-­‐Value  Store Prashanth  Menon,  Tilmann  Rabl,  Mohammad  Sadoghi  (*),   Hans-­‐Arno  Jacobsen !1 *
  • 2. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Outline • ApplicaHon  Performance  Management   • Cassandra  and  SSDs   • Extending  Cassandra’s  Row  Cache   • ImplemenHng  a  Dynamic  Schema  Catalogue   • Conclusions !2
  • 3. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Modern  Enterprise  Architecture • Many  different  soPware  systems   • Complex  interacHons   • Stateful  systems  oPen  distributed/parHHoned/replicated   • Stateless  systems  certainly  duplicated !3
  • 4. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG ApplicaHon  Performance  Management • Lightweight  agent  aSached  to  each  soPware  system  instance   • Monitors  system  health   • Traces  transacHons   • Determines  root  causes   • Raw  APM  metric: !4 Agent Agent Agent Agent Agent Agent AgentAgent Agent Agent Agent Agent
  • 5. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG ApplicaHon  Performance  Management • Problem:  Agents  have  short  memory  and  only  have  a  local  view   • What  was  the  average  response  Hme  for  requests  served  by  servlet  X   between  December  18-­‐31  2011?   • What  was  the  average  Hme  spent  in  each  service/database  to  respond   to  client  requests? !5 Agent Agent Agent Agent Agent Agent AgentAgent Agent Agent Agent Agent
  • 6. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG APM  Metrics  Datastore • All  agents  store  metric  data  in  high  write-­‐throughput  datastore   • Metric  data  is  at  a  fine  granularity  (per-­‐acHon,  millisecond  etc)   • User  now  has  global  view  of  metrics   • What  is  the  best  database  to  store  APM  metrics? !6 Agent Agent Agent Agent Agent Agent AgentAgent Agent Agent Agent Agent ?
  • 7. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Cassandra  Wins  APM • APM  experiments  performed  by  Rabl  et  al.  [1]    show  Cassandra  performs   best  for  APM  use  case   • In  memory  workloads  including  95%,  50%  and  5%  read   • Workloads  requiring  disk  access  with  95%,  50%  and  5%  reads !7 Read: 95% 0 50000 100000 150000 200000 250000 2 4 6 8 10 12 Throughput(Ops/sec) Number of Nodes Cassandra HBase Voldemort VoltDB Redis MySQL Figure 6: Throughput for Workload RW 0.1 1 10 100 1000 2 4 6 8 10 12 Latency(ms)-Logarithmic Number of Nodes Cassandra HBase Voldemort VoltDB Redis MySQL Read: 50% 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 2 4 6 8 10 12 Throughput(Operations/sec) Number of Nodes Cassandra HBase Voldemort VoltDB Redis MySQL Figure 3: Throughput for Workload R million records per node, thus, scaling the problem size with the cluster size. For each run, we used a freshly installed system and loaded the data. We ran the workload for 10 minutes with max- imum throughput. Figure 3 shows the maximum throughput for workload R for all six systems. In the experiment with only one node, Redis has the highest throughput (more than 50K ops/sec) followed by VoltDB. There are no significant differences between the throughput of Cassan- dra and MySQL, which is about half that of Redis (25K ops/sec). Voldemort is 2 times slower than Cassandra (with 12K ops/sec). The slowest system in this test on a single node is HBase with 2.5K operation per second. However, it is interesting to observe that the 0.1 1 10 100 2 4 6 8 10 12 Latency(ms)-Logarithmic Number of Nodes Cassandra HBase Voldemort VoltDB Redis MySQL Figure 4: Read latency for Workload R 0.01 0.1 1 10 100 2 4 6 8 10 12 Latency(ms)-Logarithmic Number of Nodes Cassandra HBase Voldemort VoltDB Redis MySQL Figure 5: Write latency for Workload R [1] http://msrg.org/publications/pdf_files/2012/vldb12-bigdata-Solving_Big_Data_Challenges_fo.pdf
  • 8. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Cassandra • Built  at  Facebook  by  previous  Dynamo  engineers   • Open  sourced  to  Apache  in  2009   • DHT  with  consistent  hashing   • MD5  hash  of  key   • MulHple  nodes  handle  segments  of  ring  for  load  balancing   • Dynamo  distribuHon  and  replicaHon  model  +  BigTable  storage  model !8 Commit&& Log& Memtable& SS&Tables&
  • 9. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Cassandra  and  SSDs • Improve  performance  by  either  adding  nodes  or  improving  per-­‐ node  performance   • Node  performance  is  directly  dependent  on  the  disk  I/O   performance  of  the  system   • Cassandra  stores  two  enHHes  on  disk:   • Commit  Log   • SSTables   • Should  SSDs  be  used  to  store  both?   • We  evaluated  each  possible  configura<on !9
  • 10. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Experiment  Setup • Server  specificaHon:   • 2x  Intel  8-­‐core  X5450,  16GB  RAM,  2x  2TB  RAID0  HDD,  2x  250GB  Intel  x520  SSD     • Apache  Cassandra  1.10   • Used  YCSB  benchmark   • 100M  rows,  50GB  total  raw  data,  ‘latest’  distribuHon   • 95%  read,  5%  write   • Minimum  three  runs  per  workload,  fresh  data  on  each  run   • Broken  into  phases:   • Data  load   • FragmentaHon   • Cache  warm-­‐up   • Workload  (>  12h  process) !10
  • 11. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG SSD  vs.  HDD • LocaHon  of  log  is  irrelevant   • LocaHon  of  data  is  important   • DramaHc  performance  improvement  of  SSD  over  HDD   • SSD  benefits  from  high  parallelism !11 Configura<on #  of  clients #  of  threads/client Loca<on  of  Data Loca<on  of  Commit  Log C1 1 2 RAID  (HDD) RAID  (HDD) C2 1 2 RAID  (HDD) SSD C3 1 2 SSD RAID  (HDD) C4 1 2 SSD SSD C5 4 16 RAID  (HDD) RAID  (HDD) C6 4 16 SSD SSD 0 1000 2000 3000 4000 5000 6000 7000 8000 C1 C2 C3 C4 C5 C6 Throughput(ops/sec) Configuration (a) HDD vs SSD Throughput 0 1 2 3 4 5 6 7 8 C1 C2 C3 C4 C5 C6 Latency(ms) Configuration (b) HDD vs SDD Latency 0 1000 2000 3000 4000 5000 6000 7000 8000 HDD Throughput(ops/sec) Data Empty Disk Full Disk (c) 99% Fill HDD v Fig. 4. Throughput/Latency Results for HDD vs SSD and D on HDD for the bulk of data that is infrequently accessed. Another reason to do this is the fact that SSD performance degrades with higher fill ratios. As seen in Figure 4(c), the performance of a highly filled SSD degrades much worse than This is becau the SSD; in f twice the amo alone, achiev
  • 12. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG SSD  vs.  HDD  (II) • SSD  offers  more  than  7x  improvement  to  throughput  on  empty  disk   • SSD  performance  degrades  by  half  as  storage  device  fills  up   • Filling  the  SSD  or  running  it  near  capacity  is  not  advisable !12 3 C4 C5 C6 iguration SDD Latency 0 1000 2000 3000 4000 5000 6000 7000 8000 HDD SSD Throughput(ops/sec) Data Location Empty Disk Full Disk (c) 99% Fill HDD vs SDD Throughput 0 50 100 150 200 250 HDD SSD Latency(ms) Data Location Empty Disk Full Disk (d) 99% Fill HDD vs SDD Latency t/Latency Results for HDD vs SSD and Disk Full vs Disk Empty quently accessed. SSD performance Figure 4(c), the much worse than s to be noted that , for write heavy experienced. This is because a larger portion of the hot data is cached on the SSD; in fact, our configuration enabled storing more than twice the amount of data than when using an in-memory cache alone, achieving a cache-hit ratio of more than 85%. When a read operation reaches the server for a row that does not reside in the off-heap memory cache, only a single SSD seek is required to fulfill the request. In addition, cached data is
  • 13. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG SSD  vs.  HDD:  Summary • Cassandra  benefits  most  when  storing  data  on  SSD  (not  the  log)   • LocaHon  of  commit  log  not  important   • SSD  performance  inversely  proporHonal  to  fill  raHo   • Storing  all  data  on  SSD  is  uneconomical   • Replacing  3TB  HDD  with  3x  1TB  SSD  is  10x  more  costly   • SSDs  have  limited  lifeHme  (10-­‐50K  write-­‐erase  cycles),  replacement   more  frequently   • Rabl  et  al.  [1]  show  adding  node  is  100%  costlier,  with  100%  throughput   improvement   • Build  hybrid  system  to  get  comparable  performance  for  marginal  cost !13
  • 14. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Cassandra:  Read  +  Write  Path • Write  path  is  fast:   1. Write  update  into  commit  log   2. Write  update  into  Memtable   • Memtables  flush  to  SSTables  asynchronously   when  full   • Never  blocks  writes   • Read  path  can  be  slow:   1. Read  key-­‐value  from  Memtable   2. Read  key-­‐value  from  each  SSTable  on  disk   3. Construct  merged  view  of  row  from  each   input  source !14 ReadUpdate Memtable SSTableSSTableSSTable SSTableSSTableSSTable Memory • Each  read  needs  to  do  O(#  of  SSTables)  I/O Disk Log
  • 15. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Cassandra:  SSTables • Cassandra  allows  blind-­‐writes   • Row  data  can  be  fragmented  over  mulHple  SSTables  over  Hme   ! ! ! ! • Bloom  filters  and  indexes  can  potenHally  help   • Ul<mately,  mul<ple  fragments  need  to  be  read  from  disk !15 Employee(ID( First(Name( Last(Name( Age( Department(ID( 99231234& Prashanth& Menon& 25& MSRG& {SSTables
  • 16. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Cassandra:  Row  Cache • Row  cache  buffers  full  merged  row  in   memory   • Cache  miss  follows  regular  read  path,   constructs  merged  row,  brings  into  cache   • Makes  read  path  faster  for  frequently   accessed  data   • Problem:  Row  cache  occupies  memory   • Takes  away  precious  memory  from   rest  of  system !16 • Extend  the  row  cache  efficiently  onto  SSD ReadUpdate Memtable SSTableSSTableSSTable SSTableSSTableSSTable Memory Disk Log Row Cache
  • 17. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Extended  Row  Cache • Extend  the  row  cache  onto  SSD   • Chained  with  in-­‐memory  row  cache   • LRU  in-­‐memory,  overflow  onto  LRU   SSD  row  cache   • Implemented  as  append-­‐only  cache  files   • Efficient  sequenHal  writes   • Fast  random  reads   • Zero  I/O  for  hit  in  first  level  row  cache   • One  random  I/O  on  SSD  for  second  level   row  cache   !17 Log SSTableSSTableSSTable SSTableSSTableSSTable Memory Memtable 1rst Level Row Cache 2nd Level Cache Index Disk 2nd Level Row Cache SSD ReadUpdate
  • 18. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG EvaluaHon:  SSD  Row  Cache • Setup:   • 100M  rows,  50GB  total  data,  6GB  row  cache   • Results:   • 75%  improvement  in  throughput   • 75%  improvement  in  latency   • RAM-­‐only  cache  has  too  liSle  hit  raHo !18 0 200 400 600 800 1000 95% 85% 75% Throughput(ops/sec) Read Percentage Disabled RAM RAM+SSD (a) Row Cache (Throughput) 0 1 2 3 4 5 6 7 8 95% 85% 75% Latency(ms) Read Percentage Disabled RAM RAM+SSD (b) Row Cache (Latency) 0 1000 2000 3000 4000 5000 6000 7000 95% Throughput(ops/sec) Re Regular Dynamic (c) Dynamic Sc Fig. 5. Throughput/Latency Results for Row Cache Exten and we find this to be much more compelling. In normal operation, data sizes averaged 6.8GB compressed after the initial load of 40 million keys. With a modified Cassandra, data sizes averaged at 6.01GB of data, a savings of roughly 10%. This value will grow as the number of columns in the table grow and as column names grow in length. Another potential benefit for dynamic schema model (omit- we identify key-value s In this p SSDs in k figurations and implem
  • 19. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Dynamic  Schema • Key-­‐value  stores  covet  schema-­‐less  data  model   • Very  flexible,  good  for  highly  varying  data   • Schemas  oPen  change,  defining  up  front  can  be  detrimental   ! ! ! ! ! ! • ObservaHon:  many  big  data  applicaHons  have  relaHvely  stable  schemas   • e.g.,  Click  stream,  APM,  sensor  data  etc.   • Redundant  schemas  have  significant  overhead  in  I/O  and  space  usage !19 Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988833' Value' 4' Max' 6' Min' 1' Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988848' Value' 5' Max' 7' Min' 1' Metric'Name' HostA/AgentX/Failures' Timestamp' 1332988849' All' 4' Warn' 3' Error' 1' OnHDisk'Format' Metric'Name' Timestamp' Value' Max' Min' HostA/AgentX/AVGResponse' 1332988833' 4' 6' 1' ApplicaKon'Format'
  • 20. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Dynamic  Schema  (III) • Don’t  serialize  redundant  schema  with  rows   • Extract  schema  from  data,  store  on  SSD,  serialize  schema  ID  with  data   • Allows  for  large  number  of  schemas !20 Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988833' Value' 4' Max' 6' Min' 1' Metric'Name' HostA/AgentX/AVGResponse' Timestamp' 1332988848' Value' 5' Max' 7' Min' 1' Metric'Name' HostA/AgentX/Failures' Timestamp' 1332988849' All' 4' Warn' 3' Error' 1' S1' S2' Metric'Name'Timestamp' Value' Max' Min' Metric'Name'Timestamp' All' Warn' Error' HostA/AgentX/AVGResponse'1332988833'S1' 4' 6' 1' HostA/AgentX/AVGResponse'1332988848' HostA/AgentX/Failures' 1332988849' S1' S2' 5' 7' 1' 4' 3' 1' New'Disk'Format'Schema'Catalogue' Old'Disk'Format' SSD
  • 21. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG EvaluaHon:  Dynamic  Schema • Setup:   • 40M  rows,  variable  columns  5-­‐10  (638  schemas),  6GB  row  cache   • Results:   • 10%  reducHon  in  disk  usage  (6.8GB  vs  6GB)   • Slightly  improved  throughput,  stable  latency   • EffecHve  SSD  usage  (only  random  reads)  &  reduce  I/O  and  space  usage !21 85% 75% Percentage he (Latency) 0 1000 2000 3000 4000 5000 6000 7000 95% 50% 5% Throughput(ops/sec) Read Percentage Regular Dynamic (c) Dynamic Schema (Throughput) 0 20 40 60 80 100 120 140 95% 50% 5% Latency(ms) Read Percentage Regular Dynamic (d) Dynamic Schema (Latency) atency Results for Row Cache Extension and Dynamic Schema ing. In normal essed after the fied Cassandra, ngs of roughly columns in the th. ma model (omit- we identify new avenues for exploiting the use of SSDs within key-value stores, namely, our dynamic cataloguing technique. VIII. CONCLUSION In this paper, we investigated the performance benefits of SSDs in key-value stores. We benchmarked different con- figurations of SSD and HDD combinations. We proposed and implemented two specific optimizations for SSD-HDD
  • 22. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Conclusions • Storing  Cassandra  commit  logs  on  SSD  doesn’t  help   • Managing  SSDs  at  capacity  degrades  its  performance   • Using  SSDs  as  a  secondary  row-­‐cache  dramaHcally   improves  performance   • ExtracHng  redundant  schemas  onto  and  SSD  reduces   disk  space  usage  and  required  I/O !22
  • 23. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Thanks! ! • QuesHons?   ! • Contact:     • Prashanth  Menon  (prashanth.menon@utoronto.ca) !23
  • 24. UNIVERSITY OF TORONTO UNIVERSITY OF TORONTO Fighting back: Using observability tools to improve the DBMS (not just diagnose it) Ryan Johnson MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Future  Work • What  types  of  tables  benefit  most  from  a  dynamic   schema?   • Impact  of  compacHon  on  read-­‐heavy  workloads   • How  can  SSDs  be  used  to  improve  the  performance  of   compacHon?   • How  is  performance  when  storing  only  SSTable  indexes   on  SSD? !24