Publicité

When it all goes wrong | PGConf EU 2019 | Will Leinweber

24 Oct 2019
Publicité

Contenu connexe

Publicité

Plus de Citus Data(20)

Publicité

When it all goes wrong | PGConf EU 2019 | Will Leinweber

  1. When it all Goes Wrong
  2. @leinweber Will Leinweber @leinweber Citus Data (Microsoft) bitfission.com
 (warning autoplays midi)
  3. @leinweber coming from citus cloud heroku postgres
  4. @leinweber special thanks citus cloud
 — dan farina (@danfarina) heroku postgres
 — maciek sakrejda (@uhoh_itsmaciek)
  5. @leinweber same sorts of problems from pages & alerts from support tickets
  6. @leinweber this talk more app dev who uses postgres
 rather than dba
  7. @leinweber the problem with Postgres it’s pretty good you don’t get experience with how it breaks
  8. @leinweber what to do for a problem
  9. @leinweber what to do for a problem
  10. @leinweber complicated system network hardware o/s postgres
  11. @leinweber using the database (too much) 95% application 4% auto vacuum 1% everything else
  12. @leinweber hard to convince all the graphs saying DB is slow and nothing has changed …must be the database!
  13. @leinweber https://upload.wikimedia.org/wikipedia/commons/9/98/Survivorship-bias.png
  14. @leinweber “but I didn’t change anything” no deploys! no database migrations! no scaling!
  15. @leinweber “but I didn’t change anything” https://upload.wikimedia.org/wikipedia/commons/0/09/Redherring.gif
  16. @leinweber “but I didn’t change anything” more traffic? change in access patterns? one big user logged in?
  17. @leinweber run out of a resource
  18. @leinweber snowball
  19. @leinweber example manageable user 1s query => 2x expensive frequent, small queries 3ms => 12ms
  20. @leinweber assumptions hardware maintenance app
  21. @leinweber assumptions postgres should not crash …with overcommit off and no containers large extensions increase chance
  22. @leinweber if not postgres, then what
  23. @leinweber system resources cpu memory disk parallelism / backends locks
  24. @leinweber cpu mem disk parallelismcpu mem disk parallelism
  25. @leinweber cpu mem disk parallelism credentials wrong networking broken locking issue, check pg_locks idle in transaction
  26. @leinweber cpu mem disk parallelism application submitting backlogged workload connection leak pool sizes set too large pg_lock issue + application backlog
  27. @leinweber cpu mem disk parallelism workload skew causing thrashing unusual sequential scan workload failover or restart => no cache pg_prewarm
  28. @leinweber cpu mem disk parallelism same as just disk, but also the application is piling on
  29. @leinweber cpu mem disk parallelism large GROUP BYs high disk latency due to unusual page dispersion pattern in the workload
  30. @leinweber cpu mem disk parallelism workload has high mem (GROUP BY)
 + app adding backlog lock contention slowing mem release
  31. @leinweber cpu mem disk parallelism large GROUP BYs + paging in unusual data
  32. @leinweber cpu mem disk parallelism Look for what is causing disk access
  33. @leinweber cpu mem disk parallelism small, in-memory workload lots of seq scans on small table index scan w/ filter dropping lots
  34. @leinweber cpu mem disk parallelism app backlog 
 + too much processing on small data simply a lot of work
  35. @leinweber cpu mem disk parallelism large seq scans
  36. @leinweber cpu mem disk parallelism loading cold data + application backlog
  37. @leinweber cpu mem disk parallelism small # of backends doing a lot more work
  38. @leinweber cpu mem disk parallelism entity, workload, entity*workload soft deletes and non-conditional indexes
  39. @leinweber cpu mem disk parallelism reporting query
  40. @leinweber cpu mem disk parallelism app backlog, but with CPU/mem problems
  41. @leinweber tools of the trade
  42. @leinweber tools of the trade C symbols
  43. @leinweber tools of the trade: perf perf record -p <pid> && perf report
  44. @leinweber tools of the trade: perf perf top
  45. @leinweber tools of the trade: perf www.brendangregg.com/perf.html
  46. @leinweber tools of the trade: gdb gdb -batch -ex 'bt' -p <pid>
  47. @leinweber
  48. @leinweber
  49. @leinweber tools of the trade: iostat iostat -xm 10
  50. @leinweber tools of the trade: iotop
  51. @leinweber tools of the trade: htop
  52. @leinweber Tools of the trade: bwm-ng
  53. @leinweber tools of the trade: backends pgrep -lf postgres + grep + wc select * from pg_stat_activity
  54. @leinweber tools of the trade: pg_s_s select * from pg_stat_statements
  55. @leinweber tools of the trade: summary cpu mem disk parallelism network perf x gdb x iostat x iotop x htop x x bwm x pgrep x
  56. @leinweber what to do
  57. @leinweber what to do configuration change
  58. @leinweber what to do db change
  59. @leinweber what to do code change
  60. @leinweber flirting with disaster Velocity NY 2013: Richard Cook
 "Resilience In Complex Adaptive Systems” Jens Rasmussen:
 Risk management in a dynamic society: a modeling problem
  61. @leinweber flirting with disaster economic boundary
  62. @leinweber flirting with disaster economic boundary workload boundary
  63. @leinweber flirting with disaster economic boundary workload boundary performance boundary
  64. @leinweber flirting with disaster economic boundary workload boundary performance boundary error margin
  65. @leinweber flirting with disaster economic boundary workload boundary performance boundary
  66. @leinweber flirting with disaster economic boundary workload boundary performance boundary error margin
  67. @leinweber flirting with disaster economic boundary workload boundary performance boundary error margin
  68. @leinweber flirting with disaster Velocity NY 2013: Richard Cook
 "Resilience In Complex Adaptive Systems” Jens Rasmussen: 
 Risk management in a dynamic society: a modeling problem
  69. @leinweber thank youWill Leinweber @leinweber citusdata.com
Publicité