Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

PostgreSQL on EXT4, XFS, BTRFS and ZFS

39 074 vues

Publié le

A comparison of how PostgreSQL performs on current Linux file systems - ext4, XFS, BTRFS and ZFS, with pgbench and (a subset of) TPC-DS.

Publié dans : Logiciels

PostgreSQL on EXT4, XFS, BTRFS and ZFS

  1. 1. PostgreSQL on EXT3/4, XFS, BTRFS and ZFS comparing modern (Linux) file systems Tomas Vondra <tomas@2ndquadrant.com>
  2. 2. Linux file systems ● plenty of choices, with different – goals, features, tuning options – maturity level, reliability – ext3/4, XFS – traditional, design from the 90s – improving over time, reasonably “modern” ● BTRFS, ZFS – next-generation, new architecture / design ● other (not included in this talk) – log-organized file systems, distributed, clustered, ...
  3. 3. EXT3, EXT4, XFS
  4. 4. EXT3, EXT4, XFS - history ● ext3 (2001) / ext4 (2008) – evolution of original Linux filesystem (ext, ext2, ...) – continuous improvements / fixes ● XFS (2002) – originally from SGI Irix 5.3 (1994) – 2000 released under GPL – 2002 merged into 2.5.36 ● both are – reliable journaling file systems – proven by time on many deployments
  5. 5. EXT3, EXT4, XFS - features ● traditional design with journal ● not handling – multiple devices – volume management – snapshots – ... ● need additional layers for those things – hardware RAID – software RAID (dm) – LVM / LVM2
  6. 6. EXT3, EXT4, XFS - evolution ● conceived in times of rotational storage – mostly work with SSD – stop-gap for future storage (NVRAM, ...) ● evolution, not a revolution (mostly) – fixing bugs (some real, some imaginary) – adding features (e.g. TRIM, barriers, ...) – scalability improvements (metadata, ...) – be careful when reading old articles / benchmarks – be vary of anecdotal evidence (without context) – synthetic benchmarks are misleading
  7. 7. EXT3, EXT4, XFS - sources ● Linux Filesystems: Where did they come from? (Dave Chinner @ linux.conf.au 2014) https://www.youtube.com/watch?v=SMcVdZk7wV8 ● Ted Ts'o on the ext4 Filesystem (Ted Ts'o, NYLUG, 2013) https://www.youtube.com/watch?v=2mYDFr5T4tY ● XFS: There and Back … and There Again? (Dave Chinner @ Vault 2015) https://lwn.net/Articles/638546/ ● XFS: Recent and Future Adventures in Filesystem Scalability (Dave Chinner, linux.conf.au 2012) https://www.youtube.com/watch?v=FegjLbCnoBw ● XFS: the filesystem of the future? (Jonathan Corbet, Dave Chinner, LWN, 2012) http://lwn.net/Articles/476263/
  8. 8. BTRFS, ZFS
  9. 9. BTRFS, ZFS - goals ● ideas – integrate the layers – design for commodity hardware (expect failures) – design for huge data volumes ● so that we get … – flexible management – built-in snapshotting – compression, deduplication – checksums – ...
  10. 10. BTRFS, ZFS - history ● BTRFS – merged in 2009, but considered “experimental” – on-disk format “stable” (1.0) – some claim it’s “stable” but I doubt that … – (What are the criteria for filesystem to be “stable”?) ● ZFS – originally from Solaris, but got Oracled :-( – today a bit fragmented development – available on other BSD systems (FreeBSD) – “ZFS on Linux” project (CDDL vs. GPL)
  11. 11. Tuning options
  12. 12. Generic tuning options ● TRIM (discard) – enable / disable TRIM on SSDs – impacts garbage collection / wear leveling ● write barriers – prevent disk from optimizing order of writes – still may loose data, but no filesystem corruption – write cache + battery => disable barriers ● SSD alignment – alignment on SSDs matter (pages, blocks, …) – not dedicated tuning options (can use stripe unit / width)
  13. 13. BTRFS tuning options ● nodatacow (BTRFS) – disable copy on write – still can do snapshots (will do necessary COW) – disables checksums (needs full COW) ● zfs_arc_max – limit the size of ARC cache – should be released automatically, but ...
  14. 14. BTRFS tuning options ● recordsize=8kB – match the fs page with PostgreSQL page ● ashift=13 (8kB) – align the writes to SSD pages ● primarycache=metadata – prevent double buffering (shared buffers) http://open-zfs.org/wiki/Performance_tuning
  15. 15. file systems
  16. 16. ● ext3 (default) ● default ● ext4 ● default ● discard, nobarrier, stripe-width ● xfs ● default ● LVM ● LVM + snapshot ● discard, nobarrier ● discard, nobarrier, agcount, sunit/swidth
  17. 17. ● btrfs ● default ● nodatacow ● nodiscard (+fstrim) ● zfs ● default ● recordsize=8k, ashift=13, primarycache=metadata (open-zfs) ● recordsize=8k, ashift=13, max_arc_size=5GB (custom)
  18. 18. benchmarks
  19. 19. pgbench (TPC-B) ● transactional benchmark – small queries (access by PK, ...) ● modes – read-only – read-write ● scales – small (~200MB) – medium (~50% RAM) – large (~200% RAM)
  20. 20. TPC-DS ● warehouse, analytical – large amounts of data – queries processing a lot of data ● complex queries – aggregations – joins – CTEs – … ● successor to TPC-H – more elaborate / realistic
  21. 21. System ● PostgreSQL 9.4.1 ● Gentoo with kernel 3.17 ● CPU: Intel i5-2500k – 4 cores @ 3.3 GHz (3.7GHz) – 6MB cache – 2011-2013 ● 8GB RAM (DDR3 1333) ● SSD Intel S3500 100GB (SATA)
  22. 22. pgbench read-only
  23. 23. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 10000 20000 30000 40000 50000 60000 pgbench / small (150MB) / read-only transactions per second
  24. 24. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 10000 20000 30000 40000 50000 60000 pgbench / medium (50% RAM) / read-only transactions per second
  25. 25. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-lvm-snapshot ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 pgbench / large (200% RAM) / read-only transactions per second
  26. 26. pgbench read-write
  27. 27. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 1000 2000 3000 4000 5000 6000 7000 8000 pgbench / small (150MB) / read-write transactions per second
  28. 28. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 1000 2000 3000 4000 5000 6000 pgbench / medium (50% RAM) / read-write transactions per second
  29. 29. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-lvm-snapshot ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 pgbench / large (200% RAM) / read-write transactions per second
  30. 30. performance variability
  31. 31. EXT / XFS conclusions EXT4 ● good “default” choice ● disable barriers (with protected write cache) ● tune alignment to match the SSD ● very “smooth” results XFS ● does not outperform ext4 (in this test) ● not much worse, if properly tuned ● disable write barriers, tune alignment to SSD ● more anomalies than ext4 (sudden performance drops, ...)
  32. 32. BTRFS & ZFS
  33. 33. TPC-DS
  34. 34. mkfs / mount options ● ext4, xfs – mkfs.ext4 ­E stripe­width=256 /dev/sda1 – mkfs.xfs ­d su=512k,sw=1 ­l su=512k ­f /dev/sda1 – mount: defaults,noatime,discard,nobarrier ● btrfs – mkfs.btrfs ­l 8192 ­L pgdata /dev/sda1 – mount: defaults,noatime,ssd,discard,nobarrier  [compress=lzo] ● zfs – zpool create pgpool /dev/sda1 – zfs create pgpool/pgdata – zfs set recordsize=8k pgpool/pgdata – zfs set atime=off pgpool/pgdata
  35. 35. ext4 xfs btrfs btrfs (lzo) zfs zfs (lz4) 0 1000 2000 3000 4000 5000 6000 TPC-DS load duration on EXT4, XFS, BTRFS and ZFS data indexes duration[seconds]
  36. 36. ext4 xfs btrfs btrfs lzo zfs zfs (lz4) 0 100 200 300 400 500 600 700 TPC-DS query performance EXT4, XFS, BTRFS and ZFS duration[seconds]
  37. 37. ext4 xfs btrfs btrfs lzo zfs zfs (lz4) 0 10 20 30 40 50 60 70 TPC-DS space used on EXT4, XFS, BTRFS and ZFS size[GB]
  38. 38. TPC-DS summary ● EXT4, XFS, BTRFS – about the same performance ● compression is nice – uncompressed: 60GB – compressed: ~30GB ● mostly storage capacity, queries not faster ● ZFS much slower :-(

×