welcomemili.blogg.se - Openzfs postgres tuning recommendations

OPENZFS POSTGRES TUNING RECOMMENDATIONS HOW TO
OPENZFS POSTGRES TUNING RECOMMENDATIONS FULL

If you re-run the test on a dataset with recordsize=4K there is no RMW and no amplification-you ask fio to write 4K of data to the middle of the testfile, it writes 4K of data, done. When fio wants to write 4K of new data to the middle of a testfile, it first has to read in 1M of data, then modify 4K out of that data, then write all 1M back out again. Let’s say you ask fio to do a bs=4K run when recordsize=1M. If you set recordsize=1M as you originally did, but then test with eg blocksize=4K, you get an amplification factor of 256x… and, even worse, you get a read-modify-write cycle when re-writing blocks. For any workload which does random I/O inside larger files, recordsize is exactly the increment which the random I/O happens in.

Recordsize is only “an upper bound” in the sense that small files get small records. He’s laid it out pretty nicely so I’ll copy his comment here: One thing I misunderstood about this (and thankfully was corrected by /u/mercenary_sysadmin on Reddit), is that while the default recordsize of 128k can be considered an “upper bound”, random IO in large files is done in increments of recordsize. While the default recordsize of ZFS is 128k, a recordsize of 8k exactly matches the sizing that Postgres’s page size – this is great for postgres since it cuts down on unnecessary writes (especially to solid state disks), and it gets even better with compression made available by ZFS. If you’re still here, what we’re going to be talking about is running Postgres on (Open)ZFS-on-Linux I’m going to call the effort “PoZoL”. They’ve almost certainly got lots of hard-won secret sauce tunables over years of experience so obviously you’re going to want to pay them first if you need problems diagnose or sage advice ASAP. I’ve been doing a lot of tinkering with my storage setup (deciding on mdraid +/- LVM vs ZFS for direct disk access, Ceph vs Longhorn for distributed storage, etc) since I’m gearing up to offer Postgres as a Service on NimbusWS, but I thought I should share some of the information I’ve managed to gather.Īs you might imagine, this post is not exhaustive – there are great companies out there that can take your Postgres installation to the next level:

OPENZFS POSTGRES TUNING RECOMMENDATIONS HOW TO

This post is a shameless ripoff/reformulation of everything I’ve seen so far across conference presentations, the internet, reddit, forum posts and other places about how to run Postgres on ZFS reliably and with good performance. Tl dr - Tips for running Postgres on (Open)ZFS-on-Linux (ZoL), which I’m nicknaming “PoZoL”, check out the TOC (or skip to the resource list at the bottom) and browse to whatever looks interesting. That means the capabilities of the hardware have also been extended to higher write workloads as IO bandwidth is not being consumed by WAL traffic.After lots of feedback on Reddit (thanks /u/BucketOfSpinningRust!) and doing some more experimenting and digging, I've updated this post with more information - new/updated sections are marked "Update". However, ZFS just ensured our storage a much lower write load by altering one single parameter. To be fair, a 32-thread pgbench write test is extremely abusive and certainly not a typical usage scenario.

OPENZFS POSTGRES TUNING RECOMMENDATIONS FULL

Due to write amplification caused by full page writes, Postgres produced 1.2GB of WAL files during a 1-minute pgbench test, but only 160MB with full page writes disabled.

Tps = 10336.807218 (excluding connections establishing) Tps = 10325.200812 (including connections establishing) $> pgbench -j 32 -c 32 -M prepared -T 10 pgbench The results are some fairly startling performance gains: This means we can actually turn off full_page_writes in the Postgres config. As a result, it’s not possible to have a torn page because a page can’t be partially written without reverting to the previous copy.

As a result, this is one parameter almost everyone should leave alone. The WAL is both streaming replication and crash recovery, so its integrity is of utmost importance. Tomas Vondra has written about it in the past as a necessity to prevent WAL corruption due to partial writes. Consider the Postgres full_page_writes parameter.