Postgres UPDATE and Autovacuum

2024-08-12

A common misconception of developers who haven't learned much about SQL databases, Postgres specifically, is that updating a row would modifies and overwrite the data on disk directly. It doesn't.

An UPDATE is just a INSERT + DELETE. Postgres will create a completely new version of the row with the modified data, and DELETE the old row. Which also doesn't actually remove the data right away, but only marks the row as "to be deleted".

The reason for is is because Postgres uses a non-locking concurrency method called Multi-version concurrency control. When a row is being updated, other queries can still read the old version of the row. This allow the user to run SELECT queries even when another UPDATE or DELETE query are being executed on the same row, and avoid the issue of seeing a half-written, inconsistent piece of data.

When are these old rows removed?

It's obvious that we do not want to keep them forever. Eventually there need to be a process to actually delete these deleted data. In Postgres it's called VACUUM, and the process to routinely run it is called autovacuum, which is enabled by default. And it's absolutely crucial that you keep it on.

Two main jobs of a VACUUM operation is:

Do I need to modify autovacuum settings?

By default, autovacuum should works decently. But there can be some problems which may require your attention:

The configuration of autovacuum can be seen from the settings:

SELECT *
FROM pg_settings
WHERE name ~ 'autovacuum';

We are interested in these 2 settings which control when a table needs VACUUM : autovacuum_vacuum_scale_factor and autovacuum_vacuum_threshold. By default, these are set to 0.2 and 50 respectively, meaning that VACUUM will be called when at least 20% of the table size belongs to dead tuples, and there are more than 50 of them.

This works but when you have a large table with

So if you are having some issues with autovacuum, I would probably start looking at these settings.