Database vacuuming is a way to increase the table and database performance of a PostgreSQL database. Learn how to use the vacuum process to clean the database.


Introduction

VACUUM is a garbage collection mechanism in PostgreSQL. It reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are modified by an update/delete are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on tables. 

For more information, see https://www.postgresql.org/docs/current/sql-vacuum.html

Vacuuming basics

Vacuuming achieves the following:

  • Cleans out dead rows (VACUUM)

  • Keeps database functional (FREEZE)

  • Updates info about relations (ANALYZE)

  • Automates the vacuum for maintenance on a regular basis (autovacuum)

From a broad perspective, the vacuuming can be done in two ways:·      

  • Plain VACUUM (without FULL) simply reclaims space and makes it available for reuse

  • VACUUM FULL can reclaim space and give it to the operating system.

Key vacuum operations

  • VACUUM

  • VACUUM FULL (or CLUSTER)

  • VACUUM FREEZE

  • VACUUM ANALYZE


  1. Plain VACUUM: Frees up space for reuse.

    Connect to your database, here for example, "corpus" and run the below command:

    vacuum [tablename];
    CODE
  2. Full VACUUM: Locks the database table and puts the exclusive lock on the table but reclaims more space than a plain VACUUM. 

    Connect to your database, here for example, "corpus" and run the below command. It runs on all the tables of the "corpus" database.

    vacuum full;
    CODE

    If you want to run vacuum full on a particular table, use the below command:

    vacuum full [tablename];
    
    CODE
  3. Full VACUUM and ANALYZE:  Performs a Full VACUUM and gathers new statistics on query executions paths using ANALYZE:

    vacuum full analyze [tablename];
    CODE
  4. Verbose Full VACUUM and ANALYZE:  Same as #3 but with verbose progress output:

    vacuum full analyze verbose [tablename];
    
    CODE

    ANALYZE

    ANALYZE gathers statistics for query planners to create the most efficient query execution paths. According to the PostgreSQL documentation, accurate statistics help planners to choose the most appropriate query plan, and thereby improve the speed of query processing.

    Example

    In the example below, [tablename] is optional. Without a table specified, ANALYZE is run on available tables in the current schema that the user has access to.

    analyze verbose [tablename];
    
    CODE
  5. VACUUM FREEZE:  Freezes the transaction IDs for all pages no matter whether they have been modified, so that all current rows are seen as old for all new transactions. VACUUM FREEZE marks a table's contents with a specific transaction time stamp that tells Postgres that it does not need to be vacuumed, ever. With the next update, this frozen ID disappears.

    To run VACUUM FREEZE on a database, use the below command after connecting to the database:

    vacuum freeze verbose
    CODE

    To run VACUUM Full on a particular table, use the below command after connecting to the database:

    vacuum freeze [tablename];
    CODE

Automate routine vacuum maintenance

Autovacuum is one of the background utility processes that starts automatically when you start PostgreSQL. Autovacuum in Postgres refers to both automatic VACUUM and ANALYZE and not just VACUUM:

  • We need VACUUM to remove dead tuples so that the space occupied by dead tuples can be reused by the table for future inserts/updates. 

  • We need ANALYZE on the table that updates the table statistics so that the optimizer can choose optimal execution plans for an SQL statement.

It is the autovacuum in Postgres that is responsible for performing both vacuum and analyze on tables.

Prerequisites

To start autovacuum, you must have the parameter autovacuum set to ON in the postgresql.conf file. By default, it is on.

Parameters

autovacuum = on  # ( ON by default )

track_counts = on # ( ON by default )

When autovacuum runs

An automatic vacuum or analyze runs on a table depending on the following mathematic equations.

The formula for calculating the effective table level autovacuum threshold is:

Autovacuum VACUUM threshold for a table = autovacuum_vacuum_scale_factor * number of tuples + autovacuum_vacuum_threshold

With the equation above, it is clear that if the actual number of dead tuples in a table exceeds this effective threshold, due to updates and deletes, that table becomes a candidate for automatic VACUUM. 

The below equation says that any table with a total number of inserts/deletes/updates exceeding this threshold—since last analyze—is eligible for automatic ANALYZE. 

Autovacuum ANALYZE threshold for a table = autovacuum_analyze_scale_factor * number of tuples + autovacuum_analyze_threshold

Details

Let’s understand these parameters in detail.

  • autovacuum_vacuum_scale_factor OR autovacuum_analyze_scale_factor:  Fraction of the table records that will be added to the formula. For example, a value of 0.2 equals 20% of the table records.

  • autovacuum_vacuum_threshold OR autovacuum_analyze_threshold:  Minimum number of obsolete records or DML needed to trigger an autovacuum.

Example

Let’s consider a table "abc" with 1000 records and the following autovacuum parameters in the postgresql.conf file by default:

1

autovacuum_vacuum_scale_factor = 0.2

2

autovacuum_vacuum_threshold = 50

3

autovacuum_analyze_scale_factor = 0.1

4

autovacuum_analyze_threshold = 50


Using the above mentioned mathematical formulas as reference:

  • Table "abc" is a candidate for autovacuum VACUUM when:

    Total number of Obsolete records = (0.2 * 1000) + 50 = 250

  • Table "abc" is a candidate for autovacuum ANALYZE when:

    Total number of Inserts/Deletes/Updates = (0.1 * 1000) + 50 = 150

Useful commands

check autovacuum status

By default, autovacuuming should already be turned on, but let’s double-check with below command:

SELECT name, setting FROM pg_settings WHERE name='autovacuum';
CODE

check track_counts status

You can check if the statistics collector is enabled by consulting the "Runtime Statistics" section of your postgresql.conf configuration file to see if track_counts is on, or by running the following query:

SELECT name, setting FROM pg_settings WHERE name='track_counts';
CODE

check settings related to autovacuum

You can view the current and default settings of the autovacuum parameters by the pg_settings:

select name, setting, unit, category, short_desc from pg_settings where category like 'Autovacuum';

check for disk space use

The following query shows you the table that is using the most disk space in your database:

SELECT
       relname AS "table_name",
       pg_size_pretty(pg_table_size(C.oid)) AS "table_size"
FROM
       pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema') 
AND nspname !~ '^pg_toast' AND relkind IN ('r')
ORDER BY pg_table_size(C.oid)
DESC LIMIT 1;
CODE

check for last (auto)vacuum run

The built-in view pg_stat_user_tables enables you to find out the last time a vacuuming or autovacuuming process successfully ran on each of your tables, connect your "corpus" database and run this command:

SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
CODE

check for disk space reclaim after full vacuum

First, connect your PostgreSQL database with superuser and create the extension "pgstattuple". After this, run the select command with your <tablename>:

\c corpus postgres;
CREATE EXTENSION pgstattuple;
SELECT (dead_tuple_len + approx_free_space)/1073741824.0 
AS reclaimable FROM pgstattuple_approx('<tablename>');
CODE

tune autovacuum for tables

To tune autovacuum for tables individually, you must know the number of inserts/deletes/updates on a table for an interval. You can also view the Postgres catalog view to get that information: pg_stat_user_tables

SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", 
n_live_tup as "live_tuples", n_dead_tup as "dead_tuples" 
FROM pg_stat_user_tables;
CODE

So autovacuum is triggered when below formula meets:

n_dead_tup > n_live_top * 0.2

check for running vaccum tasks

Use this view to get a quick overview of running vacuum tasks:

SELECT * FROM pg_stat_progress_vacuum;
CODE

The final query has a bit more fields but, at the same time, it’s more informative than the default content of pg_stat_progress_vacuum. Run it to connect with your <database> by psql.

SELECT
p.pid,
now() - a.xact_start AS duration,
coalesce(wait_event_type ||'.'|| wait_event, 'f') AS waiting,
CASE
WHEN a.query ~*'^autovacuum.*to prevent wraparound' THEN 'wraparound'
WHEN a.query ~*'^vacuum' THEN 'user'
ELSE 'regular'
END AS mode,
p.datname AS database,
p.relid::regclass AS table,
p.phase,
pg_size_pretty(p.heap_blks_total * 
current_setting('block_size')::int) AS table_size,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(p.heap_blks_scanned *
current_setting('block_size')::int) AS scanned,
pg_size_pretty(p.heap_blks_vacuumed * 
current_setting('block_size')::int) AS vacuumed,
round(100.0 * p.heap_blks_scanned / p.heap_blks_total, 1) AS scanned_pct,
round(100.0 * p.heap_blks_vacuumed / p.heap_blks_total, 1) AS vacuumed_pct,
p.index_vacuum_count,
round(100.0 * p.num_dead_tuples / p.max_dead_tuples,1) AS dead_pct
FROM pg_stat_progress_vacuum p
JOIN pg_stat_activity a using (pid)
ORDER BY now() - a.xact_start DESC;
CODE