pert5432

Materialized view column order performance

•

In this article we will explore how column order of a materialized view can have a big impact on performance of runing REFRESH MATERIALIZED VIEW CONCURRENTLY

All execution times are taken from my 2019 Macbook Pro with 6 core intel i7 and 16GB RAM running Postgres 17. I have also observed the same behaviour on Postgres instances running in the cloud.

My Postgres settings are defaults (installed via homebrew from petere/postgresql), except work_mem set to 512MB.

The data

Let’s create a simple table and fill it with some data:

CREATE TABLE numbers (
	a INT NOT NULL,
	b INT NOT NULL
);

INSERT INTO numbers
SELECT 
	num AS a,
	num % 100 as b
FROM generate_series(1, 1000000) AS num;

Table numbers now contains 1 000 000 rows, in each row column a contains a number from 1 to 1 000 000 and column b contains the same number modulo 100 (you can think about it as b containing the last 2 digits from a).

The important thing here is that that column a contains 1 000 000 unique values and column b contains 100 distinct values (0 - 99).

We will get to why this is important later.

The problem

Lets create 2 materialized views and a unique index on each of them (required for REFRESH MATERIALIZED VIEW CONCURRENTLY):

CREATE MATERIALIZED VIEW numbers_mv_ab AS (
	SELECT
		a, b
	FROM numbers
);
CREATE UNIQUE INDEX numbers_mv_ab_uniq ON numbers_mv_ab (a);


CREATE MATERIALIZED VIEW numbers_mv_ba AS (
	SELECT
		b, a
	FROM numbers
);
CREATE UNIQUE INDEX numbers_mv_ba_uniq ON numbers_mv_ba (a);

The views have basically identical data, the only difference is the column order, first view has (a, b) and second one has (b, a), creating both the views takes the same amount of time (about 1s).

Now we will refresh the views concurrently, note that no changes to the data in the numbers table were made so the data in the views will actually stay the same.
We would expect the refreshes to take the same amount of time, right? Wrong

REFRESH MATERIALIZED VIEW CONCURRENTLY numbers_mv_ab; -- Takes ~3s

REFRESH MATERIALIZED VIEW CONCURRENTLY numbers_mv_ba; -- Takes ~11s

So refreshing the view with column order (b,a) takes almost 4x longer than the view with (a, b)

The basic takeaway from this is that if you want to (possibly) speed up your materialized view refreshes then make sure to put columns with more unique values first in the query which you use to define your materialized view.

As a side note: I tested creating different indexes (different columns and/or column orders) on the views and didn’t see any correlation to the refresh speeds in this example

If you just want to speed up your views and don’t care about why this happens then you can probably stop reading here, but I encourage you to keep reading ;)

The Reason

First of all I should clarify that a materialized view is basically a table that you can’t directly edit (INSERT/UPDATE/DELETE), but internal Postgres code can still edit it as if it was a normal table.

So I dug into Postgres source code to see what actually happens when you run a concurrent refresh of a materialized view.
In short:

The query that defines the view is executed and the resulting data is loaded into a temp table, lets call this table tmp
ANALYZE tmp;
tmp is checked for duplicate rows, an error is raised if there are any duplicate rows and the refresh is cancelled
A temp table to store the difference between the new and old data is created, lets call it diff
diff is filled with differences of tmp and the old data in the materialized view; the contents of this table now say which rows should be inserted into / deleted from the materialized view
Rows are inserted into / deleted from the materialized view based on contents of diff

All of these steps are performed by executing actual SQL queries so we can reproduce each step by running the same SQL queries. So I did exactly that and figured out that the culprit of the slow refresh is in step number 3: checking tmp for duplicate rows.

So lets replicate the process for the view with column order (a,b) up to step 3:

-- Step 1:
CREATE TEMP TABLE tmp AS (
	SELECT
		a, b
	FROM numbers
);

-- Step 2:
ANALYZE tmp;

-- Step 3:
SELECT 
	newdata.*::tmp 
FROM tmp newdata
WHERE
	newdata.* IS NOT NULL
	AND EXISTS (
		SELECT
			1
		FROM tmp newdata2
		WHERE
			newdata2.* IS NOT NULL
			AND newdata2.* OPERATOR(pg_catalog.*=) newdata.*
			AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid
	);

The query performs a self-join on the table, joining all rows that have the same values in all columns, no nulls and have different ctids

In this example the query from step 3 takes ~1.7s to execute.

The process is almost identical for the view with column order (b, a), we just need to change the column order in the query in the first step:

-- Step 1:
CREATE TEMP TABLE tmp AS (
	SELECT
		b, a -- Different column order
	FROM numbers
);

-- Step 2:
ANALYZE tmp;

-- Step 3:
SELECT 
	newdata.*::tmp 
FROM tmp newdata
WHERE
	newdata.* IS NOT NULL
	AND EXISTS (
		SELECT
			1
		FROM tmp newdata2
		WHERE
			newdata2.* IS NOT NULL
			AND newdata2.* OPERATOR(pg_catalog.*=) newdata.*
			AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid
	);

If we execute this script (on a different connection so the table names don’t collide) the query in step 3 takes ~10s.

But why does this happen?
Lets look at EXPLAIN ANALYZE of the query for each view.

The query plans

Fast:

Merge Semi Join  (cost=227207.15..86859369.65 rows=497500 width=32) (actual time=2064.456..2064.457 rows=0 loops=1)
  Merge Cond: (newdata.* *= newdata2.*)
  Join Filter: (newdata2.ctid <> newdata.ctid)
  Rows Removed by Join Filter: 1000000
  ->  Sort  (cost=113603.58..116091.08 rows=995000 width=38) (actual time=594.277..652.592 rows=1000000 loops=1)
        Sort Key: newdata.* USING *<
        Sort Method: quicksort  Memory: 79264kB
        ->  Seq Scan on tmp newdata  (cost=0.00..14480.00 rows=995000 width=38) (actual time=0.055..292.684 rows=1000000 loops=1)
              Filter: (newdata.* IS NOT NULL)
  ->  Sort  (cost=113603.58..116091.08 rows=995000 width=38) (actual time=595.846..654.321 rows=1000000 loops=1)
        Sort Key: newdata2.* USING *<
        Sort Method: quicksort  Memory: 79264kB
        ->  Seq Scan on tmp newdata2  (cost=0.00..14480.00 rows=995000 width=38) (actual time=0.028..280.065 rows=1000000 loops=1)
              Filter: (newdata2.* IS NOT NULL)
Planning Time: 0.098 ms
Execution Time: 2078.938 ms

Slow:

Merge Semi Join  (cost=227207.15..86859369.65 rows=497500 width=32) (actual time=10330.515..10330.516 rows=0 loops=1)
  Merge Cond: (newdata.* *= newdata2.*)
  Join Filter: (newdata2.ctid <> newdata.ctid)
  Rows Removed by Join Filter: 1000000
  ->  Sort  (cost=113603.58..116091.08 rows=995000 width=38) (actual time=4495.189..4676.238 rows=1000000 loops=1)
        Sort Key: newdata.* USING *<
        Sort Method: quicksort  Memory: 79264kB
        ->  Seq Scan on tmp newdata  (cost=0.00..14480.00 rows=995000 width=38) (actual time=0.035..278.498 rows=1000000 loops=1)
              Filter: (newdata.* IS NOT NULL)
  ->  Sort  (cost=113603.58..116091.08 rows=995000 width=38) (actual time=4225.981..4402.503 rows=1000000 loops=1)
        Sort Key: newdata2.* USING *<
        Sort Method: quicksort  Memory: 79264kB
        ->  Seq Scan on tmp newdata2  (cost=0.00..14480.00 rows=995000 width=38) (actual time=0.028..272.560 rows=1000000 loops=1)
              Filter: (newdata2.* IS NOT NULL)
Planning Time: 0.164 ms
Execution Time: 10345.674 ms

The query plans and cost estimations are identical, but there is a huge time difference in the sort nodes. Sort Key: newdata.* USING *< means the rows will get sorted using values of all columns and it takes the columns in the order that they are defined in the table. In our case that is (a, b) for the fast query and (b, a) for the slow query.

So the fast query sorts the rows by a and then uses b as a tie-breaker incase there are mutliple columns with the same value in a. But the values in a are unique so the rows are already sorted properly by only comparing the values in a and sorting by b has no effect so Postgres doesn’t even compare the values in column b.

On the other hand the slow query sorts the rows by b which only has 100 distinct values (0 - 99) so there will be a lot of collisions where 2 rows have the same value in b and the sorting has to check the value in column a as well so it has to do more comparisons than the sort in the fast query.

We can also see that the Merge Semi Join node in the slow query takes longer, this is due to the same reason as with the sorts. The node compares rows by values in all columns, in the order the columns appear in the temp table. The fast query compares rows by a first which determines vast majority of rows shouldn’t be joined so b values only have to be compared for very little rows. The slow query on the other hand compares b values first which have a big overlap (there are 1 000 000 rows with 100 distinct b values so each value is present in about 10 000 rows) and the join has to compare both columns for a lot of rows, hence having to do more work.