Benchmark: Is it worth to use enum instead of text in Postgres?

performance

postgres performance

I got curious whether there is a performance difference to using an enum vs a text column in Postgres so I decided to benchmark it.

An enum value is always represented by 4 bytes so it makes sense there could be performance benefits to querying an enum column since it could be faster to compare 4 bytes long values than longer text values.

All results are from Macbook Pro 2024 M4 Max with 36GB memory. work_mem is set to 512MB and shared_buffers to 2GB, other settings are defaults.

I created one table with a text column and another one with an enum column. I filled both tables with 1 000 000 rows with 5 distinct values, each repeated 200 000 times.

-- Text
CREATE TABLE foo_text (
	name TEXT
);

CREATE INDEX foo_text_name ON foo_text (name);

INSERT INTO foo_text (name)
SELECT
	x.name
FROM GENERATE_SERIES(1, 1000000), (VALUES ('CAR'), ('HAMBURGER'), ('CAT'), ('COFFEE'), ('CARPET')) x (name);


-- Enum
CREATE TYPE foo_name AS ENUM ('CAR', 'HAMBURGER', 'CAT', 'COFFEE', 'CARPET');

CREATE TABLE foo_enum (
	name foo_name
);

CREATE INDEX foo_enum_name ON foo_enum (name);

INSERT INTO foo_enum (name)
SELECT
	x.name::foo_name
FROM GENERATE_SERIES(1, 1000000), (VALUES ('CAR'), ('HAMBURGER'), ('CAT'), ('COFFEE'), ('CARPET')) x (name);


-- Vacuum
VACUUM ANALYZE foo_text, foo_enum;


Then I ran pg_bench with this SQL script

SELECT * FROM foo_text WHERE name = 'CAR';
SELECT * FROM foo_text WHERE name = 'HAMBURGER';
SELECT * FROM foo_text WHERE name = 'CAT';
SELECT * FROM foo_text WHERE name = 'COFFEE';
SELECT * FROM foo_text WHERE name = 'CARPET';


The benchmark script for enum looks the same except it selects from foo_enum instead of foo_text

I ran both the text and enum benchmarks 3 times and took the fastest time for each.

Results

Text:

pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-text.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 455.324 ms
initial connection time = 4.233 ms
tps = 21.962362 (without initial connection time)

Enum:

pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-enum.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 502.367 ms
initial connection time = 4.499 ms
tps = 19.905753 (without initial connection time)


This result surprised me, I expected enum to be faster so I decided to do the same test again but with longer text values / enum identifiers.

Long identifiers

The long enum looks like this:

CREATE TYPE foo_name AS ENUM (
  'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 
  'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB', 
  'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC', 
  'DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD', 
  'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE'
);

The text data looks the same. Same number and distribution of rows as before.

Results

text:

pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-long-text.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 1044.358 ms
initial connection time = 4.412 ms
tps = 9.575260 (without initial connection time)

enum:

pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-long-enum.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 1016.029 ms
initial connection time = 4.304 ms
tps = 9.842239 (without initial connection time)


The enum approach was faster by only a very small margin, I would say within error margin, so my verdict is that enum does not offer any major speed improvements over text.

Relation size

One more thing to look at is the overall size of both the tables.

pg_total_relation_size of the tables:

short text: 211 MB
short enum: 204 MB

long text: 477 MB
long enum: 204 MB

As you can see using an enum will save a considerable amount of space compared to text, especially if the identifiers are longer.

You can find all the results, setup and benchmark scripts on github: https://github.com/pert5432/pg_benchmarks/tree/main/enum-text