I got curious whether there is a performance difference to using an enum
vs a text
column in Postgres so I decided to benchmark it.
An enum
value is always represented by 4 bytes so it makes sense there could be performance benefits to querying an enum
column since it could be faster to compare 4 bytes long values than longer text
values.
All results are from Macbook Pro 2024 M4 Max with 36GB memory. work_mem
is set to 512MB
and shared_buffers
to 2GB
, other settings are defaults.
I created one table with a text
column and another one with an enum
column. I filled both tables with 1 000 000 rows with 5 distinct values, each repeated 200 000 times.
-- Text
CREATE TABLE foo_text (
name TEXT
);
CREATE INDEX foo_text_name ON foo_text (name);
INSERT INTO foo_text (name)
SELECT
x.name
FROM GENERATE_SERIES(1, 1000000), (VALUES ('CAR'), ('HAMBURGER'), ('CAT'), ('COFFEE'), ('CARPET')) x (name);
-- Enum
CREATE TYPE foo_name AS ENUM ('CAR', 'HAMBURGER', 'CAT', 'COFFEE', 'CARPET');
CREATE TABLE foo_enum (
name foo_name
);
CREATE INDEX foo_enum_name ON foo_enum (name);
INSERT INTO foo_enum (name)
SELECT
x.name::foo_name
FROM GENERATE_SERIES(1, 1000000), (VALUES ('CAR'), ('HAMBURGER'), ('CAT'), ('COFFEE'), ('CARPET')) x (name);
-- Vacuum
VACUUM ANALYZE foo_text, foo_enum;
Then I ran pg_bench with this SQL script
SELECT * FROM foo_text WHERE name = 'CAR';
SELECT * FROM foo_text WHERE name = 'HAMBURGER';
SELECT * FROM foo_text WHERE name = 'CAT';
SELECT * FROM foo_text WHERE name = 'COFFEE';
SELECT * FROM foo_text WHERE name = 'CARPET';
The benchmark script for enum looks the same except it selects from foo_enum
instead of foo_text
I ran both the text and enum benchmarks 3 times and took the fastest time for each.
Results
Text:
pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-text.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 455.324 ms
initial connection time = 4.233 ms
tps = 21.962362 (without initial connection time)
Enum:
pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-enum.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 502.367 ms
initial connection time = 4.499 ms
tps = 19.905753 (without initial connection time)
This result surprised me, I expected enum
to be faster so I decided to do the same test again but with longer text
values / enum
identifiers.
Long identifiers
The long enum looks like this:
CREATE TYPE foo_name AS ENUM (
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC',
'DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD',
'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE'
);
The text
data looks the same. Same number and distribution of rows as before.
Results
text:
pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-long-text.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 1044.358 ms
initial connection time = 4.412 ms
tps = 9.575260 (without initial connection time)
enum:
pgbench (17.1 (Homebrew petere/postgresql))
transaction type: bench-long-enum.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 1016.029 ms
initial connection time = 4.304 ms
tps = 9.842239 (without initial connection time)
The enum
approach was faster by only a very small margin, I would say within error margin, so my verdict is that enum
does not offer any major speed improvements over text
.
Relation size
One more thing to look at is the overall size of both the tables.
pg_total_relation_size of the tables:
short text: 211 MB
short enum: 204 MB
long text: 477 MB
long enum: 204 MB
As you can see using an enum
will save a considerable amount of space compared to text
, especially if the identifiers are longer.
You can find all the results, setup and benchmark scripts on github: https://github.com/pert5432/pg_benchmarks/tree/main/enum-text