· pgvector / postgres / vector-search
pgvector index tuning: HNSW vs IVFFlat in production
The defaults under-deliver at scale. Covers which parameters to change, how to measure recall honestly, and when to move to a dedicated vector database.
By Ethan
1,816 words · 10 min read
If your vector queries are slow or your RAG pipeline is returning off-target results after your table grew past a few hundred thousand rows, the culprit is almost certainly the index — or the absence of tuning. pgvector ships with conservative defaults. They’re safe. They’re not fast.
The short version: use HNSW, set m = 32 and ef_construction = 200 when you build the index, and tune ef_search at query time against a recall target you measure yourself. If you’re on IVFFlat, set lists = rows / 1000 and probes = sqrt(lists) as your starting point. The rest of this article explains why those numbers, what to do if they’re wrong for your workload, and where pgvector runs out of road.
Who this is for
Backend engineers running a RAG pipeline or semantic-search feature on an existing Postgres stack. This assumes you’ve already decided to use pgvector. If you’re scaling to hundreds of millions of vectors, read the “When to move on” section first.
For teams managing pgvector across multiple tenants, multi-tenant Postgres patterns covers the isolation strategies that affect how you scope vector queries.
What pgvector is (and what changed in v0.5.0)
pgvector adds a vector column type and two index types — IVFFlat and HNSW — to standard PostgreSQL. It requires PostgreSQL 13 or later; PostgreSQL 12 support was dropped in v0.8.0, released October 30, 2024.
The version history matters here. HNSW was introduced in v0.5.0 on August 28, 2023. Before that, IVFFlat was the only index option. A lot of tutorials written before that date are still ranking well — they recommend IVFFlat because HNSW didn’t exist yet. Check the date before following any pgvector tuning advice.
HNSW vs IVFFlat
The tradeoffs come directly from the pgvector README:
“HNSW has better query performance than IVFFlat (in terms of speed-recall tradeoff), but has slower build times and uses more memory.”
“IVFFlat has faster build times and uses less memory than HNSW, but has lower query performance (in terms of speed-recall tradeoff).”
That’s the complete picture from the primary source. Any blog post claiming a precise multiplier — “6× faster,” “30× better throughput” — is extrapolating from benchmarks that don’t match your hardware, your embedding dimensions, or your query distribution. Adversarial verification found none of these multipliers held up against primary sources.
In practice:
Use HNSW when query latency is your SLA. HNSW maintains recall better as you scale up, because the graph structure handles high-dimensional spaces more gracefully than cluster-based indexing.
Use IVFFlat when you’re rebuilding the index frequently — new batch of embeddings, regular maintenance cycles — or when RAM is a hard constraint. IVFFlat builds faster and costs less memory.
One operational note: IVFFlat requires enough data to partition into meaningful clusters. As a rule of thumb, build the index only after the table has some data — sparse lists degrade recall. HNSW has no cold-start constraint.
HNSW parameters
Three knobs:
| Parameter | Default | What it controls | Production starting point |
|---|---|---|---|
m | 16 | Max connections per layer in the graph | 32 |
ef_construction | 64 | Candidate list size during index build | 200 |
ef_search | 40 | Candidate list size at query time | Tune per workload |
m and ef_construction are set at CREATE INDEX time and require a full rebuild to change. ef_search is a session-level GUC — change it per query with SET hnsw.ef_search = 100.
m = 32 over the default 16: more edges per node means the graph navigates to good candidates in fewer hops. SIGMOD 2026 benchmarks on 5–10M vector datasets (Brown University, Google, Université Paris Cité, and ETH Zurich) used m=32. Doubling m increases build time and index size, but recall improves at scale.
ef_construction = 200 over the default 64: a larger candidate pool during build produces a better-connected graph. Those same SIGMOD 2026 benchmarks used 200. Higher values cost more build time; they don’t affect query latency.
ef_search is the recall-latency dial at query time. Higher gives better recall at the cost of latency. The right value depends on your SLA. Run the benchmark in the next section to find your own crossover.
-- Build with production-grade parameters
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 200);
-- Tune ef_search at query time
SET hnsw.ef_search = 100;
IVFFlat parameters
Two knobs:
| Parameter | Default | Recommended starting point |
|---|---|---|
lists | — | rows / 1000 for ≤1M rows; sqrt(rows) for >1M rows |
probes | 1 | sqrt(lists) |
lists is the number of clusters the index partitions vectors into. More lists means finer-grained partitioning — each list covers fewer vectors, and probe costs scale proportionally. The formulas above are from the pgvector README.
probes is how many lists the query scans. The default is 1 — one cluster, which is very fast and has poor recall at scale. Setting probes = lists forces a full scan of all clusters; the query planner recognizes this as equivalent to a sequential scan and bypasses the index entirely. You get exact results, not approximate ones.
The sqrt(lists) starting point balances recall and latency. If recall@10 isn’t meeting your target, increment probes. Each increment costs roughly proportional query time.
-- Build index for a 500k-row table
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 500); -- rows / 1000 = 500k / 1000
-- Set probes for query session
SET ivfflat.probes = 23; -- sqrt(500) ≈ 22.4, round up
When exact scan wins
pgvector always supports exact nearest-neighbor search — it doesn’t use an index for it. This is correct behavior, not a bug.
Exact scan is the right choice in three scenarios:
No index exists: this is the default. If you SELECT ... ORDER BY embedding <-> query LIMIT 10 with no index, you get perfect recall. It scales poorly past a few hundred thousand rows, but for small datasets the overhead isn’t meaningful.
Small tables: for datasets where the index’s system overhead costs more than scanning the table directly. A 2026 SIGMOD preprint found that heap tuple ID resolution consumed 60–75% of CPU cycles in PostgreSQL vector workloads — buffer management, TID indirection, and page locking, not distance computation. For small row counts, exact scan skips all of that.
IVFFlat with probes = lists: setting probes equal to lists tells the planner to scan every cluster. It recognizes this as a sequential scan and bypasses the index. You get perfect recall without index overhead. Use this when you need 100% recall and build time matters more than query latency.
Benchmarking yourself
Every benchmark number from secondary sources — blog posts, conference talks, vendor comparisons — was adversarially verified for this article. None of the commonly cited multipliers held up against primary sources. The only benchmark that applies to your workload is one you run on your hardware, with your embedding dimensions, with your query distribution.
Here’s the methodology. ANN-Benchmarks uses exactly this framework — recall@10 vs queries per second — to compare pgvector against dedicated vector databases. Running it yourself gives you numbers you can actually act on.
-- 1. Create test table
CREATE TABLE bench_vectors (id bigserial PRIMARY KEY, embedding vector(1536));
-- 2. Insert synthetic data (replace with your real embeddings)
INSERT INTO bench_vectors (embedding)
SELECT array_fill(random(), ARRAY[1536])::vector
FROM generate_series(1, 100000);
-- 3. Build HNSW with production parameters
CREATE INDEX ON bench_vectors USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 200);
-- 4. Set ef_search
SET hnsw.ef_search = 100;
-- 5a. Ground truth: exact scan
SET enable_indexscan = off;
-- Run your query → record top-10 IDs as set A
-- 5b. ANN results: HNSW index
SET enable_indexscan = on;
-- Run same query → record top-10 IDs as set B
-- recall@10 = |A ∩ B| / 10
-- 6. Profile the index path
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, embedding <=> '[0.1, 0.2, ...]'::vector AS distance
FROM bench_vectors
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
IVFFlat equivalent:
CREATE INDEX ON bench_vectors USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
SET ivfflat.probes = 10;
Run this at several ef_search (or probes) values. Plot recall@10 against wall-clock latency. Find the crossover that meets your SLA. If you want to understand how pgvector sits relative to Qdrant, Weaviate, or Pinecone, ANN-Benchmarks has the recall@10 vs QPS plots across datasets and hardware configurations.
When to move on
pgvector runs out of road at some point. No verified row-count threshold survived adversarial verification — “millions of vectors” is directionally correct but the real limit depends on your workload shape, not raw size.
Signs it’s time to evaluate dedicated databases:
- Filtered queries (vector similarity combined with metadata filters) are consistently slower than your SLA
- You need tenant-level isolation with per-tenant index tuning
- Sub-millisecond latency is non-negotiable at your scale
- The ANN-Benchmarks recall@QPS curve for pgvector no longer clears your target
If you’re already on Postgres and want managed pgvector with minimal operational overhead, Supabase has shipped HNSW since pgvector v0.5.0 (September 2023) and handles infrastructure-level tuning. It’s the natural path before switching to a dedicated vector database. The Neon vs Supabase comparison covers both options in detail if you’re deciding between managed pgvector providers.
If your scale-up path requires the pgvectorscale extension — streaming inserts at high ingest rates, DiskANN-based indexing — Timescale Cloud ships it as a managed service.
Decision guide
Work through this in order:
- Table under 50k rows, no latency SLA? — No index. Exact scan. Done.
- Index built infrequently, query latency is the SLA? — HNSW, m=32, ef_construction=200. Tune ef_search to your recall target.
- Rebuilding the index often, or RAM-constrained? — IVFFlat. lists = rows/1000 (or sqrt(rows) above 1M). probes = sqrt(lists).
- Need 100% recall? — Exact scan: no index, or IVFFlat with probes=lists.
- Tuning maxed out and still missing SLA? — Run ANN-Benchmarks for your dataset and evaluate Qdrant or Weaviate.
Measure recall@10 before and after any parameter change. Intuition about what “should” be faster is frequently wrong in high-dimensional space.
Caveats
The SIGMOD 2026 paper (Brown University, Google, Université Paris Cité, and ETH Zurich) was published in PACMMOD vol. 4, no. 3, article 134, June 2026 (DOI 10.1145/3802011). This article cites the arXiv preprint (arXiv:2603.23710) as its source; the published paper is the canonical reference. The TID resolution data point is directionally supported but treat it as such.
No QPS multipliers appear in this article because none survived adversarial verification against primary sources.