pgvector lets you add semantic search to a Postgres database without standing up a dedicated vector store. If you already run Postgres, this is the fastest path to embedding-based retrieval — no Pinecone account, no new service to monitor, no extra bill.

This tutorial walks from zero to a working semantic search query. By the end, you’ll have a documents table with a 1,536-dimension vector column, an HNSW index, and a Node.js function that returns the closest matches to a query string using OpenAI embeddings.

Who this is for

You know Postgres. You’ve heard about embeddings and RAG but haven’t wired them up. You want semantic search in an app that already uses Postgres — not a reason to evaluate a different database.

If you’re working with 100M+ vectors under high-QPS writes, pgvector will hit its ceiling. That’s a different problem than this tutorial solves.

Step 1: Install pgvector

The current stable release is v0.8.2 (February 25, 2026). Versions 0.8.0 and 0.8.1 have a confirmed buffer overflow in parallel HNSW index builds. Use v0.8.2. Postgres 13 or higher is required.

Self-hosted Postgres

# Ubuntu / Debian — replace 16 with your Postgres major version
sudo apt-get install postgresql-16-pgvector

CREATE EXTENSION IF NOT EXISTS vector;

Other installation paths: the pgvector/pgvector Docker image ships with the extension pre-loaded; Homebrew (brew install pgvector), PGXN, conda-forge, and Postgres.app all work.

Supabase

pgvector is pre-installed. Enable it from the Dashboard (Database → Extensions → vector) or with SQL:

CREATE EXTENSION vector WITH SCHEMA extensions;

One Supabase-specific gotcha: PostgREST, the REST layer Supabase uses, doesn’t support pgvector distance operators directly. Wrap similarity queries in Postgres functions and call them via rpc() from the client library. Production Supabase apps come with other sharp edges — see Supabase RLS pitfalls in production before exposing query results to untrusted clients.

Supabase has a generous free tier and is the lowest-friction hosted path if you’re not already on another provider.

Neon

CREATE EXTENSION IF NOT EXISTS vector;

Neon tracks the latest pgvector version and one prior. The serverless model (scale to zero) suits bursty or low-traffic workloads well. Neon-specific tip: the default maintenance_work_mem varies by compute size. Set it to 50–60% of available RAM before building a large HNSW index or the build will slow significantly.

Railway and Render

Both support pgvector on managed Postgres:

CREATE EXTENSION IF NOT EXISTS vector;

Railway provides a pgvector template that ships pre-enabled. Railway supports enabling pgvector directly — use the CREATE EXTENSION command above. Render enables it directly — no migration required.

Step 2: Create a table with a vector column

CREATE TABLE documents (
  id        BIGSERIAL PRIMARY KEY,
  content   TEXT NOT NULL,
  embedding VECTOR(1536)  -- matches text-embedding-3-small default output
);

VECTOR(n) stores a fixed-length array of float4 values. For HNSW indexing, the maximum dimension is 2,000 for vector and 4,000 for halfvec. OpenAI’s text-embedding-3-small outputs 1,536 dimensions by default — within the limit.

If you’re using text-embedding-3-large (3,072 dims by default), switch to HALFVEC(3072) or pass dimensions=1536 to the API to reduce the size.

Step 3: Generate embeddings

You need vectors before you can insert or query. This tutorial uses OpenAI’s text-embedding-3-small: 1,536 dimensions, $0.02 per million input tokens, up to 8,192 input tokens per request.

Node.js

import OpenAI from "openai";

const openai = new OpenAI(); // reads OPENAI_API_KEY from env

async function embed(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    // dimensions: 512  // optional: shorter embeddings trade some recall for storage savings
  });
  return response.data[0].embedding; // float array, length 1536
}

Python

from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from env

def embed(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
        # dimensions=512  # optional
    )
    return response.data[0].embedding  # list of 1536 floats

Failure mode: passing an empty string returns a zero vector. pgvector skips zero vectors when building a cosine index, so those rows won’t show up in similarity results. Filter empty strings before embedding.

Step 4: Insert vectors

With the embedding as a float array, insert it alongside the source text.

Node.js (with `pg`)

import { Pool } from "pg";

const pool = new Pool({ connectionString: process.env.DATABASE_URL });

async function insertDocument(content) {
  const vector = await embed(content);
  await pool.query(
    "INSERT INTO documents (content, embedding) VALUES ($1, $2)",
    [content, JSON.stringify(vector)]
  );
}

Python (with `psycopg2`)

import os
import psycopg2

conn = psycopg2.connect(os.environ["DATABASE_URL"])
cur = conn.cursor()

def insert_document(content: str):
    vector = embed(content)
    cur.execute(
        "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
        (content, vector)
    )
    conn.commit()

The pg Node driver accepts the vector as a JSON-stringified array. The psycopg2 Python driver accepts the raw Python list — pgvector’s adapter handles the cast.

Step 5: Query with cosine similarity

The <=> operator computes cosine distance — lower means more similar. The <-> operator computes L2 (Euclidean) distance.

For text embeddings, cosine is almost always the right choice. It measures angle, not magnitude, so vector length doesn’t bias results.

-- Return the 10 documents most similar to a query embedding
SELECT
  id,
  content,
  1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

1 - distance converts cosine distance to similarity (1 = identical, 0 = orthogonal). The ORDER BY uses raw distance — this is cosmetic.

Failure mode: wrapping the embedding column in a function or cast in the ORDER BY clause breaks index usage. The planner won’t hit the HNSW index if you do ORDER BY normalize(embedding) <=> $1. Keep it bare.

Step 6: Add an HNSW index for scale

Without an index, every query scans the full table. Query time grows linearly with row count — fine at a few thousand rows, a bottleneck in production. An HNSW index replaces the sequential scan with an approximate graph traversal, keeping lookups in the low milliseconds as the table grows.

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

Tuning parameters:

m (default 16): connections per node in the HNSW graph. Higher values mean better recall and more memory. Don’t exceed 32 unless recall is measurably below target.
ef_construction (default 64): search scope during the index build. Use 256–512 for high-accuracy use cases; returns diminish past ~128 with slower builds.
hnsw.ef_search (default 40, set per session): scope of the query-time search. Increase to 100+ when recall matters more than latency.

-- Adjust query-time recall vs. latency
SET hnsw.ef_search = 100;

HNSW takes longer to build than IVFFlat and uses more memory during construction. For data that’s written once and queried heavily, that’s a one-time cost. For heavy real-time writes, watch insert latency — the index updates on every row.

The HNSW index must fit in memory. If it’s evicted from shared_buffers by competing queries, latency spikes. Size the index to fit comfortably in memory before going to production.

NULL and zero vector handling: rows with a NULL embedding are skipped by the index. Rows with a zero vector are skipped for cosine distance. Use a partial index if you want to enforce coverage:

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64)
  WHERE embedding IS NOT NULL;

Step 7: Filtered queries with iterative scans

When you combine an ORDER BY distance clause with a WHERE filter, the HNSW index may not find enough candidates in its initial scan and will return fewer rows than your LIMIT. This is the most common production gotcha with pgvector.

Fix: enable iterative scans (introduced in v0.8.0):

SET hnsw.iterative_scan = relaxed_order;

With relaxed_order, pgvector scans further if the first pass doesn’t find enough qualifying rows. It trades strict distance ordering for result completeness. For most filtered semantic search use cases, completeness matters more than strict ordering.

Step 8: End-to-end example — semantic document search

This combines every step into a working Node.js module.

import OpenAI from "openai";
import { Pool } from "pg";

const openai = new OpenAI();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

async function embed(text) {
  const res = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return res.data[0].embedding;
}

async function addDocument(content) {
  if (!content.trim()) {
    throw new Error("Empty content produces a zero vector — skipped by cosine index");
  }
  const vector = await embed(content);
  await pool.query(
    "INSERT INTO documents (content, embedding) VALUES ($1, $2)",
    [content, JSON.stringify(vector)]
  );
}

async function search(query, limit = 5) {
  const vector = await embed(query);
  const { rows } = await pool.query(
    `SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
     FROM documents
     ORDER BY embedding <=> $1::vector
     LIMIT $2`,
    [JSON.stringify(vector), limit]
  );
  return rows;
}

// Example
await addDocument("pgvector adds vector similarity search to Postgres");
await addDocument("HNSW indexes trade memory for fast approximate nearest neighbor queries");

const results = await search("how does pgvector handle indexing?");
console.log(results);
// [
//   { id: 2, content: 'HNSW indexes trade memory...', similarity: 0.94 },
//   { id: 1, content: 'pgvector adds vector...', similarity: 0.81 }
// ]

Known limitations

Prisma: no native pgvector support as of late 2025. Use raw SQL or Drizzle ORM, which has a pgvector extension.
text-embedding-3-large (3,072 dims): exceeds the 2,000-dim HNSW limit for the vector type. Use HALFVEC(3072) or request dimensions=1536 from the API.
>10M vectors at high write QPS: dedicated vector databases (Pinecone, Qdrant, Weaviate) have purpose-built sharding. pgvector doesn’t. For most RAG workloads — hundreds of thousands to a few million documents — this boundary won’t matter.

References

pgvector README — operators, index types, dimension limits
pgvector CHANGELOG — v0.8.2 security fix details
Supabase: vector columns guide
Neon: pgvector guide
Railway: pgvector blog post
Render: supported extensions
OpenAI: text-embedding-3-small announcement
Instaclustr: HNSW vs IVFFlat benchmark