· rag / pgvector / drizzle-orm
How to build a RAG pipeline with pgvector and Drizzle ORM
Step-by-step tutorial: store embeddings in Neon Postgres, query with cosine similarity via Drizzle ORM, and wire up GPT-4o for answers — all in TypeScript.
By Ethan
2,064 words · 11 min read
By the end of this tutorial you’ll have a working RAG pipeline in TypeScript: documents stored with embeddings in Postgres, retrieved by cosine similarity, and fed to GPT-4o for answers. You don’t need a separate vector database — pgvector runs inside the Postgres you already have.
Who this is for
TypeScript developers who know Postgres and Drizzle ORM but haven’t done vector search before. If you’ve already shipped a RAG app with Pinecone and want to know whether pgvector is worth switching to, skip to the comparison section near the end.
What you’ll build
A DocumentStore class that can:
- Embed any text with OpenAI’s
text-embedding-3-small - Store the embedding in Postgres alongside the source text
- Return the
kmost similar documents for a query - Generate a grounded answer with GPT-4o using those documents as context
Everything runs against one Postgres database. No extra infrastructure.
Prerequisites
- Node.js 20+ or Bun 1.1+
- An OpenAI API key with access to
text-embedding-3-smallandgpt-4o - A Postgres database — Neon’s free tier fits the whole tutorial; 0.5 GB storage and serverless compute, no credit card required
- pgvector extension ≥ 0.8.2 (Neon has it pre-installed)
If this is your first time wiring Drizzle to Postgres, How to Set Up Drizzle ORM with Postgres and pgvector covers the initial project setup — this tutorial builds on that foundation.
Install dependencies:
npm install drizzle-orm@^0.31.0 postgres@^3.x openai@^4.x
npm install -D drizzle-kit@^0.22.0
Set your environment variables:
DATABASE_URL=postgresql://user:pass@host/dbname
OPENAI_API_KEY=sk-...
Step 1: Create the pgvector extension
This is the step that trips people up most. Drizzle does not automatically run CREATE EXTENSION IF NOT EXISTS vector; — if you skip this, the vector column type fails at migration time with a confusing error (type "vector" does not exist).
Create a custom SQL migration file before any Drizzle-generated schema:
-- migrations/0000_enable_pgvector.sql
CREATE EXTENSION IF NOT EXISTS vector;
If you’re using drizzle-kit migrate, name the file so it sorts before Drizzle’s output (numeric prefix 0000_ works). Run it once before applying the schema migration:
psql $DATABASE_URL -f migrations/0000_enable_pgvector.sql
Failure mode: If you’re on a managed provider that has pgvector pre-installed (like Neon), this still needs to run — the extension is available but not enabled in your database until you CREATE EXTENSION.
Step 2: Define the schema
Drizzle ORM ≥ 0.31.0 ships a native vector column type. No customType wrapper, no raw SQL — it’s first-class:
// src/db/schema.ts
import { pgTable, serial, text, index } from 'drizzle-orm/pg-core';
import { vector } from 'drizzle-orm/pg-core';
export const documents = pgTable(
'documents',
{
id: serial('id').primaryKey(),
content: text('content').notNull(),
metadata: text('metadata'),
embedding: vector('embedding', { dimensions: 1536 }).notNull(),
},
(table) => [
index('documents_embedding_idx')
.using('hnsw', table.embedding.op('vector_cosine_ops'))
.with({ m: 16, ef_construction: 64 }),
],
);
A few things worth explaining:
Why 1536 dimensions? text-embedding-3-small outputs 1536-dimensional vectors. The schema must match the model’s output exactly — a mismatch throws at insert time.
Why HNSW, not IVFFlat? HNSW (Hierarchical Navigable Small World) builds the index incrementally, so it’s fast to query even on a growing dataset. IVFFlat requires knowing the approximate dataset size upfront (for the lists parameter) and needs a full index rebuild to stay accurate as rows are added. For a tutorial and most production workloads under 10M vectors, HNSW is the right default.
vector_cosine_ops: tells pgvector to optimize the index for cosine distance queries. If you switch to Euclidean (vector_l2_ops) or inner product (vector_ip_ops), the operator class must match.
Generate and apply the migration:
npx drizzle-kit generate
npx drizzle-kit migrate
Failure mode: If drizzle-kit migrate exits with type "vector" does not exist, Step 1 didn’t run. Go back and enable the extension first.
Step 3: Generate embeddings
Use text-embedding-3-small, not text-embedding-ada-002. ada-002 is the legacy model; OpenAI continues to serve it but it’s been superseded by the 3-small/3-large family, which has better retrieval quality at lower cost.
// src/embeddings.ts
import OpenAI from 'openai';
const openai = new OpenAI();
export async function embed(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
That’s it — the function returns a plain number array, which is what Drizzle’s vector column expects.
Failure mode: text-embedding-3-small returns exactly 1536 floats. If you use a different model and its dimension count doesn’t match the column definition, Postgres will reject the insert with a dimension mismatch error. Verify with response.data[0].embedding.length === 1536 if something’s off.
Step 4: Store documents
// src/store.ts
import { drizzle } from 'drizzle-orm/postgres-js';
import postgres from 'postgres';
import { documents } from './db/schema';
import { embed } from './embeddings';
const client = postgres(process.env.DATABASE_URL!);
const db = drizzle(client);
export async function storeDocument(content: string, metadata?: string) {
const embedding = await embed(content);
await db.insert(documents).values({
content,
metadata,
embedding,
});
}
Nothing surprising. embed() returns a number[], and Drizzle serializes it to the Postgres vector wire format automatically.
Failure mode: Drizzle will serialize the embedding as a bracketed array string ([0.1, 0.2, ...]). If you see a serialization error, make sure drizzle-orm is ≥ 0.31.0 — earlier versions don’t know the vector type and will fall back to generic JSON encoding, which Postgres rejects.
Step 5: Search by cosine similarity
pgvector’s <=> operator computes cosine distance. Lower is more similar — 0 is identical, 2 is maximally dissimilar. Similarity = 1 − distance.
Drizzle ≥ 0.31.0 exposes a cosineDistance helper so you don’t write raw SQL:
// src/search.ts
import { cosineDistance, desc, sql } from 'drizzle-orm';
import { db } from './store';
import { documents } from './db/schema';
import { embed } from './embeddings';
export async function searchDocuments(query: string, topK = 5) {
const queryEmbedding = await embed(query);
const similarity = sql<number>`1 - (${cosineDistance(documents.embedding, queryEmbedding)})`;
const results = await db
.select({
id: documents.id,
content: documents.content,
metadata: documents.metadata,
similarity,
})
.from(documents)
.where(sql`${cosineDistance(documents.embedding, queryEmbedding)} < 0.3`)
.orderBy(desc(similarity))
.limit(topK);
return results;
}
The .where clause filters to cosine distance < 0.3, which means similarity > 0.7. Adjust the threshold to taste — lower thresholds return higher-confidence matches at the cost of recall.
Metadata filtering gotcha: If you filter on metadata AND limit to topK rows, the HNSW index is probed first (returning ef_search candidates), then the metadata filter is applied. If most candidates fail the filter, you can end up with fewer than topK results — pgvector doesn’t re-probe for more. Solution: set ef_search higher (see the tuning section), pre-filter in a subquery, or accept that result count varies.
Failure mode: The query runs but returns 0 rows even with a very large distance threshold. The most common cause is the HNSW index being built on an empty table — it requires at least one row to be queryable. Insert a document first, then query.
Step 6: Wire up GPT-4o
Take the top-k search results, concatenate their content as context, and ask GPT-4o to answer the user’s question:
// src/rag.ts
import OpenAI from 'openai';
import { searchDocuments } from './search';
const openai = new OpenAI();
export async function ask(question: string): Promise<string> {
const results = await searchDocuments(question, 5);
if (results.length === 0) {
return "I don't have enough context to answer that.";
}
const context = results
.map((r, i) => `[${i + 1}] ${r.content}`)
.join('\n\n');
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content:
'You are a helpful assistant. Answer the user\'s question using only the provided context. ' +
'If the context doesn\'t contain the answer, say so. ' +
'Cite your sources with [1], [2], etc.',
},
{
role: 'user',
content: `Context:\n\n${context}\n\nQuestion: ${question}`,
},
],
});
return response.choices[0].message.content ?? '';
}
The system prompt keeps GPT-4o grounded to the retrieved context. Without it, the model happily supplements gaps with training knowledge — useful for chat, wrong for RAG.
Failure mode: Long context passages can exceed GPT-4o’s context window on large datasets. If you’re storing multi-page documents as single rows, chunk them first (paragraph or fixed-token chunks) before embedding, then reassemble the relevant chunks at retrieval time.
Step 7 (optional): HNSW index tuning
The default m: 16, ef_construction: 64 values work for most datasets. If you care about recall versus speed, here’s what each parameter does:
m: number of neighbors each node links to in the graph. Higherm= better recall, more memory. Range 5–48; 16 is a good starting point.ef_construction: search width during index build. Higher = better index quality, slower build. 64 is sensible for ≤1M rows.ef_search(query-time, not schema-time): controls how many candidates the search explores. Default is 40. Increase it for higher recall on filtered queries:
SET hnsw.ef_search = 100;
Or per-session in a Drizzle transaction:
await db.transaction(async (tx) => {
await tx.execute(sql`SET LOCAL hnsw.ef_search = 100`);
return tx.select(...);
});
HNSW recall degrades with aggressive metadata filters — tune ef_search to recover it. See the gotcha in Step 5.
pgvector vs. dedicated vector databases
If you’re choosing a vector layer, here’s the honest tradeoff:
pgvector runs inside Postgres, so you have zero extra infrastructure, no synchronization lag between your relational tables and your vector index, and the full Postgres backup/observability story. Recall is tunable via ef_search — you trade query latency for accuracy. Under ~10M vectors the query performance is competitive with dedicated databases. Beyond that scale, or if your workload is purely vector search with no relational joins, a dedicated database may outperform it.
Pinecone is fully managed — no ops overhead, serverless billing, and fast time-to-first-query. The tradeoff is vendor lock-in, an opaque recall model (you can’t tune the internal index), and a second service to keep synchronized with your relational data.
Qdrant is purpose-built for high-recall, high-throughput vector search and supports both self-hosted and managed deployments. It shines on very large datasets and complex filtered queries. The cost is an extra service to run and monitor.
pgvector is the right starting point if you’re already on Postgres and your dataset fits within ~10M vectors. Migrate to a dedicated database only if you hit a concrete performance wall — not before.
For a side-by-side benchmark of all major options, Best Vector Database in 2026 covers Qdrant, Pinecone, Cloudflare Vectorize, and pgvector in detail.
Alternative: If you’re already on Supabase, pgvector is available there too. Supabase has a similar free tier and the same pgvector installation. The Drizzle code in this tutorial works identically — swap the
DATABASE_URLand you’re done.
Putting it together
A complete usage example:
// src/main.ts
import { storeDocument } from './store';
import { ask } from './rag';
async function main() {
// Index some documents
await storeDocument(
'Drizzle ORM 0.31.0 ships native vector column support with cosineDistance helper.',
JSON.stringify({ source: 'drizzle-changelog', date: '2024-05' }),
);
await storeDocument(
'pgvector 0.8.2 supports two index types: HNSW builds the index incrementally and suits growing datasets; IVFFlat requires knowing the approximate final size upfront for the lists parameter. The pgvector docs present them as alternatives with documented tradeoffs — HNSW offers better query performance while IVFFlat has a cheaper build cost.',
JSON.stringify({ source: 'pgvector-readme', date: '2024-04' }),
);
// Query
const answer = await ask('What index type should I use with pgvector?');
console.log(answer);
}
main();
Expected output (paraphrased):
pgvector 0.8.2 supports two index types: HNSW and IVFFlat [2].
HNSW builds incrementally and suits growing datasets; IVFFlat requires knowing the approximate final size upfront.
For most use cases, HNSW is a better default because it handles dataset growth without a rebuild.
Wrapping up
You now have a full RAG pipeline in under 150 lines of TypeScript: schema defined in Drizzle, migrations applied against Neon Postgres, embeddings via text-embedding-3-small, cosine search with cosineDistance, and GPT-4o generating grounded answers.
The three things that catch people are: enabling the pgvector extension before migration, using the right embedding model (3-small, not ada-002), and understanding that <=> is distance not similarity.
From here, the natural next steps are chunking longer documents before embedding, adding metadata columns for filtering, and wiring the ask() function into an API route. If you want to go further, How to build an AI agent in TypeScript shows how to wrap a retrieval pipeline like this into a full agentic loop with tool calling and memory.
Neon’s free tier covers everything in this tutorial. Upgrade only when you hit the storage limit — which, for an embedding workload, is roughly 300K 1536-dim vectors.