· mongodb / postgresql / database
How to Migrate MongoDB to PostgreSQL Without Downtime
How to migrate MongoDB to PostgreSQL without downtime: dual-write strategy, TypeScript ETL, schema mapping, data validation, and 8 production gotchas.
By Ethan
1,923 words · 10 min read
Migrating from MongoDB to PostgreSQL isn’t the hard part. The hard part is doing it without a maintenance window while production traffic keeps flowing. This guide walks through a dual-write strategy that gives you a live rollback option at every step. You can run the ETL over a weekend and flip the cutover flag on a Tuesday morning without paging anyone.
Tested against MongoDB 8.0 and PostgreSQL 18.3.
Who this is for
Backend developers running MongoDB in production — on Atlas or self-hosted — who need to move to PostgreSQL. This guide is for codebases under ~50 GB. If you’re over that or need ongoing CDC sync, skip to the AWS DMS section at the end.
Tooling decision
Do not use pgloader. pgloader 3.6.9 supports MySQL, SQLite, and MS SQL Server — not MongoDB. Any tutorial that recommends it for a MongoDB migration is wrong.
The right tooling depends on scale:
| Tool | Best for | Cost |
|---|---|---|
mongodump + custom ETL script | Indie devs, small SaaS (<50 GB) | Free |
| AWS DMS | Mid-size teams, >50 GB, complex types | ~$200/month EC2 |
| Airbyte / Hevo | Managed CDC, ongoing sync | Paid SaaS |
This guide covers the free path in full. The DMS and Airbyte paths are covered at the end.
Schema mapping cheatsheet
Before writing a line of code, translate every MongoDB concept to a PostgreSQL equivalent.
| MongoDB concept | PostgreSQL equivalent |
|---|---|
| Collection | Table |
_id (ObjectId) | mongo_id TEXT + id UUID DEFAULT gen_random_uuid() |
| Nested document | JSONB column (or normalize to a child table) |
| Array field | JSONB column (or child table with FK) |
ISODate | TIMESTAMP WITH TIME ZONE |
| No schema enforced | Add NOT NULL constraints after backfill is confirmed |
The mongo_id column is your migration safety net. Keep it until you’ve verified every foreign key reference is updated. Drop it after the cutover is stable.
If you’re building a multi-tenant SaaS, read multi-tenant Postgres patterns in 2026 before committing to a schema design — the row-level security vs. schema-per-tenant decision is hard to undo once your data is in.
Step 1: Create the PostgreSQL schema
Map each collection to a table. For a typical users collection:
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
mongo_id TEXT UNIQUE NOT NULL,
email TEXT NOT NULL,
name TEXT,
metadata JSONB,
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ NOT NULL
);
-- Recreate Mongo indexes as concurrent builds so production reads don't stall
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
Failure mode: Setting NOT NULL on fields before the backfill runs will reject rows that have missing data in MongoDB. Leave constraints off until Step 5 confirms the backfill is clean.
Step 2: Capture a MongoDB dump without downtime
For replica sets (self-hosted or Atlas M10+), use --oplog to capture writes that land during the dump:
mongodump \
--uri="mongodb+srv://user:[email protected]/mydb" \
--oplog \
--readPreference=secondary \
--gzip \
--out=./dump
--readPreference=secondary keeps the dump off the primary. --oplog records any writes that happened during the dump window so you can replay them.
Failure mode: --oplog is blocked on Atlas M0, M2, and M5 (free/shared clusters). On those tiers, use mongoexport per collection instead:
mongoexport \
--uri="mongodb+srv://user:[email protected]/mydb" \
--collection=users \
--out=./dump/users.json
You’ll lose the oplog capture window, so run the ETL during low-traffic hours and accept a small lag window.
Step 3: Write the ETL script
This TypeScript script reads from MongoDB directly and writes to PostgreSQL in batches. It’s idempotent — run it multiple times and it won’t duplicate rows.
Install the dependencies:
npm install mongodb pg
npm install --save-dev @types/pg tsx
// migrate.ts
import { MongoClient } from 'mongodb';
import { Pool } from 'pg';
const MONGO_URI = process.env.MONGO_URI!;
const PG_URI = process.env.PG_URI!;
const BATCH_SIZE = 500;
async function migrate() {
const mongo = new MongoClient(MONGO_URI);
const pg = new Pool({ connectionString: PG_URI });
await mongo.connect();
const db = mongo.db('mydb');
const collection = db.collection('users');
let skip = 0;
let processed = 0;
while (true) {
const docs = await collection
.find()
.sort({ _id: 1 })
.skip(skip)
.limit(BATCH_SIZE)
.toArray();
if (docs.length === 0) break;
const values = docs.map((doc) => ({
mongo_id: doc._id.toString(),
email: doc.email as string,
name: (doc.name as string) ?? null,
// JSON null vs SQL NULL — normalize explicitly
metadata: doc.metadata ? JSON.stringify(doc.metadata) : null,
// ISODate → new Date(milliseconds) for TIMESTAMP conversion
created_at: doc.createdAt instanceof Date
? doc.createdAt
: new Date(doc.createdAt),
updated_at: doc.updatedAt instanceof Date
? doc.updatedAt
: new Date(doc.updatedAt),
}));
await pg.query(
`INSERT INTO users (mongo_id, email, name, metadata, created_at, updated_at)
SELECT
v.mongo_id, v.email, v.name,
v.metadata::jsonb,
v.created_at::timestamptz, v.updated_at::timestamptz
FROM jsonb_to_recordset($1::jsonb) AS v(
mongo_id text, email text, name text,
metadata text, created_at text, updated_at text
)
ON CONFLICT (mongo_id) DO NOTHING`,
[JSON.stringify(values)]
);
processed += docs.length;
skip += BATCH_SIZE;
console.log(`Migrated ${processed} rows`);
}
await mongo.close();
await pg.end();
}
migrate().catch(console.error);
Run it:
MONGO_URI="mongodb+srv://..." PG_URI="postgresql://..." npx tsx migrate.ts
Failure mode: Numbers stored as strings in MongoDB will insert as TEXT and fail a NUMERIC column constraint. Add a type guard in the ETL:
// In the values map, before inserting:
const price = typeof doc.price === 'string'
? parseFloat(doc.price)
: (doc.price as number) ?? null;
Failure mode: ON CONFLICT (mongo_id) DO NOTHING means re-runs are safe, but it also silently skips rows where the first insert failed for a different reason. Check row counts after each run (Step 5).
Step 4: Set up dual-write with feature flag
Don’t cut over in one shot. Run both databases in parallel and shift traffic incrementally.
// db-router.ts
type DbTarget = 'mongo' | 'both' | 'postgres';
const DB_TARGET: DbTarget = (process.env.DB_TARGET as DbTarget) ?? 'mongo';
export async function writeUser(data: UserData) {
if (DB_TARGET === 'mongo' || DB_TARGET === 'both') {
await mongoDb.collection('users').insertOne(toMongoDoc(data));
}
if (DB_TARGET === 'both' || DB_TARGET === 'postgres') {
await pgPool.query(
`INSERT INTO users (mongo_id, email, name, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (mongo_id) DO UPDATE SET email = EXCLUDED.email`,
[data.mongoId, data.email, data.name, data.createdAt, data.updatedAt]
);
}
}
export async function readUser(id: string) {
if (DB_TARGET === 'mongo') return mongoDb.collection('users').findOne({ _id: id });
return pgPool.query('SELECT * FROM users WHERE mongo_id = $1', [id]).then(r => r.rows[0]);
}
The four phases:
| Phase | DB_TARGET | Effect |
|---|---|---|
| 1 — Backfill | mongo | Production reads/writes to Mongo; ETL copies history to PG |
| 2 — Dual-write | both | All writes go to both; reads from Mongo |
| 3 — PG reads | both (reads from PG) | Writes to both; reads from PG |
| 4 — Cutover | postgres | Mongo receives nothing |
Rollback at any phase: set DB_TARGET back one step. No data loss — Mongo still has everything up to the cutover.
Failure mode: If your service runs multiple instances, set DB_TARGET in your environment config (Railway, Fly, Kubernetes env), not per-instance. An inconsistent flag across instances causes write splits that are a pain to reconcile.
Step 5: Validate data
Run these checks before Phase 4.
// validate.ts
import { MongoClient } from 'mongodb';
import { Pool } from 'pg';
async function validate() {
const mongo = new MongoClient(process.env.MONGO_URI!);
const pg = new Pool({ connectionString: process.env.PG_URI! });
await mongo.connect();
const mongoCount = await mongo.db('mydb').collection('users').countDocuments();
const pgResult = await pg.query('SELECT COUNT(*) FROM users');
const pgCount = parseInt(pgResult.rows[0].count, 10);
console.log(`MongoDB: ${mongoCount} rows`);
console.log(`PostgreSQL: ${pgCount} rows`);
if (mongoCount !== pgCount) {
console.error(`Row count mismatch: ${mongoCount - pgCount} rows missing`);
}
// Spot-check 20 random IDs
const samples = await mongo
.db('mydb')
.collection('users')
.aggregate([{ $sample: { size: 20 } }])
.toArray();
for (const doc of samples) {
const pgRow = await pg.query('SELECT * FROM users WHERE mongo_id = $1', [
doc._id.toString(),
]);
if (pgRow.rows.length === 0) {
console.error(`Missing in PG: ${doc._id.toString()}`);
}
}
// NULL check for required fields
const nullEmails = await pg.query(
'SELECT COUNT(*) FROM users WHERE email IS NULL'
);
console.log(`Rows with NULL email: ${nullEmails.rows[0].count}`);
await mongo.close();
await pg.end();
}
validate().catch(console.error);
If row counts match and the spot-check passes, you’re ready for Phase 4.
Failure mode: SELECT DISTINCT queries can be significantly slower on PostgreSQL than on MongoDB because Mongo avoids table scans with its document model. Benchmark your most-common read patterns before the cutover. If a query regresses, add a partial index or rewrite it.
Step 6: Cut over traffic
# Deploy with DB_TARGET=postgres
DB_TARGET=postgres railway up # or fly deploy, kubectl set env, etc.
Watch error rates for 30 minutes. If anything looks wrong, roll back with:
DB_TARGET=mongo railway up
Once you’re confident, stop the dual-write writes to Mongo by removing the both path, and schedule Mongo decommissioning for two weeks out. Don’t drop the database on cutover day.
With the migration complete, the next step is usually an ORM for type-safe queries. The Drizzle ORM with Postgres setup guide covers schema declarations, migrations, and pgvector if you’re planning to add vector search.
8 production gotchas (from Voucherify’s migration)
These are real failure modes, sourced from Voucherify’s public post-mortem on their production MongoDB→PostgreSQL migration.
-
ISODateconversion: MongoDB’sISODatestores milliseconds since epoch. Usenew Date(milliseconds)explicitly in your ETL — don’t assume the driver serializes it correctly for every field. -
JSON null vs SQL NULL: MongoDB documents can have a field explicitly set to
null. That’s different from a missing field. Normalize withdoc.field ?? nullto avoid inserting the string"null"into a JSONB column. -
Numbers stored as strings: MongoDB’s schemaless nature allows
{ price: "9.99" }and{ price: 9.99 }in the same collection. Add a type guard in the ETL before inserting into aNUMERICcolumn. -
Don’t set
NOT NULLbefore backfill is confirmed: A constraint applied before the ETL finishes will reject partial rows. Add constraints in a separate migration after Step 5 passes. -
SELECT DISTINCTregression: Postgres can regress onDISTINCTcompared to Mongo’s indexed reads. Benchmark before cutover, not after. -
--oplogblocked on Atlas M0/M2/M5: Free and shared clusters don’t expose the oplog. Usemongoexportper collection and accept a lag window. -
JSONB update bloat on high-churn nested docs: If a JSONB column is updated frequently, Postgres’s MVCC creates row versions that bloat the table. Extract hot keys to typed columns and reserve JSONB for cold or append-only data.
-
Index recreation: Every Mongo index needs a
CREATE INDEX CONCURRENTLYequivalent in Postgres. Don’t skip this — a missing index on a large table will surface as a production query regression, not an error.
For larger migrations: AWS DMS
If your dataset is over 50 GB or your schema has complex types (polymorphic documents, deeply nested arrays used in queries), AWS DMS is the practical choice. It handles ongoing CDC replication, so you can run Mongo and Postgres in parallel for weeks and do a near-zero-lag cutover.
Cost is roughly $200/month for an r5.large replication instance. Run it for the migration period, then terminate it.
Where to host the destination PostgreSQL: Neon is worth looking at for serverless deployments — branching and point-in-time restore simplify the validation step significantly. The Neon vs Supabase comparison covers the key differences if you’re deciding between the two.