· mongodb / postgresql / database

How to Migrate MongoDB to PostgreSQL Without Downtime

How to migrate MongoDB to PostgreSQL without downtime: dual-write strategy, TypeScript ETL, schema mapping, data validation, and 8 production gotchas.

By

1,923 words · 10 min read

Migrating from MongoDB to PostgreSQL isn’t the hard part. The hard part is doing it without a maintenance window while production traffic keeps flowing. This guide walks through a dual-write strategy that gives you a live rollback option at every step. You can run the ETL over a weekend and flip the cutover flag on a Tuesday morning without paging anyone.

Tested against MongoDB 8.0 and PostgreSQL 18.3.

Who this is for

Backend developers running MongoDB in production — on Atlas or self-hosted — who need to move to PostgreSQL. This guide is for codebases under ~50 GB. If you’re over that or need ongoing CDC sync, skip to the AWS DMS section at the end.

Tooling decision

Do not use pgloader. pgloader 3.6.9 supports MySQL, SQLite, and MS SQL Server — not MongoDB. Any tutorial that recommends it for a MongoDB migration is wrong.

The right tooling depends on scale:

ToolBest forCost
mongodump + custom ETL scriptIndie devs, small SaaS (<50 GB)Free
AWS DMSMid-size teams, >50 GB, complex types~$200/month EC2
Airbyte / HevoManaged CDC, ongoing syncPaid SaaS

This guide covers the free path in full. The DMS and Airbyte paths are covered at the end.

Schema mapping cheatsheet

Before writing a line of code, translate every MongoDB concept to a PostgreSQL equivalent.

MongoDB conceptPostgreSQL equivalent
CollectionTable
_id (ObjectId)mongo_id TEXT + id UUID DEFAULT gen_random_uuid()
Nested documentJSONB column (or normalize to a child table)
Array fieldJSONB column (or child table with FK)
ISODateTIMESTAMP WITH TIME ZONE
No schema enforcedAdd NOT NULL constraints after backfill is confirmed

The mongo_id column is your migration safety net. Keep it until you’ve verified every foreign key reference is updated. Drop it after the cutover is stable.

If you’re building a multi-tenant SaaS, read multi-tenant Postgres patterns in 2026 before committing to a schema design — the row-level security vs. schema-per-tenant decision is hard to undo once your data is in.

Step 1: Create the PostgreSQL schema

Map each collection to a table. For a typical users collection:

CREATE TABLE users (
  id         UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
  mongo_id   TEXT        UNIQUE NOT NULL,
  email      TEXT        NOT NULL,
  name       TEXT,
  metadata   JSONB,
  created_at TIMESTAMPTZ NOT NULL,
  updated_at TIMESTAMPTZ NOT NULL
);

-- Recreate Mongo indexes as concurrent builds so production reads don't stall
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);

Failure mode: Setting NOT NULL on fields before the backfill runs will reject rows that have missing data in MongoDB. Leave constraints off until Step 5 confirms the backfill is clean.

Step 2: Capture a MongoDB dump without downtime

For replica sets (self-hosted or Atlas M10+), use --oplog to capture writes that land during the dump:

mongodump \
  --uri="mongodb+srv://user:[email protected]/mydb" \
  --oplog \
  --readPreference=secondary \
  --gzip \
  --out=./dump

--readPreference=secondary keeps the dump off the primary. --oplog records any writes that happened during the dump window so you can replay them.

Failure mode: --oplog is blocked on Atlas M0, M2, and M5 (free/shared clusters). On those tiers, use mongoexport per collection instead:

mongoexport \
  --uri="mongodb+srv://user:[email protected]/mydb" \
  --collection=users \
  --out=./dump/users.json

You’ll lose the oplog capture window, so run the ETL during low-traffic hours and accept a small lag window.

Step 3: Write the ETL script

This TypeScript script reads from MongoDB directly and writes to PostgreSQL in batches. It’s idempotent — run it multiple times and it won’t duplicate rows.

Install the dependencies:

npm install mongodb pg
npm install --save-dev @types/pg tsx
// migrate.ts
import { MongoClient } from 'mongodb';
import { Pool } from 'pg';

const MONGO_URI = process.env.MONGO_URI!;
const PG_URI = process.env.PG_URI!;
const BATCH_SIZE = 500;

async function migrate() {
  const mongo = new MongoClient(MONGO_URI);
  const pg = new Pool({ connectionString: PG_URI });

  await mongo.connect();
  const db = mongo.db('mydb');
  const collection = db.collection('users');

  let skip = 0;
  let processed = 0;

  while (true) {
    const docs = await collection
      .find()
      .sort({ _id: 1 })
      .skip(skip)
      .limit(BATCH_SIZE)
      .toArray();

    if (docs.length === 0) break;

    const values = docs.map((doc) => ({
      mongo_id: doc._id.toString(),
      email: doc.email as string,
      name: (doc.name as string) ?? null,
      // JSON null vs SQL NULL — normalize explicitly
      metadata: doc.metadata ? JSON.stringify(doc.metadata) : null,
      // ISODate → new Date(milliseconds) for TIMESTAMP conversion
      created_at: doc.createdAt instanceof Date
        ? doc.createdAt
        : new Date(doc.createdAt),
      updated_at: doc.updatedAt instanceof Date
        ? doc.updatedAt
        : new Date(doc.updatedAt),
    }));

    await pg.query(
      `INSERT INTO users (mongo_id, email, name, metadata, created_at, updated_at)
       SELECT
         v.mongo_id, v.email, v.name,
         v.metadata::jsonb,
         v.created_at::timestamptz, v.updated_at::timestamptz
       FROM jsonb_to_recordset($1::jsonb) AS v(
         mongo_id text, email text, name text,
         metadata text, created_at text, updated_at text
       )
       ON CONFLICT (mongo_id) DO NOTHING`,
      [JSON.stringify(values)]
    );

    processed += docs.length;
    skip += BATCH_SIZE;
    console.log(`Migrated ${processed} rows`);
  }

  await mongo.close();
  await pg.end();
}

migrate().catch(console.error);

Run it:

MONGO_URI="mongodb+srv://..." PG_URI="postgresql://..." npx tsx migrate.ts

Failure mode: Numbers stored as strings in MongoDB will insert as TEXT and fail a NUMERIC column constraint. Add a type guard in the ETL:

// In the values map, before inserting:
const price = typeof doc.price === 'string'
  ? parseFloat(doc.price)
  : (doc.price as number) ?? null;

Failure mode: ON CONFLICT (mongo_id) DO NOTHING means re-runs are safe, but it also silently skips rows where the first insert failed for a different reason. Check row counts after each run (Step 5).

Step 4: Set up dual-write with feature flag

Don’t cut over in one shot. Run both databases in parallel and shift traffic incrementally.

// db-router.ts
type DbTarget = 'mongo' | 'both' | 'postgres';
const DB_TARGET: DbTarget = (process.env.DB_TARGET as DbTarget) ?? 'mongo';

export async function writeUser(data: UserData) {
  if (DB_TARGET === 'mongo' || DB_TARGET === 'both') {
    await mongoDb.collection('users').insertOne(toMongoDoc(data));
  }
  if (DB_TARGET === 'both' || DB_TARGET === 'postgres') {
    await pgPool.query(
      `INSERT INTO users (mongo_id, email, name, created_at, updated_at)
       VALUES ($1, $2, $3, $4, $5)
       ON CONFLICT (mongo_id) DO UPDATE SET email = EXCLUDED.email`,
      [data.mongoId, data.email, data.name, data.createdAt, data.updatedAt]
    );
  }
}

export async function readUser(id: string) {
  if (DB_TARGET === 'mongo') return mongoDb.collection('users').findOne({ _id: id });
  return pgPool.query('SELECT * FROM users WHERE mongo_id = $1', [id]).then(r => r.rows[0]);
}

The four phases:

PhaseDB_TARGETEffect
1 — BackfillmongoProduction reads/writes to Mongo; ETL copies history to PG
2 — Dual-writebothAll writes go to both; reads from Mongo
3 — PG readsboth (reads from PG)Writes to both; reads from PG
4 — CutoverpostgresMongo receives nothing

Rollback at any phase: set DB_TARGET back one step. No data loss — Mongo still has everything up to the cutover.

Failure mode: If your service runs multiple instances, set DB_TARGET in your environment config (Railway, Fly, Kubernetes env), not per-instance. An inconsistent flag across instances causes write splits that are a pain to reconcile.

Step 5: Validate data

Run these checks before Phase 4.

// validate.ts
import { MongoClient } from 'mongodb';
import { Pool } from 'pg';

async function validate() {
  const mongo = new MongoClient(process.env.MONGO_URI!);
  const pg = new Pool({ connectionString: process.env.PG_URI! });

  await mongo.connect();

  const mongoCount = await mongo.db('mydb').collection('users').countDocuments();
  const pgResult = await pg.query('SELECT COUNT(*) FROM users');
  const pgCount = parseInt(pgResult.rows[0].count, 10);

  console.log(`MongoDB: ${mongoCount} rows`);
  console.log(`PostgreSQL: ${pgCount} rows`);

  if (mongoCount !== pgCount) {
    console.error(`Row count mismatch: ${mongoCount - pgCount} rows missing`);
  }

  // Spot-check 20 random IDs
  const samples = await mongo
    .db('mydb')
    .collection('users')
    .aggregate([{ $sample: { size: 20 } }])
    .toArray();

  for (const doc of samples) {
    const pgRow = await pg.query('SELECT * FROM users WHERE mongo_id = $1', [
      doc._id.toString(),
    ]);
    if (pgRow.rows.length === 0) {
      console.error(`Missing in PG: ${doc._id.toString()}`);
    }
  }

  // NULL check for required fields
  const nullEmails = await pg.query(
    'SELECT COUNT(*) FROM users WHERE email IS NULL'
  );
  console.log(`Rows with NULL email: ${nullEmails.rows[0].count}`);

  await mongo.close();
  await pg.end();
}

validate().catch(console.error);

If row counts match and the spot-check passes, you’re ready for Phase 4.

Failure mode: SELECT DISTINCT queries can be significantly slower on PostgreSQL than on MongoDB because Mongo avoids table scans with its document model. Benchmark your most-common read patterns before the cutover. If a query regresses, add a partial index or rewrite it.

Step 6: Cut over traffic

# Deploy with DB_TARGET=postgres
DB_TARGET=postgres railway up   # or fly deploy, kubectl set env, etc.

Watch error rates for 30 minutes. If anything looks wrong, roll back with:

DB_TARGET=mongo railway up

Once you’re confident, stop the dual-write writes to Mongo by removing the both path, and schedule Mongo decommissioning for two weeks out. Don’t drop the database on cutover day.

With the migration complete, the next step is usually an ORM for type-safe queries. The Drizzle ORM with Postgres setup guide covers schema declarations, migrations, and pgvector if you’re planning to add vector search.

8 production gotchas (from Voucherify’s migration)

These are real failure modes, sourced from Voucherify’s public post-mortem on their production MongoDB→PostgreSQL migration.

  1. ISODate conversion: MongoDB’s ISODate stores milliseconds since epoch. Use new Date(milliseconds) explicitly in your ETL — don’t assume the driver serializes it correctly for every field.

  2. JSON null vs SQL NULL: MongoDB documents can have a field explicitly set to null. That’s different from a missing field. Normalize with doc.field ?? null to avoid inserting the string "null" into a JSONB column.

  3. Numbers stored as strings: MongoDB’s schemaless nature allows { price: "9.99" } and { price: 9.99 } in the same collection. Add a type guard in the ETL before inserting into a NUMERIC column.

  4. Don’t set NOT NULL before backfill is confirmed: A constraint applied before the ETL finishes will reject partial rows. Add constraints in a separate migration after Step 5 passes.

  5. SELECT DISTINCT regression: Postgres can regress on DISTINCT compared to Mongo’s indexed reads. Benchmark before cutover, not after.

  6. --oplog blocked on Atlas M0/M2/M5: Free and shared clusters don’t expose the oplog. Use mongoexport per collection and accept a lag window.

  7. JSONB update bloat on high-churn nested docs: If a JSONB column is updated frequently, Postgres’s MVCC creates row versions that bloat the table. Extract hot keys to typed columns and reserve JSONB for cold or append-only data.

  8. Index recreation: Every Mongo index needs a CREATE INDEX CONCURRENTLY equivalent in Postgres. Don’t skip this — a missing index on a large table will surface as a production query regression, not an error.

For larger migrations: AWS DMS

If your dataset is over 50 GB or your schema has complex types (polymorphic documents, deeply nested arrays used in queries), AWS DMS is the practical choice. It handles ongoing CDC replication, so you can run Mongo and Postgres in parallel for weeks and do a near-zero-lag cutover.

Cost is roughly $200/month for an r5.large replication instance. Run it for the migration period, then terminate it.

Where to host the destination PostgreSQL: Neon is worth looking at for serverless deployments — branching and point-in-time restore simplify the validation step significantly. The Neon vs Supabase comparison covers the key differences if you’re deciding between the two.

References