How to rate-limit Cloudflare Workers without Redis

If you need rate limiting on Cloudflare Workers, you don’t need Redis. You don’t need Upstash. Cloudflare ships three primitives that cover the full spectrum from “stop the script-kiddie bots” to “count every request for a billing system.” Pick the right one for your use case and you’re done in under an hour.

Who this is for

Workers developers who want rate limiting without wiring in an external cache layer. If you already have Redis in your stack and are happy with it, skip this — adding a Workers binding won’t save you much. If Redis is still a consideration elsewhere in your stack, see Redis vs Valkey 2026.

For a broader look at the Cloudflare Workers platform before committing, see Cloudflare Workers vs AWS Lambda and Cloudflare Workers vs Vercel Functions.

What we tested

All three approaches run on Wrangler 4.94.0 (current as of 2026-05-27). Wrangler ≥ 4.36.0 is required for the Rate Limiting API. The Workers runtime is Cloudflare’s standard V8 isolate. Free plan limits apply throughout — where Paid plan changes the numbers, that’s noted explicitly.

The Rate Limiting API

Start here. Available on all plans including Free. Requires zero extra infrastructure. Added latency: none — the counter lives on the same machine as your Worker.

The API hit general availability on September 19, 2025. It’s backed by the same memcached-based sliding window algorithm that powers Cloudflare’s WAF rate limiting — the same system handling billions of requests per day with a measured 0.003% error rate.

One [[ratelimits]] block in wrangler.toml, one method call in your handler:

# wrangler.toml
[[ratelimits]]
name = "MY_RATE_LIMITER"
namespace_id = "1001"          # any positive integer, unique per account

  [ratelimits.simple]
  limit = 100                  # requests allowed per period
  period = 60                  # 10 or 60 seconds — those are the only options

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const userId = request.headers.get("x-user-id") ?? "anonymous";

    const { success } = await env.MY_RATE_LIMITER.limit({ key: userId });

    if (!success) {
      return new Response("429 Rate limit exceeded", { status: 429 });
    }
    return new Response("OK");
  },
};

You can differentiate free and paid users with two bindings:

[[ratelimits]]
name = "FREE_RL"
namespace_id = "1001"
[ratelimits.simple]
limit = 100
period = 60

[[ratelimits]]
name = "PAID_RL"
namespace_id = "1002"
[ratelimits.simple]
limit = 1000
period = 60

const limiter = isPaidUser ? env.PAID_RL : env.FREE_RL;
const { success } = await limiter.limit({ key: userId });

Latency: The counter update is synchronous on the local machine, not a network call. Cloudflare describes it as adding no meaningful latency because the memcached lookup happens in the same PoP as the isolate, before the Worker even returns.

The catch: limits are per Cloudflare PoP, not global. A user whose requests split between two Cloudflare locations gets 100 requests per minute at each location — 200 total, not 100. This is fine for abuse prevention. It’s a hard no for billing. The docs say explicitly: it’s “intentionally designed to not be used as an accurate accounting system.”

Two other constraints worth knowing upfront:

Period windows are 10 seconds or 60 seconds. No 5-minute windows, no hourly caps.
IP-based keys are discouraged — mobile NAT and corporate proxies bundle many users behind one IP. Prefer user IDs or API keys as the key value.

Workers KV

For daily quotas and soft caps. Eventually consistent, globally replicated, cheap to read. The 60-second global propagation lag is its defining characteristic — not a bug, just the design.

The use case: you have a “500 AI calls per day” plan limit. Exact counting across concurrent requests doesn’t matter much because users hitting the daily cap are already deep into their quota. A few extra requests slipping through at midnight UTC is acceptable. A few extra requests slipping through a rate limit every second is not.

# wrangler.toml
[[kv_namespaces]]
binding = "KV"
id = "<your-kv-namespace-id>"

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const userId = request.headers.get("x-user-id") ?? "anonymous";
    // Daily key — one row per user per UTC day
    const key = `quota:${userId}:${new Date().toISOString().slice(0, 10)}`;

    const raw = await env.KV.get(key);
    const count = raw ? parseInt(raw, 10) : 0;
    const DAILY_LIMIT = 500;

    if (count >= DAILY_LIMIT) {
      return new Response("429 Daily quota exceeded", { status: 429 });
    }

    // Expires after 24 hours regardless of when the day started
    await env.KV.put(key, String(count + 1), { expirationTtl: 86400 });
    return new Response("OK");
  },
};

Latency: Hot-cached reads return quickly from the edge. Cold reads (first access or post-TTL) fetch from the central store — expect meaningfully higher latency than a cache hit. Writes acknowledge locally immediately; global propagation takes up to 60 seconds.

The hard limits:

1 write per second per key. This is a KV platform constraint. If a single user fires more than 1 req/s, KV returns HTTP 429 on the write — not your rate limit response, but a KV infrastructure error. Any user generating more than 1 req/s to the same key will start hitting this. KV is not appropriate for per-request rate limiting on active APIs.
Race condition by design. Two requests from different PoPs can both read count = 49, both pass the check, and both write count = 50 against a limit of 50. You’ll over-serve. For daily quotas, that’s acceptable. For anything where over-counting costs money (billing, API credits), it’s not.

Free plan: 100,000 reads/day, 1,000 writes/day. The write limit makes KV effectively unusable for any user with meaningful daily traffic on Free — 1,000 writes is 1,000 distinct users hitting the quota endpoint exactly once. Factor that in before choosing.

Durable Objects

For strict global counters. Strong consistency, globally accurate, higher code complexity. If the Rate Limiting API’s per-PoP eventual consistency is a problem and KV’s 1-write/s cap rules it out, this is the primitive.

Durable Objects are singleton processes with private, serialized access to SQLite storage. Route all requests for a given user to the same named object, and every increment is serialized through that object’s single thread. No race conditions. No over-counting. Every request everywhere counts exactly once.

# wrangler.toml
[[durable_objects.bindings]]
name = "RATE_LIMITER"
class_name = "RateLimiter"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["RateLimiter"]

import { DurableObject } from "cloudflare:workers";

export class RateLimiter extends DurableObject<Env> {
  async check(limit: number, windowSec: number): Promise<boolean> {
    const now = Date.now();
    const windowStart = now - windowSec * 1000;

    this.ctx.storage.sql.exec(`CREATE TABLE IF NOT EXISTS hits (ts INTEGER)`);
    this.ctx.storage.sql.exec(`DELETE FROM hits WHERE ts < ?`, windowStart);

    const { count } = this.ctx.storage.sql
      .exec(`SELECT COUNT(*) as count FROM hits`)
      .one() as { count: number };

    if (count >= limit) return false;

    this.ctx.storage.sql.exec(`INSERT INTO hits VALUES (?)`, now);
    return true;
  }
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const userId = request.headers.get("x-user-id") ?? "anonymous";
    // One DO per user — getByName routes globally to the same instance
    const stub = env.RATE_LIMITER.getByName(userId);
    const allowed = await stub.check(100, 60); // 100 req/min

    if (!allowed) {
      return new Response("429 Too Many Requests", { status: 429 });
    }
    return new Response("OK");
  },
};

Latency: Requests from the same Cloudflare PoP as the DO’s home region are fast after warmup. Requests from a different continent pay a cross-region network hop — expect meaningful latency overhead. That’s the trade-off — global consistency costs a network round-trip for users far from the DO’s home.

Cold starts are possible once the object evicts due to inactivity. For an API with consistent traffic this is rare. For bursty or intermittent traffic, plan for occasional cold-start spikes.

Do not use one global DO for all traffic. One object serializes one request at a time; at ~500–1,000 req/s it becomes the bottleneck. Name your objects per user: env.RATE_LIMITER.getByName(userId).

Do not keep counters in instance variables. Object instances evict from memory after a period of inactivity; in-memory state disappears. Always write to ctx.storage.sql — that’s what persists across hibernation cycles.

Plan considerations: Free plan allows 100,000 DO requests/day and 13,000 GB-seconds of compute. Production APIs will hit both caps. Paid plan ($5/month minimum) raises the request allowance to 1M/month and adds 400,000 GB-seconds — better fit for sustained traffic.

Decision matrix

	Rate Limiting API	Workers KV	Durable Objects
Consistency	Eventually consistent (per-PoP)	Eventually consistent (up to 60 s global)	Strong (globally exact)
Added latency	0 ms (same machine)	Fast when cached; slower cold read	Cross-region network hop
Max write frequency	No documented limit	1 write/s per key	~500–1,000 req/s per object
Free plan	✓	✓ (with caveats)	✓ (100K req/day)
Cold start risk	None	None	Yes, after idle period
Code complexity	Low	Medium	High
Suitable for billing	No	No	Yes
Window flexibility	10 s or 60 s only	Any (via TTL)	Any (via custom SQL)

Verdict per workload tier

Hobby / abuse prevention: Rate Limiting API. Two lines of config, zero latency cost, works on Free. The per-PoP accounting is fine — you’re stopping bots, not counting API credits.

Production API with soft quotas (daily or weekly caps): Workers KV, with the understanding that users firing more than 1 req/s will start seeing KV write errors. Pair a KV quota check with a Rate Limiting API binding to catch burst traffic before it hits the quota write.

Production API with strict per-user billing or credit deduction: Durable Objects. Accept the cross-region latency overhead; it’s the cost of global accuracy. Upgrade to the Paid plan ($5/month) before hitting the 100K req/day Free limit — you’ll hit it faster than you expect on any real API.

Enterprise / multi-region strict: Durable Objects on Paid plan with per-user named objects. If throughput per user exceeds ~500 req/s, shard by user ID prefix — though at that scale you probably have infra people who should be in the room.

Caveats

The Rate Limiting API’s per-PoP behavior is the most commonly misunderstood characteristic here. Every Cloudflare location maintains its own counter independently. A client that rotates through multiple PoPs (a CDN, a mobile carrier with anycast, a VPN) can exceed the nominal limit without the API detecting it. This is documented behavior, not a bug — size your limits accordingly.

Workers KV’s 60-second replication lag means that after you ban a user or they exhaust a quota, they can continue sending requests from a different PoP for up to a minute. For daily quotas, this is noise. For security-critical limits, it’s not acceptable.

Durable Objects cold starts are predictable but not eliminable. If your API has long idle periods between bursts, expect a cold-start penalty on the first request of each burst. There’s no way to keep a DO warm indefinitely on Free plan without generating fake traffic.

None of these approaches are substitutes for Cloudflare’s WAF-level rate limiting if you need protection before traffic hits your Worker. WAF rules fire at the edge before the isolate boots.

For a detailed breakdown of Cloudflare costs at scale — including Workers, R2, and D1 pricing vs AWS equivalents — see Cloudflare vs AWS: the complete cost breakdown.