· api / image-generation / flux

Best AI image API in 2026: pricing, rate limits, SDK

fal.ai FLUX.1 [schnell] at $0.003/image for cost, FLUX.2 [pro] for production quality, Stable Diffusion open weights for self-hosting. Developer breakdown.

By Ethan · Updated May 20, 2026

1,606 words · 9 min read

Cost: FLUX.1 [schnell] via fal.ai or Replicate at $0.003 per image, sub-second latency. Production quality: FLUX.2 [pro] via Black Forest Labs direct or fal.ai at $0.030 per image, 2× faster after the March 2026 update. Self-hosting: FLUX.1 [dev] (Apache 2.0) or SDXL open weights — free at scale if you manage your own GPUs. Midjourney is Enterprise-only as of May 2026; skip it unless you have $120+/month and manual approval time to spare.

Who this is for

Developers shipping image generation as a feature. You need to pick an API, understand the pricing cliff when you hit 10,000 images a month, and know which providers will return a 429 at exactly the wrong moment. This is not a “which one makes prettier pictures” article.

What we tested

Pricing, rate limits, SDK availability, and latency data collected from primary sources on 2026-05-20:

  • FLUX.1 [schnell] via fal.ai (fal-ai/flux/schnell) and Replicate
  • FLUX.2 [pro] via Black Forest Labs direct (api.bfl.ai) and fal.ai
  • FLUX.2 [klein] 4B via BFL direct
  • FLUX1.1 [pro] via BFL direct and Replicate
  • Stable Image Core, SD3.5 Large, Stable Image Ultra via Stability AI
  • Midjourney API — documented from official docs only; no live access (Enterprise-gated)

Latency figures come from third-party benchmark trackers (artificialanalysis.ai) and provider documentation, not a controlled test rig. A live test script against each API would produce cleaner numbers.

Findings

Pricing and credit model

This is where the decisions actually happen. All prices at 1024×1024 (1 megapixel) unless noted.

ModelProviderPrice per image
FLUX.1 [schnell]fal.ai$0.003
FLUX.1 [schnell]Replicate$0.003
FLUX.2 [klein] 4BBFL direct$0.014
FLUX.1 [dev]fal.ai$0.025
FLUX.1 [dev]Replicate$0.025
FLUX.2 [pro]BFL direct / fal.ai$0.030
Stable Image CoreStability AI$0.030
FLUX1.1 [pro]BFL direct / Replicate$0.040
SD3.5 LargeStability AI$0.065
Stable Image UltraStability AI$0.080
Midjourney APIEnterprise onlyNot disclosed

The spread is significant. FLUX.1 [schnell] at $0.003 is 10× cheaper than FLUX.2 [pro] and 26× cheaper than Stable Image Ultra. At 10,000 images a month that gap is $30 vs. $300 vs. $800 — before any volume discount.

Stability AI credit system: 1 credit = $0.01. Pay-as-you-go at $10 per 1,000 credits. A $20/month membership includes 6,000 credits, which covers roughly 2,000 Stable Image Core generations. Beyond that, you pay overages at the same per-credit rate.

BFL and fal.ai pricing: BFL uses megapixel-based pricing for FLUX.2 models — the first megapixel at the listed rate, additional megapixels billed proportionally. fal.ai matches BFL pricing on FLUX.2 [pro] exactly. Where fal.ai beats BFL directly: FLUX.1 [schnell] access. BFL does not sell schnell through the API — it is Apache 2.0 open-source, self-hosted only. The commercial equivalent via BFL is FLUX.2 [klein] ($0.014), not schnell.

Replicate: per-output pricing for the major Flux models, no GPU-second billing. Clean pricing page, no surprises at the invoice.

Volume projection (Stable Image Core, $0.030):

Volume/monthMonthly cost
100 images$3
1,000 images$30
10,000 images$300

At schnell pricing ($0.003): $0.30 / $3 / $30 for the same volumes. For how these per-image costs fit into a full AI product’s budget, see The real cost of running an AI agent team in 2026.

Latency

ModelProviderTypical latency
FLUX.1 [schnell]Together.ai Turbo~315 ms
FLUX.1 [schnell]fal.ai~0.8 s
Stable Image CoreStability AI2–4 s
FLUX.1 [dev]fal.ai / Replicate~3.5–4.5 s
FLUX.2 [pro]BFL / fal.ai~5–6 s
Stable Image UltraStability AI6–12 s
SD3.5 LargeStability AI6–12 s

Source: provider documentation and artificialanalysis.ai multi-provider tracker. Latency is single-region and will vary.

The headline number is schnell: sub-second at fal.ai, 315 ms on Together.ai’s Turbo endpoint. If your app generates images inline with a user interaction, that gap between 0.8 s and 6 s is the difference between feeling instant and feeling like a loading spinner.

FLUX.2 [pro] became roughly 2× faster in March 2026 with no price increase. For quality-sensitive use cases it’s now the most competitive API for the $0.030 price point.

Output quality

Output quality comparison is not in scope for this article — see the Caveats.

Developer experience

Black Forest Labs (api.bfl.ai)

No official Python or JavaScript SDK. All code examples in the docs are raw HTTP — Python requests or curl. The integration pattern is async: POST a generation request, get back a polling_url, poll until status is Ready. Image URLs expire after 10 minutes; you must proxy or re-serve from your own infrastructure, you cannot hand the URL directly to a client.

Rate limit: 24 concurrent requests for standard endpoints, 6 concurrent for Kontext [max]. Hard ceiling — over-limit returns HTTP 429. Exponential backoff is the documented retry strategy.

Multi-region: api.bfl.ai (global, automatic failover), api.eu.bfl.ai (GDPR routing), api.us.bfl.ai. The regional option matters if you are handling EU user data and need to keep inference on that side of the Atlantic.

MCP server at mcp.bfl.ai for Claude Desktop / Claude Code / Cursor integration — useful if you are using one of the AI coding CLI tools that support MCP.

fal.ai

Official SDKs: fal-client (Python, PyPI, updated 2026-04-28) and @fal-ai/client (npm, 1.10.1, updated 2026-05-04). TypeScript types included. SDK quality is the strongest of any provider here.

Access to 1,000+ models through a single API integration. If you plan to use more than Flux, fal.ai reduces the integration surface dramatically.

Stated 99.9% uptime SLA. Specific rate limits are not publicly documented — contact sales for production commitments.

Pricing parity with BFL direct on FLUX.2 [pro]. The main reason to use fal.ai over BFL for quality-tier generation: the SDK and the unified catalog. The main reason to use BFL direct: you get new model releases first and you have explicit regional routing guarantees.

Replicate

Good official Python and JavaScript clients. Per-model pricing pages show run counts and version history — useful for assessing stability before you build on a model. Rate limit transparency is lower than BFL; not documented publicly.

Per-output pricing on Flux models (not GPU-second) is a significant advantage: you know exactly what a call costs before you make it.

Stability AI

No official v2beta REST SDK. The official Python SDK (stability-sdk) targets the legacy gRPC API (last release May 2024). For v2beta, the official docs show raw requests examples. Community Node.js and TypeScript wrappers exist but are unofficially maintained.

Rate limit: 150 requests per 10 seconds per API key. Exceeding it returns HTTP 429 with a 60-second timeout. Up to 10 API keys can be used simultaneously — documented workaround for burst workloads.

No formal SLA document found on primary sources.

AI image API verdict

Use casePick
Cost < $0.005/image, latency < 1 s, volume > 5,000/monthFLUX.1 [schnell] via fal.ai or Replicate
Production quality at $0.030, latest Flux models, GDPR EU routingFLUX.2 [pro] via BFL direct
Production quality + best SDK + 1,000+ model catalogFLUX.2 [pro] via fal.ai
Managed Stable Diffusion, no GPU infrastructureStable Image Core via Stability AI ($0.030)
Stable Diffusion ecosystem, $0.065–$0.080 price tierSD3.5 Large or Stable Image Ultra via Stability AI
Volume > 50,000/month, full model control, zero per-image cost toleranceFLUX.1 [dev] (Apache 2.0) or SDXL self-hosted on your GPU infra
Midjourney-quality outputNot accessible without Enterprise plan ($120+/mo, manual approval)

Two picks for most teams starting out:

  1. Start with fal.ai + schnell for prototyping and cost-sensitive workloads. $0.003/image, SDK from day one, switch to FLUX.2 [pro] on the same provider when quality requirements go up.
  2. Go BFL direct for production quality if you want the newest Flux models first and need explicit EU data routing. Accept the polling architecture and the lack of an official SDK.

Caveats

Output quality: Side-by-side prompt testing across all APIs is not included. This article covers pricing, rate limits, latency, and SDK quality only.

Midjourney API access: Based on public documentation only. Enterprise pricing not confirmed beyond the $120/month floor. Access gating may change — verify current status before including in production planning.

fal.ai rate limits: Not documented publicly. Planned high-concurrency use requires a sales call.

Pricing volatility: All prices accessed 2026-05-20. Stability AI raised pricing on non-core services in August 2025; BFL pricing has been stable but FLUX.2 models are relatively new. Pin your vendor contract or monitor pricing pages before locking in a volume estimate.

Benchmarks are single-region: Latency numbers from artificialanalysis.ai and provider docs reflect specific infrastructure snapshots. Your region, time of day, and concurrent load all affect real performance.

No affiliate relationships: No provider in this comparison has an affiliate or referral program with toolchew. No affiliate links are included.

References