· llm-tools / openrouter / litellm

Best LLM Router in 2026: OpenRouter, LiteLLM, and Portkey

LiteLLM is the best self-hosted router at 8ms P95 and zero per-request cost. OpenRouter wins for instant multi-model access. Here is which one fits your stack.

By

1,925 words · 10 min read

LiteLLM is the best LLM router for teams who own their infrastructure — 8ms P95 at 1,000 RPS, zero per-request cost, and 100+ provider integrations in a single self-hosted proxy. If you’d rather not maintain a server, OpenRouter gets you 300+ models through one API key with nothing to deploy. Portkey fills the gap when compliance certifications are non-negotiable.

Who this is for

Backend engineers and AI teams choosing a routing layer in mid-2026. If you’re calling a single model from a single app, you don’t need a router. These tools pay off when you want to spread load across providers, govern team spending, or steer different query types to different models — the same logic Cursor applies internally when it routes your code edits between models depending on task complexity.

What we evaluated

Five tools: OpenRouter, LiteLLM (v1.85.0), Portkey, Not Diamond, and Martian. We reviewed each tool’s public documentation, pricing pages, GitHub release history, and community discussions on Hacker News, G2, and Gartner Peer Insights as of May 2026. Latency figures come from vendor documentation and named community benchmarks.

Quick-pick table

ToolBest forOpen source?Self-host?
OpenRouterWidest model catalog, zero setupNoNo
LiteLLMFull infra control, cost governanceYes (47.7k ⭐)Yes
PortkeyEnterprise compliance + observabilityPartial (gateway OSS)Yes / Managed
Not DiamondPer-query model selection on top of your existing stackNoNo
MartianEnterprise high-volume intelligent routing with tracingNoNo

OpenRouter

openrouter.ai — SaaS, nothing to deploy

OpenRouter aggregates 300+ models from OpenAI, Anthropic, Google, Meta, Mistral, Groq, and dozens of other providers behind a single OpenAI-compatible endpoint. One API key, one billing relationship, immediate access to everything on the list.

The free tier gives you 50 requests/day on free-tier models. Put $10 in credits and that jumps to 1,000 requests/day. OpenRouter passes through inference prices without markup — you pay the same as going directly to each provider. The platform adds a 5.5% fee on credit card purchases (5% for crypto).

Standout feature: Auto Router and Pareto Router (coding-optimized). Instead of hardcoding a model name, you point at a router slug and let OpenRouter pick based on task type and cost target. Automatic fallbacks kick in when a provider goes down. Model-level features — semantic caching, web search, PDF/audio/video multimodal support — are available without extra configuration.

Latency overhead: 50–70ms per request. That is a real cost in latency-sensitive paths and stacks with other middleware.

Gotchas: OpenRouter periodically renames model IDs. If you pin a specific model name in code, expect invalid model ID errors after a rename with no advance warning. Customer support is Discord-only; multiple users report account lockouts that go unresolved for days. Free-tier models are explicitly rate-limited and unsuitable for production traffic. Credits expire after one year of inactivity.

Best for: Indie developers and prototypers who want instant access to every major model without deploying or maintaining anything.


LiteLLM

github.com/BerriAI/litellm — open source, self-hosted

LiteLLM is a Python proxy that wraps 100+ LLM providers — Azure, AWS Bedrock, Vertex AI, Anthropic, Groq, Together AI, and more — behind a single OpenAI-compatible interface. At 47,700 GitHub stars and 1,326 releases (v1.85.0, May 17, 2026), it is the most actively maintained LLM gateway in open source.

P95 latency is 8ms at 1,000 RPS in a self-hosted deployment. There is no external round-trip — your proxy lives on your infrastructure, next to your application. The difference between 8ms and OpenRouter’s 50–70ms matters when you’re building latency-sensitive inference pipelines or real-time agent loops.

Standout feature: Virtual API keys with per-key budget limits. Issue a separate key to each team, project, or user; set daily or monthly spend caps; get granular cost attribution across your whole organization. Fallbacks, retries, load balancing, and context-window-aware routing are all configurable per-route. The net effect: the same multi-provider routing you get from OpenRouter, but on your own infrastructure with no per-request fee. The underlying cost logic — which task types benefit from cheaper models versus frontier models — is covered in LLM cost routing: when Haiku beats Opus and when it does not.

Pricing: The proxy is free. Enterprise tier adds SSO, RBAC, audit logs, and secret vault integrations — it requires 100+ users or 10+ production AI use cases and is priced by usage (contact sales). A private university deployed LiteLLM for multi-model cost governance across departments; it’s a common architecture for teams that need provider isolation alongside usage tracking.

Gotchas: Self-hosting a high-availability proxy in production requires real platform engineering. You own uptime, the upgrade cycle, and the observability layer. No managed tier exists. If you want the feature set without the operational overhead, Portkey is the closer fit.

Best for: Engineering teams with DevOps capacity who need open-source auditability, sub-10ms routing latency, and tight per-team cost governance across a multi-provider LLM stack.


Portkey

portkey.ai — managed SaaS, with self-hosted OSS option

Portkey is an AI gateway built for deployments where compliance is a hard requirement. It puts 250+ models (and 1,600+ including fine-tuned customer models) behind a managed endpoint and adds semantic caching, guardrails, full request/response logging, and cost attribution. The gateway OSS is available for self-hosting; the managed tier runs on Portkey’s global edge worker infrastructure.

ISO 27001, SOC 2 Type 2, GDPR, and HIPAA compliance ship on all managed plans. On G2, Portkey holds 4.8/5 — enterprise reviewers specifically call out the observability dashboard and the support team’s responsiveness during PoC evaluations. Gartner Peer Insights reviews highlight rapid model enablement as a practical differentiator for teams switching providers frequently.

Standout feature: Semantic caching that identifies semantically similar queries and serves cached responses. Portkey’s internal tests show approximately 20% cache hit rates for Q&A and RAG workloads, with RAG scenarios ranging from 18% to 60% depending on query distribution — each cached hit eliminates an LLM call entirely. Deterministic and AI-powered guardrails are available on the same endpoint.

Pricing:

  • Developer (managed): Free, 10,000 logged requests/month, 3-day log retention
  • Production: $49/month, 100,000 logs/month with overages at $9/100K, 30-day retention
  • Enterprise: Custom pricing, VPC deployment, 10M+ logs, SSO, custom guardrail hooks

Gotchas: The Production plan’s 100K log cap runs out quickly at scale — a busy microservice hits it in days and the overage billing adds up. MCP (Model Context Protocol) gateway support for agentic workflows is described as limited as of early 2026. You pay per log, not per inference token, which makes cost modeling harder.

Best for: Growth-stage and enterprise teams in regulated industries — healthcare, finance, legal — that need compliance certifications, deep observability, and a polished managed experience.


Not Diamond

notdiamond.ai — SaaS routing layer (not a standalone gateway)

Not Diamond does one thing: it reads each incoming query and decides which model should handle it. Simple lookups go to cheaper, faster models. Complex reasoning goes to frontier models. The claimed outcome is 50%+ cost savings with a 10%+ accuracy improvement versus routing everything to one model. A Rootly case study reports a 39% average accuracy increase across SRE benchmarks.

One critical clarification before you evaluate it: Not Diamond is not a standalone gateway. It sits on top of OpenRouter, HuggingFace, or your existing provider stack. It routes — it does not proxy. You still need a gateway underneath to actually call the models, and you pay both the routing cost and the inference cost separately.

Standout feature: Custom router training with as few as 3 data samples. You can train a router specific to your task distribution — useful if your workload has a distinctive mix, such as a customer service pipeline that alternates between short lookups and long-form summarization.

Pricing: 10,000 routing recommendations/month free, then $10/10,000 additional. Enterprise tier adds VPC deployment, bring-your-own-models, and custom zero-data-retention policies.

Routing latency: 10–100ms overhead per request depending on router complexity, stacked on top of whatever your underlying gateway adds.

Best for: Teams with a functioning gateway who want to cut inference costs by steering easy queries to cheaper models — particularly valuable for mixed-complexity agent pipelines.


Martian

route.withmartian.com — enterprise SaaS, contact sales

Martian is a purpose-built intelligent router from a San Francisco startup reportedly nearing a $1.3B valuation (April 2026). It analyzes each prompt in real time and routes to the model most likely to handle it correctly, with full tracing of routing decisions, latency, and cost attribution per request. Accenture uses it in enterprise AI workflows under the Project Spotlight validation program.

Vendor benchmarks claim 20–97% cost reduction and accuracy that “often beats GPT-4 on key benchmarks” — neither figure has been independently verified. Routing latency overhead is 20–50ms.

Pricing: Volume-based, contact sales. No self-service or public pricing page.

Best for: Large enterprises running high-volume, heterogeneous AI traffic where a purpose-built router with built-in tracing justifies the opaque pricing conversation.


LLM router decision guide

Use OpenRouter if you’re prototyping or building an indie app. Zero setup, every model, one API key. Accept the 50–70ms overhead and plan for occasional model ID renames.

Use LiteLLM if your team controls its infrastructure and needs cost governance with sub-10ms routing latency. Budget for the DevOps effort to run it reliably in production.

Use Portkey if you’re in a regulated industry or need SOC 2/HIPAA out of the box. Watch the log volume cap on the $49/month plan — it runs out faster than the number suggests.

Add Not Diamond if you already have a gateway and want per-query model selection to cut costs. Don’t use it as your only routing layer — it has no proxy capability.

Talk to Martian if you’re running enterprise-scale traffic and need routing traceability baked in from day one.

The underlying problem all these tools solve is the same one Cursor handles internally: the model that is fastest and cheapest for a simple completion is not the model you want on a hard architectural question. A routing layer lets you act on that difference systematically instead of picking one model and overpaying half the time. If you’re also choosing the AI coding assistant that sits above the routing layer, Best AI Coding CLI in 2026 covers how tools like Claude Code and Gemini CLI handle model selection internally.

Verdict

LiteLLM for teams who own their infra. OpenRouter for everyone who doesn’t. Portkey when compliance is a hard gate. Not Diamond as an optimization layer on top of a stack you already have. Martian only if you’re large enough that the opaque pricing is worth investigating.

Caveats

  • OpenRouter latency (50–70ms) comes from community reports and vendor documentation, not an independent lab measurement.
  • LiteLLM 8ms P95 is proxy overhead only, measured against a mock endpoint at 1,000 RPS (vendor benchmark); end-to-end latency including LLM inference is higher.
  • Not Diamond’s 50%+ cost savings and Rootly 39% accuracy gain are vendor-published case studies without independent verification.
  • Martian’s $1.3B valuation figure is from a single media report; treat as unverified.
  • This article contains affiliate links to Cursor.

References