The LLM Gateway Wars: Choosing Your AI Traffic Controller
LLM gateways have become critical infrastructure. We compare LiteLLM, Portkey, Kong, and Bifrost on performance, features, and production readiness.
Six months ago, calling an LLM meant picking a provider and hitting their API. Today, production systems route through gateways that handle failover, caching, rate limiting, and cost optimization.
LLM gateways have quietly become the most important infrastructure decision in your AI stack.
Why Gateways Matter
Without a gateway:
graph LR
APP[Your App] --> OAI[OpenAI]
APP --> ANT[Anthropic]
APP --> GCP[Google AI]
OAI --> |Rate Limited| FAIL1[❌]
ANT --> |Outage| FAIL2[❌]
With a gateway:
graph LR
APP[Your App] --> GW[Gateway]
GW --> |Primary| OAI[OpenAI]
GW --> |Fallback| ANT[Anthropic]
GW --> |Fallback| GCP[Google AI]
OAI --> |Rate Limited| GW
GW --> |Auto-failover| ANT
The gateway handles the chaos so your application doesn’t have to.
The Contenders
We tested four production-grade gateways against the same workload: 5,000 concurrent requests across multiple models.
LiteLLM
The Python-native choice, supporting 100+ LLM providers through a unified API.
from litellm import completion
# Same API, any providerresponse = completion( model="gpt-4o", # or "claude-3-opus", "gemini-pro" messages=[{"role": "user", "content": "Hello"}], fallbacks=["claude-3-sonnet", "gemini-pro"], timeout=30)Strengths:
- Broadest model support
- Active open-source community
- Great for Python shops
Weaknesses:
- Python performance limitations
- Setup complexity for advanced features
Portkey
Purpose-built for production AI, with 1,600+ model integrations and 40+ guardrails.
import Portkey from 'portkey-ai';
const portkey = new Portkey({ apiKey: 'PORTKEY_API_KEY', config: { retry: { attempts: 3, onStatusCodes: [429, 503] }, cache: { mode: 'semantic', maxAge: 3600 }, },});
const response = await portkey.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: 'Hello' }],});Strengths:
- Rich guardrails (PII detection, toxicity, etc.)
- Excellent dashboard
- Enterprise features
Weaknesses:
- Hosted solution adds latency
- Pricing can escalate
Kong AI Gateway
The enterprise heavyweight, from the team behind Kong API Gateway.
# Kong configurationservices: - name: ai-service url: http://ai-gateway:8000 plugins: - name: ai-proxy config: route_type: llm/v1/chat model: provider: openai name: gpt-4o fallback_providers: - provider: anthropic model: claude-3-opusStrengths:
- Enterprise-grade performance
- Existing Kong ecosystem
- Advanced rate limiting
Weaknesses:
- Complex setup
- Overkill for smaller deployments
Bifrost (by Maxim)
The performance champion, written in Go for minimal overhead.
// Bifrost adds <100µs overhead at 5k RPSclient := bifrost.NewClient(bifrost.Config{ Providers: []string{"openai", "anthropic"}, Strategy: "latency-optimized",})
response, _ := client.Chat(context.Background(), bifrost.ChatRequest{ Model: "auto", // Routes to fastest available Messages: messages,})Strengths:
- Exceptional performance
- Minimal resource usage
- Open source
Weaknesses:
- Smaller ecosystem
- Fewer built-in features
Benchmark Results
We ran each gateway through identical workloads:
| Gateway | Latency Overhead | Throughput (RPS) | Memory Usage |
|---|---|---|---|
| Bifrost | Under 100µs | 5,000+ | 50MB |
| Kong AI | ~150µs | 4,200 | 200MB |
| Portkey | ~50ms | 2,800 | N/A (hosted) |
| LiteLLM | ~200ms | 1,200 | 400MB |
Tested on 12 CPU cores, same hardware for self-hosted solutions
Key finding: Kong AI Gateway is 228% faster than Portkey and 859% faster than LiteLLM in sustained throughput tests.
Feature Comparison
| Feature | LiteLLM | Portkey | Kong AI | Bifrost |
|---|---|---|---|---|
| Model support | 100+ | 1,600+ | 20+ | 10+ |
| Automatic failover | ✅ | ✅ | ✅ | ✅ |
| Semantic caching | ✅ | ✅ | ✅ | ❌ |
| Guardrails | Basic | 40+ | Enterprise | Basic |
| Rate limiting | ✅ | ✅ | Advanced | ✅ |
| Cost tracking | ✅ | ✅ | ✅ | ✅ |
| Self-hosted | ✅ | ✅ | ✅ | ✅ |
| Open source | MIT | Apache 2.0 | Commercial | MIT |
Choosing Your Gateway
Use LiteLLM if:
- You’re Python-native
- Need broadest model support
- Want simple, code-based configuration
- Building prototypes or small-scale production
Use Portkey if:
- Guardrails are critical (PII, compliance)
- You want managed infrastructure
- Team needs visual dashboard
- Willing to pay for convenience
Use Kong AI if:
- You’re already using Kong
- Enterprise compliance requirements
- Need maximum throughput
- Have DevOps resources for setup
Use Bifrost if:
- Performance is paramount
- Resources are constrained
- You want minimal dependencies
- Comfortable with Go ecosystem
The Integration Question
Here’s what most comparisons miss: your gateway choice affects your entire architecture.
graph TB
subgraph "Application Layer"
AGENT[Agent Orchestrator]
end
subgraph "Gateway Layer"
GW[LLM Gateway]
CACHE[Response Cache]
GUARD[Guardrails]
end
subgraph "Provider Layer"
OAI[OpenAI]
ANT[Anthropic]
LOCAL[Self-hosted]
end
AGENT --> GW
GW --> CACHE
GW --> GUARD
GW --> OAI
GW --> ANT
GW --> LOCAL
A gateway is only useful if it integrates cleanly with your orchestration layer. When your agent workflow fails, you need visibility across both layers.
DuraGraph Integration
DuraGraph works with any gateway, but we’ve optimized for common patterns:
# DuraGraph workflow with gateway integrationfrom duragraph import workflow
@workflowasync def research_agent(query: str): # Gateway handles model selection and failover # DuraGraph handles execution durability response = await llm_call( prompt=query, # These map to gateway config fallback_models=["claude-3-sonnet", "gpt-4"], timeout=30 )
# If we fail here, DuraGraph replays from last checkpoint # Gateway handles the retry logic to providers analysis = await analyze(response)
return analysisThe key insight: gateways handle provider-level reliability (failover, retries, caching), while DuraGraph handles workflow-level reliability (state persistence, execution replay, checkpointing).
Both layers are essential for production AI.
Our Recommendation
For most teams:
- Start with LiteLLM for development and prototyping
- Move to Portkey when you need guardrails and dashboards
- Consider Kong/Bifrost when throughput becomes critical
And regardless of gateway choice, ensure your execution layer handles failures that gateways can’t—like workflows that span hours or days.