What is Scalix Router?
Scalix Router is an enterprise-grade LLM gateway that sits between your applications and 16+ AI providers. It provides a single OpenAI-compatible API to route requests intelligently across providers like OpenAI, Anthropic, Google Vertex AI, Groq, Mistral, and more — with automatic failover, content safety, cost tracking, and full audit logging.
Think of it as mission control for your AI infrastructure: one unified interface to manage routing, security, monitoring, and compliance across every LLM your organization uses.
Key Capabilities
- OpenAI-compatible API — drop-in replacement for any OpenAI SDK or library
- Intelligent routing across 16+ LLM providers with automatic failover
- Content safety with PII detection, topic filtering, and guardrails
- Enterprise knowledge (RAG) with 13+ data source connectors
- Per-request cost tracking with budget caps and alerts
- API key management with scoped permissions and rotation
- Multi-tenancy with per-tenant quotas and isolation
- GPU monitoring for NVIDIA, AMD, and Intel hardware
- Full audit logging for compliance (SOC 2, ISO 27001)
- Deployment via Docker, Docker Compose, or Kubernetes with Helm
Quick Start
Get Scalix Router running in under 5 minutes with Docker Compose.
Pull the Scalix Router image from our container registry and configure your environment variables with at least one provider API key.
Run docker-compose up -d to launch the Router, PostgreSQL, and Redis.
Send a chat completion request using curl or any OpenAI-compatible SDK.
# Pull the Scalix Router image docker pull registry.scalix.world/scalix-router:latest # Create your environment file cat > .env << EOF OPENAI_API_KEY=sk-your-openai-key ANTHROPIC_API_KEY=sk-ant-your-anthropic-key GOOGLE_API_KEY=your-google-key EOF
docker-compose up -d # Router available at http://localhost:8000 # Admin dashboard at http://localhost:8000/admin
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4-turbo",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'Configuration
Scalix Router is configured via environment variables. All settings can be set in a .env file or passed directly to Docker/Kubernetes.
Provider API Keys
Add API keys for the providers you want to use. Only providers with configured keys will be available for routing.
# Core providers OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... GOOGLE_API_KEY=AIza... GROQ_API_KEY=gsk_... # Additional providers TOGETHER_API_KEY=... MISTRAL_API_KEY=... PERPLEXITY_API_KEY=pplx-... OPENROUTER_API_KEY=sk-or-... XAI_API_KEY=xai-... COHERE_API_KEY=... # Azure OpenAI (requires endpoint) AZURE_OPENAI_API_KEY=... AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_API_VERSION=2024-02-01
Security Settings
JWT_SECRET=your-secret-key # Required in production JWT_ALGORITHM=HS256 JWT_EXPIRY_HOURS=24 API_KEY_SALT=your-salt # Required in production
Feature Flags
ENABLE_AUDIT_LOGGING=true # Log every request for compliance ENABLE_RATE_LIMITING=true # Enforce per-key and per-tenant limits ENABLE_HOT_RELOAD=true # Auto-detect model config changes
Server Settings
HOST=0.0.0.0 PORT=8000 ENV=production # development or production OLLAMA_URL=http://localhost:11434 # For local model inference RATE_LIMIT_REQUESTS=100 # Requests per hour per key
OpenAI-Compatible API
Scalix Router implements the OpenAI Chat Completions API specification. Any application, SDK, or library that works with OpenAI will work with Scalix Router — just change the base URL.
Endpoint
/v1/chat/completionsCreate a chat completion with automatic provider routing
Request
{
"model": "gpt-4-turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 500,
"temperature": 0.7,
"stream": false
}Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677858242,
"model": "gpt-4-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}Streaming
Set "stream": true to receive Server-Sent Events (SSE) for real-time token-by-token output.
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{"model": "claude-3-opus", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'Using with OpenAI SDKs
from openai import OpenAI
client = OpenAI(
api_key="sk-your-scalix-key",
base_url="http://localhost:8000/v1"
)
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-your-scalix-key',
baseURL: 'http://localhost:8000/v1',
});
const response = await client.chat.completions.create({
model: 'claude-3-opus',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);Intelligent Routing
Scalix Router analyzes each request and routes it to the optimal provider based on your chosen strategy. The content analysis engine evaluates request type, complexity, and language to make routing decisions.
Routing Strategies
- Cost-Optimized — routes to the cheapest provider that can handle the request
- Performance — routes to the fastest provider with lowest latency
- Balanced — weighs cost and performance equally
- Quality-First — routes to the most capable model regardless of cost
- Custom — define your own routing rules with conditional logic
Automatic Failover
When a provider is unavailable or returns an error, the Router automatically fails over to the next candidate provider. The circuit breaker pattern tracks provider health and temporarily removes unhealthy providers from the rotation.
- Circuit breaker with configurable failure threshold (default: 5 failures)
- Automatic recovery after timeout period (default: 60 seconds)
- Health states: CLOSED (healthy) → OPEN (failed) → HALF_OPEN (testing recovery)
- Per-provider request success/failure tracking
Model-to-Provider Mapping
The Router automatically maps models to compatible providers. For example, GPT models route to OpenAI or Azure, Claude models route to Anthropic, and Llama models route to Groq or Together AI.
Multi-Provider Support
Scalix Router supports 16+ LLM providers out of the box. Each provider is configured via a single API key environment variable.
Supported Providers
- OpenAI — GPT-4, GPT-4 Turbo, GPT-3.5, o1, o3
- Anthropic — Claude 3 Opus, Sonnet, Haiku
- Google Vertex AI — Gemini Pro, Gemini Ultra
- Groq — Llama 3, Mixtral, Gemma (ultra-low latency)
- Together AI — Llama, Mixtral, open-source models
- Mistral — Mistral Large, Medium, Small, Codestral
- Cohere — Command R+, Command R
- Azure OpenAI — GPT models via Azure deployments
- AWS Bedrock — Claude, Llama, Titan via AWS
- OpenRouter — 100+ models via unified API
- Perplexity — Sonar models for search-augmented generation
- xAI — Grok models
- Hugging Face — Inference API for open-source models
- Ollama — Local models on your own hardware
- Alibaba Cloud — Qwen and Tongyi models
Adding a Provider
To enable a provider, add its API key to your environment configuration. The Router will automatically detect and load the provider on startup.
Knowledge Base (RAG)
Scalix Router includes built-in Retrieval-Augmented Generation (RAG) that enriches every LLM request with relevant context from your enterprise data sources.
Supported Data Sources
- PostgreSQL
- MongoDB
- Amazon S3
- Google BigQuery
- Snowflake
- Databricks
- Elasticsearch
- And more — 13+ connectors available
How It Works
Configure your databases, data lakes, and document stores.
Documents are chunked, embedded using SentenceTransformer (all-MiniLM-L6-v2), and stored in the vector index.
When a request arrives, the Router performs semantic search to find relevant context and injects it into the prompt.
Search API
/api/knowledge-base/searchSearch indexed documents with semantic similarity
curl -X POST http://localhost:8000/api/knowledge-base/search \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"query": "What is our return policy?",
"limit": 5,
"access_level": "internal"
}'Authentication
Scalix Router supports JWT token authentication for user sessions and API key authentication for programmatic access.
JWT Login
/auth/loginAuthenticate with username and password
curl -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "your-password"}'{
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"token_type": "bearer",
"expires_in": 86400
}API Key Management
API keys provide scoped, long-lived access for applications. Keys are hashed with PBKDF2-SHA256 (100,000 iterations) and never stored in plaintext.
/api/keys/createGenerate a new API key
/api/keys/listList all API keys for the current user
/api/keys/{key_id}Revoke an API key
/api/keys/{key_id}/regenerateRotate an API key
curl -X POST http://localhost:8000/api/keys/create \
-H "Authorization: Bearer <jwt-token>" \
-H "Content-Type: application/json" \
-d '{
"name": "production-key",
"permissions": ["read", "write"],
"rate_limit": 100,
"expires_in_days": 90
}'Multi-Tenancy
Scalix Router supports full multi-tenancy, allowing you to isolate users, models, and quotas per organization or team.
Tenant Isolation
- Separate API keys per tenant
- Per-tenant rate limiting and quotas
- Independent model access controls
- Isolated usage tracking and billing
- Per-tenant cost reporting
Tenant Header
Include the X-Tenant-ID header in API requests to scope requests to a specific tenant.
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "X-Tenant-ID: tenant-acme-corp" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'Rate Limiting
Scalix Router enforces rate limits using a sliding window algorithm. Limits can be set per API key and per tenant.
Configuration
- RATE_LIMIT_REQUESTS — requests per hour per API key (default: 100)
- ADMIN_RATE_LIMIT_REQUESTS — admin endpoint limit (default: 50)
- ENABLE_RATE_LIMITING — toggle rate limiting on/off (default: true)
Rate Limit Headers
Every response includes rate limit headers so clients can track their usage.
X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1640995200 Retry-After: 60 # Only when limit is exceeded
429 Too Many Requests
When the rate limit is exceeded, the Router returns a 429 status with a Retry-After header indicating when the client can retry.
Content Safety
Scalix Router includes a built-in safety engine that checks every request for harmful or sensitive content before it reaches any LLM provider.
Content Types Detected
- Hate speech and discrimination
- Violence and harmful content
- Sexual content
- Harassment and bullying
- Misinformation
- PII data (SSNs, emails, phone numbers, credit cards)
- Malicious code
- Spam
- Political content
- Illegal activities
Safety Actions
- Allow — content passes all checks
- Warn — content flagged but request proceeds (logged)
- Block — request rejected with 403 status
- Quarantine — flagged for human review
- Escalate — sent to security team for review
Check Endpoint
/api/guardrails/checkCheck content for safety violations
curl -X POST http://localhost:8000/api/guardrails/check \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"content": "Please process this. SSN: 412-68-9103",
"checks": ["pii", "toxicity"]
}'Audit Logging
Every request processed by Scalix Router is logged with full context for compliance and debugging. Audit logs include event type, severity, user identity, timestamps, and request/response metadata.
What's Logged
- All API requests with method, path, and status
- User identity (API key, tenant, IP address)
- Request and response metadata
- Content safety violations
- Authentication events (login, key creation, key revocation)
- Admin actions (config changes, user management)
Searching Logs
Audit logs can be searched and filtered by date range, user, action type, and severity. Logs can be exported in JSON format for integration with external SIEM tools.
Configuration
ENABLE_AUDIT_LOGGING=true # Enable/disable audit logging
Cost Management
Scalix Router tracks the cost of every LLM request at the per-token level. Costs are attributed to individual API keys, tenants, and models for full visibility into AI spending.
Features
- Per-request cost calculation by model and provider
- Cost aggregation by API key, tenant, model, and time period
- Budget caps with configurable alerts
- Usage reports and billing dashboards in the admin UI
- Cost-optimized routing to minimize spending
Cost-Optimized Routing
When using the cost-optimized routing strategy, the Router automatically selects the cheapest provider capable of handling each request. Combined with per-team budget caps, this gives organizations full control over AI spending.
Deployment
Scalix Router can be deployed via Docker, Docker Compose, or Kubernetes with Helm charts. All deployment methods support horizontal scaling.
Docker
docker run -d \ --name scalix-router \ -p 8000:8000 \ -e OPENAI_API_KEY=sk-... \ -e JWT_SECRET=your-secret \ registry.scalix.world/scalix-router:latest
Docker Compose
The recommended deployment for most teams. Includes the Router, PostgreSQL, and Redis.
services:
scalix-router:
image: registry.scalix.world/scalix-router:latest
ports:
- "8000:8000"
env_file: .env
depends_on:
- postgres
- redis
postgres:
image: postgres:15
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: scalix
POSTGRES_USER: scalix
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
redis:
image: redis:7-alpine
volumes:
- redis_data:/var/lib/redis/dataKubernetes with Helm
helm repo add scalix https://charts.scalix.world helm repo update helm install scalix-router scalix/scalix-router \ --namespace scalix \ --create-namespace \ --set replicaCount=3 \ --set autoscaling.enabled=true \ --set autoscaling.minReplicas=1 \ --set autoscaling.maxReplicas=10
The Helm chart includes deployment, service, ingress (NGINX with TLS), HPA, PodDisruptionBudget, NetworkPolicy, and ServiceMonitor for Prometheus.
Air-Gapped Deployment
For environments with no internet access, Scalix Router can run entirely on local infrastructure using Ollama for model inference. No data leaves your network.
GPU Monitoring
Scalix Router provides real-time monitoring for GPU hardware across your infrastructure, supporting NVIDIA, AMD, and Intel GPUs.
Supported Hardware
- NVIDIA GPUs — via pynvml (NVIDIA ML Library) with full CUDA support
- AMD GPUs — via ROCm SMI command-based monitoring
- Intel GPUs — via intel_gpu_top and OneAPI integration
Monitored Metrics
- GPU utilization percentage
- Memory usage (allocated / total)
- Temperature (Celsius)
- Power consumption (Watts)
- Hardware availability status
Admin Endpoints
/admin/gpu/nodesList all GPU nodes and their status
/admin/gpu/nodesRegister a new GPU node
/admin/gpu/allocateAllocate GPUs to a tenant or model
Monitoring & Observability
Scalix Router integrates with the standard observability stack: Prometheus for metrics, Grafana for dashboards, and OpenTelemetry for distributed tracing.
Prometheus Metrics
The Router exposes a /metrics endpoint in Prometheus exposition format. Metrics include HTTP request counts, latency histograms, provider health status, and GPU utilization.
/metricsPrometheus-compatible metrics endpoint
Grafana Dashboards
Pre-built Grafana dashboards are included with your Scalix Router deployment package. Docker Compose with the monitoring profile deploys Prometheus and Grafana automatically.
docker-compose --profile monitoring up -d # Prometheus: http://localhost:9090 # Grafana: http://localhost:3000
Alert Rules
12+ pre-configured alert rules covering high error rates, provider failures, GPU temperature, memory usage, and response latency.
OpenTelemetry
The Router is instrumented with OpenTelemetry for distributed tracing. Traces flow through the full request lifecycle: routing → guardrails → provider selection → LLM call → response.
API Endpoints
Key API endpoints available in Scalix Router.
Chat Completions
/v1/chat/completionsOpenAI-compatible chat completion with provider routing
/v1/modelsList available models
Authentication
/auth/loginJWT authentication
/auth/test-jwtVerify JWT token
API Keys
/api/keys/createGenerate new API key
/api/keys/listList API keys
/api/keys/{key_id}Revoke API key
/api/keys/{key_id}/regenerateRotate API key
Content Safety
/api/guardrails/checkCheck content for violations
/api/guardrails/analyzeDetailed content analysis
/api/guardrails/violationsView violation history
Knowledge Base
/api/knowledge-base/indexIndex documents from data sources
/api/knowledge-base/searchSemantic search
GPU Management
/admin/gpu/nodesList GPU nodes
/admin/gpu/nodesRegister GPU node
/admin/gpu/allocateAllocate GPUs
System
/healthHealth check
/metricsPrometheus metrics
Error Handling
All errors follow a consistent JSON format with an error code, message, and optional details.
Error Response Format
{
"error": {
"code": "rate_limit_exceeded",
"message": "API rate limit exceeded",
"details": {
"retry_after": 60,
"limit": 100,
"reset_time": "2025-01-12T18:00:00Z"
}
}
}Common Error Codes
- 400 Bad Request — invalid request parameters or malformed JSON
- 401 Unauthorized — missing or invalid API key / JWT token
- 403 Forbidden — content blocked by guardrails or insufficient permissions
- 429 Too Many Requests — rate limit exceeded (check Retry-After header)
- 503 Service Unavailable — all providers failed or are unavailable
FAQ
What is Scalix Router?
Scalix Router is an enterprise LLM gateway that provides a single API to access 16+ AI providers with intelligent routing, access control, content safety, and full audit logging.
Do I need to change my application code?
No. If your application uses the OpenAI SDK, you only need to change the base URL and API key. The Router implements the same API specification.
What are the system requirements?
Docker with 2+ CPU cores and 4GB+ RAM for basic deployment. For production, use Kubernetes with 3+ replicas. PostgreSQL 15+ and Redis 7+ are recommended for persistence and caching.
Can I run it without internet access?
Yes. Scalix Router supports air-gapped deployment using Ollama for local model inference. No data leaves your network.
What compliance standards does it support?
Scalix Router is designed for SOC 2 Type II and ISO 27001 compliance with features including full audit logging, PII detection, content filtering, and role-based access control.
How does failover work?
The Router uses a circuit breaker pattern. If a provider fails 5 consecutive requests, it is marked as unavailable for 60 seconds. Requests automatically route to the next healthy provider.
What data sources does the RAG system support?
PostgreSQL, MongoDB, Amazon S3, Google BigQuery, Snowflake, Databricks, Elasticsearch, and more — 13+ connectors available. Documents are indexed using SentenceTransformer embeddings for semantic search.
How is pricing structured?
Scalix Router is self-hosted software. You pay for your own infrastructure and LLM provider API costs. Contact team@scalix.world for enterprise licensing.
Support
Need help with Scalix Router? We are here for you.
Contact
- Enterprise support — team@scalix.world
- Technical documentation — this page
- Product website — scalix.world