Back to Home

Documentation

Everything you need to integrate, configure, and manage Scalix Router.

What is Scalix Router?

Scalix Router is an enterprise-grade LLM gateway that sits between your applications and 16+ AI providers. It provides a single OpenAI-compatible API to route requests intelligently across providers like OpenAI, Anthropic, Google Vertex AI, Groq, Mistral, and more — with automatic failover, content safety, cost tracking, and full audit logging.

Think of it as mission control for your AI infrastructure: one unified interface to manage routing, security, monitoring, and compliance across every LLM your organization uses.

Key Capabilities

  • OpenAI-compatible API — drop-in replacement for any OpenAI SDK or library
  • Intelligent routing across 16+ LLM providers with automatic failover
  • Content safety with PII detection, topic filtering, and guardrails
  • Enterprise knowledge (RAG) with 13+ data source connectors
  • Per-request cost tracking with budget caps and alerts
  • API key management with scoped permissions and rotation
  • Multi-tenancy with per-tenant quotas and isolation
  • GPU monitoring for NVIDIA, AMD, and Intel hardware
  • Full audit logging for compliance (SOC 2, ISO 27001)
  • Deployment via Docker, Docker Compose, or Kubernetes with Helm
Tip
Scalix Router is OpenAI-compatible. If your app already uses the OpenAI SDK, you only need to change the base URL — no code changes required.

Quick Start

Get Scalix Router running in under 5 minutes with Docker Compose.

1
Pull and configure

Pull the Scalix Router image from our container registry and configure your environment variables with at least one provider API key.

2
Start the services

Run docker-compose up -d to launch the Router, PostgreSQL, and Redis.

3
Make your first request

Send a chat completion request using curl or any OpenAI-compatible SDK.

1. Pull and configure
bash
# Pull the Scalix Router image
docker pull registry.scalix.world/scalix-router:latest

# Create your environment file
cat > .env << EOF
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
GOOGLE_API_KEY=your-google-key
EOF
2. Start the services
bash
docker-compose up -d
# Router available at http://localhost:8000
# Admin dashboard at http://localhost:8000/admin
3. Make your first request
bash
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4-turbo",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'
Tip
Set model to "auto" to let the Router choose the best provider based on your routing strategy.

Configuration

Scalix Router is configured via environment variables. All settings can be set in a .env file or passed directly to Docker/Kubernetes.

Provider API Keys

Add API keys for the providers you want to use. Only providers with configured keys will be available for routing.

Provider API Keys
bash
# Core providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
GROQ_API_KEY=gsk_...

# Additional providers
TOGETHER_API_KEY=...
MISTRAL_API_KEY=...
PERPLEXITY_API_KEY=pplx-...
OPENROUTER_API_KEY=sk-or-...
XAI_API_KEY=xai-...
COHERE_API_KEY=...

# Azure OpenAI (requires endpoint)
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01

Security Settings

Security
bash
JWT_SECRET=your-secret-key          # Required in production
JWT_ALGORITHM=HS256
JWT_EXPIRY_HOURS=24
API_KEY_SALT=your-salt              # Required in production

Feature Flags

Feature Flags
bash
ENABLE_AUDIT_LOGGING=true           # Log every request for compliance
ENABLE_RATE_LIMITING=true           # Enforce per-key and per-tenant limits
ENABLE_HOT_RELOAD=true              # Auto-detect model config changes

Server Settings

Server
bash
HOST=0.0.0.0
PORT=8000
ENV=production                      # development or production
OLLAMA_URL=http://localhost:11434   # For local model inference
RATE_LIMIT_REQUESTS=100             # Requests per hour per key
Production Security
In production, JWT_SECRET and API_KEY_SALT must be set to strong, unique values. The server will refuse to start without them.

OpenAI-Compatible API

Scalix Router implements the OpenAI Chat Completions API specification. Any application, SDK, or library that works with OpenAI will work with Scalix Router — just change the base URL.

Endpoint

POST/v1/chat/completions

Create a chat completion with automatic provider routing

Request

Chat Completion Request
json
{
  "model": "gpt-4-turbo",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "max_tokens": 500,
  "temperature": 0.7,
  "stream": false
}

Response

Chat Completion Response
json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming

Set "stream": true to receive Server-Sent Events (SSE) for real-time token-by-token output.

Streaming Request
bash
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-3-opus", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Using with OpenAI SDKs

Python SDK
python
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-scalix-key",
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
JavaScript/TypeScript SDK
javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-scalix-key',
  baseURL: 'http://localhost:8000/v1',
});

const response = await client.chat.completions.create({
  model: 'claude-3-opus',
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);

Intelligent Routing

Scalix Router analyzes each request and routes it to the optimal provider based on your chosen strategy. The content analysis engine evaluates request type, complexity, and language to make routing decisions.

Routing Strategies

  • Cost-Optimized — routes to the cheapest provider that can handle the request
  • Performance — routes to the fastest provider with lowest latency
  • Balanced — weighs cost and performance equally
  • Quality-First — routes to the most capable model regardless of cost
  • Custom — define your own routing rules with conditional logic

Automatic Failover

When a provider is unavailable or returns an error, the Router automatically fails over to the next candidate provider. The circuit breaker pattern tracks provider health and temporarily removes unhealthy providers from the rotation.

  • Circuit breaker with configurable failure threshold (default: 5 failures)
  • Automatic recovery after timeout period (default: 60 seconds)
  • Health states: CLOSED (healthy) → OPEN (failed) → HALF_OPEN (testing recovery)
  • Per-provider request success/failure tracking

Model-to-Provider Mapping

The Router automatically maps models to compatible providers. For example, GPT models route to OpenAI or Azure, Claude models route to Anthropic, and Llama models route to Groq or Together AI.

Note
When multiple providers support the same model, the Router selects based on your routing strategy and provider health status.

Multi-Provider Support

Scalix Router supports 16+ LLM providers out of the box. Each provider is configured via a single API key environment variable.

Supported Providers

  • OpenAI — GPT-4, GPT-4 Turbo, GPT-3.5, o1, o3
  • Anthropic — Claude 3 Opus, Sonnet, Haiku
  • Google Vertex AI — Gemini Pro, Gemini Ultra
  • Groq — Llama 3, Mixtral, Gemma (ultra-low latency)
  • Together AI — Llama, Mixtral, open-source models
  • Mistral — Mistral Large, Medium, Small, Codestral
  • Cohere — Command R+, Command R
  • Azure OpenAI — GPT models via Azure deployments
  • AWS Bedrock — Claude, Llama, Titan via AWS
  • OpenRouter — 100+ models via unified API
  • Perplexity — Sonar models for search-augmented generation
  • xAI — Grok models
  • Hugging Face — Inference API for open-source models
  • Ollama — Local models on your own hardware
  • Alibaba Cloud — Qwen and Tongyi models

Adding a Provider

To enable a provider, add its API key to your environment configuration. The Router will automatically detect and load the provider on startup.

Tip
You can add providers at runtime by updating environment variables and restarting the service. No code changes needed.

Knowledge Base (RAG)

Scalix Router includes built-in Retrieval-Augmented Generation (RAG) that enriches every LLM request with relevant context from your enterprise data sources.

Supported Data Sources

  • PostgreSQL
  • MongoDB
  • Amazon S3
  • Google BigQuery
  • Snowflake
  • Databricks
  • Elasticsearch
  • And more — 13+ connectors available

How It Works

1
Connect data sources

Configure your databases, data lakes, and document stores.

2
Index content

Documents are chunked, embedded using SentenceTransformer (all-MiniLM-L6-v2), and stored in the vector index.

3
Automatic enrichment

When a request arrives, the Router performs semantic search to find relevant context and injects it into the prompt.

Search API

POST/api/knowledge-base/search

Search indexed documents with semantic similarity

Knowledge Base Search
bash
curl -X POST http://localhost:8000/api/knowledge-base/search \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is our return policy?",
    "limit": 5,
    "access_level": "internal"
  }'
Note
Access-level filtering ensures users only see documents they are authorized to access (public, internal, confidential, restricted).

Authentication

Scalix Router supports JWT token authentication for user sessions and API key authentication for programmatic access.

JWT Login

POST/auth/login

Authenticate with username and password

Login Request
bash
curl -X POST http://localhost:8000/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "your-password"}'
Login Response
json
{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer",
  "expires_in": 86400
}

API Key Management

API keys provide scoped, long-lived access for applications. Keys are hashed with PBKDF2-SHA256 (100,000 iterations) and never stored in plaintext.

POST/api/keys/create

Generate a new API key

GET/api/keys/list

List all API keys for the current user

DELETE/api/keys/{key_id}

Revoke an API key

POST/api/keys/{key_id}/regenerate

Rotate an API key

Create API Key
bash
curl -X POST http://localhost:8000/api/keys/create \
  -H "Authorization: Bearer <jwt-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-key",
    "permissions": ["read", "write"],
    "rate_limit": 100,
    "expires_in_days": 90
  }'
Key Security
API keys are shown only once at creation. Store them securely — they cannot be retrieved after the initial response.

Multi-Tenancy

Scalix Router supports full multi-tenancy, allowing you to isolate users, models, and quotas per organization or team.

Tenant Isolation

  • Separate API keys per tenant
  • Per-tenant rate limiting and quotas
  • Independent model access controls
  • Isolated usage tracking and billing
  • Per-tenant cost reporting

Tenant Header

Include the X-Tenant-ID header in API requests to scope requests to a specific tenant.

Tenant-Scoped Request
bash
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "X-Tenant-ID: tenant-acme-corp" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'

Rate Limiting

Scalix Router enforces rate limits using a sliding window algorithm. Limits can be set per API key and per tenant.

Configuration

  • RATE_LIMIT_REQUESTS — requests per hour per API key (default: 100)
  • ADMIN_RATE_LIMIT_REQUESTS — admin endpoint limit (default: 50)
  • ENABLE_RATE_LIMITING — toggle rate limiting on/off (default: true)

Rate Limit Headers

Every response includes rate limit headers so clients can track their usage.

Response Headers
text
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1640995200
Retry-After: 60          # Only when limit is exceeded

429 Too Many Requests

When the rate limit is exceeded, the Router returns a 429 status with a Retry-After header indicating when the client can retry.

Tip
Implement exponential backoff in your client to handle rate limits gracefully.

Content Safety

Scalix Router includes a built-in safety engine that checks every request for harmful or sensitive content before it reaches any LLM provider.

Content Types Detected

  • Hate speech and discrimination
  • Violence and harmful content
  • Sexual content
  • Harassment and bullying
  • Misinformation
  • PII data (SSNs, emails, phone numbers, credit cards)
  • Malicious code
  • Spam
  • Political content
  • Illegal activities

Safety Actions

  • Allow — content passes all checks
  • Warn — content flagged but request proceeds (logged)
  • Block — request rejected with 403 status
  • Quarantine — flagged for human review
  • Escalate — sent to security team for review

Check Endpoint

POST/api/guardrails/check

Check content for safety violations

Content Safety Check
bash
curl -X POST http://localhost:8000/api/guardrails/check \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Please process this. SSN: 412-68-9103",
    "checks": ["pii", "toxicity"]
  }'
Note
PII detection runs automatically on all requests when guardrails are enabled. No additional configuration needed.

Audit Logging

Every request processed by Scalix Router is logged with full context for compliance and debugging. Audit logs include event type, severity, user identity, timestamps, and request/response metadata.

What's Logged

  • All API requests with method, path, and status
  • User identity (API key, tenant, IP address)
  • Request and response metadata
  • Content safety violations
  • Authentication events (login, key creation, key revocation)
  • Admin actions (config changes, user management)

Searching Logs

Audit logs can be searched and filtered by date range, user, action type, and severity. Logs can be exported in JSON format for integration with external SIEM tools.

Configuration

Audit Logging
bash
ENABLE_AUDIT_LOGGING=true    # Enable/disable audit logging
Note
Audit logging is enabled by default. Disabling it is not recommended for production environments.

Cost Management

Scalix Router tracks the cost of every LLM request at the per-token level. Costs are attributed to individual API keys, tenants, and models for full visibility into AI spending.

Features

  • Per-request cost calculation by model and provider
  • Cost aggregation by API key, tenant, model, and time period
  • Budget caps with configurable alerts
  • Usage reports and billing dashboards in the admin UI
  • Cost-optimized routing to minimize spending

Cost-Optimized Routing

When using the cost-optimized routing strategy, the Router automatically selects the cheapest provider capable of handling each request. Combined with per-team budget caps, this gives organizations full control over AI spending.

Deployment

Scalix Router can be deployed via Docker, Docker Compose, or Kubernetes with Helm charts. All deployment methods support horizontal scaling.

Docker

Docker
bash
docker run -d \
  --name scalix-router \
  -p 8000:8000 \
  -e OPENAI_API_KEY=sk-... \
  -e JWT_SECRET=your-secret \
  registry.scalix.world/scalix-router:latest

Docker Compose

The recommended deployment for most teams. Includes the Router, PostgreSQL, and Redis.

docker-compose.yml
yaml
services:
  scalix-router:
    image: registry.scalix.world/scalix-router:latest
    ports:
      - "8000:8000"
    env_file: .env
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:15
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: scalix
      POSTGRES_USER: scalix
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/var/lib/redis/data

Kubernetes with Helm

Helm Install
bash
helm repo add scalix https://charts.scalix.world
helm repo update

helm install scalix-router scalix/scalix-router \
  --namespace scalix \
  --create-namespace \
  --set replicaCount=3 \
  --set autoscaling.enabled=true \
  --set autoscaling.minReplicas=1 \
  --set autoscaling.maxReplicas=10

The Helm chart includes deployment, service, ingress (NGINX with TLS), HPA, PodDisruptionBudget, NetworkPolicy, and ServiceMonitor for Prometheus.

Air-Gapped Deployment

For environments with no internet access, Scalix Router can run entirely on local infrastructure using Ollama for model inference. No data leaves your network.

Tip
For production, use the Kubernetes deployment with at least 3 replicas and HPA enabled for automatic scaling.

GPU Monitoring

Scalix Router provides real-time monitoring for GPU hardware across your infrastructure, supporting NVIDIA, AMD, and Intel GPUs.

Supported Hardware

  • NVIDIA GPUs — via pynvml (NVIDIA ML Library) with full CUDA support
  • AMD GPUs — via ROCm SMI command-based monitoring
  • Intel GPUs — via intel_gpu_top and OneAPI integration

Monitored Metrics

  • GPU utilization percentage
  • Memory usage (allocated / total)
  • Temperature (Celsius)
  • Power consumption (Watts)
  • Hardware availability status

Admin Endpoints

GET/admin/gpu/nodes

List all GPU nodes and their status

POST/admin/gpu/nodes

Register a new GPU node

POST/admin/gpu/allocate

Allocate GPUs to a tenant or model

Monitoring & Observability

Scalix Router integrates with the standard observability stack: Prometheus for metrics, Grafana for dashboards, and OpenTelemetry for distributed tracing.

Prometheus Metrics

The Router exposes a /metrics endpoint in Prometheus exposition format. Metrics include HTTP request counts, latency histograms, provider health status, and GPU utilization.

GET/metrics

Prometheus-compatible metrics endpoint

Grafana Dashboards

Pre-built Grafana dashboards are included with your Scalix Router deployment package. Docker Compose with the monitoring profile deploys Prometheus and Grafana automatically.

Enable Monitoring Stack
bash
docker-compose --profile monitoring up -d
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000

Alert Rules

12+ pre-configured alert rules covering high error rates, provider failures, GPU temperature, memory usage, and response latency.

OpenTelemetry

The Router is instrumented with OpenTelemetry for distributed tracing. Traces flow through the full request lifecycle: routing → guardrails → provider selection → LLM call → response.

API Endpoints

Key API endpoints available in Scalix Router.

Chat Completions

POST/v1/chat/completions

OpenAI-compatible chat completion with provider routing

GET/v1/models

List available models

Authentication

POST/auth/login

JWT authentication

GET/auth/test-jwt

Verify JWT token

API Keys

POST/api/keys/create

Generate new API key

GET/api/keys/list

List API keys

DELETE/api/keys/{key_id}

Revoke API key

POST/api/keys/{key_id}/regenerate

Rotate API key

Content Safety

POST/api/guardrails/check

Check content for violations

POST/api/guardrails/analyze

Detailed content analysis

GET/api/guardrails/violations

View violation history

Knowledge Base

POST/api/knowledge-base/index

Index documents from data sources

POST/api/knowledge-base/search

Semantic search

GPU Management

GET/admin/gpu/nodes

List GPU nodes

POST/admin/gpu/nodes

Register GPU node

POST/admin/gpu/allocate

Allocate GPUs

System

GET/health

Health check

GET/metrics

Prometheus metrics

Error Handling

All errors follow a consistent JSON format with an error code, message, and optional details.

Error Response Format

Error Response
json
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "API rate limit exceeded",
    "details": {
      "retry_after": 60,
      "limit": 100,
      "reset_time": "2025-01-12T18:00:00Z"
    }
  }
}

Common Error Codes

  • 400 Bad Request — invalid request parameters or malformed JSON
  • 401 Unauthorized — missing or invalid API key / JWT token
  • 403 Forbidden — content blocked by guardrails or insufficient permissions
  • 429 Too Many Requests — rate limit exceeded (check Retry-After header)
  • 503 Service Unavailable — all providers failed or are unavailable
Tip
Always check the Retry-After header on 429 responses and implement exponential backoff for transient failures (503).

FAQ

What is Scalix Router?

Scalix Router is an enterprise LLM gateway that provides a single API to access 16+ AI providers with intelligent routing, access control, content safety, and full audit logging.

Do I need to change my application code?

No. If your application uses the OpenAI SDK, you only need to change the base URL and API key. The Router implements the same API specification.

What are the system requirements?

Docker with 2+ CPU cores and 4GB+ RAM for basic deployment. For production, use Kubernetes with 3+ replicas. PostgreSQL 15+ and Redis 7+ are recommended for persistence and caching.

Can I run it without internet access?

Yes. Scalix Router supports air-gapped deployment using Ollama for local model inference. No data leaves your network.

What compliance standards does it support?

Scalix Router is designed for SOC 2 Type II and ISO 27001 compliance with features including full audit logging, PII detection, content filtering, and role-based access control.

How does failover work?

The Router uses a circuit breaker pattern. If a provider fails 5 consecutive requests, it is marked as unavailable for 60 seconds. Requests automatically route to the next healthy provider.

What data sources does the RAG system support?

PostgreSQL, MongoDB, Amazon S3, Google BigQuery, Snowflake, Databricks, Elasticsearch, and more — 13+ connectors available. Documents are indexed using SentenceTransformer embeddings for semantic search.

How is pricing structured?

Scalix Router is self-hosted software. You pay for your own infrastructure and LLM provider API costs. Contact team@scalix.world for enterprise licensing.

Support

Need help with Scalix Router? We are here for you.

Contact

  • Enterprise support — team@scalix.world
  • Technical documentation — this page
  • Product website — scalix.world
Tip
For fastest response, include your deployment method (Docker/K8s), Router version, and relevant error messages when contacting support.