Documentation

Everything you need to integrate, configure, and manage Scalix Router.

What is Scalix Router?

Scalix Router is an enterprise-grade LLM gateway that sits between your applications and 16+ AI providers. It provides a single OpenAI-compatible API to route requests intelligently across providers like OpenAI, Anthropic, Google Vertex AI, Groq, Mistral, and more — with automatic failover, content safety, cost tracking, and full audit logging.

Think of it as mission control for your AI infrastructure: one unified interface to manage routing, security, monitoring, and compliance across every LLM your organization uses.

Key Capabilities

OpenAI-compatible API — drop-in replacement for any OpenAI SDK or library
Intelligent routing across 16+ LLM providers with automatic failover
Content safety with PII detection, topic filtering, and guardrails
Enterprise knowledge (RAG) with 13+ data source connectors
Per-request cost tracking with budget caps and alerts
API key management with scoped permissions and rotation
Multi-tenancy with per-tenant quotas and isolation
GPU monitoring for NVIDIA, AMD, and Intel hardware
Full audit logging for compliance (SOC 2, ISO 27001)
Deployment via Docker, Docker Compose, or Kubernetes with Helm

Tip

Scalix Router is OpenAI-compatible. If your app already uses the OpenAI SDK, you only need to change the base URL — no code changes required.

Quick Start

Get Scalix Router running in under 5 minutes with Docker Compose.

Pull and configure

Pull the Scalix Router image from our container registry and configure your environment variables with at least one provider API key.

Start the services

Run docker-compose up -d to launch the Router, PostgreSQL, and Redis.

Make your first request

Send a chat completion request using curl or any OpenAI-compatible SDK.

1. Pull and configure

bash

# Pull the Scalix Router image
docker pull registry.scalix.world/scalix-router:latest

# Create your environment file
cat > .env << EOF
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
GOOGLE_API_KEY=your-google-key
EOF

2. Start the services

bash

docker-compose up -d
# Router available at http://localhost:8000
# Admin dashboard at http://localhost:8000/admin

3. Make your first request

bash

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4-turbo",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Tip

Set model to "auto" to let the Router choose the best provider based on your routing strategy.

Configuration

Scalix Router is configured via environment variables. All settings can be set in a .env file or passed directly to Docker/Kubernetes.

Provider API Keys

Add API keys for the providers you want to use. Only providers with configured keys will be available for routing.

Provider API Keys

bash

# Core providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
GROQ_API_KEY=gsk_...

# Additional providers
TOGETHER_API_KEY=...
MISTRAL_API_KEY=...
PERPLEXITY_API_KEY=pplx-...
OPENROUTER_API_KEY=sk-or-...
XAI_API_KEY=xai-...
COHERE_API_KEY=...

# Azure OpenAI (requires endpoint)
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01

Security Settings

Security

bash

JWT_SECRET=your-secret-key          # Required in production
JWT_ALGORITHM=HS256
JWT_EXPIRY_HOURS=24
API_KEY_SALT=your-salt              # Required in production

Feature Flags

bash

ENABLE_AUDIT_LOGGING=true           # Log every request for compliance
ENABLE_RATE_LIMITING=true           # Enforce per-key and per-tenant limits
ENABLE_HOT_RELOAD=true              # Auto-detect model config changes

Server Settings

Server

bash

HOST=0.0.0.0
PORT=8000
ENV=production                      # development or production
OLLAMA_URL=http://localhost:11434   # For local model inference
RATE_LIMIT_REQUESTS=100             # Requests per hour per key

Production Security

In production, JWT_SECRET and API_KEY_SALT must be set to strong, unique values. The server will refuse to start without them.

OpenAI-Compatible API

Scalix Router implements the OpenAI Chat Completions API specification. Any application, SDK, or library that works with OpenAI will work with Scalix Router — just change the base URL.

Endpoint

POST/v1/chat/completions

Create a chat completion with automatic provider routing

Request

Chat Completion Request

json

{
  "model": "gpt-4-turbo",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "max_tokens": 500,
  "temperature": 0.7,
  "stream": false
}

Response

Chat Completion Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming

Set "stream": true to receive Server-Sent Events (SSE) for real-time token-by-token output.

Streaming Request

bash

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-3-opus", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Using with OpenAI SDKs

Python SDK

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-scalix-key",
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

JavaScript/TypeScript SDK

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-scalix-key',
  baseURL: 'http://localhost:8000/v1',
});

const response = await client.chat.completions.create({
  model: 'claude-3-opus',
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);

Intelligent Routing

Scalix Router analyzes each request and routes it to the optimal provider based on your chosen strategy. The content analysis engine evaluates request type, complexity, and language to make routing decisions.

Routing Strategies

Cost-Optimized — routes to the cheapest provider that can handle the request
Performance — routes to the fastest provider with lowest latency
Balanced — weighs cost and performance equally
Quality-First — routes to the most capable model regardless of cost
Custom — define your own routing rules with conditional logic

Automatic Failover

When a provider is unavailable or returns an error, the Router automatically fails over to the next candidate provider. The circuit breaker pattern tracks provider health and temporarily removes unhealthy providers from the rotation.

Circuit breaker with configurable failure threshold (default: 5 failures)
Automatic recovery after timeout period (default: 60 seconds)
Health states: CLOSED (healthy) → OPEN (failed) → HALF_OPEN (testing recovery)
Per-provider request success/failure tracking

Model-to-Provider Mapping

The Router automatically maps models to compatible providers. For example, GPT models route to OpenAI or Azure, Claude models route to Anthropic, and Llama models route to Groq or Together AI.

Note

When multiple providers support the same model, the Router selects based on your routing strategy and provider health status.

Multi-Provider Support

Scalix Router supports 16+ LLM providers out of the box. Each provider is configured via a single API key environment variable.

Supported Providers

OpenAI — GPT-4, GPT-4 Turbo, GPT-3.5, o1, o3
Anthropic — Claude 3 Opus, Sonnet, Haiku
Google Vertex AI — Gemini Pro, Gemini Ultra
Groq — Llama 3, Mixtral, Gemma (ultra-low latency)
Together AI — Llama, Mixtral, open-source models
Mistral — Mistral Large, Medium, Small, Codestral
Cohere — Command R+, Command R
Azure OpenAI — GPT models via Azure deployments
AWS Bedrock — Claude, Llama, Titan via AWS
OpenRouter — 100+ models via unified API
Perplexity — Sonar models for search-augmented generation
xAI — Grok models
Hugging Face — Inference API for open-source models
Ollama — Local models on your own hardware
Alibaba Cloud — Qwen and Tongyi models

Adding a Provider

To enable a provider, add its API key to your environment configuration. The Router will automatically detect and load the provider on startup.

Tip

You can add providers at runtime by updating environment variables and restarting the service. No code changes needed.

Knowledge Base (RAG)

Scalix Router includes built-in Retrieval-Augmented Generation (RAG) that enriches every LLM request with relevant context from your enterprise data sources.

Supported Data Sources

PostgreSQL
MongoDB
Amazon S3
Google BigQuery
Snowflake
Databricks
Elasticsearch
And more — 13+ connectors available

How It Works

Connect data sources

Configure your databases, data lakes, and document stores.

Index content

Documents are chunked, embedded using SentenceTransformer (all-MiniLM-L6-v2), and stored in the vector index.

Automatic enrichment

When a request arrives, the Router performs semantic search to find relevant context and injects it into the prompt.

Search API

POST/api/knowledge-base/search

Search indexed documents with semantic similarity

Knowledge Base Search

bash

curl -X POST http://localhost:8000/api/knowledge-base/search \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is our return policy?",
    "limit": 5,
    "access_level": "internal"
  }'

Note

Access-level filtering ensures users only see documents they are authorized to access (public, internal, confidential, restricted).

Authentication

Scalix Router supports JWT token authentication for user sessions and API key authentication for programmatic access.

JWT Login

POST/auth/login

Authenticate with username and password

bash

curl -X POST http://localhost:8000/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "your-password"}'

json

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer",
  "expires_in": 86400
}

API Key Management

API keys provide scoped, long-lived access for applications. Keys are hashed with PBKDF2-SHA256 (100,000 iterations) and never stored in plaintext.

POST/api/keys/create

Generate a new API key

GET/api/keys/list

List all API keys for the current user

DELETE/api/keys/{key_id}

Revoke an API key

POST/api/keys/{key_id}/regenerate

Rotate an API key

Create API Key

bash

curl -X POST http://localhost:8000/api/keys/create \
  -H "Authorization: Bearer <jwt-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-key",
    "permissions": ["read", "write"],
    "rate_limit": 100,
    "expires_in_days": 90
  }'

Key Security

API keys are shown only once at creation. Store them securely — they cannot be retrieved after the initial response.

Multi-Tenancy

Scalix Router supports full multi-tenancy, allowing you to isolate users, models, and quotas per organization or team.

Tenant Isolation

Separate API keys per tenant
Per-tenant rate limiting and quotas
Independent model access controls
Isolated usage tracking and billing
Per-tenant cost reporting

Tenant Header

Include the X-Tenant-ID header in API requests to scope requests to a specific tenant.

Tenant-Scoped Request

bash

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "X-Tenant-ID: tenant-acme-corp" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'

Rate Limiting

Scalix Router enforces rate limits using a sliding window algorithm. Limits can be set per API key and per tenant.

Configuration

RATE_LIMIT_REQUESTS — requests per hour per API key (default: 100)
ADMIN_RATE_LIMIT_REQUESTS — admin endpoint limit (default: 50)
ENABLE_RATE_LIMITING — toggle rate limiting on/off (default: true)

Rate Limit Headers

Every response includes rate limit headers so clients can track their usage.

Response Headers

text

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1640995200
Retry-After: 60          # Only when limit is exceeded

429 Too Many Requests

When the rate limit is exceeded, the Router returns a 429 status with a Retry-After header indicating when the client can retry.

Tip

Implement exponential backoff in your client to handle rate limits gracefully.

Content Safety

Scalix Router includes a built-in safety engine that checks every request for harmful or sensitive content before it reaches any LLM provider.

Content Types Detected

Hate speech and discrimination
Violence and harmful content
Sexual content
Harassment and bullying
Misinformation
PII data (SSNs, emails, phone numbers, credit cards)
Malicious code
Spam
Political content
Illegal activities

Safety Actions

Allow — content passes all checks
Warn — content flagged but request proceeds (logged)
Block — request rejected with 403 status
Quarantine — flagged for human review
Escalate — sent to security team for review

Check Endpoint

POST/api/guardrails/check

Check content for safety violations

Content Safety Check

bash

curl -X POST http://localhost:8000/api/guardrails/check \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Please process this. SSN: 412-68-9103",
    "checks": ["pii", "toxicity"]
  }'

Note

PII detection runs automatically on all requests when guardrails are enabled. No additional configuration needed.

Audit Logging

Every request processed by Scalix Router is logged with full context for compliance and debugging. Audit logs include event type, severity, user identity, timestamps, and request/response metadata.

What's Logged

All API requests with method, path, and status
User identity (API key, tenant, IP address)
Request and response metadata
Content safety violations
Authentication events (login, key creation, key revocation)
Admin actions (config changes, user management)

Searching Logs

Audit logs can be searched and filtered by date range, user, action type, and severity. Logs can be exported in JSON format for integration with external SIEM tools.

Configuration

Audit Logging

bash

ENABLE_AUDIT_LOGGING=true    # Enable/disable audit logging

Note

Audit logging is enabled by default. Disabling it is not recommended for production environments.

Cost Management

Scalix Router tracks the cost of every LLM request at the per-token level. Costs are attributed to individual API keys, tenants, and models for full visibility into AI spending.

Features

Per-request cost calculation by model and provider
Cost aggregation by API key, tenant, model, and time period
Budget caps with configurable alerts
Usage reports and billing dashboards in the admin UI
Cost-optimized routing to minimize spending

Cost-Optimized Routing

When using the cost-optimized routing strategy, the Router automatically selects the cheapest provider capable of handling each request. Combined with per-team budget caps, this gives organizations full control over AI spending.

Deployment

Scalix Router can be deployed via Docker, Docker Compose, or Kubernetes with Helm charts. All deployment methods support horizontal scaling.

Docker

bash

docker run -d \
  --name scalix-router \
  -p 8000:8000 \
  -e OPENAI_API_KEY=sk-... \
  -e JWT_SECRET=your-secret \
  registry.scalix.world/scalix-router:latest

Docker Compose

The recommended deployment for most teams. Includes the Router, PostgreSQL, and Redis.

docker-compose.yml

yaml

services:
  scalix-router:
    image: registry.scalix.world/scalix-router:latest
    ports:
      - "8000:8000"
    env_file: .env
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:15
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: scalix
      POSTGRES_USER: scalix
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/var/lib/redis/data

Kubernetes with Helm

Helm Install

bash

helm repo add scalix https://charts.scalix.world
helm repo update

helm install scalix-router scalix/scalix-router \
  --namespace scalix \
  --create-namespace \
  --set replicaCount=3 \
  --set autoscaling.enabled=true \
  --set autoscaling.minReplicas=1 \
  --set autoscaling.maxReplicas=10

The Helm chart includes deployment, service, ingress (NGINX with TLS), HPA, PodDisruptionBudget, NetworkPolicy, and ServiceMonitor for Prometheus.

Air-Gapped Deployment

For environments with no internet access, Scalix Router can run entirely on local infrastructure using Ollama for model inference. No data leaves your network.

Tip

For production, use the Kubernetes deployment with at least 3 replicas and HPA enabled for automatic scaling.

GPU Monitoring

Scalix Router provides real-time monitoring for GPU hardware across your infrastructure, supporting NVIDIA, AMD, and Intel GPUs.

Supported Hardware

NVIDIA GPUs — via pynvml (NVIDIA ML Library) with full CUDA support
AMD GPUs — via ROCm SMI command-based monitoring
Intel GPUs — via intel_gpu_top and OneAPI integration

Monitored Metrics

GPU utilization percentage
Memory usage (allocated / total)
Temperature (Celsius)
Power consumption (Watts)
Hardware availability status

Admin Endpoints

GET/admin/gpu/nodes

List all GPU nodes and their status

POST/admin/gpu/nodes

POST/admin/gpu/allocate

Allocate GPUs to a tenant or model

Monitoring & Observability

Scalix Router integrates with the standard observability stack: Prometheus for metrics, Grafana for dashboards, and OpenTelemetry for distributed tracing.

Prometheus Metrics

The Router exposes a /metrics endpoint in Prometheus exposition format. Metrics include HTTP request counts, latency histograms, provider health status, and GPU utilization.

GET/metrics

Prometheus-compatible metrics endpoint

Grafana Dashboards

Pre-built Grafana dashboards are included with your Scalix Router deployment package. Docker Compose with the monitoring profile deploys Prometheus and Grafana automatically.

Enable Monitoring Stack

bash

docker-compose --profile monitoring up -d
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000

Alert Rules

12+ pre-configured alert rules covering high error rates, provider failures, GPU temperature, memory usage, and response latency.

OpenTelemetry

The Router is instrumented with OpenTelemetry for distributed tracing. Traces flow through the full request lifecycle: routing → guardrails → provider selection → LLM call → response.

API Endpoints

Key API endpoints available in Scalix Router.

Chat Completions

POST/v1/chat/completions

OpenAI-compatible chat completion with provider routing

GET/v1/models

List available models

Authentication

POST/auth/login

JWT authentication

GET/auth/test-jwt

Verify JWT token

API Keys

POST/api/keys/create

Generate new API key

GET/api/keys/list

List API keys

DELETE/api/keys/{key_id}

Revoke API key

POST/api/keys/{key_id}/regenerate

Rotate API key

Content Safety

POST/api/guardrails/check

Check content for violations

POST/api/guardrails/analyze

Detailed content analysis

GET/api/guardrails/violations

View violation history

Knowledge Base

POST/api/knowledge-base/index

Index documents from data sources

POST/api/knowledge-base/search

Semantic search

GPU Management

GET/admin/gpu/nodes

List GPU nodes

POST/admin/gpu/nodes

POST/admin/gpu/allocate

Allocate GPUs

System

GET/health

Health check

GET/metrics

Prometheus metrics

Error Handling

All errors follow a consistent JSON format with an error code, message, and optional details.

Error Response Format

Error Response

json

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "API rate limit exceeded",
    "details": {
      "retry_after": 60,
      "limit": 100,
      "reset_time": "2025-01-12T18:00:00Z"
    }
  }
}

Common Error Codes

400 Bad Request — invalid request parameters or malformed JSON
401 Unauthorized — missing or invalid API key / JWT token
403 Forbidden — content blocked by guardrails or insufficient permissions
429 Too Many Requests — rate limit exceeded (check Retry-After header)
503 Service Unavailable — all providers failed or are unavailable

Tip

Always check the Retry-After header on 429 responses and implement exponential backoff for transient failures (503).

FAQ

What is Scalix Router?

Scalix Router is an enterprise LLM gateway that provides a single API to access 16+ AI providers with intelligent routing, access control, content safety, and full audit logging.

Do I need to change my application code?

No. If your application uses the OpenAI SDK, you only need to change the base URL and API key. The Router implements the same API specification.

What are the system requirements?

Docker with 2+ CPU cores and 4GB+ RAM for basic deployment. For production, use Kubernetes with 3+ replicas. PostgreSQL 15+ and Redis 7+ are recommended for persistence and caching.

Can I run it without internet access?

Yes. Scalix Router supports air-gapped deployment using Ollama for local model inference. No data leaves your network.

What compliance standards does it support?

Scalix Router is designed for SOC 2 Type II and ISO 27001 compliance with features including full audit logging, PII detection, content filtering, and role-based access control.

How does failover work?

The Router uses a circuit breaker pattern. If a provider fails 5 consecutive requests, it is marked as unavailable for 60 seconds. Requests automatically route to the next healthy provider.

What data sources does the RAG system support?

PostgreSQL, MongoDB, Amazon S3, Google BigQuery, Snowflake, Databricks, Elasticsearch, and more — 13+ connectors available. Documents are indexed using SentenceTransformer embeddings for semantic search.

How is pricing structured?

Scalix Router is self-hosted software. You pay for your own infrastructure and LLM provider API costs. Contact team@scalix.world for enterprise licensing.

Support

Need help with Scalix Router? We are here for you.

Contact

Enterprise support — team@scalix.world
Technical documentation — this page
Product website — scalix.world

Tip

For fastest response, include your deployment method (Docker/K8s), Router version, and relevant error messages when contacting support.

Back to Home