Skip to content

Simple LiteLLM AI API Proxy

This document describes a minimal, fully working LiteLLM-based AI proxy running on Kubernetes. It exposes multiple LLM providers (e.g. OpenAI, Anthropic) under a unified OpenAI-compatible API surface.

Original Author: Tuukka Peltomäki
Lastmod: 11.12.2025
Status: In progress

Notes#

This version is intentionally simple:

  • Uses the official LiteLLM Docker image
  • Uses a ConfigMap for model configuration
  • Provides API access via a Kubernetes Service + Ingress
  • Uses a master API key for authentication
  • No Redis, no database, no fallback routing
  • Includes the built-in LiteLLM dashboard

1. Introduction#

The Simple LiteLLM AI API Proxy provides a lightweight, unified interface to multiple LLM providers.

It allows you to:

  • Expose OpenAI-compatible /v1/chat/completions endpoints
  • Route requests to providers based on "model": "<your_model>"
  • Authenticate users using a single master API key
  • Support OpenAI, Anthropic, Mistral, Gemini, and many others
  • Use LiteLLM’s built-in dashboard for configuration, logs, and key management

2. Architecture#

Client → Ingress → Service → LiteLLM Proxy Pod → External LLM Providers

Exposed endpoints

URL Description
https://<domain>/v1/... API endpoint (OpenAI-compatible)
https://<domain>/ LiteLLM dashboard interface
  • The LiteLLM proxy validates API keys
  • It forwards requests to the configured models (OpenAI, Anthropic, etc...)

3. Configuration Overview#

LiteLLM uses a single YAML configuration containing:

1. Model Definitions

Each entry consists of:

  • model_name: The ID your clients use
  • litellm_params.model: Full provider model ID
  • api_key: Provider-specific key for LiteLLM

Example:

model_list:
  - model_name: gpt-4.1-mini
    litellm_params:
      model: gpt-4.1-mini
      api_key: <OPENAI_API_KEY>

2. General Settings

The proxy uses one master key:

general_settings:
  master_key: <MASTER-KEY>

All client requests must include:

Authorization: Bearer <MASTER-KEY>

4 Kubernetes deployment guide#

This deployment consists of:

  • ConfigMap
  • LiteLLM Deployment
  • ClusterIP Service
  • Ingress (NGINX)

Below is the exact working configuration adapted from your cluster.

# ConfigMap — litellm-config
apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-config
data:
  config.yaml: |
    model_list:
      - model_name: gpt-4.1-mini
        litellm_params:
          model: gpt-4.1-mini
          api_key: <OPENAI_API_KEY>

      - model_name: claude-3-5-haiku
        litellm_params:
          model: claude-3-5-haiku-20241022
          api_key: <ANTHROPIC_API_KEY>

    general_settings:
      master_key: sk-master-1234


# Deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
        - name: litellm
          image: ghcr.io/berriai/litellm:main-stable
          args:
            - "--config"
            - "/app/config.yaml"
            - "--port"
            - "4000"
          ports:
            - containerPort: 4000
          volumeMounts:
            - name: litellm-config-volume
              mountPath: /app/config.yaml
              subPath: config.yaml
      volumes:
        - name: litellm-config-volume
          configMap:
            name: litellm-config


# Service
---
apiVersion: v1
kind: Service
metadata:
  name: litellm-service
spec:
  selector:
    app: litellm
  ports:
    - protocol: TCP
      port: 4000
      targetPort: 4000
  type: ClusterIP


# Ingress
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: litellm-ingress
spec:
  ingressClassName: nginx
  rules:
    - host: <domain>
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: litellm-service
                port:
                  number: 4000

Deployment from CLI:

[rocky@rocky-vm ~]$ microk8s kubectl apply -f litellm-deployment.yaml
configmap/litellm-config created
service/litellm-service created
deployment.apps/litellm created
ingress.networking.k8s.io/litellm-ingress created

[rocky@rocky-vm ~]$ microk8s kubectl get pods -w
NAME                      READY   STATUS    RESTARTS   AGE
litellm-784f6b955-jtvzx   1/1     Running   0          6s

This exposes both:

  • /v1/... — API
  • / — LiteLLM dashboard

5. Usage examples#

These are simple examples tested from CLI againts live environment

Example 1 — Call OpenAI model via Proxy

curl -X POST http://fip-86-50-231-57.kaj.poutavm.fi/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-master-1234" \
  -d '{
    "model": "gpt-4.1-mini",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Example 2 — Call Claude via Proxy

curl -X POST http://fip-86-50-231-57.kaj.poutavm.fi/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-master-1234" \
  -d '{
    "model": "claude-3-5-haiku",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

6. LiteLLM Dashboard#

When the proxy is running, LiteLLM exposes a built-in management UI:

https://<your-domain>/

From the dashboard you can:

  • View request logs
  • Add/remove API keys
  • Enable rate limiting
  • Change config
  • View health and metrics

7. Production Recommendations#

For safe production deployment:

1. Move API keys to Kubernetes Secrets

Never store:

  • API keys
  • inside ConfigMaps.

Use:

kubectl create secret generic litellm-keys ...

kubectl create secret generic litellm-secrets \
  --from-literal=openai_key=xxx \
  --from-literal=anthropic_key=yyy \
  --from-literal=master_key=zzz

2. Enforce individual user API keys

LiteLLM supports:

  • user-specific keys
  • per-user rate limits
  • per-user model access control

This allows e.g.:

  • Developers use gpt-4o-mini only
  • Admins can use all models
  • External apps get separate rate limits

3. Enable rate limiting

  • LiteLLM supports global or per-key limits:

general_settings:

  • rate_limiter: "local"
  • rate_limit_per_minute: 200

4. Enable logging to persistent storage

With large workloads, save logs to:

  • Loki
  • Elastic
  • S3
  • PostgreSQL

5. Run multiple replicas behind HPA

You can scale this setup by adding:

  • replicas: 2
  • or via HorizontalPodAutoscaler.
  1. Enable TLS (if external traffic)

If using NGINX ingress:

  • Add cert-manager
  • Use Let’s Encrypt certificates

8. Summary#

The Simple LiteLLM AI API Proxy is an extremely lightweight but flexible layer on top of LLM providers.

It offers:

  • Unified OpenAI-compatible API
  • OpenAI + Anthropic support (can be expanded to other providers)
  • Built-in dashboard
  • Simple Kubernetes deployment
  • Easy testing (curl / Python / JS)
  • Secure access via master key

This setup is well-suited for:

  • Prototyping AI tools
  • Small internal services
  • Educational environments
  • Simple multi-model deployments
  • With production enhancements (Secrets, rate limiting, auditing), it can evolve into a robust enterprise-grade API gateway.