Simple LiteLLM AI API Proxy
This document describes a minimal, fully working LiteLLM-based AI proxy running on Kubernetes. It exposes multiple LLM providers (e.g. OpenAI, Anthropic) under a unified OpenAI-compatible API surface.
Notes#
This version is intentionally simple:
- Uses the official LiteLLM Docker image
- Uses a ConfigMap for model configuration
- Provides API access via a Kubernetes Service + Ingress
- Uses a master API key for authentication
- No Redis, no database, no fallback routing
- Includes the built-in LiteLLM dashboard
1. Introduction#
The Simple LiteLLM AI API Proxy provides a lightweight, unified interface to multiple LLM providers.
It allows you to:
- Expose OpenAI-compatible
/v1/chat/completionsendpoints - Route requests to providers based on
"model": "<your_model>" - Authenticate users using a single master API key
- Support OpenAI, Anthropic, Mistral, Gemini, and many others
- Use LiteLLM’s built-in dashboard for configuration, logs, and key management
2. Architecture#
Exposed endpoints
| URL | Description |
|---|---|
https://<domain>/v1/... |
API endpoint (OpenAI-compatible) |
https://<domain>/ |
LiteLLM dashboard interface |
- The LiteLLM proxy validates API keys
- It forwards requests to the configured models (OpenAI, Anthropic, etc...)
3. Configuration Overview#
LiteLLM uses a single YAML configuration containing:
1. Model Definitions
Each entry consists of:
model_name: The ID your clients uselitellm_params.model: Full provider model IDapi_key: Provider-specific key for LiteLLM
Example:
model_list:
- model_name: gpt-4.1-mini
litellm_params:
model: gpt-4.1-mini
api_key: <OPENAI_API_KEY>
2. General Settings
The proxy uses one master key:
All client requests must include:
4 Kubernetes deployment guide#
This deployment consists of:
- ConfigMap
- LiteLLM Deployment
- ClusterIP Service
- Ingress (NGINX)
Below is the exact working configuration adapted from your cluster.
# ConfigMap — litellm-config
apiVersion: v1
kind: ConfigMap
metadata:
name: litellm-config
data:
config.yaml: |
model_list:
- model_name: gpt-4.1-mini
litellm_params:
model: gpt-4.1-mini
api_key: <OPENAI_API_KEY>
- model_name: claude-3-5-haiku
litellm_params:
model: claude-3-5-haiku-20241022
api_key: <ANTHROPIC_API_KEY>
general_settings:
master_key: sk-master-1234
# Deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm
spec:
replicas: 1
selector:
matchLabels:
app: litellm
template:
metadata:
labels:
app: litellm
spec:
containers:
- name: litellm
image: ghcr.io/berriai/litellm:main-stable
args:
- "--config"
- "/app/config.yaml"
- "--port"
- "4000"
ports:
- containerPort: 4000
volumeMounts:
- name: litellm-config-volume
mountPath: /app/config.yaml
subPath: config.yaml
volumes:
- name: litellm-config-volume
configMap:
name: litellm-config
# Service
---
apiVersion: v1
kind: Service
metadata:
name: litellm-service
spec:
selector:
app: litellm
ports:
- protocol: TCP
port: 4000
targetPort: 4000
type: ClusterIP
# Ingress
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: litellm-ingress
spec:
ingressClassName: nginx
rules:
- host: <domain>
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: litellm-service
port:
number: 4000
Deployment from CLI:
[rocky@rocky-vm ~]$ microk8s kubectl apply -f litellm-deployment.yaml
configmap/litellm-config created
service/litellm-service created
deployment.apps/litellm created
ingress.networking.k8s.io/litellm-ingress created
[rocky@rocky-vm ~]$ microk8s kubectl get pods -w
NAME READY STATUS RESTARTS AGE
litellm-784f6b955-jtvzx 1/1 Running 0 6s
This exposes both:
/v1/...— API/— LiteLLM dashboard

5. Usage examples#
These are simple examples tested from CLI againts live environment
Example 1 — Call OpenAI model via Proxy
curl -X POST http://fip-86-50-231-57.kaj.poutavm.fi/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-master-1234" \
-d '{
"model": "gpt-4.1-mini",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'
Example 2 — Call Claude via Proxy
curl -X POST http://fip-86-50-231-57.kaj.poutavm.fi/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-master-1234" \
-d '{
"model": "claude-3-5-haiku",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'
6. LiteLLM Dashboard#
When the proxy is running, LiteLLM exposes a built-in management UI:
From the dashboard you can:
- View request logs
- Add/remove API keys
- Enable rate limiting
- Change config
- View health and metrics

7. Production Recommendations#
For safe production deployment:
1. Move API keys to Kubernetes Secrets
Never store:
- API keys
- inside ConfigMaps.
Use:
kubectl create secret generic litellm-keys ...
kubectl create secret generic litellm-secrets \
--from-literal=openai_key=xxx \
--from-literal=anthropic_key=yyy \
--from-literal=master_key=zzz
2. Enforce individual user API keys
LiteLLM supports:
- user-specific keys
- per-user rate limits
- per-user model access control
This allows e.g.:
- Developers use gpt-4o-mini only
- Admins can use all models
- External apps get separate rate limits
3. Enable rate limiting
- LiteLLM supports global or per-key limits:
general_settings:
rate_limiter: "local"rate_limit_per_minute: 200
4. Enable logging to persistent storage
With large workloads, save logs to:
- Loki
- Elastic
- S3
- PostgreSQL
5. Run multiple replicas behind HPA
You can scale this setup by adding:
- replicas: 2
- or via HorizontalPodAutoscaler.
- Enable TLS (if external traffic)
If using NGINX ingress:
- Add cert-manager
- Use Let’s Encrypt certificates
8. Summary#
The Simple LiteLLM AI API Proxy is an extremely lightweight but flexible layer on top of LLM providers.
It offers:
- Unified OpenAI-compatible API
- OpenAI + Anthropic support (can be expanded to other providers)
- Built-in dashboard
- Simple Kubernetes deployment
- Easy testing (curl / Python / JS)
- Secure access via master key
This setup is well-suited for:
- Prototyping AI tools
- Small internal services
- Educational environments
- Simple multi-model deployments
- With production enhancements (Secrets, rate limiting, auditing), it can evolve into a robust enterprise-grade API gateway.