This repository contains Kubernetes operators for managing AI workloads:
- KB Operator (ml-operator): Manages knowledge base resources with Kubeflow Pipelines
- Agent Operator: Manages AI agent deployments with foundation models
Install uv
to set up the project. uv venv
sets up the virtual environment.
All dependencies should be exclusively managed using this tool. Also
pre-commit install
should be run initially, for ensuring consistent
formatting.
A few utility commands are set up using poe
. Outside the virtual environment,
poe
can be invoked using uv run poe <utility>
.
test
: runs all testslint
: checks on formattingformat
: fixes formattingfix
: fixes formatting and other rules, e.g. import sortingexport-deps
: regenerates the requirements.txt independencies/
dev
: Run the operator locally
ai-operators
├── agent # Agent Helm chart
├── chart # Combined Helm chart for both operators
├── dependencies # Generated requirements.txt for image generation
├── src # Operator packages (kb_operator and agent_operator)
└── tests # pytest modules and resources
The chart/
directory contains a unified Helm chart that can deploy both operators. You can enable or disable each operator independently using values:
# Enable/disable operators
kbOperator:
enabled: true # Knowledge Base operator
replicaCount: 1
image:
repository: linode/kb-operator
tag: "0.1.0"
# Additional KB operator config...
agentOperator:
enabled: false # Agent operator
replicaCount: 1
provider: linode # linode or apl
image:
repository: linode/agent-operator
tag: "0.1.0"
# Additional agent operator config...
Deploy both operators:
helm install ai-operators ./chart
Deploy only KB operator:
helm install ai-operators ./chart --set agentOperator.enabled=false
Deploy only Agent operator:
helm install ai-operators ./chart --set kbOperator.enabled=false --set agentOperator.enabled=true
The agent-operator can be deployed using two different providers:
- Linode/K8s Provider (
PROVIDER=linode
): Useskubectl
to deploy agents directly - APL Provider (
PROVIDER=apl
): Uses ArgoCD to deploy agents
1. Create Kind cluster
kind create cluster --name agent-operator-test
kubectl wait --for=condition=Ready nodes --all --timeout=300s
2. Create test namespace
kubectl create namespace team-demo
3. Build and deploy agent-operator
# Generate requirements.txt
uv run poe export-deps
# Build Docker image with agent chart included
docker build \
--build-arg OPERATOR_MODULE=agent_operator \
-t agent-operator:local .
# Load image into Kind cluster
kind load docker-image agent-operator:local --name agent-operator-test
# Deploy the agent-operator with Linode provider
helm install -n team-demo ai-operators ./chart \
--set agentOperator.enabled=true \
--set agentOperator.provider=linode \
--set agentOperator.image.repository=agent-operator \
--set agentOperator.image.tag=local \
--set agentOperator.image.pullPolicy=Never \
--set kbOperator.enabled=false \
--wait \
--timeout=5m
4. Create required secrets
# Create pgvector database secret (required for knowledge base tools)
kubectl create secret generic pgvector-app -n team-demo \
--from-literal=username=app \
--from-literal=password=your-password-here \
--from-literal=host=pgvector-cluster-rw.team-demo.svc.cluster.local \
--from-literal=port=5432
5. Test the operator
# Create a test foundation model service (required for agent deployment)
kubectl create service clusterip llama-service --tcp=8000:8000 -n team-demo
kubectl label service llama-service modelType=foundation modelName=llama -n team-demo
# Create a test agent resource
kubectl apply -f tests/resources/kb-crd.yaml
kubectl apply -f tests/resources/agent-cr.yaml
# Check the agent resource
kubectl get akamaiagents -n team-demo
# Watch operator logs
kubectl logs -l app.kubernetes.io/component=agent-operator -n team-demo -f
1. Create Kind cluster
kind create cluster --name agent-operator-test
kubectl wait --for=condition=Ready nodes --all --timeout=300s
2. Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
kubectl wait --for=condition=Ready pods --all -n argocd --timeout=600s
3. Create test namespace
kubectl create namespace team-demo
4. Build and deploy agent-operator
# Generate requirements.txt
uv run poe export-deps
# Build Docker image with agent chart included
docker build \
--build-arg OPERATOR_MODULE=agent_operator \
--no-cache \
-t agent-operator:local .
# Load image into Kind cluster
kind load docker-image agent-operator:local --name agent-operator-test
# Deploy the agent-operator with APL provider
helm install -n team-demo ai-operators ./chart \
--set agentOperator.enabled=true \
--set agentOperator.provider=apl \
--set agentOperator.image.repository=agent-operator \
--set agentOperator.image.tag=local \
--set agentOperator.image.pullPolicy=Never \
--set agentOperator.env.AGENT_CHART_REPO_URL=https://github.com/linode/ai-operators.git \
--set agentOperator.env.AGENT_CHART_REPO_REVISION=main \
--set agentOperator.env.AGENT_CHART_PATH=agent \
--set kbOperator.enabled=false \
--wait \
--timeout=5m
5. Create required secrets
# Create pgvector database secret (required for knowledge base tools)
kubectl create secret generic pgvector-app -n team-demo \
--from-literal=username=app \
--from-literal=password=your-password-here \
--from-literal=host=pgvector-cluster-rw.team-demo.svc.cluster.local \
--from-literal=port=5432
6. Test the operator
# Create a test foundation model service (required for agent deployment)
kubectl create service clusterip llama-service --tcp=8000:8000 -n team-demo
kubectl label service llama-service modelType=foundation modelName=llama -n team-demo
# Create a test agent resource
kubectl apply -f tests/resources/kb-crd.yaml
kubectl apply -f tests/resources/kb-cr.yaml
kubectl apply -f tests/resources/agent-cr.yaml
# Check the agent resource
kubectl get akamaiagents -n team-demo
# Check ArgoCD application created
kubectl get applications -n argocd
# Watch operator logs
kubectl logs -l app.kubernetes.io/component=agent-operator -n team-demo -f
7. Cleanup
# Delete the Kind cluster
kind delete cluster --name agent-operator-test
For testing the ML-Operator locally, you can set up a Kind cluster with Kubeflow Pipelines using the following steps:
# Install Kind (if not already installed)
brew install kind
# Install uv (if not already installed)
brew install uv
# Install Helm (if not already installed)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
Follow these steps to create your test environment:
1. Create Kind cluster
kind create cluster --name ml-operator-test
kubectl wait --for=condition=Ready nodes --all --timeout=300s
2. Add Helm repositories
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo add minio https://charts.min.io/
helm repo update
3. Clone chart repositories
rm -rf /tmp/apl-core /tmp/apl-charts
git clone https://github.com/linode/apl-core.git /tmp/apl-core
git clone https://github.com/linode/apl-charts.git /tmp/apl-charts
4. Create namespaces
kubectl create namespace kfp
kubectl create namespace cnpg-system
kubectl create namespace team-kb
5. Install CloudNative-PG Operator
helm install cnpg cnpg/cloudnative-pg \
--namespace cnpg-system \
--wait \
--timeout=5m
6. Install PostgreSQL cluster with pgvector
helm install pgvector-cluster /tmp/apl-charts/pgvector-cluster \
--namespace team-kb \
--set imageName=ghcr.io/cloudnative-pg/postgresql:17.5 \
--set apl.networkpolicies.create=false \
--wait \
--timeout=10m
7. Install MinIO for artifact storage
helm install minio minio/minio \
--namespace kfp \
--set rootUser=otomi-admin \
--set rootPassword=supersecretkey \
--set defaultBuckets="kubeflow-pipelines" \
--set resources.requests.memory=256Mi \
--set resources.limits.memory=512Mi \
--set mode=standalone \
--set replicas=1 \
--wait \
--timeout=10m
8. Create Kubeflow Pipelines secrets
kubectl create secret generic mlpipeline-minio-artifact \
--from-literal=accesskey=otomi-admin \
--from-literal=secretkey=supersecretkey \
--namespace kfp
kubectl label secret mlpipeline-minio-artifact app=kubeflow-pipelines -n kfp
9. Install Kubeflow Pipelines
helm install kubeflow-pipelines /tmp/apl-core/charts/kubeflow-pipelines \
--namespace kfp \
--set objectStorage.endpoint=minio.kfp.svc.cluster.local \
--set objectStorage.bucket=kubeflow-pipelines \
--set objectStorage.region=us-east-1 \
--set objectStorage.port=9000 \
--set objectStorage.secure=false \
--set objectStorage.type=minio \
--wait \
--timeout=10m
10. Wait for all services to be ready
kubectl wait --for=condition=Ready pods --all -n kfp --timeout=600s
kubectl wait --for=condition=Ready pods --all -n cnpg-system --timeout=300s
kubectl wait --for=condition=Ready pods --all -n team-kb --timeout=600s
11. Upload test pipeline to Kubeflow
# Port-forward to access Kubeflow Pipelines API
kubectl port-forward -n kfp service/ml-pipeline 3000:80 &
# Upload the test pipeline
python tests/resources/upload-pipeline.py
12. Set up test pipeline source config
First copy .secrets.template to .secrets, and follow the instructions to create and set a token
kubectl create ns ml-operator
kubectl create configmap pipelines -n ml-operator --from-literal=default='{"url": "https://api.github.com/repos/linode/ml-pipelines/actions/artifacts/4055865221/zip", "authType": "bearer", "authSecretName": "pipelines", "authSecretKey": "gh-token"}' <!--- pragma: allowlist secret --->
kubectl create secret pipelines -n ml-operator --from-env-file .secrets
13. Build and deploy ML-Operator
# Generate requirements.txt
uv run poe export-deps
# Build Docker image
docker build -t ml-operator:local .
# Load image into Kind cluster
kind load docker-image ml-operator:local --name ml-operator-test
# Deploy the ML-Operator (KB Operator)
helm install -n ml-operator ai-operators ./chart \
--set kbOperator.enabled=true \
--set kbOperator.image.repository=ml-operator \
--set kbOperator.image.tag=local \
--set kbOperator.image.pullPolicy=Never \
--set kbOperator.env.KUBEFLOW_ENDPOINT=http://ml-pipeline-ui.kfp.svc.cluster.local \
--set agentOperator.enabled=false \
--wait \
--timeout=5m
Once the environment is set up, the operator is already running in the cluster. You can test it:
# Create a test knowledge base resource
kubectl apply -f tests/resources/test-knowledge-base.yaml
# Check the knowledge base resource
kubectl get akamaiknowledgebases
# Watch operator logs
kubectl logs -l app.kubernetes.io/component=kb-operator -f
For iterative development:
# Make code changes, then rebuild and redeploy
uv run poe export-deps
docker build -t ml-operator:local .
kind load docker-image ml-operator:local --name ml-operator-test
# Restart the operator deployment
kubectl rollout restart deployment -l app.kubernetes.io/component=kb-operator
# Watch the updated logs
kubectl logs -l app.kubernetes.io/component=kb-operator -f
# Delete the Kind cluster
kind delete cluster --name ml-operator-test