Skip to content

Production Deployment Guide

This guide covers production hardening for CloudTaser deployments. It assumes you have completed the basic installation of the operator, eBPF agent, and optionally the S3 encryption proxy.

Multi-Cloud Support

CloudTaser is tested on the three major managed Kubernetes platforms. Each has specific requirements and considerations.

Cluster Requirements

Requirement Value
Cluster type GKE Standard (not Autopilot)
Kubernetes 1.28+
Node image Container-Optimized OS (COS) or Ubuntu
Kernel 5.15+ (COS and Ubuntu both qualify)

GKE-Specific Configuration

Workload Identity -- If using Workload Identity, ensure the operator's ServiceAccount is bound to a GCP service account with no additional permissions. CloudTaser does not need GCP API access; it authenticates to vault using Kubernetes auth only.

Private clusters -- The vault endpoint must be reachable from the cluster's VPC. Use VPC peering or Cloud VPN to connect to your EU vault. Add the vault endpoint to the master authorized networks if using a private control plane.

Binary Authorization -- The operator and wrapper images are signed. Configure Binary Authorization to allow images from ghcr.io/skipopsltd/*.

helm install cloudtaser oci://ghcr.io/skipopsltd/cloudtaser-helm/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.vaultAddress=https://vault.eu.example.com

Cluster Requirements

Requirement Value
Node groups Managed or self-managed (not Fargate)
Kubernetes 1.28+
AMI Amazon Linux 2023 or Ubuntu 22.04
Kernel 5.15+ (AL2023: 6.1+, Ubuntu 22.04: 5.15+)

EKS-Specific Configuration

IRSA / Pod Identity -- Not required for the operator or eBPF agent. CloudTaser authenticates to vault using Kubernetes auth, not AWS IAM. IRSA or Pod Identity is only needed if the S3 proxy requires access to AWS S3 buckets.

VPC connectivity -- Ensure your EU vault is reachable from the EKS VPC via VPN, Transit Gateway, or a public endpoint with TLS.

Security Groups -- Allow outbound HTTPS (port 443) from worker nodes to the vault endpoint. Also allow intra-cluster traffic on port 8199 (gRPC between eBPF agent and operator).

helm install cloudtaser oci://ghcr.io/skipopsltd/cloudtaser-helm/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.vaultAddress=https://vault.eu.example.com

Cluster Requirements

Requirement Value
Node pools Regular (not Virtual Nodes / ACI)
Kubernetes 1.28+
Node image Ubuntu 22.04 or Azure Linux (Mariner)
Kernel 5.15+

AKS-Specific Configuration

Azure AD Pod Identity / Workload Identity -- Not required. CloudTaser uses Kubernetes auth to vault, not Azure AD. Only needed if the S3 proxy accesses Azure Blob Storage.

Private endpoint -- If using AKS private cluster, ensure vault is reachable from the VNet via VNet peering or Azure VPN Gateway.

NSG rules -- Allow outbound HTTPS to the vault endpoint from node pool subnets. Allow intra-cluster traffic on port 8199.

helm install cloudtaser oci://ghcr.io/skipopsltd/cloudtaser-helm/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.vaultAddress=https://vault.eu.example.com

Network Policies

CloudTaser requires specific network connectivity between its components and external services. Apply network policies to restrict traffic to only what is necessary.

Required Connectivity

Source Destination Port Protocol Purpose
Application pods Vault endpoint 443 HTTPS Secret fetching by wrapper
Operator pod K8s API server 443 HTTPS Webhook serving, pod watching
eBPF agent Operator pod 8199 gRPC PID registration for protected processes
Operator pod Container registries 443 HTTPS Entrypoint resolution
S3 proxy sidecar Upstream S3 endpoint 443 HTTPS Object storage access (if S3 proxy enabled)
S3 proxy sidecar Vault endpoint 443 HTTPS Transit encrypt/decrypt operations

Auto-Applied Policies

The CloudTaser operator automatically applies egress NetworkPolicies to namespaces containing protected pods. These policies restrict protected pods to only reach:

  • The configured vault endpoint (HTTPS/443)
  • The Kubernetes API server (for service account token exchange)
  • DNS (UDP/TCP 53)

Auto-applied policies are created as cloudtaser-egress-<namespace> NetworkPolicy resources and are reconciled by the operator's NetworkPolicy controller.

Manual NetworkPolicy Example

For additional control, apply explicit network policies:

cloudtaser-network-policies.yaml
# Allow application pods to reach the vault
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-vault-egress
  namespace: default
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: <VAULT_IP>/32
      ports:
        - protocol: TCP
          port: 443
---
# Allow eBPF agent to reach the operator
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ebpf-to-operator
  namespace: cloudtaser-system
spec:
  podSelector:
    matchLabels:
      app: cloudtaser-ebpf
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: cloudtaser-operator
      ports:
        - protocol: TCP
          port: 8199

Generate Policies with CLI

The CloudTaser CLI can generate network policies tailored to your environment:

cloudtaser netpol --vault-address https://vault.eu.example.com

Apply the generated policies:

cloudtaser netpol --vault-address https://vault.eu.example.com | kubectl apply -f -

RBAC Hardening

The Helm chart creates the necessary RBAC resources automatically. This section covers additional hardening for production environments.

Operator ClusterRole

The operator requires cluster-wide permissions:

Resource Verbs Purpose
api.cloudtaser.io CRDs Full CRUD Manage CloudTaserConfigs and SecretMappings
secrets get, list, watch, create, update Webhook TLS certificates
pods get, list, watch Injection decisions
serviceaccounts get, list, watch Identity validation
mutatingwebhookconfigurations get, patch Self-managed webhook
apps/deployments get, list, watch, patch Workload management

eBPF Agent ClusterRole

The eBPF agent requires minimal permissions:

Resource Verbs Purpose
pods get, list, watch Discover monitored PIDs
nodes get Identify the current node

The agent runs as a privileged DaemonSet with hostPID: true and requires SYS_ADMIN, SYS_PTRACE, NET_ADMIN, and SYS_RESOURCE capabilities.

Restrict Pod Read Access

CloudTaser stores configuration in pod annotations (vault paths, environment variable mappings, rotation strategy) but never stores secret values in Kubernetes. However, annotation metadata reveals your secret infrastructure -- vault paths, role names, and key mappings. Restricting pod read access is a defense-in-depth measure.

What annotations expose (and what they do not)

Annotations contain only configuration: vault endpoint URLs, auth role names, KV paths, and env-var mappings. They tell an observer where secrets live, but not what the secret values are. An attacker with only pod read access cannot retrieve actual credentials.

Audit existing RBAC:

# Check if the default service account can list pods
kubectl auth can-i list pods --as=system:serviceaccount:default:default

# Cluster-wide audit
kubectl auth can-i list pods --all-namespaces \
  --as=system:serviceaccount:default:default

If any non-operator service account returns yes, apply restrictive RBAC:

rbac-pod-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: protected-workloads
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-reader-binding
  namespace: protected-workloads
subjects:
  - kind: ServiceAccount
    name: cloudtaser-operator
    namespace: cloudtaser-system
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Use Separate Namespaces for Protected Workloads

Isolate CloudTaser-protected workloads in dedicated namespaces:

kubectl create namespace protected-workloads
kubectl label namespace protected-workloads cloudtaser.io/protected=true

Benefits:

  • RBAC Role and RoleBinding are namespace-scoped, simplifying access control
  • NetworkPolicies (auto-applied by the operator) are namespace-scoped
  • Audit logging can be filtered by namespace

Resource Limits

Configure appropriate resource requests and limits for all CloudTaser components in production.

Operator

Resource Request Limit Notes
CPU 50m 200m Increases during high pod creation rates
Memory 64Mi 128Mi Stable; cache size depends on watched resources

eBPF Agent (per node)

Resource Request Limit Notes
CPU 100m 500m Higher during initial BPF program loading
Memory 128Mi 512Mi BPF maps consume memory proportional to monitored PIDs

Wrapper (per injected pod)

The wrapper runs inside each protected workload container and adds minimal overhead:

Resource Overhead
Memory ~5-10 MB additional RSS
CPU Negligible (idle after initial vault fetch; wakes for lease renewal)
Startup latency 50-200ms (depends on vault response time)

S3 Proxy (per injected pod)

Resource Request Limit Notes
CPU 50m 200m Higher during encryption-heavy workloads
Memory 32Mi 128Mi Scales with concurrent request count

Override defaults in your Helm values:

values.yaml
operator:
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 256Mi

ebpf:
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 1000m
      memory: 1Gi

High Availability

Operator HA

For production, run the operator with multiple replicas and leader election:

values.yaml
operator:
  ha: true
  replicaCount: 3
  leaderElect: true

In HA mode:

  • 3 replicas are deployed with pod anti-affinity across nodes
  • Leader election ensures only one replica serves the webhook at a time
  • Failover is automatic -- if the leader pod is evicted or crashes, another replica takes over within seconds
  • Replicas are spread across availability zones when possible

Pod Disruption Budget

The Helm chart creates a PodDisruptionBudget in HA mode that ensures at least 1 replica is always available during voluntary disruptions (node drains, upgrades).

eBPF Agent HA

The eBPF agent runs as a DaemonSet and is inherently HA -- one instance per node. It uses priorityClassName: system-node-critical to ensure scheduling even under resource pressure.

Node drain considerations

When draining a node for maintenance, the eBPF agent on that node will be evicted. Pods on that node lose runtime enforcement until the agent is rescheduled. Plan maintenance windows accordingly and drain nodes one at a time.

Vault HA

Vault HA is outside the scope of CloudTaser but is strongly recommended for production:

  • OpenBao / Vault Enterprise -- Use integrated Raft storage with 3+ nodes across availability zones
  • OpenBao OSS -- Use an external storage backend (Consul, PostgreSQL) with multiple vault instances behind a load balancer

Monitoring and Alerting

Operator Metrics

The operator exposes Prometheus metrics on port 8080:

Metric Type Description
controller_runtime_reconcile_total Counter Reconciliation counts by controller and result
controller_runtime_reconcile_errors_total Counter Failed reconciliations
cloudtaser_webhook_injection_total Counter Injection count by status (success, error, skipped)

Example Prometheus scrape config:

prometheus-scrape.yaml
- job_name: cloudtaser-operator
  kubernetes_sd_configs:
    - role: pod
      namespaces:
        names:
          - cloudtaser-system
  relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      regex: cloudtaser-operator
      action: keep
    - source_labels: [__meta_kubernetes_pod_container_port_number]
      regex: "8080"
      action: keep

eBPF Agent Health

The eBPF agent exposes HTTP health endpoints on port 9090:

Endpoint Purpose
GET /healthz Liveness probe -- agent process is running
GET /readyz Readiness probe -- BPF programs are loaded and monitoring
cloudtaser-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cloudtaser-alerts
  namespace: cloudtaser-system
spec:
  groups:
    - name: cloudtaser
      rules:
        - alert: CloudTaserOperatorDown
          expr: |
            kube_deployment_status_replicas_available{
              deployment="cloudtaser-operator",
              namespace="cloudtaser-system"
            } == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: CloudTaser operator has no available replicas

        - alert: CloudTaserEbpfAgentMissing
          expr: |
            kube_daemonset_status_number_ready{
              daemonset="cloudtaser-ebpf",
              namespace="cloudtaser-system"
            }
            <
            kube_daemonset_status_desired_number_scheduled{
              daemonset="cloudtaser-ebpf",
              namespace="cloudtaser-system"
            }
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: eBPF agent not running on all nodes

        - alert: CloudTaserWebhookErrors
          expr: |
            rate(cloudtaser_webhook_injection_total{status="error"}[5m]) > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: CloudTaser webhook injection errors detected

        - alert: CloudTaserLowProtectionScore
          expr: |
            cloudtaser_workload_protection_score < 50
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: Workload protection score below threshold

Grafana Dashboard

Key panels for a CloudTaser monitoring dashboard:

  1. Operator health -- Replica count, restart count, reconciliation rate
  2. Injection rate -- Successful vs. failed injections over time
  3. eBPF coverage -- Nodes with healthy agent / total nodes
  4. Protection scores -- Per-workload protection score heatmap
  5. Vault latency -- P50/P95/P99 secret fetch latency from wrapper metrics
  6. S3 proxy throughput -- Encrypted objects per second, encryption latency

TLS Certificate Management

Webhook TLS

The operator generates a self-signed CA and server certificate at startup. The CA bundle is injected into the MutatingWebhookConfiguration automatically. Certificates are stored in an emptyDir volume and regenerated on pod restart.

For production, provide your own certificates via a Kubernetes Secret:

values.yaml
operator:
  webhook:
    certSecret: cloudtaser-webhook-certs

The secret must contain tls.crt and tls.key:

kubectl create secret tls cloudtaser-webhook-certs \
  --cert=webhook.crt \
  --key=webhook.key \
  --namespace cloudtaser-system

cert-manager integration

If you use cert-manager, create a Certificate resource that targets the webhook service:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cloudtaser-webhook
  namespace: cloudtaser-system
spec:
  secretName: cloudtaser-webhook-certs
  dnsNames:
    - cloudtaser-operator.cloudtaser-system.svc
    - cloudtaser-operator.cloudtaser-system.svc.cluster.local
  issuerRef:
    name: cluster-issuer
    kind: ClusterIssuer

Vault TLS

The wrapper validates the vault server certificate on every connection. If your vault uses a private CA, mount the CA bundle into workload pods:

deployment-with-vault-ca.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    metadata:
      annotations:
        cloudtaser.io/inject: "true"
        cloudtaser.io/vault-address: "https://vault.eu.example.com"
        cloudtaser.io/vault-role: "cloudtaser"
        cloudtaser.io/secret-paths: "secret/data/myapp/config"
        cloudtaser.io/env-map: "password=PGPASSWORD"
    spec:
      containers:
        - name: myapp
          image: myapp:latest
          volumeMounts:
            - name: vault-ca
              mountPath: /etc/ssl/certs/vault-ca.crt
              subPath: ca.crt
      volumes:
        - name: vault-ca
          configMap:
            name: vault-ca-bundle

Create the ConfigMap containing the CA certificate:

kubectl create configmap vault-ca-bundle \
  --from-file=ca.crt=vault-ca.pem \
  --namespace default

Production Checklist

Use this checklist before going live with CloudTaser in production.

Pre-production checklist

Infrastructure

  • [ ] Vault hosted in EU region with TLS enabled
  • [ ] Vault HA configured (3+ nodes with Raft or external storage)
  • [ ] Network connectivity verified between cluster and vault
  • [ ] Kubernetes cluster running 1.28+ with kernel 5.14+

Operator

  • [ ] HA mode enabled (operator.ha: true, replicaCount: 3)
  • [ ] Leader election enabled (operator.leaderElect: true)
  • [ ] Resource limits configured appropriately for workload volume
  • [ ] Webhook failurePolicy: Fail (default; do not change to Ignore in production)
  • [ ] Webhook TLS certificates managed (self-signed or cert-manager)

eBPF Agent

  • [ ] DaemonSet running on all nodes (DESIRED == READY)
  • [ ] enforceMode: true (not audit-only)
  • [ ] reactiveKill: true for high-security workloads
  • [ ] All nodes running kernel 5.14+ for full feature set

Security

  • [ ] NetworkPolicies applied (auto or manual)
  • [ ] RBAC hardened -- pod read access restricted to operators only
  • [ ] Protected workloads in dedicated namespaces
  • [ ] Vault audit logging enabled

Monitoring

  • [ ] Prometheus scraping operator metrics (port 8080)
  • [ ] eBPF agent health endpoints monitored (port 9090)
  • [ ] Alerting rules configured for operator down, eBPF missing, webhook errors
  • [ ] Protection score monitoring active

Validation

  • [ ] cloudtaser validate passes all checks
  • [ ] cloudtaser audit shows expected coverage
  • [ ] Test secret injection with a sample workload before rolling out to production services

Next Steps