Kubernetes Operators: Extending the Platform

April 2, 2018

Kubernetes manages containerized applications well, but complex stateful applications need more than container orchestration. Databases require backups, schema migrations, and replica management. Message queues need partition rebalancing. Monitoring systems need configuration across clusters.

Operators extend Kubernetes to manage these complex applications automatically. They encode human operational knowledge into software.

What Operators Are

The Concept

An operator is a controller that:

  1. Watches for custom resources (your application-specific objects)
  2. Compares desired state to actual state
  3. Takes actions to reconcile differences
User → Custom Resource → Operator → Kubernetes Resources
        (desired state)             (actual state)

Custom Resources

Custom Resource Definitions (CRDs) extend the Kubernetes API:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  names:
    kind: Database
    plural: databases
    singular: database
  scope: Namespaced
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              engine:
                type: string
                enum: ["postgres", "mysql"]
              version:
                type: string
              replicas:
                type: integer

Users create instances:

apiVersion: example.com/v1
kind: Database
metadata:
  name: orders-db
spec:
  engine: postgres
  version: "14"
  replicas: 3

Control Loop

Operators implement the reconciliation loop:

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the Database custom resource
    var db examplev1.Database
    if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Check current state
    currentReplicas := r.getCurrentReplicaCount(ctx, &db)

    // 3. Reconcile to desired state
    if currentReplicas < db.Spec.Replicas {
        r.scaleUp(ctx, &db)
    } else if currentReplicas > db.Spec.Replicas {
        r.scaleDown(ctx, &db)
    }

    // 4. Update status
    db.Status.ReadyReplicas = r.getReadyReplicas(ctx, &db)
    r.Status().Update(ctx, &db)

    return ctrl.Result{RequeueAfter: time.Minute}, nil
}

The loop runs continuously, ensuring actual state matches desired state.

Why Operators Matter

Encoded Operational Knowledge

Consider PostgreSQL:

A PostgreSQL operator encodes this knowledge:

apiVersion: postgres.example.com/v1
kind: PostgresCluster
metadata:
  name: my-cluster
spec:
  version: "14"
  instances: 3
  backup:
    schedule: "0 * * * *"  # Hourly
    retention: 7d

The operator handles replication, failover, backups, and upgrades automatically.

Day 2 Operations

Day 1 (initial deployment) is often easy. Day 2 (ongoing operations) is hard:

Operators automate Day 2 operations.

Self-Healing

Operators continuously reconcile:

Database replica crashes →
Operator detects missing replica →
Operator creates replacement →
Operator configures replication →
Cluster healthy again

Recovery happens automatically without human intervention.

Building Operators

Operator SDK

The Operator SDK simplifies operator development:

# Create new operator project
operator-sdk init --domain=example.com --repo=github.com/example/db-operator

# Create API and controller
operator-sdk create api --group=database --version=v1 --kind=Database

This scaffolds:

Kubebuilder

Kubebuilder provides similar scaffolding with a focus on Kubernetes SIG standards:

kubebuilder init --domain=example.com
kubebuilder create api --group=database --version=v1 --kind=Database

Both generate Go-based operators. For other languages, consider Kopf (Python) or Java Operator SDK.

Implementation Pattern

A typical operator:

// types.go - Define the custom resource
type DatabaseSpec struct {
    Engine   string `json:"engine"`
    Version  string `json:"version"`
    Replicas int32  `json:"replicas"`
}

type DatabaseStatus struct {
    Phase         string `json:"phase"`
    ReadyReplicas int32  `json:"readyReplicas"`
    Conditions    []metav1.Condition `json:"conditions,omitempty"`
}

// controller.go - Implement reconciliation
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("database", req.NamespacedName)

    // Fetch the resource
    var db databasev1.Database
    if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Create or update StatefulSet
    if err := r.reconcileStatefulSet(ctx, &db); err != nil {
        return ctrl.Result{}, err
    }

    // Create or update Service
    if err := r.reconcileService(ctx, &db); err != nil {
        return ctrl.Result{}, err
    }

    // Update status
    return r.updateStatus(ctx, &db)
}

Best Practices

Idempotency: Reconciliation must be safe to run multiple times.

Owned Resources: Set owner references so dependent resources are garbage collected:

ctrl.SetControllerReference(&db, &statefulSet, r.Scheme)

Status Conditions: Use standard condition patterns for status:

meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
    Type:    "Ready",
    Status:  metav1.ConditionTrue,
    Reason:  "AllReplicasReady",
    Message: "All replicas are running and ready",
})

Finalizers: Clean up external resources before deletion:

const finalizerName = "database.example.com/finalizer"

if !db.DeletionTimestamp.IsZero() {
    if containsString(db.Finalizers, finalizerName) {
        // Cleanup external resources
        r.cleanupExternalResources(&db)
        // Remove finalizer
        db.Finalizers = removeString(db.Finalizers, finalizerName)
        r.Update(ctx, &db)
    }
    return ctrl.Result{}, nil
}

Operator Maturity Levels

The OperatorHub defines maturity levels:

Level 1: Basic Install

Level 2: Seamless Upgrades

Level 3: Full Lifecycle

Level 4: Deep Insights

Level 5: Auto Pilot

Most operators start at Level 1-2. Higher levels require significant investment.

When to Build vs. Use

Use Existing Operators

Many mature operators exist:

Prefer mature operators unless you have specific requirements.

Build When

Buy When

Operational Considerations

Operator Deployment

Deploy operators with care:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: database-operator
spec:
  replicas: 1  # Usually 1 (leader election if more)
  selector:
    matchLabels:
      app: database-operator
  template:
    spec:
      serviceAccountName: database-operator
      containers:
      - name: operator
        image: example/database-operator:v1.0.0
        resources:
          limits:
            memory: 256Mi
            cpu: 500m

RBAC

Operators need appropriate permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: database-operator
rules:
- apiGroups: ["database.example.com"]
  resources: ["databases"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
  resources: ["statefulsets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Follow least privilege—only grant what’s needed.

Monitoring

Monitor operators themselves:

The Operator SDK includes Prometheus metrics by default.

Testing

Test operators thoroughly:

Consider envtest for controller testing:

testEnv = &envtest.Environment{
    CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
}

cfg, err = testEnv.Start()

Key Takeaways

Operators represent a powerful pattern for extending Kubernetes. They’re especially valuable for stateful applications that need more than basic container management.