Placement Contract

Bloodraven uses node labels and taints to control where MySQL pods run and to evict application workloads during failover. This page describes the labeling contract cluster administrators must satisfy.

Why node labeling matters

When the operator performs a failover, it taints nodes at the old active site with NoExecute to force Kubernetes to evict application pods that do not tolerate the taint. Taints are scoped per failover group, so a failover in one group does not disrupt pods belonging to another group.

Required node labels

Each site declares the exact nodes it controls with spec.sites[].taintNodeSelector:

spec:
  sites:
    - name: iad
      zone: us-east-1a
      taintNodeSelector:
        shipstream.io/failover-group.orders: "true"
        shipstream.io/site.orders: iad

The selector is required. Bloodraven does not infer taint targets from the failover group name or site name.

Apply matching labels to every node that should receive the site's taint:

kubectl label node node-iad-1 \
  shipstream.io/failover-group.orders=true \
  shipstream.io/site.orders=iad

kubectl label node node-pdx-1 \
  shipstream.io/failover-group.orders=true \
  shipstream.io/site.orders=pdx

You can label multiple nodes per site. The operator taints or untaints every node matching the selector for that site.

Taint behavior

When a site loses its primary role, the operator applies a per-group taint to all nodes selected by that site's taintNodeSelector:

shipstream.io/db-readonly-<group>=true:NoExecute

For example, for a failover group named orders:

shipstream.io/db-readonly-orders=true:NoExecute

When a site becomes the active primary, the operator removes this taint from its selected nodes.

Effect on pods

The NoExecute effect means:

Pods that do not tolerate the taint are immediately evicted
Pods that do tolerate the taint continue running

This gives you two categories of workloads:

Workload type	Toleration	Behavior on failover
Write-dependent apps	No toleration for `shipstream.io/db-readonly-<group>:NoExecute`	Evicted from the old site, rescheduled to the new active site
Read-only / stateless apps	Tolerates `shipstream.io/db-readonly-<group>:NoExecute`	Continues running at both sites

Shared-node support

Because taints and selectors are scoped per failover group, multiple groups can share the same physical nodes. Put one label pair per failover group on each shared node:

kubectl label node node-iad-1 \
  shipstream.io/failover-group.orders=true \
  shipstream.io/site.orders=iad \
  shipstream.io/failover-group.inventory=true \
  shipstream.io/site.inventory=iad

Then configure each group with its own selector:

# orders MysqlFailoverGroup
spec:
  sites:
    - name: iad
      taintNodeSelector:
        shipstream.io/failover-group.orders: "true"
        shipstream.io/site.orders: iad

# inventory MysqlFailoverGroup
spec:
  sites:
    - name: iad
      taintNodeSelector:
        shipstream.io/failover-group.inventory: "true"
        shipstream.io/site.inventory: iad

A failover in orders applies shipstream.io/db-readonly-orders=true:NoExecute only. Pods for inventory are unaffected if they tolerate other groups' taints.

Cross-group tolerations

On shared nodes, application pods must tolerate taint keys from other failover groups but not their own. This ensures they are only evicted by their own group's failover:

# Pod for the "orders" group on nodes shared with "inventory"
spec:
  tolerations:
    - key: shipstream.io/db-readonly-inventory
      operator: Exists
      effect: NoExecute
    # Do not tolerate shipstream.io/db-readonly-orders.

A mutating admission webhook or Helm template can automate generating these tolerations across groups.

Scheduling MySQL pods

The operator schedules MySQL pods using the site's zone field via topology.kubernetes.io/zone. The taintNodeSelector controls application-node tainting; it is not used as the MySQL pod scheduler selector.

Why node labeling matters​

Required node labels​

Taint behavior​

Effect on pods​

Shared-node support​

Cross-group tolerations​

Scheduling MySQL pods​