Skip to main content

Planned failover

planned failover infographic

Planned failover is the admin-triggered graceful switchover path. It is the operational-hygiene counterpart to emergency failover: use it for maintenance windows, rolling kernel upgrades on the active site's nodes, or any time you want to move the primary role to a specific replica without relying on the operator's automatic detection.

TL;DR

kubectl annotate mysqlfailovergroup orders \
bloodraven.shipstream.io/planned-failover=pdx

The operator fences the current primary, waits for pdx to catch up on the fenced source's GTID set, promotes it, flips DNS, and clears the fence. When spec.dragonfly.enabled=true, the same request also waits for the target Dragonfly replica to catch up and promotes it before MySQL promotion. Status lands on .status.plannedFailover.phase: Succeeded with transactionsLost: 0.

Why not kubectl exec?

Manual promotion via three kubectl exec ... SET GLOBAL ... commands (see Operations) works but has four real problems:

  1. No atomicity — a dropped connection or typo mid-sequence leaves the cluster fenced-on-both-sides or promoting a replica that has not drained its relay logs.
  2. No lag gate — nothing mechanical checks that the target is caught up before you flip read_only=0.
  3. No audit trailbloodraven_failovers_total does not increment, no Event fires, no CR status reflects what you did.
  4. Bypasses the anti-flap cooldown — a switchover immediately before a real emergency exposes the cluster to a window where the operator refuses to take automatic action.

The annotation API closes all four.

Lifecycle

Pending ──► Validating ──► Draining ──► WaitingForLag ──► WaitingForDragonflySync ──► PromotingDragonfly ──► Promoting ──► Resuming ──► Succeeded
│ │ │ │ │ │ │ │
│ └──► Deferred (cooldown active, onCooldown=defer) ──► re-enters Validating

└─ reject ────┴─ rollback ────┴─ rollback ────┴─ rollback/proceed ───────┴─ rollback/proceed ───┴─ fail ──────┴──► Failed

WaitingForDragonflySync and PromotingDragonfly are skipped when Dragonfly is disabled.

PhaseWhat the operator is doing
PendingAnnotation observed; status block initialised.
ValidatingChecking target is primary-candidate, replicating, read-only, no concurrent restore/update, cooldown not active.
Deferredspec.plannedFailover.onCooldown: defer is set and cooldown is still active. Annotation is retained; reconciler re-validates at status.plannedFailover.retryAfter.
DrainingTwo steps, one per reconcile: (1) SET GLOBAL super_read_only=ON on source + record sourceGtidAtFence, strip role=primary pod label so the -primary Service sheds endpoints; (2) KillAppConnections in a loop until idle or drainTimeout elapses.
WaitingForLagPolling the target's GTID_EXECUTED until it contains the source's fenced GTID set.
WaitingForDragonflySyncWhen Dragonfly is enabled, captures the source Dragonfly replication offset and polls the target until it catches up or spec.dragonfly.plannedFailover.maxSyncWait expires.
PromotingDragonflyWhen Dragonfly is enabled, strips the source Dragonfly traffic label, promotes the target with REPLTAKEOVER, updates role/traffic labels, and best-effort kills old-master clients so they reconnect through the active Service.
PromotingRunning the same 8-step sequence as emergency failover: kill app connections, relay-log drain, STOP REPLICA, RESET REPLICA ALL, capture promotion GTID, clear super_read_only, SET GLOBAL read_only=0, DNS flip.
ResumingPersisting status.activeSite, status.lastFailover, status.promotionGtidExecuted; releasing the topology-manager guard.
SucceededTerminal success. Status block retained until the next annotation replaces it.
FailedTerminal failure. See rollback behaviour below.

One phase transition per reconcile means operator restarts always land on a well-defined observable state — if the operator crashes during WaitingForLag, the next reconcile resumes the wait from the persisted sourceGtidAtFence.

Spec-level defaults

Cluster-wide knobs live on the CR so common overrides don't have to be spelled into every kubectl annotate:

spec:
plannedFailover:
maxLagWait: 5m # max time to wait for the target to catch up before rolling back
drainTimeout: 30s # upper bound on KillAppConnections polling after super_read_only=ON
onCooldown: reject # one of: "reject" (default) or "defer"
dragonfly:
plannedFailover:
maxSyncWait: 30s # Dragonfly target sync / REPLTAKEOVER budget
onSyncTimeout: proceed # one of: "proceed" (default) or "fail"

All fields are optional; omitting plannedFailover is equivalent to {maxLagWait: 5m, drainTimeout: 30s, onCooldown: reject}. Omitting dragonfly.plannedFailover is equivalent to {maxSyncWait: 30s, onSyncTimeout: proceed} when Dragonfly is enabled.

  • maxLagWait — how long WaitingForLag polls the target before rolling back. Set shorter than the default when you know the cluster is caught up and want to fail fast.
  • drainTimeout — upper bound on the Draining phase's KillAppConnections loop. After super_read_only=ON, the reconciler kills stragglers every second until the count reaches zero or the budget expires; advancing to WaitingForLag happens either way so a stuck client cannot block the switchover.
  • onCooldown — what happens when Validating rejects the request because the anti-flap cooldown is active:
    • reject (default): stamp Failed{CooldownActive} and clear the annotation. The admin must re-annotate after the cooldown expires.
    • defer: stamp Deferred, keep the annotation in place, and re-try validation automatically at status.plannedFailover.retryAfter. Useful when you want to queue a switchover immediately after an emergency event without manually waiting out the cooldown.
  • dragonfly.plannedFailover.maxSyncWait — how long WaitingForDragonflySync waits for the target Dragonfly replica to reach the source offset. The same duration is used as the REPLTAKEOVER timeout.
  • dragonfly.plannedFailover.onSyncTimeout — what happens when Dragonfly sync or promotion cannot prove session preservation:
    • proceed (default): continue MySQL promotion and stamp status.plannedFailover.dragonfly.sessionsPreserved=false.
    • fail: roll back before MySQL promotion, release the MySQL fence, and leave status.activeSite unchanged.

Per-request override

The annotation value may carry a maxLagWait override, mirroring the reclone-annotation key=value grammar:

kubectl annotate mysqlfailovergroup orders \
bloodraven.shipstream.io/planned-failover=pdx:maxLagWait=30s

Useful when you know the target is caught up and want to fail fast rather than wait five minutes for the default timeout.

Status

Inspect progress:

kubectl get mysqlfailovergroup orders \
-o jsonpath='{.status.plannedFailover}{"\n"}'

Example Succeeded block:

status:
plannedFailover:
phase: Succeeded
target: pdx
sourcePrimary: iad
sourceGtidAtFence: "abc-...:1-9182731"
targetGtidAtPromotion: "abc-...:1-9182731"
startTime: "2026-04-20T14:32:00Z"
completionTime: "2026-04-20T14:32:47Z"
durationSeconds: 47
transactionsLost: 0
dragonfly:
enabled: true
sessionsPreserved: true
promotionMethod: REPLTAKEOVER
syncWaitSeconds: 3
message: "promoted pdx, 0 transactions lost"

Rollback and failure modes

Every path out of Draining, WaitingForLag, or Promoting either completes a promotion or restores the old primary to writable. Below, rollback means the operator unfenced the source and left the cluster unchanged.

FailureObservableRollback?
Target is unknown / dr-only / not replicatingphase: Failed, reason: TargetUnhealthy or UnknownSite, event PlannedFailoverRejectedno fence applied
Anti-flap cooldown active + onCooldown: rejectphase: Failed, reason: CooldownActive, message includes retry after ...no fence applied
Anti-flap cooldown active + onCooldown: deferphase: Deferred, reason: CooldownActive, retryAfter populated; annotation retainedno fence applied; reconciler retries automatically
Admin removes the annotation while Deferredphase: Failed, reason: Cancelledno fence applied
Concurrent in-place restore or ordered updatephase: Failed, reason: ConcurrentOperationno fence applied
Zero-lag wait times outphase: Failed, reason: LagTimeout, source unfenced, role=primary label restored, no DNS flipyes
drainTimeout elapses with clients still connectedDraining advances to WaitingForLag anyway with a message noting remaining connectionsN/A — source is already fenced, clients get ER_OPTION_PREVENTS_STATEMENT on their next write
Source crashes during Drainingphase: Failed, reason: SourceCrashedhand-off to emergency failover path
Dragonfly target fails to catch up and onSyncTimeout=proceedphase advances through PromotingDragonfly, status.plannedFailover.dragonfly.sessionsPreserved=falseno MySQL rollback; MySQL promotion continues
Dragonfly target fails to catch up and onSyncTimeout=failphase: Failed, reason: DragonflySyncTimeout, source unfenced, no DNS flipyes
Dragonfly REPLTAKEOVER fails and onSyncTimeout=proceedstatus.plannedFailover.dragonfly.sessionsPreserved=false, reason: DragonflyPromotionFailed; MySQL promotion continuesno MySQL rollback
Dragonfly REPLTAKEOVER fails and onSyncTimeout=failphase: Failed, reason: DragonflyPromotionFailed, source unfenced, no DNS flipyes
FailoverController.Execute fails mid-promotionphase: Failed, reason: ExecuteFailedno — same failure mode as emergency; manual recovery required
DNS flip failsphase: Succeeded with a warning eventN/A — the Service label swap carries writes to the new primary while external DNS catches up

The topology-manager's automatic cross-site evaluation is paused while a planned failover is in flight (via the plannedFailoverActive guard). When the planned path stamps Succeeded or Failed, the guard is released and the automatic path resumes.

Anti-flap cooldown

The planned path writes status.lastFailover on success, exactly like the emergency path. This means:

  • A planned failover within spec.failoverCooldown of any prior failover is rejected at Validating.
  • An emergency failover within spec.failoverCooldown of a planned failover is blocked by the same cooldown check in the topology manager.

The default cooldown is 5 minutes; production deployments should keep this, not lower it.

Observability

Metrics

MetricTypeLabels
bloodraven_planned_failovers_totalcountertarget_site, result (success, rejected, failed_timeout, failed_other)
bloodraven_planned_failover_duration_secondshistogramtarget_site
bloodraven_planned_failover_lag_wait_secondshistogramtarget_site
bloodraven_dragonfly_promotions_totalcountergroup, target_site, result (success, failed, skipped)

bloodraven_failovers_total (automatic only) and bloodraven_dns_flips_total retain their existing semantics. Dashboards keyed on bloodraven_failovers_total see no change.

Events

PlannedFailoverStarted, PlannedFailoverSkipped (already-active target), PlannedFailoverRejected, PlannedFailoverDeferred (cooldown + defer policy), PlannedFailoverDraining, PlannedFailoverLagOK, PlannedFailoverCompleted, PlannedFailoverFailed, DragonflyPromotionStarted, DragonflyPromotionCompleted, DragonflyPromotionFailed, and DragonflySyncTimeout.

RPO

Planned failover is the one Bloodraven operation with an RPO of zero by construction. The zero-lag gate at WaitingForLag guarantees the target has replicated every committed transaction on the fenced source before promotion. Any scenario that would produce loss (target falls behind, source crashes mid-drain, catch-up times out) routes to the Failed rollback path instead of promoting a lagging replica. See Durability and RPO.