Complete reference for Bloodraven custom resources (shipstream.io/v1alpha1).
CRDs at a glance
| Kind | Short name | Scope | Description |
|---|
MysqlFailoverGroup | mfg | Namespaced | Core resource. Declares a multi-site MySQL HA cluster with automatic failover. |
MysqlBackup | mb | Namespaced | One backup run. Created ad-hoc or by a schedule CronJob. Tracks phase, location, GTID, and binlog coordinates. |
MysqlBackupVerification | mbv | Namespaced | Periodic restore-of-latest-backup into a throwaway mysqld to prove a backup loads cleanly. See Backup verification. |
MysqlStandbyCluster | msc | Namespaced | Phase 1 of WISHLIST #7. Passive DR relationship descriptor: lives on the DR cluster, continuously monitors the source archive bucket, and stamps readiness conditions. See Multi-cluster DR. |
MysqlFailoverGroup
Complete reference for the MysqlFailoverGroup custom resource (shipstream.io/v1alpha1).
Common fields quick reference
:::danger Destructive and high-risk fields
Review runbooks before changing spec.restoreInPlace, restore-related confirm tokens, reclone controls, and any field that changes spec.sites, spec.credentials, spec.tls, or MySQL images. Restore and reclone operations can replace data; image and credential changes can trigger restarts or failed reconnects.
:::
Spec
Top-level fields
| Field | Type | Required | Default | Description |
|---|
image | string | No | mysql:9.6 | MySQL container image |
sidecarImage | string | No | ghcr.io/shipstream/bloodraven-sidecar:0.1.6 | Bloodraven sidecar container image |
sites | []SiteSpec | Yes | -- | Site definitions. At least two sites with role: primary-candidate; additional dr-only sites may be appended. MinItems=2, MaxItems=16. |
secretName | string | No | -- | (Legacy) Name of the Secret containing MySQL credentials (key: dsn). Mutually exclusive with credentials. |
credentials | CredentialsSpec | No | -- | Per-role MySQL credential management. The operator creates MySQL users with role-appropriate privileges and rotates passwords when secrets change. Mutually exclusive with secretName. |
dns | DNSSpec | Yes | -- | DNS configuration for traffic steering via external-dns |
tls | TLSSpec | No | -- | TLS configuration (cert-manager integration) |
pollInterval | duration | No | 2s | How often the operator polls each MySQL instance |
failureThreshold | int | No | 3 | Consecutive failed polls before a site is marked unreachable |
recoveryThreshold | int | No | 2 | Consecutive successful polls before a site is marked writable |
failoverCooldown | duration | No | 5m | Minimum time between automatic failovers |
splitBrainPolicy | SplitBrainPolicySpec | No | -- | Opt-in automated resolution when multiple sites are writable and the operator has no prior failover history. See SplitBrainPolicySpec. |
sidecar | SidecarSpec | No | -- | Sidecar-container knobs (lease timeout, peer check interval, Bloodraven aux endpoint). |
sidecarResources | ResourceRequirements | No | -- | Compute resources (requests/limits) for the sidecar container |
terminationGracePeriodSeconds | int | No | 60 | Grace period for MySQL container shutdown. |
mysqlConf | map[string]string | No | -- | MySQL configuration overrides merged into the generated my.cnf. Changes trigger an ordered rolling restart when updateStrategy: OrderedUpdate is set. |
replication | ReplicationSpec | No | -- | Replication health settings |
updateStrategy | string | No | OrderedUpdate | Update strategy (OrderedUpdate or Recreate). |
cloneTimeout | int | No | 3600 | Timeout in seconds for initial data clone operations |
podLabels | map[string]string | No | -- | Additional labels applied to every MySQL pod. Operator labels take precedence on conflict. |
podAnnotations | map[string]string | No | -- | Additional annotations applied to every MySQL pod. Operator annotations take precedence on conflict. |
serviceTemplate | ServiceTemplate | No | -- | Customizes the Services created by the operator |
extraContainers | []Container | No | -- | Additional containers injected into every MySQL pod (e.g. exporters) |
extraInitContainers | []Container | No | -- | Additional init containers injected after the operator's built-in init container |
backup | BackupSpec | No | -- | Backup configuration (profiles, schedules, retry, PITR, encryption, security contexts). See BackupSpec. |
initFromBackup | InitFromBackupSpec | No | -- | One-shot restore-on-first-boot before normal bootstrap completes. See InitFromBackupSpec. |
restoreInPlace | RestoreInPlaceSpec | No | -- | Re-triggerable destructive restore against the active primary. Bumping confirm to a newer RFC 3339 timestamp re-arms. See RestoreInPlaceSpec. |
plannedFailover | PlannedFailoverSpec | No | -- | Cluster-wide defaults for the graceful planned-failover API (triggered via bloodraven.shipstream.io/planned-failover=<site>). |
dragonfly | DragonflySpec | No | -- | Optional per-site Dragonfly cache/session co-management. When enabled, the operator creates one Dragonfly StatefulSet per site, an active Redis-compatible Service, replication wiring, promotion status, and optional snapshot-restore upgrade support. See DragonflySpec. |
SiteSpec
Each entry in spec.sites defines one site in the failover group.
| Field | Type | Required | Default | Description |
|---|
name | string | Yes | -- | Unique site identifier (e.g. iad, pdx). Used in Service names and labels. MaxLength=253. |
role | string | No | primary-candidate | Promotion eligibility: primary-candidate (can be auto-promoted on failover) or dr-only (passive replica; never auto-promoted). At least two primary-candidate sites are required. |
zone | string | Yes | -- | Availability zone or region for pod scheduling |
taintNodeSelector | map[string]string | Yes | -- | Label selector used to find nodes that receive this group's db-readonly taint for this site. Use group-scoped labels such as shipstream.io/failover-group.orders=true and shipstream.io/site.orders=iad for shared nodes. |
lbIP | string | Yes | -- | Load balancer IP address. Used for DNS A-record steering on failover. |
storage | StorageSpec | Yes | -- | Persistent storage configuration for this site |
resources | ResourceRequirements | No | -- | Compute resources (requests/limits) for the MySQL container at this site |
StorageSpec
| Field | Type | Required | Description |
|---|
storageClassName | string | Yes | Kubernetes StorageClass name |
size | string | Yes | Storage request size (e.g. 100Gi) |
DNSSpec
| Field | Type | Required | Description |
|---|
hostname | string | Yes | The DNS hostname managed by Bloodraven (e.g. orders.az.example.com). On failover, the operator creates/updates a DNSEndpoint CR with an A record pointing this hostname to the active site's lbIP. Apps should CNAME their DNS to this hostname. |
ttl | int | No | TTL in seconds for the DNS A record (default: 60) |
TLSSpec
| Field | Type | Required | Description |
|---|
issuerRef | IssuerRef | Yes | cert-manager issuer reference |
secretName | string | Yes | Name of the TLS Secret to create |
IssuerRef
| Field | Type | Required | Description |
|---|
name | string | Yes | Issuer or ClusterIssuer name |
kind | string | Yes | Issuer or ClusterIssuer |
ReplicationSpec
| Field | Type | Required | Description |
|---|
maxLagSeconds | int | No | Maximum acceptable replication lag in seconds before the replica is considered unhealthy |
SplitBrainPolicySpec
Opt-in configuration for how the operator should behave when more than
one site is simultaneously writable and the operator cannot infer a
winner from prior failover history. See
Split-brain resolution for a
full description of semantics and tradeoffs.
| Field | Type | Required | Description |
|---|
sitePriorities | []string | No | Ordered list of primary-candidate site names. The operator walks the list and promotes the first entry that is currently writable; every other writable site is fenced. Empty or unset falls back to manual resolution (alert only). MaxItems=16. Entries must reference sites with role primary-candidate. |
ServiceTemplate
| Field | Type | Required | Description |
|---|
type | string | No | Kubernetes Service type: ClusterIP (default), LoadBalancer, or NodePort |
annotations | map[string]string | No | Additional annotations applied to every Service managed by the operator |
CredentialsSpec
Exactly one of secretName or credentials must be set. When credentials is used, the operator manages MySQL users on the primary — creating them during first boot via an init script and updating passwords via ALTER USER when the referenced Secrets change.
| Field | Type | Required | Description |
|---|
operatorSecret | string | Yes | Secret for operator and sidecar connections. Required keys: username, password, MYSQL_ROOT_PASSWORD. Optional: MYSQL_REPLICATION_USER, MYSQL_REPLICATION_PASSWORD (default to operator credentials). Grants: ALL PRIVILEGES WITH GRANT OPTION. |
appSecret | string | No | Secret for application read-write connections. Keys: username, password. Grants: ALL PRIVILEGES (no GRANT OPTION, no SUPER). |
readOnlySecret | string | No | Secret for application read-only connections. Keys: username, password. Grants: SELECT, SHOW VIEW, SHOW DATABASES, PROCESS. |
monitorSecret | string | No | Secret for Prometheus exporter connections. Keys: username, password. Grants: PROCESS, REPLICATION CLIENT, SELECT on performance_schema. |
backupSecret | string | No | Secret for backup/restore connections. Keys: username, password. Grants: SELECT, LOCK TABLES, SHOW VIEW, EVENT, TRIGGER, RELOAD, BACKUP_ADMIN, REPLICATION CLIENT. |
BackupSpec
Top-level backup configuration, embedded as spec.backup. Omitting
the field disables backups for this failover group.
| Field | Type | Required | Default | Description |
|---|
image | string | No | container-registry.oracle.com/mysql/community-server:9.6 | Image containing mysqlsh used for dump / load. |
imagePullSecrets | []LocalObjectReference | No | -- | Pull secrets for the backup image. |
profiles | []BackupProfile | No | -- | Reusable named backup configurations referenced by schedules and MysqlBackup CRs. |
schedules | []BackupSchedule | No | -- | Cron-driven recurring backups; each references a profile by name. Each entry becomes a Kubernetes CronJob owned by this failover group. |
maxLagSecondsForSource | int | No | 300 | Threshold above which the backup reconciler falls back from replica-first source selection to the primary. |
resources | ResourceRequirements | No | -- | Resources for the backup Job's mysqlsh container. |
activeDeadlineSeconds | int | No | 7200 | Wall-clock cap on a single backup Job. |
backoffLimit | int | No | 2 | Job-level backoffLimit applied to backup Jobs. |
retry | BackupRetrySpec | No | -- | Operator-level retries for Failed scheduled MysqlBackup CRs. Independent of Job backoffLimit. |
pitr | PITRSpec | No | -- | Continuous binlog archival for point-in-time recovery. |
podSecurityContext | PodSecurityContext | No | -- | Overrides default hardened pod security context merged onto the defaults. |
containerSecurityContext | SecurityContext | No | -- | Overrides default hardened container security context. |
stagingVolumeSizeLimit | Quantity | No | -- | Cap on the plaintext-staging emptyDir used by backup / restore / verify Jobs. Unset falls through to the node's ephemeral-storage limit; set on shared clusters to keep a large dump from triggering DiskPressure (AUDIT H6). |
BackupProfile (spec.backup.profiles[])
| Field | Type | Required | Default | Description |
|---|
name | string | Yes | -- | Profile identifier, [a-z0-9]([-a-z0-9]*[a-z0-9])?. |
storage | BackupStorage | Yes | -- | Tagged union: type: S3 + s3 block, or type: PVC + pvc block. |
dump | DumpOptions | No | -- | util.dumpInstance() tuning (threads, chunk size, compression, schema filters, consistent locking, OCIMDS checks). |
retention | int | No | 7 | Legacy shorthand: max number of successful MysqlBackup CRs to keep. Ignored when retentionPolicy is set. |
retentionPolicy | RetentionPolicy | No | -- | Structured retention: count, maxAgeDays, minKeep floor, maxFailedKeep cap. Replaces the retention shorthand. |
verification | VerificationSpec | No | -- | Opt-in periodic restore-of-this-profile's-latest-backup into a throwaway MySQL instance; renders a CronJob when enabled: true. |
encryption | BackupEncryptionSpec | No | -- | Enables client-side envelope encryption (AES-256-GCM + HKDF-SHA256) for every dump file and, when this profile is also used for PITR, every archived binlog. Passphrase is owned by the operator — rotating or deleting the Secret renders existing ciphertexts unrecoverable. See Backup encryption. |
BackupEncryptionSpec
| Field | Type | Required | Default | Description |
|---|
algorithm | string | No | AES-256-GCM | Encryption algorithm. Only AES-256-GCM is currently supported. |
passphraseSecret | PassphraseSecretRef | Yes | -- | Reference to a Secret in the same namespace holding the passphrase. See PassphraseSecretRef. |
PassphraseSecretRef
Reused by BackupEncryptionSpec, BackupDecryptionSpec,
spec.initFromBackup.decryption, and spec.restoreInPlace.decryption.
| Field | Type | Required | Default | Description |
|---|
name | string | Yes | -- | Name of the Secret. |
key | string | No | passphrase | Key within the Secret that holds the passphrase bytes. Leading and trailing whitespace is stripped when the passphrase is read. |
BackupDecryptionSpec
Used on spec.initFromBackup.decryption and
spec.restoreInPlace.decryption to supply the passphrase that decrypts
an encrypted backup. When the source is a same-group mysqlBackupRef
and the field is omitted, the operator falls back to the profile's own
passphraseSecret.
| Field | Type | Required | Description |
|---|
passphraseSecret | PassphraseSecretRef | Yes | Secret holding the passphrase used at backup time. Rotating to a new value does not re-wrap existing ciphertext — it will simply fail to decrypt. |
RetentionPolicy
Fields are independently optional; a successful MysqlBackup is kept iff
any enabled check says keep. minKeep is the safety floor that
prevents a retention sweep from wiping the last good backup after a
long outage of failing attempts.
| Field | Type | Required | Default | Description |
|---|
count | int | No | 0 | Max number of successful CRs to keep. 0 disables count-based pruning. |
maxAgeDays | int | No | 0 | Max age in days of a successful CR before it becomes eligible for pruning. 0 disables age-based pruning. |
minKeep | int | No | 1 | This many newest successful CRs are always kept, regardless of the other knobs. |
maxFailedKeep | int | No | 10 | Cap on the Failed bucket, independent of the success retention policy. |
PITRSpec (spec.backup.pitr)
Enables continuous binary-log archival so restores can target an
arbitrary stopDatetime on top of any retained full dump. See
Backup and restore → Point-in-time recovery
for the end-to-end story.
| Field | Type | Required | Description |
|---|
enabled | bool | No | Turns archival on. Omitting the whole block is equivalent to false. |
profileName | string | Yes when enabled | Name of a spec.backup.profiles[] entry. Binlog objects and per-site manifests live under a binlogs/ subprefix inside that profile's storage. |
maxBinlogSize | string | No | Passed to MySQL as max_binlog_size. Controls rotation cadence (and therefore the RPO gap). Default "100M". |
archivePollInterval | duration | No | Belt-and-suspenders scan cadence alongside inotify. Default "60s". |
InitFromBackupSpec (spec.initFromBackup)
One-shot restore-on-first-boot. The operator runs a restore Job against
the initial primary site before normal bootstrap is considered
complete. Once status.restore.phase == Succeeded, subsequent
reconciles skip the restore even if this field is still set.
| Field | Type | Required | Description |
|---|
source | InitFromBackupSource | Yes | Tagged reference: exactly one of mysqlBackupRef, s3, or pvc. |
loadOptions | LoadOptions | No | util.loadDump() tuning (threads, resetProgress, skipBinlog, loadIndexes, schema filters). |
pointInTime | PointInTimeSpec | No | Post-load binlog replay to a specific target timestamp. Requires the source group to have had spec.backup.pitr.enabled=true at backup time. |
decryption | BackupDecryptionSpec | No | Passphrase to decrypt the source backup and any replayed binlogs. For same-group mysqlBackupRef, the operator falls back to the profile's own passphrase when omitted. |
RestoreInPlaceSpec (spec.restoreInPlace)
Re-triggerable destructive restore against the currently-active
primary. Unlike initFromBackup (one-shot, greenfield), this field is
meant to be edited repeatedly: bumping confirm to a newer RFC 3339
timestamp re-arms another restore. See RestoreInPlaceSpec in the
source for full semantics (full-instance vs. per-schema restores).
| Field | Type | Required | Description |
|---|
confirm | string | Yes | Anti-fat-finger token. Must be an RFC 3339 timestamp strictly greater than status.restoreInPlace.confirmTokenUsed. |
source | InitFromBackupSource | Yes | Tagged reference: one of mysqlBackupRef, s3, or pvc. |
loadOptions | LoadOptions | No | util.loadDump() tuning. For per-schema restores, skipBinlog is force-set to false so the DROP+load replicates. |
pointInTime | PointInTimeSpec | No | Post-load binlog replay. |
decryption | BackupDecryptionSpec | No | Same semantics as InitFromBackupSpec.Decryption. |
PointInTimeSpec (spec.initFromBackup.pointInTime)
Requests binlog replay on top of the loaded dump. Requires the source
failover group to have had spec.backup.pitr.enabled=true at the
time of the full dump — otherwise there is no archive to replay.
| Field | Type | Required | Description |
|---|
stopDatetime | string | Yes | Replay target. Accepts RFC 3339 (2026-04-15T09:30:00Z) or MySQL's native form (2026-04-15 09:30:00). |
excludeGtids | string | No | Forwarded verbatim as mysqlbinlog --exclude-gtids=... to surgically skip known-bad transactions while replaying everything else. |
DragonflySpec (spec.dragonfly)
Optional cache/session sidekick management. Omitting this field or setting
enabled: false preserves MySQL-only behavior and removes previously
managed Dragonfly resources.
| Field | Type | Required | Default | Description |
|---|
enabled | bool | No | false | Creates and manages one Dragonfly StatefulSet per site when true. |
image | string | Yes when enabled | -- | Pinned Dragonfly image. :latest is rejected. Use Dragonfly v1.38.0+. |
port | int | No | 6379 | Redis-compatible client port. |
adminPort | int | No | 9999 | Admin/control port used by the operator. |
maxMemoryMb | int | No | -- | Renders Dragonfly --maxmemory=<value>mb when non-zero. |
proactorThreads | int | No | -- | Renders Dragonfly --proactor_threads=<value> when non-zero. |
args | []string | No | -- | Extra Dragonfly args. Operator-owned flags such as --port, --admin_port, --requirepass, --dir, S3 flags, and --break_replication_on_master_restart are filtered out. |
auth | DragonflyAuthSpec | No | -- | Secret reference for the Dragonfly password. The operator renders --requirepass and authenticates its own connections. |
resources | ResourceRequirements | No | -- | Compute resources for each Dragonfly container. |
plannedFailover | DragonflyPlannedFailoverSpec | No | {maxSyncWait: 30s, onSyncTimeout: proceed} | Planned-failover sync and timeout behavior. |
snapshot | DragonflySnapshotSpec | No | -- | Native Dragonfly snapshot directory and optional S3-compatible settings for planned snapshot-restore upgrades. |
DragonflyAuthSpec
| Field | Type | Required | Default | Description |
|---|
secretName | string | Yes | -- | Secret containing the Dragonfly password. |
passwordKey | string | No | password | Secret key containing the password. |
DragonflyPlannedFailoverSpec
| Field | Type | Required | Default | Description |
|---|
maxSyncWait | duration | No | 30s | Maximum time to wait for target Dragonfly replica catch-up. Also used as the REPLTAKEOVER timeout. |
onSyncTimeout | string | No | proceed | proceed continues MySQL promotion and records sessionsPreserved=false; fail rolls back before MySQL promotion. |
DragonflySnapshotSpec
| Field | Type | Required | Description |
|---|
dir | string | No | Passed to Dragonfly as --dir. Use s3://bucket[/prefix] for S3-compatible snapshot/restore support. |
serviceAccountName | string | No | ServiceAccount assigned to Dragonfly pods for cloud IAM access. |
credentialsSecretName | string | No | Secret projected as AWS credential environment variables for S3-compatible snapshot access. |
s3Endpoint | string | No | S3-compatible endpoint passed as --s3_endpoint. |
s3UseHTTPS | bool | No | Passed as --s3_use_https when set. |
s3SignPayload | bool | No | Passed as --s3_sign_payload when set. |
Full example
apiVersion: shipstream.io/v1alpha1
kind: MysqlFailoverGroup
metadata:
name: orders
spec:
image: mysql:9.6
sidecarImage: ghcr.io/shipstream/bloodraven-sidecar:latest
sites:
- name: iad
zone: us-east-1a
taintNodeSelector:
shipstream.io/failover-group.orders: "true"
shipstream.io/site.orders: iad
lbIP: 10.0.1.1
storage:
storageClassName: fast-ssd
size: 100Gi
resources:
requests:
cpu: "2"
memory: "16Gi"
limits:
cpu: "4"
memory: "16Gi"
- name: pdx
zone: us-west-2a
taintNodeSelector:
shipstream.io/failover-group.orders: "true"
shipstream.io/site.orders: pdx
lbIP: 10.0.2.1
storage:
storageClassName: fast-ssd
size: 100Gi
resources:
requests:
cpu: "2"
memory: "16Gi"
limits:
cpu: "4"
memory: "16Gi"
secretName: mysql-credentials
sidecarResources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
dns:
hostname: orders.az.example.com
ttl: 60
tls:
issuerRef:
name: letsencrypt
kind: ClusterIssuer
secretName: mysql-tls
pollInterval: 2s
failureThreshold: 3
recoveryThreshold: 2
failoverCooldown: 5m
mysqlConf:
innodb_buffer_pool_size: "8G"
max_connections: "500"
innodb_flush_log_at_trx_commit: "1"
sync_binlog: "1"
replication:
maxLagSeconds: 300
updateStrategy: OrderedUpdate
dragonfly:
enabled: true
image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.38.0
plannedFailover:
maxSyncWait: 30s
onSyncTimeout: proceed
cloneTimeout: 3600
podLabels:
cost-center: platform
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9104"
serviceTemplate:
type: ClusterIP
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
extraContainers:
- name: mysqld-exporter
image: prom/mysqld-exporter:latest
env:
- name: DATA_SOURCE_NAME
valueFrom:
secretKeyRef:
name: mysql-monitor-creds
key: dsn
ports:
- containerPort: 9104
Full example (credentials mode)
apiVersion: shipstream.io/v1alpha1
kind: MysqlFailoverGroup
metadata:
name: orders
spec:
image: mysql:9.6
sidecarImage: ghcr.io/shipstream/bloodraven-sidecar:latest
sites:
- name: iad
zone: us-east-1a
taintNodeSelector:
shipstream.io/failover-group.orders: "true"
shipstream.io/site.orders: iad
lbIP: 10.0.1.1
storage:
storageClassName: fast-ssd
size: 100Gi
- name: pdx
zone: us-west-2a
taintNodeSelector:
shipstream.io/failover-group.orders: "true"
shipstream.io/site.orders: pdx
lbIP: 10.0.2.1
storage:
storageClassName: fast-ssd
size: 100Gi
credentials:
operatorSecret: mysql-operator-creds
appSecret: mysql-app-creds
monitorSecret: mysql-monitor-creds
backupSecret: mysql-backup-creds
dns:
hostname: orders.az.example.com
ttl: 60
Each referenced Secret must contain username and password keys. The operator secret additionally requires MYSQL_ROOT_PASSWORD:
apiVersion: v1
kind: Secret
metadata:
name: mysql-operator-creds
stringData:
username: bloodraven
password: <operator-password>
MYSQL_ROOT_PASSWORD: <root-password>
---
apiVersion: v1
kind: Secret
metadata:
name: mysql-app-creds
stringData:
username: app
password: <app-password>
Status
The operator writes the following status fields.
Top-level status
| Field | Type | Description |
|---|
activeSite | string | Name of the site currently serving as primary |
sites | []SiteStatus | Per-site status |
conditions | []Condition | Standard Kubernetes conditions |
lastFailover | timestamp | When the last failover occurred |
lastFailoverTarget | string | Which site was promoted in the last failover |
promotionGtidExecuted | string | GTID set recorded on the candidate at the moment of the most recent promotion, before it began accepting writes. Used for data-loss accounting. |
updatePhase | string | Current phase of an ordered update; empty when not updating. |
restore | RestoreStatus | In-flight or completed initFromBackup restore (phase, JobName, targetSite, startTime, completionTime, message). |
restoreInPlace | RestoreInPlaceStatus | In-flight or completed in-place restore (phase, JobName, targetSite, scope, confirmTokenUsed, startTime, completionTime, message). Mirrors the state machine described on RestoreInPlaceSpec. |
backupSchedules | []BackupScheduleStatus | Per-schedule rollup (cronJobName, lastScheduleTime, lastSuccessfulTime, lastBackupName, lastBackupPhase, lastSuccessfulBackupName, lastRetryAttempt, nextRetryTime). |
lastBackupTime | timestamp | Completion time of the most recent successful MysqlBackup across all profiles, regardless of whether it was scheduled or on-demand. |
pitr | PITRStatus | Summary of the continuous binlog archive, populated by the operator from periodic polls of each sidecar's /archiver/status endpoint. Present when spec.backup.pitr.enabled=true. |
plannedFailover | PlannedFailoverStatus | Most-recent planned (admin-triggered) failover attempt. Retained until a newer annotation replaces it; how kubectl describe tells the story of a switchover after the fact. |
dragonfly | DragonflyStatus | Observed Dragonfly subsystem state when spec.dragonfly.enabled=true: active site, phase, per-site roles, reachability, replication health, last promotion, and snapshot-upgrade status. |
SiteStatus
| Field | Type | Description |
|---|
name | string | Site identifier |
state | string | Current state: writable, read-only, unreachable, or unknown |
lastSeen | timestamp | Last time the operator successfully polled this site |
replicating | bool | Whether this site is replicating from another site |
secondsBehindSource | int | Replication lag in seconds (only set when replicating) |
gtidExecuted | string | GTID set executed on this site |
recoveryState | string | Old-primary recovery state: empty (no recovery needed), RecoveryInProgress (the operator is reconfiguring or waiting for the site to stabilize as a replica), or RecoveryBlocked (divergent transactions detected — trigger a reclone to recover) |
divergentGtid | string | GTID set of transactions on this site that diverge from the current primary. Populated when recoveryState is RecoveryBlocked. |
divergentTransactionCount | int | Number of divergent transactions. Populated when recoveryState is RecoveryBlocked. |
Conditions
| Type | Meaning |
|---|
Ready | True when the failover group has one writable primary and all replicas are healthy |
Degraded | True when replication is broken, lag exceeds maxLagSeconds, or a site is unreachable |
RecoveryPending | True while old-primary recovery is running (RecoveryInProgress) or when an old primary has divergent transactions (DivergentTransactions). Use the reclone annotation only for DivergentTransactions. |
Bootstrapping | True when a clone operation is in progress (fresh-deploy, auto-clone, or reclone). Reason indicates the phase. |
DragonflyStatus
| Field | Type | Description |
|---|
enabled | bool | Mirrors spec.dragonfly.enabled at last observation. |
activeSite | string | Site currently acting as Dragonfly master. Usually matches status.activeSite; may briefly diverge during planned failover. |
phase | string | Disabled, Reconciling, ConfiguringReplication, Ready, Degraded, or Promoting. |
message | string | Human-readable status summary. |
lastPromotionTime | timestamp | Last successful Dragonfly promotion time. |
lastPromotionTarget | string | Site promoted in the last successful Dragonfly promotion. |
sites | []DragonflySiteStatus | Per-site Dragonfly observation. |
upgrade | DragonflyUpgradeStatus | Snapshot-restore upgrade progress when requested by annotation. |
DragonflySiteStatus
| Field | Type | Description |
|---|
name | string | Site identifier. |
role | string | master, replica, stale-master, unconfigured, unreachable, or unknown. |
reachable | bool | Whether the operator completed an INFO replication call on the latest poll. |
serviceName | string | Site-local Dragonfly Service name. |
podName | string | Dragonfly pod name when known. |
replicationState | string | Raw Dragonfly role value from INFO replication. |
linkStatus | string | Replica master_link_status; empty on a master. |
syncInProgress | bool | Whether Dragonfly reports a full sync in progress. |
lastIOSecondsAgo | int | Seconds since the replica last received data from its master; -1 means never. |
ready | bool | Operator-level readiness for the Dragonfly role. |
message | string | Human-readable per-site status. |
Annotations
| Annotation | Description |
|---|
bloodraven.shipstream.io/reclone-site | Triggers a CLONE INSTANCE from the current primary to the named site. Value format depends on the target's state: <site>:confirm=<group-name> for a cold reclone (no divergent GTID recorded), or <site>:<gtid-prefix> when status.sites[].divergentGtid is non-empty — the prefix must be 8+ characters and match the observed divergent GTID. Both forms act as a fat-finger interlock (cold path added by AUDIT L3 / WISHLIST #5). Invalid annotations are rejected with a RecloneRejected Warning Event and cleared. Valid ones emit RecloneRequested and report progress via the Bootstrapping condition. |
bloodraven.shipstream.io/planned-failover | Starts a planned failover to the named site. When Dragonfly is enabled, the planned-failover status includes a dragonfly substatus with session-preservation details. |
bloodraven.shipstream.io/dragonfly-snapshot-upgrade | Starts the Dragonfly snapshot-restore upgrade workflow to the target image. Requires spec.dragonfly.snapshot to be configured. |
Example status
status:
activeSite: iad
sites:
- name: iad
state: writable
lastSeen: "2025-01-01T00:00:00Z"
replicating: false
- name: pdx
state: read-only
lastSeen: "2025-01-01T00:00:00Z"
replicating: true
secondsBehindSource: 0
gtidExecuted: "uuid:1-100"
conditions:
- type: Ready
status: "True"
- type: Degraded
status: "False"
lastFailover: "2025-01-01T00:00:00Z"
lastFailoverTarget: iad
dragonfly:
enabled: true
activeSite: iad
phase: Ready
message: all sites ready
sites:
- name: iad
role: master
reachable: true
ready: true
- name: pdx
role: replica
reachable: true
linkStatus: up
ready: true
MysqlStandbyCluster
Phase 1 of WISHLIST #7. A passive DR relationship descriptor that
lives on the DR cluster and continuously monitors a source
MysqlFailoverGroup's backup archive in a shared object store. The
controller populates status.discovered and stamps readiness conditions
on a configurable cadence. No mysqld is started; no dump is loaded in
Phase 1.
See Multi-cluster DR for the end-to-end
recovery runbook and a full explanation of what Phase 1 does and does not
provide.
Top-level spec fields
| Field | Type | Required | Default | Description |
|---|
transport | string | No | ObjectStore | Transport mode. Only ObjectStore is honored in v1alpha1; Network is reserved. |
source | StandbySource | Yes | -- | Identifies the source archive: bucket, prefix, profile name, optional decryption. |
template | StandbyFailoverGroupTemplate | Yes | -- | Embedded MysqlFailoverGroupSpec the controller will materialize on activation (Phase 3). Required (not optional) so the full activated topology is validated at standby-CR-create-time while the cluster is calm, rather than failing late mid-incident during a promote. Declare it now, not during an outage. |
freshness | StandbyFreshnessSpec | No | -- | Phase 1 bucket-discovery cadence (discoveryInterval only). |
StandbySource
| Field | Type | Required | Description |
|---|
failoverGroupName | string | Yes | Source MysqlFailoverGroup name. Informational; used in events and status. |
namespace | string | No | Source MFG namespace in its own cluster. Informational. |
cluster | string | No | Free-form source cluster identifier (e.g. us-west-prod). Informational. |
storage | BackupStorage | Yes | Object-store backend (same shape as spec.backup.profiles[].storage). The controller refuses to write to it. |
profileName | string | Yes | Backup profile name under which dumps and binlogs are stored. |
decryption | BackupDecryptionSpec | No | Passphrase Secret to decrypt source artifacts. Must exist in the DR namespace. |
StandbyFreshnessSpec
| Field | Type | Default | Description |
|---|
discoveryInterval | duration | 5m | How often the controller re-scans the bucket to refresh status.discovered. Minimum 30s. |
The Phase 2 verification cadence and staleness knobs (verifySchedule,
verifyTimeZone, maxStaleness, suspend, retentionFloorRefresh) and the
Phase 3 spec.activate block are not part of v1alpha1. They will be added
back (backward-compatibly) when the code that consumes them ships. Until then
they are absent from the structural schema, so they have no effect: the API
server prunes them (they are silently dropped and never persisted), and clients
using strict server-side field validation — the default for kubectl apply/create since Kubernetes 1.25 — reject them outright.
Status
| Field | Type | Description |
|---|
discovered | StandbyDiscovered | Most recent successful bucket scan: dump location, name, completion time, GTID set, size, and binlog window timestamps. Encryption detection (BRV1 header) is deferred to Phase 2. |
lastVerified | StandbyLastVerified | Most recent terminal MysqlBackupVerification owned by this CR (Phase 2). |
activation | StandbyActivationStatus | One-shot promote audit trail: phase, source/target GTIDs, PITR stop datetime, replayed binlog count, active site, reason, message (Phase 3). |
materializedFailoverGroup | string | Name of the MysqlFailoverGroup created during activation. Empty until Activated (Phase 3). |
conditions | []Condition | Standard Kubernetes conditions. See below. |
Conditions
| Type | Phase populated | Meaning |
|---|
BucketReadable | Phase 1 | Bucket scan succeeded within the last discovery interval. Reason: ListSucceeded (True), ListFailed (False), ScanIncomplete (False — the List call did not complete before the scan deadline; a context deadline/cancel during List, distinct from a genuine ListFailed. The previous status.discovered is preserved), AuthFailed (False — store construction failed, e.g. bad credentials), or ConfigError (False — missing/invalid spec fields). |
SourceConfigKnown | Phase 1 | At least one full dump and one binlog manifest were found. Reason: DumpFound (True), NoDumpFound (False), NoBinlogManifests (False — dump found, no PITR window yet: a dump exists but no binlog manifests are under <prefix>/binlogs/, so recovery is limited to the dump. This is the expected state for a dump-only source or a brand-new source, not a misconfiguration), MetadataUnreadable (False — dump @.json present but genuinely malformed/cannot be parsed), ScanIncomplete (False — the scan was cut short by a context deadline/cancel before reading every dump @.json or all manifests; the partial selection is not published, the previous status.discovered is preserved as last-known-good, and no BucketScanned event is emitted), or ConfigError (False — propagated from storage backend error). The default kubectl get printcolumn surfaces this as the neutral SourceKnown. |
Restorable | Phase 2 | Most recent owned MysqlBackupVerification is Succeeded and within a future staleness threshold (Phase 2). |
ActivationInProgress | Phase 3 | True while activation phase is non-terminal. |
Active | Phase 3 | True when status.activation.phase == Activated. |