Skip to main content

Backup and restore

backup restore infographic

Bloodraven drives scheduled and on-demand MySQL backups using the mysqlsh util.dumpInstance() / util.loadDump() utilities against a configurable S3 or PVC destination. Restores are bootstrap-only in v1: populate a brand-new MysqlFailoverGroup from a previous dump before cross-site replication starts.

:::tip Quick setup guides Use S3 Backups or PVC Backups for copy-paste setup. This page is the deep behavior reference for source selection, retention, verification, PITR, restore semantics, and failure modes. :::

Backup decision table

NeedUse
Durable off-cluster recoveryS3 profile
Local lab or short-lived staging copyPVC profile
Lower RPO than full backup cadencePITR binlog archival
Proof that backups loadMysqlBackupVerification
Compliance-grade application-level encryptionspec.backup.profiles[].encryption

Restore workflow checklist

  • Confirm whether this is a new recovery group or a destructive in-place restore.
  • Confirm source artifact, namespace, bucket/PVC path, and decryption Secret.
  • Disable production traffic to the recovery DNS name until validation completes.
  • Verify MySQL starts, expected schemas exist, and application smoke tests pass.
  • Only then move traffic or update application configuration.

Recovery time and sizing considerations

Restore time depends on dump size, compression, object-store or PVC throughput, loadOptions.threads, MySQL startup time, and any PITR replay window. Size staging volumes and verification PVCs for at least the compressed artifact plus restore working space; production teams should measure this with a full-size backup before go-live.

Concepts

  • Backup profile — a named, reusable configuration living on MysqlFailoverGroup.spec.backup.profiles[]. Selects a storage target (S3 or PVC), dump options, and retention.
  • Backup schedule — a cron expression on MysqlFailoverGroup.spec.backup.schedules[] that references a profile by name. Each schedule becomes a Kubernetes CronJob owned by the operator.
  • MysqlBackup CRD — one CR per backup run. Created ad-hoc via kubectl create or by the schedule CronJob. Tracks phase, start / completion times, dump location, size, GTID, and binlog coordinates.
  • Restorespec.initFromBackup on the failover group. Gates initial bootstrap on a one-shot util.loadDump() into a dynamically- selected target site.
  • Verification — a parallel MysqlBackupVerification CRD and spec.backup.profiles[].verification block that periodically restores the latest Succeeded backup into a throwaway MySQL instance to prove it's actually loadable. See Backup verification.
  • Encryption at rest — opt-in, per-profile spec.backup.profiles[].encryption block that turns on client-side envelope encryption (AES-256-GCM) for every dump artifact and archived PITR binlog. The passphrase lives in a Kubernetes Secret the operator controls, so keys and storage credentials can be kept on separate blast radii. See Backup encryption.

The backup image

The default image is pinned to container-registry.oracle.com/mysql/community-server:9.6. This bundles the mysqlsh binary; the community-shell repository that appears in some docs does not exist in the Oracle registry — a common stumbling block. Production deployments should always pin this explicitly via spec.backup.image and avoid floating tags like :9 or :latest, since mysqlsh dump/load compatibility across versions is not guaranteed.

Backup source selection

The reconciler prefers the replica site as the dump source so the primary is not loaded by long-running backups. It falls back to the primary when the replica is unreachable, not replicating, or lagging beyond spec.backup.maxLagSecondsForSource (default 300). Override per-backup with spec.sourceSiteOverride on a MysqlBackup CR.

When ActiveSite changes while a backup is in flight, the backup reconciler emits an InFlightFailover warning event on the CR. This is a soft signal: the artifact is still a valid point-in-time snapshot of the original source, but operators should know a failover happened mid-dump so they can correlate it with any replication-gap alerts.

Security context defaults

Every backup, restore, and cleanup Job pod runs with a hardened pod- and container-level SecurityContext matching the Restricted Pod Security Standard:

LevelFieldDefault
podrunAsNonRoottrue
podrunAsUser / runAsGroup27 (mysql)
podfsGroup27
podseccompProfile.typeRuntimeDefault
containerallowPrivilegeEscalationfalse
containerreadOnlyRootFilesystemtrue
containerrunAsNonRoottrue
containercapabilities.drop[ALL]
containerseccompProfile.typeRuntimeDefault

Because readOnlyRootFilesystem=true, the backup Job pod also mounts two ephemeral emptyDir volumes: mysqlsh-home at /home/mysqlsh (mysqlsh needs a writable home for ~/.mysqlsh) and tmp at /tmp. HOME is set to /home/mysqlsh so mysqlsh discovers them cleanly.

Users can override individual fields via spec.backup.podSecurityContext and spec.backup.containerSecurityContext. Overrides are merged on top of the defaults — any unset field stays at the default, and any set field wins. This lets you tighten further (e.g. set a specific seLinuxOptions) without reintroducing capabilities or root access.

Credential layout

MySQL credentials are never injected into the backup pod via envFrom. Instead the reconciler derives a Secret per backup run (with MYSQL_USER / MYSQL_PASSWORD keys) and mounts it as files at /run/bloodraven/mysql-creds/ with mode 0400. The embedded Python dump script reads the files via the BLOODRAVEN_MYSQL_CREDS_DIR env var. This keeps plaintext passwords out of /proc/$PID/environ on the pod and makes accidental env-var leaks (e.g. in panic stack traces) much less dangerous.

In credentials mode (spec.credentials), the derived Secret reads username/password directly from spec.credentials.backupSecret (or falls back to operatorSecret if no dedicated backup secret is configured). In legacy mode (spec.secretName), the credentials are parsed from the DSN in the referenced Secret.

For S3 profiles the same pattern applies to AWS credentials: the profile's credentialsSecret is mounted at /run/bloodraven/aws-creds/ (mode 0400), pointed at by BLOODRAVEN_AWS_CREDS_DIR, and the dump script assembles a standard ~/.aws/credentials file on the fly so the AWS SDK inside mysqlsh picks it up natively.

S3 backup profile

apiVersion: shipstream.io/v1alpha1
kind: MysqlFailoverGroup
metadata:
name: orders
namespace: orders
spec:
image: mysql:9.6
secretName: mysql-credentials
# ... sites, dns, sidecar omitted ...
backup:
image: container-registry.oracle.com/mysql/community-server:9.6
maxLagSecondsForSource: 120
retry:
maxAttempts: 3
initialBackoffSeconds: 60
maxBackoffSeconds: 1800
profiles:
- name: nightly-s3
retentionPolicy:
count: 14
maxAgeDays: 30
minKeep: 1
maxFailedKeep: 10
storage:
type: S3
s3:
bucket: shipstream-backups
prefix: orders
region: us-east-1
credentialsSecret: s3-backup-creds
dump:
threads: 8
bytesPerChunk: "128M"
compression: zstd
consistent: true
schedules:
- name: nightly
profileName: nightly-s3
schedule: "0 2 * * *"
timeZone: "America/Los_Angeles"

The referenced s3-backup-creds Secret must contain AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_SESSION_TOKEN / AWS_REGION. The keys are each mounted as a separate file under /run/bloodraven/aws-creds/ and the dump script stitches them into a standard credentials file at startup.

TimeZone

schedules[].timeZone is an IANA zone name (default Etc/UTC) that is passed to the CronJob spec's .timeZone field. This matters because kube-controller-manager's local timezone is environment- dependent and unreliable — two clusters running the same manifest can fire the same cron at different wall-clock times. Pinning the TZ per-schedule makes backup scheduling reproducible no matter where the control plane runs.

MinIO or other S3-compatible stores

Set storage.s3.endpointURL to the target endpoint (e.g. https://minio.internal:9000). The reconciler maps this onto mysqlsh's s3EndpointOverride option.

PVC-backed profile

backup:
profiles:
- name: daily-local
retention: 7
storage:
type: PVC
pvc:
storageClassName: fast
size: 50Gi
dump:
compression: zstd

If pvc.claimName is empty the operator provisions a PVC named mysql-<fg>-backup-<profile>. Otherwise the user-managed claim is mounted read-write into the backup Job at /backups.

Structured retention

Each profile can use the structured retentionPolicy field instead of the shorthand retention: N int:

FieldMeaningDefault
countMax successful CRs to keep. 0 disables count-based pruning.0
maxAgeDaysMax age of a successful CR before it's eligible for pruning.0
minKeepSafety floor — this many newest successful CRs are always kept.1
maxFailedKeepMax Failed CRs to keep per profile.10

A successful CR is kept if any of the enabled checks say "keep": inside the count window, inside the maxAgeDays window, or within the minKeep floor. minKeep is the critical safety knob: if every recent attempt after a long outage has failed, it prevents a retention sweep on the next successful run from wiping the last good backup.

The legacy shorthand retention: 7 is still supported and maps to (count=7, minKeep=1, maxFailedKeep=10).

Artifact cleanup

When a MysqlBackup CR is deleted (manually or by retention) the reconciler runs a short-lived cleanup Job via the shipstream.io/mysqlbackup finalizer. The cleanup Job uses the same mysqlsh image and credentials layout as the backup Job, but runs cleanup.py instead of dump.py. It dispatches on the BLOODRAVEN_STORAGE_TYPE env var:

  • S3 — invokes util.rmdump(prefix, {s3BucketName: ...}) to recursively delete the dump prefix. "Not found" / "no such key" responses are treated as success.
  • PVC — resolves the dump subdirectory under the mount and rmtrees it. Refuses to delete anything outside the mount root.

Events:

  • ArtifactCleanupStarted — cleanup Job created.
  • ArtifactCleanupSucceeded — artifact removed or already gone.
  • ArtifactCleanupFailed — cleanup Job failed. The finalizer blocks the CR deletion until either the next attempt succeeds or an operator force-deletes by removing the shipstream.io/mysqlbackup finalizer by hand.
  • ArtifactCleanupSkipped — the referenced failover group or profile is gone, so cleanup cannot run.

Retries for scheduled backups

spec.backup.retry configures operator-level retries for scheduled CRs that land in Failed. This is independent of the Job-level backoffLimit: Job backoff retries the container inside a single CR, whereas this retries the whole CR and produces a fresh MysqlBackup object with its own Job and attempt counter.

backup:
retry:
maxAttempts: 3 # total attempts including the original
initialBackoffSeconds: 60
maxBackoffSeconds: 1800

Backoff is exponential: the Nth retry waits initialBackoffSeconds * 2^(N-1) seconds from the failed CR's completion time, capped at maxBackoffSeconds. spec.backup.retry is ignored for ad-hoc CRs — those are operator-initiated and do not participate in automatic retries.

One-off backups

apiVersion: shipstream.io/v1alpha1
kind: MysqlBackup
metadata:
name: orders-preupgrade
namespace: orders
spec:
failoverGroupRef:
name: orders
profileName: nightly-s3
triggeredBy: manual

Inspect with kubectl get mysqlbackups. The Phase, Location, and Size columns reflect the current state. Failed runs leave a condition on .status.conditions explaining the reason.

Restore at bootstrap

To recover a lost failover group from a previous backup, create a new MysqlFailoverGroup that references an existing MysqlBackup CR in the same namespace:

apiVersion: shipstream.io/v1alpha1
kind: MysqlFailoverGroup
metadata:
name: orders
namespace: orders
spec:
image: mysql:9.6
secretName: mysql-credentials
# ... sites etc. ...
backup:
profiles:
- name: nightly-s3
storage:
type: S3
s3:
bucket: shipstream-backups
prefix: orders
credentialsSecret: s3-backup-creds
initFromBackup:
source:
mysqlBackupRef:
name: orders-preupgrade

Dynamic target-site resolution

The restore reconciler does not hard-code spec.sites[0] as the restore target. Instead it resolves dynamically via restoreTargetSite:

  1. If status.activeSite is set and that site is observed writable (or has no observed state yet on fresh deploys), use it.
  2. If status.activeSite is set but that site is observed in any other state (read-only, unreachable), refuse — the reconciler emits a RestoreTargetUnavailable warning event and parks the restore in Pending. This prevents accidentally overwriting a recovering standby with a stale dump.
  3. Otherwise (true fresh deploy) fall back to spec.sites[0].Name.

Rolled-out Deployment gate

Before spawning the load Job, the reconciler waits until the target site's Deployment is fully rolled out:

  • generation <= 1 → accept as long as ReadyReplicas >= 1 (fresh deploy; ObservedGeneration may briefly lag).
  • Otherwise, require ObservedGeneration >= Generation, UpdatedReplicas >= 1, and ReadyReplicas >= 1.

This guards against firing a load against a Deployment mid-rolling- update, which would race the ready MySQL container against the terminating one.

Direct S3 / PVC sources

initFromBackup.source also accepts a direct S3 URL or a read-only PVC, which is useful when the source backup CR has been garbage- collected:

initFromBackup:
source:
s3:
bucket: shipstream-backups
prefix: orders/orders-preupgrade
credentialsSecret: s3-backup-creds

Restores are one-shot. If the Job fails, inspect the logs, delete the Job, and the reconciler will rebuild it on the next pass. Restoring into a non-empty data directory is not supported — util.loadDump() fails fast with a clear error.

Per-schema bootstrap (tenant migration pattern)

loadOptions.includeSchemas and loadOptions.excludeSchemas are forwarded to util.loadDump() and let you carve a subset of a full-instance dump into a brand-new failover group. This is useful for migrating a single tenant off a shared cluster onto its own dedicated one:

spec:
initFromBackup:
source:
mysqlBackupRef:
name: shared-nightly
loadOptions:
includeSchemas: ["tenant_42"]

Operationally, the migration pattern is: put the tenant into application-level maintenance mode, take a backup of the shared cluster, deploy a new MysqlFailoverGroup with includeSchemas pointing at the tenant's schema, then flip the app's connection string to the new group. System users are not part of a util.dumpInstance dump, so pair the restore with whatever user-provisioning process the new group uses.

In-place restore (spec.restoreInPlace)

initFromBackup is one-shot and runs before the failover group is considered ready. For rolling an existing, live cluster back to a previous dump (or replaying to a point-in-time) without teardown or rename, use spec.restoreInPlace. It is re-triggerable and operates directly against the active primary.

Two modes are supported, selected by loadOptions.includeSchemas:

Full-instance in-place restore

When includeSchemas is empty, the restore wipes every user schema on the primary and reloads from the dump. Choreography:

  1. Preflight — validates that status.activeSite is writable and the confirm timestamp is newer than the last consumed one.
  2. Fencing — the operator strips the primary role label on the active pod(s), so the -primary Service sheds endpoints and apps see connection errors. The topology manager is frozen: no promotion, auto-clone, or recovery actions fire during the restore.
  3. Restoring — a Job runs util.loadDump() after a STOP REPLICA; RESET REPLICA ALL; DROP DATABASE <each-user-schema> preflight. skipBinlog=true on the load (the replica is about to be re-cloned, so there is no point burning binlog space). Optional PITR replay runs after the dump load, against the same primary.
  4. Resuming — the primary role label is restored, the topology manager is unfrozen, and the reclone-site annotation is set for the peer. The existing reclone machinery then CLONEs the peer from the freshly-restored primary in the background.
spec:
restoreInPlace:
confirm: "2026-04-17T14:32:00Z"
source:
mysqlBackupRef:
name: orders-nightly-20260416
pointInTime:
stopDatetime: "2026-04-17T14:30:00Z"

Per-schema in-place restore

When includeSchemas contains exactly one entry, the restore drops and reloads only that schema. The primary Service stays up — other tenants keep writing — and replication carries the DROP + load through the primary's binlog to the peer. No reclone is scheduled.

spec:
restoreInPlace:
confirm: "2026-04-17T14:32:00Z"
source:
mysqlBackupRef:
name: shared-nightly
loadOptions:
includeSchemas: ["tenant_42"]

The caller is responsible for putting the affected tenant into application-level maintenance mode. The operator forces skipBinlog=false on the load so the propagation to the peer works.

:::warning Per-schema PITR has a caveat PITR binlog replay on a per-schema restore adds mysqlbinlog --database=<schema> to the pipeline. That filter matches on the session's default database at log time, not on the schemas a statement actually touches. For well-isolated multi-tenant schemas this is fine. For apps that issue cross-schema statements (INSERT INTO a.t SELECT ... FROM b.t) use a full-instance restore instead — the filter can silently drop or misapply events. :::

The confirm timestamp

spec.restoreInPlace.confirm must be an RFC 3339 timestamp (e.g. 2026-04-17T14:32:00Z). It is the anti-fat-finger gate: the operator refuses to run the restore unless confirm parses and is strictly greater than status.restoreInPlace.confirmTokenUsed. On every new run, bump confirm to a newer timestamp. A common idiom in automation is to set confirm: $(date -u +%FT%TZ) at the moment the user authorizes the restore.

This gives programmatic callers a simple "just send now()" pattern while also protecting against replay: an older manifest applied accidentally does not re-trigger a restore.

Observing the state machine

kubectl get mysqlfailovergroup orders -o jsonpath='{.status.restoreInPlace}' | jq .

Phase sequence: PreflightFencingRestoringResumingSucceeded (or Failed at any point). Failed is terminal; inspect the Job logs, fix the underlying issue, then bump confirm to re-arm.

Required MySQL privileges

Consistent dumps (dump.consistent: true, the default) need BACKUP_ADMIN on the backup user in addition to standard replication and dump grants:

GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT,
SELECT, SHOW VIEW, EVENT, TRIGGER, BACKUP_ADMIN
ON *.* TO 'backup'@'%';

Without BACKUP_ADMIN, set dump.consistent: false on the profile.

Point-in-time recovery (PITR)

Bloodraven can continuously archive MySQL binary logs so restores can target an arbitrary --stop-datetime on top of any retained full dump, not just the dump's own capture point. For the interaction between PITR and emergency-failover RPO — specifically, when PITR does and does not narrow the data-loss window — see Durability and RPO → PITR and the RPO window.

How it works

Binlog archival runs inside the existing sidecar on every MySQL pod. The archiver goroutine:

  1. Uses inotify on mysql-bin.index in the data PVC to detect rotations. A poll-tick every archivePollInterval (default 60s) is a belt-and-suspenders safety net in case an inotify event is missed.
  2. Gates on @@read_only — only the primary uploads. After a failover the new primary takes over within one scan cycle; GTID dedup at restore time makes any brief overlap harmless.
  3. For each newly sealed binlog file, extracts timestamp + GTID metadata, uploads the file to the referenced backup profile's storage, and appends an entry to a per-site manifest.

Storage layout under the profile's prefix:

<profile-prefix>/binlogs/
├── manifest-<site-a>.json
├── manifest-<site-b>.json # post-failover (or different primary)
├── <site-a>/mysql-bin.000042
├── <site-a>/mysql-bin.000043
└── <site-b>/mysql-bin.000001

Per-site manifests prevent races between the current primary and a post-failover primary writing to the same object. The restore side reads every manifest in the prefix and merges their entries.

Enabling PITR

Set spec.backup.pitr on the failover group and point it at an existing backup profile:

spec:
backup:
profiles:
- name: primary
storage:
type: S3
s3:
bucket: lion-backups
prefix: prod
region: us-east-1
credentialsSecret: aws-creds
retentionPolicy:
count: 14
minKeep: 3
pitr:
enabled: true
profileName: primary
maxBinlogSize: "100M" # MySQL max_binlog_size; controls
# rotation cadence and therefore RPO

When enabled, the operator:

  • Injects max_binlog_size=<value> into the generated my.cnf (default 100M). Smaller values shorten the RPO gap (unarchived tail on a crashed primary) at the cost of more objects.
  • Mounts the MySQL data PVC read-only into the sidecar container so the archiver can read sealed binlog files.
  • Mounts the profile's storage credentials (for S3) or the backup PVC (for PVC) into the sidecar.
  • Rolls the MySQL pods to pick up the new config (the spec hash includes effective PITR values, with defaults normalized, so changing the default in a future release also rolls pods).

Retention

Archived binlogs are pruned to track full-backup retention: once the oldest retained MysqlBackup for a profile moves forward, binlogs with lastEventTime before that cutoff are no longer needed and get deleted.

The mechanism is pull-driven: the sidecar archiver polls the operator's auxiliary HTTP server at /pitr-cutoff?namespace=X&group=Y&profile=Z (default once per hour), which returns the minimum CompletionTime across retained backups. The archiver then prunes its own manifest and deletes the corresponding objects. The operator pod doesn't need storage credentials — the archiver already has them.

Restore to a target timestamp

Add a pointInTime block to spec.initFromBackup:

spec:
initFromBackup:
source:
mysqlBackupRef:
name: nightly-2026-04-14
pointInTime:
stopDatetime: "2026-04-15T09:30:00Z"
# excludeGtids: "server-uuid:42" # optional, skip a bad txn

stopDatetime accepts RFC 3339 (with or without trailing Z or timezone offset) or MySQL's native YYYY-MM-DD HH:MM:SS form.

The restore Job runs in two phases, one per container:

  1. pitr-download init container (bloodraven pitr-download subcommand) — downloads every archived binlog whose firstEventTime ≤ stopDatetime from the PITR archive storage into a shared emptyDir (/pitr-binlogs/<site>/...). Uses the AWS SDK v2 for S3 (with paginator), so it doesn't depend on the aws CLI existing in the MySQL image.
  2. mysqlsh main container — runs the existing util.loadDump() as before, then feeds the downloaded binlogs through mysqlbinlog --stop-datetime=<target> | mysql --binary-mode. Server-side GTID dedup skips transactions already applied by the dump load.

Observability

  • Archiver status: GET /archiver/status on the sidecar (mysql-<fg>-<site>:8080) returns {enabled, primary, lastScanAt, filesArchived, lastError, storageType, manifestPrefix, site} — handy for kubectl exec poking or a Prometheus scraper.
  • Job logs: BLOODRAVEN_PITR_START, BLOODRAVEN_PITR_COMPLETE / BLOODRAVEN_PITR_FAILED / BLOODRAVEN_PITR_NOOP from the restore container; one JSON line per archived file from the init container.

Known limitations

  • A GTID-set-based file filter would let the restore download fewer binlogs; today it filters on firstEventTime only and relies on GTID dedup to cover over-inclusion.

Metrics

The operator emits five Prometheus metrics for backups. All are labelled by {group, profile} unless noted.

MetricTypeLabelsWhat it measures
bloodraven_backup_runs_totalcountergroup, profile, resultTerminal backups labelled success or failure.
bloodraven_backup_duration_secondshistogramgroup, profileWall-clock duration from Job StartTime to CompletionTime.
bloodraven_backup_last_success_timestamp_secondsgaugegroup, profileUnix timestamp of the most recent successful backup.
bloodraven_backup_last_attempt_timestamp_secondsgaugegroup, profileUnix timestamp of the most recent terminal attempt, any result.
bloodraven_backup_last_size_bytesgaugegroup, profileSize in bytes of the last successful backup artifact.

All five are emitted exactly once per terminal reconcile, via a semantic-equality check on the computed next status plus a stable completion timestamp derived from the Job's terminal condition. This means re-reconciling an already-terminal CR does not produce duplicate counter increments or churny gauge updates.

Typical alerts:

  • Stale backup: time() - bloodraven_backup_last_success_timestamp_seconds > 86400
  • Repeated failures: increase(bloodraven_backup_runs_total{result="failure"}[24h]) > 3
  • Runaway duration: histogram_quantile(0.95, rate(bloodraven_backup_duration_seconds_bucket[1h])) > 3600

Monitoring

  • kubectl get mysqlbackups -A — phase, location, size.
  • kubectl describe mysqlfailovergroup <name> — rollup under status.backupSchedules[] (including lastSuccessfulBackupName, lastRetryAttempt, nextRetryTime) and status.lastBackupTime.
  • kubectl logs job/mysqlbackup-<name> — live dump progress, plus the final BLOODRAVEN_DUMP_COMPLETE sentinel line with the structured location, sizeBytes, gtidExecuted, binlogFile, binlogPos fields.
  • Prometheus metrics above.
  • Events on the MysqlBackup CR: BackupStarted, BackupSucceeded, BackupFailed, InFlightFailover, ArtifactCleanup*.
  • Events on the MysqlFailoverGroup CR: BackupRetryScheduled, BackupPITRNotImplemented, RestoreTargetUnavailable.