Skip to main content

How to create your first MysqlFailoverGroup

getting started infographic

Use this guide to install Bloodraven and create a two-site MysqlFailoverGroup in a Kubernetes cluster. It is the shortest successful path for a first non-production group.

For a complete local demo with a dashboard, counter app, and chaos scenarios, use the Playground. For production dependency ownership, image pinning, monitoring, and NetworkPolicy, use Production Install.

Prerequisites

  • Kubernetes 1.27 or newer.
  • Helm 3.
  • Two sets of nodes, each labeled for a different site.
  • A working StorageClass for MySQL persistent volumes.
  • external-dns configured for your DNS provider.
  • The externaldns.k8s.io/v1alpha1 DNSEndpoint custom resource definition (CRD) installed in the cluster.
  • Optional: cert-manager, if you want Bloodraven to request certificates through spec.tls.issuerRef.

1. Install the operator

Add the Helm repository and install the operator:

helm repo add bloodraven https://raw.githubusercontent.com/shipstream/bloodraven/gh-pages
helm upgrade --install bloodraven bloodraven/bloodraven \
--namespace bloodraven \
--create-namespace

Verify that the operator is running:

kubectl rollout status deployment/bloodraven -n bloodraven
kubectl get crd mysqlfailovergroups.shipstream.io

You should see the bloodraven Deployment complete its rollout and the MysqlFailoverGroup CRD registered.

2. Create MySQL credentials

Create separate Secrets for operator and application credentials. Bloodraven uses the operator credentials to manage MySQL users and replication, and it uses the application credentials for app-facing access.

kubectl create secret generic mysql-operator-creds \
--namespace default \
--from-literal=username=bloodraven \
--from-literal=password=OPERATOR_PASSWORD \
--from-literal=MYSQL_ROOT_PASSWORD=ROOT_PASSWORD

kubectl create secret generic mysql-app-creds \
--namespace default \
--from-literal=username=app \
--from-literal=password=APP_PASSWORD

Replace OPERATOR_PASSWORD, ROOT_PASSWORD, and APP_PASSWORD with real values before running the commands.

:::note Legacy DSN secret Bloodraven still accepts spec.secretName for older manifests that use one MySQL data source name (DSN) Secret. New installs should use spec.credentials.operatorSecret and spec.credentials.appSecret. :::

3. Label nodes by site

Label at least one node for each site. The labels must match the selectors you put in spec.sites[].taintNodeSelector.

kubectl label node NODE_IAD_1 \
shipstream.io/failover-group.orders=true \
shipstream.io/site.orders=iad

kubectl label node NODE_PDX_1 \
shipstream.io/failover-group.orders=true \
shipstream.io/site.orders=pdx

Replace NODE_IAD_1 and NODE_PDX_1 with real node names.

See Placement Contract for the full labeling and scheduling contract.

4. Create the failover group

Create orders-fg.yaml:

apiVersion: shipstream.io/v1alpha1
kind: MysqlFailoverGroup
metadata:
name: orders
namespace: default
spec:
image: mysql:9.6
sidecarImage: ghcr.io/shipstream/bloodraven-sidecar:latest

credentials:
operatorSecret: mysql-operator-creds
appSecret: mysql-app-creds

dns:
hostname: orders.az.example.com
ttl: 60

sites:
- name: iad
zone: us-east-1a
taintNodeSelector:
shipstream.io/failover-group.orders: "true"
shipstream.io/site.orders: iad
lbIP: 10.0.1.1
storage:
storageClassName: fast-ssd
size: 100Gi
- name: pdx
zone: us-west-2a
taintNodeSelector:
shipstream.io/failover-group.orders: "true"
shipstream.io/site.orders: pdx
lbIP: 10.0.2.1
storage:
storageClassName: fast-ssd
size: 100Gi

pollInterval: 2s
failureThreshold: 3
recoveryThreshold: 2
failoverCooldown: 5m
updateStrategy: OrderedUpdate

replication:
maxLagSeconds: 300

cloneTimeout: 3600

Update these values before applying the manifest:

FieldReplace with
spec.dns.hostnameThe hostname that should point at the active primary.
spec.sites[].lbIPThe load balancer IP for each site.
spec.sites[].storage.storageClassNameA StorageClass that can create MySQL PVCs in that site.
spec.sites[].taintNodeSelectorLabels that select only nodes in that site.
spec.image and spec.sidecarImagePinned image tags for production.

Apply the manifest:

kubectl apply -f orders-fg.yaml

5. Verify the group

Wait for the MySQL pods:

kubectl get pods -n default -l shipstream.io/failover-group=orders

Check the failover group status:

kubectl get mysqlfailovergroup orders -n default -o yaml

The status should show one active site, one writable site, one read-only site, and a Ready=True condition:

status:
activeSite: iad
sites:
- name: iad
state: writable
replicating: false
- name: pdx
state: read-only
replicating: true
secondsBehindSource: 0
conditions:
- type: Ready
status: "True"
- type: Degraded
status: "False"

Check the operator status API:

In one terminal, start a port forward:

kubectl port-forward -n bloodraven deploy/bloodraven 8082:8082

In another terminal, call the status endpoint:

curl http://localhost:8082/status

Smoke-test MySQL through the primary Service:

kubectl run mysql-client -n default --rm -it --restart=Never \
--image=mysql:9.6 -- \
mysql -h mysql-orders-primary.default.svc.cluster.local \
-u app -pAPP_PASSWORD \
-e 'SELECT @@hostname, @@read_only, @@super_read_only;'

Verify DNS after the first active site is selected:

kubectl get dnsendpoint -A | grep orders.az.example.com
dig orders.az.example.com

Troubleshooting first runs

If the group does not become ready, check these common causes first:

SymptomLikely causeNext step
MySQL pod remains PendingNode labels or StorageClass do not match the site.Check Placement Contract.
Operator logs mention missing Secret keysThe credential Secret is missing a required key.Recreate the Secret from Credentials and TLS.
DNS does not changeexternal-dns is not watching the namespace or DNSEndpoint CRD.Check Troubleshooting.
Replica does not catch upReplication credentials, network access, or clone bootstrap failed.Check Troubleshooting.

Next steps

  • App Integration - Connect applications to the primary and replica Services.
  • GitOps Guidance - Configure Argo CD or Flux ownership, sync order, and health checks.
  • Failover - Understand detection, promotion, DNS steering, and cooldown behavior.
  • Monitoring - Set up Prometheus metrics and alerting.
  • Production Install - Move from a first group to a production-ready install.