Why not Group Replication?

The single most common question about Bloodraven is: "Why run async replication with an external operator when MySQL ships both Group Replication (GR) and InnoDB Cluster?" This page is the architectural answer so readers can stop asking and pick the right tool for their situation.

TL;DR

Bloodraven is optimized for the two-site, geographically-separated, accept-non-zero-RPO deployment. Group Replication is optimized for the three-or-more-node, low-latency, zero-RPO deployment. They're different design points, and Bloodraven solves problems Group Replication doesn't — most importantly: staying writable when the cross-site link is slow, flappy, or down.

What Bloodraven trades away

RPO is not zero. A hard primary loss can lose every transaction that committed on the dying primary but hadn't yet shipped to the replica over async replication. Under healthy operation this window is typically sub-second, but it exists. If your data model cannot accept this, Bloodraven is the wrong tool — use Group Replication (or a higher-tier system like Spanner / CockroachDB).
No in-database conflict resolution. Because only one site writes at a time, Bloodraven cannot merge concurrent writes from two sites. Conflicts are resolved by the operator's "one primary at a time" invariant; anything that breaks that invariant (split brain) is surfaced as a condition for a human, not silently merged.

What Bloodraven keeps

Zero commit latency. A primary write acknowledges as soon as it has fsynced to the local binlog — the same latency profile as a standalone MySQL. No quorum round-trip, no cross-site ACK. For two sites separated by ≥ 20 ms of network latency, GR's certification

quorum on every commit typically adds 40–80 ms to p50 write latency. Bloodraven's async model adds zero.

Single-node write availability. The primary accepts writes even when the other site is unreachable. GR requires a majority of the group online to accept writes, so losing half of a two-node group means the remaining node is read-only; losing two of three means the whole cluster is read-only. Bloodraven will promote a surviving site to writable the moment it detects the primary is gone and the replica is healthy (see Failover sequence).

No quorum requirement. Two sites is a legitimate topology. GR with two members is a pathological configuration — any partition or node loss makes the group inquorate. Operators who want two-site HA with GR are forced to invent a "witness" third node somewhere, which introduces its own set of cross-region headaches (where does the witness live? What happens when the witness is isolated?). Bloodraven's sidecar self-fencing layer performs the "am I still authoritative?" check without a quorum.

Simpler mental model. There is always exactly one primary; the other site is a replica or is fenced. Topology never has to pick between N possible primaries, negotiate a view change, or resolve certification conflicts. The operator's state machine has four per-site states and a small cross-site truth table — a single developer can hold the whole thing in their head, which is load-bearing when you're debugging at 03:00.

Works across zones with real latency. Group Replication's paper-published performance numbers assume sub-millisecond inter-node latency, because every commit serializes through the group's certification protocol. At 20-100 ms cross-region latency (the typical two-datacenter or two-region deployment), GR is functional but expensive on write throughput, and any network blip triggers a view change. Async replication + a supervisor is the standard answer at that latency tier for a reason.

Sidecar self-fencing. The sidecar on each MySQL pod refuses to accept writes if it can reach neither the operator nor the peer for spec.sidecar.leaseTimeout (default 20 s). This closes the split-brain window that async replication alone would leave open. See the Sidecar description.

When Group Replication is actually the right answer

Bloodraven is not always the right answer — be honest about when it isn't:

Zero-RPO is a product requirement. Financial ledgers, inventory-as-source-of-truth, anything where "we lost a second of writes" is a customer-visible failure. Group Replication is what you want.
Three or more nodes are already on the table. If your topology already has three MySQL nodes for HA (and the write path is willing to pay quorum latency), GR turns that into synchronous-ish replication with no external operator. Bloodraven's two-site architecture doesn't fit.
Low inter-node latency. Single-AZ, single-DC, single-rack deployments don't pay GR's latency cost, because the cost is small in that environment.
You cannot tolerate split-brain resolution by human. Bloodraven's response to writable/writable is to alert and wait for an operator (or, opt-in, to fence a pre-configured loser — see spec.splitBrainPolicy). If your runbook requires the cluster to auto-pick a winner in every case without data reconciliation, use GR.

The honest tradeoff

Bloodraven and Group Replication solve the same top-level problem ("keep MySQL writable when bad things happen") from two different vantage points:

Concern	Group Replication	Bloodraven
RPO on hard primary loss	0	≈ `secondsBehindSource` of the replica at failure
Commit latency	1 cross-node round-trip	1 local fsync
Minimum nodes to tolerate 1 failure	3	2
Write availability during a partition	Majority side only	The reachable side (operator arbitrates)
Conflict resolution	Certification (may abort commits)	Single-writer invariant (no conflicts possible)
Operational complexity	View changes, certification, group membership	Primary/replica + one external operator
Typical inter-node latency sweet spot	< 5 ms	Doesn't care; tested at 20–100+ ms
Supervisor required for DNS/traffic steering	Yes (MySQL Router / InnoDB Cluster)	Yes (Bloodraven itself)

Bloodraven picks the column on the right of every row. If the column on the left describes your situation better, run Group Replication.

Architecture — how the operator, sidecars, and Services fit together.
Failover — state machine, failover sequence, anti-flap cooldown, split-brain handling.
Getting started — stand up a two-site failover group end-to-end.

TL;DR​

What Bloodraven trades away​

What Bloodraven keeps​

When Group Replication is actually the right answer​

The honest tradeoff​

Related reading​

TL;DR

What Bloodraven trades away

What Bloodraven keeps

When Group Replication is actually the right answer

The honest tradeoff

Related reading