Cluster Replication

Cluster mode replicates control-plane state and keeps node-local enforcement in sync with that state.

Cluster mode adds replicated control-plane state on top of the normal single-node Neuwerk model.

It gives you:

  • leader-aware management behavior
  • replicated policy and other control-plane records
  • secure node enrollment
  • a path to keep each node’s local enforcement state aligned with the active cluster state

It does not replicate active dataplane flow or NAT state.

What Gets Replicated

Cluster mode manages control-plane records such as:

  • active policy and policy metadata
  • service accounts and auth material
  • integrations
  • cluster and node TLS material
  • other cluster-aware admin state

Each node then rebuilds its own local enforcement state from that replicated control-plane data.

Two Network Surfaces

Clustering uses two distinct listeners:

  1. the main cluster RPC listener
  2. the join listener

The main listener is for steady-state cluster traffic between enrolled nodes.

The join listener exists for bootstrap enrollment before a node has its normal cluster identity and TLS material.

Seed And Joiner Roles

At startup, a clustered node behaves as one of two roles:

  • seed node: starts cluster services without a --join target
  • joining node: enrolls through another node’s join listener before joining normal cluster traffic

That split is important operationally because you must make both paths reachable during bootstrap.

Local Enforcement Still Matters

Replicated state is not enough by itself. Each node still has to:

  • replay the active policy locally
  • rebuild its local policy store
  • apply that state to its own dataplane and service-plane runtimes

That is why HA readiness depends on both cluster health and policy replication.

Failure Boundaries

When the cluster is unhealthy, the most common operator-visible effects are:

  • leader-aware API calls fail with 503
  • cluster readiness fails
  • policy_replication readiness fails on nodes that have not caught up

Those are control-plane failures. They do not mean the dataplane has suddenly become shared across nodes.

Recovery Mindset

For production planning, treat clustered state and per-node identity material together:

  • the replicated store is authoritative for shared control-plane state
  • node-local TLS, bootstrap-token, and node identity files still matter for recovery

Read Back Up And Restore State before designing disaster recovery around the cluster.