High Availability

Boot multiple appliance images, enable cluster mode, and replicate control-plane state while each node continues to enforce traffic locally.

Use this guide when you need more than one node to share control-plane state.

Read Requirements first if you need the runtime assumptions around the image, bundled DPDK runtime, or custom deployment outside the packaged image.

In Neuwerk, high availability means:

  • multiple booted Neuwerk appliance images
  • cluster mode enabled
  • replicated control-plane state
  • leader-aware management behavior

It does not mean a distributed dataplane. Each node still runs its own local dataplane and keeps its own local flow state.

Use the shipped appliance image for each node when possible. That is the primary distribution artifact and keeps the cluster runtime contract consistent across nodes.

What Cluster Mode Adds

Compared with single-node mode, HA adds:

  • the cluster RPC listener
  • the join listener for enrollment
  • a replicated control-plane store
  • secure node join
  • leader-aware API handling
  • policy replay from cluster state to each node’s local enforcement state

Everything else remains familiar: each node still runs a management API, DNS proxy, service plane, and dataplane.

Enabling Cluster Mode

Cluster mode is entered when cluster-related flags are present. The key ones are:

  • --cluster-bind
  • --cluster-join-bind
  • --cluster-advertise
  • --join
  • --cluster-data-dir
  • --node-id-path
  • --bootstrap-token-path

The Two Startup Roles

There are two practical startup patterns:

  • seed node: start a node with cluster settings but without --join
  • joining node: start a node with --join <seed-join-address>

The seed node initializes cluster services. A joining node enrolls through the join listener before it begins steady-state cluster traffic.

For first-admin bootstrap in cluster mode, prefer the cluster auth CLI path described in Get Admin Access rather than minting separate local tokens on each node.

Default Cluster Paths

Unless overridden, cluster mode uses:

  • 127.0.0.1:9600 for the main cluster RPC listener
  • cluster-bind + 1 for the join listener
  • /var/lib/neuwerk/cluster for cluster data
  • /var/lib/neuwerk/node_id for node identity
  • /var/lib/neuwerk/bootstrap-token for the bootstrap token

Readiness Expectations

HA adds two readiness concerns that single-node mode does not have:

  • cluster
  • policy_replication

cluster fails when the node cannot determine leader health.

policy_replication fails when the node has not yet replayed the active cluster policy into its own local enforcement state.

Post-Deployment Checks

After bringing up a cluster, verify:

GET /ready
GET /api/v1/stats
POST /api/v1/support/sysdump/cluster

Look for:

  • a known leader
  • healthy followers
  • policy replication caught up on every node

Warning

A one-node cluster is valid for bootstrap or migration, but it is not highly available. You only gain failover value once replicated state exists on more than one node.