Duva - Distributed Cache Server

Reconnection on reboot

Duva reconnects to peers automatically after reboot — and it feels really good

One of the little things we added to Duva, Rust-powered distributed key-value store, is that when a node reboots, it tries to reconnect to the same peers it was talking to before.

It just works.
No re-bootstrap, no manual config tweaking, no weird “am I part of the cluster?” delays.

The node loads a small file with known peers and tries to reconnect as soon as it boots. If the file's older than some amount of time(should be configurable), it just skips it — no point trying to gossip with ghosts.

It sounds simple, but I didn't realize how nice it would feel until I started rebooting nodes in dev/testing and seeing them instantly slide back into the cluster like nothing happened.

But it wasn’t completely painless

I ran into some interesting edge cases during this:

  • How does a failed node know if it was a leader? If the node crashed while it was the leader, and it loads back up with a stale view, it might falsely assume it still leads the cluster. I had to make sure the node always re-validates its role by trying to talk to others before assuming leadership.
  • Avoiding conflict with the replicaof command: Duva has a replicaof command that lets you point a node to follow a specific leader manually. But that creates tension: should the node obey replicaof, or reconnect to old peers from before the crash? I had to make sure the reboot reconnection logic respects replicaof if it’s set — basically, user intent wins over automation.

These were fun challenges to debug, and we're pretty happy with how the final setup behaves. It keeps things simple but smart — nodes that go down come back up quickly, and fall back into their roles with minimal fuss.