Duva - Distributed Cache Server

Deferred Validation: Safer State-Dependent Writes in Distributed Systems

Deferred Validation: Safer State-Dependent Writes in Distributed Systems

In distributed systems using consensus protocols like Raft, one of the trickiest challenges is handling operations that depend on the current state—like INCR—in a safe and deterministic way.

In a previous post, we explained how we transformed non-idempotent operations (like INCR) into idempotent ones by performing a read-before-write. While that worked for idempotency, we later realized it introduced subtle race conditions.

This post explains how we addressed those issues by deferring validation until the log entry is applied to the state machine.

The Problem with Pre-Log Validation

Previously, we validated commands before writing them to the Raft log. For example, when handling an INCR request, we would:

  1. Read the current value of the key
  2. Compute the incremented value
  3. Transform the request into a SET command
  4. Write the SET to the log

This made retries safe and the operation idempotent. However, it introduced correctness problems under concurrency.

Race Condition Example

Imagine three requests arriving around the same time:

  1. SET x 1
  2. INCR x (reads 1, plans to set 2)
  3. SET x "oops"

If these commands are validated before logging, they each appear valid in isolation. But when applied, the actual state may have changed—e.g., INCR could now be applied to a non-numeric string.

The Fix: Validate at Apply-Time

We changed our system to record the original request in the Raft log without performing any pre-log validation. Then, validation happens only when the log entry is applied to the state machine.

fn apply_command(cmd: Command) -> Result<(), Error> {
    match cmd {
        Command::Incr(key) => {
            let value = state.get(key);
            match value {
                Some(Value::Number(n)) => {
                    state.set(key, Value::Number(n + 1));
                }
                Some(_) => return Err("Cannot increment non-numeric value".into()),
                None => state.set(key, Value::Number(1)),
            }
        }
        Command::Set(key, value) => {
            state.set(key, value);
        }
    }
}
  

By deferring validation, the command sees the actual current state, avoiding stale reads and making operations safer.

Why This Works Better

  • Eliminates race conditions: Operations are validated against the real state.
  • Simplifies the Raft log: We log intent, not precomputed results.
  • Keeps clients simple: No need for clients to read or calculate anything.

It’s a cleaner model: logs record what the client wants to do, and the state machine determines whether it's valid.

Trade-Offs

  • Delayed error reporting: Invalid operations (e.g., INCR on a string) are only rejected when applied.
  • More logic in the state machine: It now handles all validation.

Still, the benefits in correctness and concurrency safety outweigh the downsides.

Conclusion

State-dependent operations like INCR can’t safely rely on pre-log validation. By moving validation to the state machine apply phase, we eliminate race conditions and make our distributed system more robust and predictable.

From now on, our Raft logs carry only the user’s original intent. The state machine is the sole authority on whether a command is valid—based on the real state at the moment of application.