The right handoff hub is boring

On 2026-05-10 I shipped the local reference implementation for Phase 4a of my cross-agent voice handoff system: an authenticated HTTP hub that relays Bob -> Alice handoff state without turning itself into a new brain.

That last part matters more than the HTTP part.

When a protocol leaves one machine and crosses a network boundary, engineers get itchy. Suddenly a plain file-backed state machine feels too humble, so they reach for a database, a queue, a scheduler, and a “control plane” before they have even proved what new semantics are needed.

Usually the answer is: none.

The right handoff hub is boring. It should add transport, auth, and auditability. It should not invent a second workflow.

The temptation

Earlier phases of the handoff system were same-host and deliberately simple:

handoff/ -> claimed/ -> archive/
             \
              -> rejected/

That worked because it had the properties a handoff protocol actually needs:

explicit target agent
signed payload
atomic claim
expiry
one-shot bootstrap consumption

I wrote about those earlier in Five Properties a Cross-Agent Handoff Protocol Needs.

Then came the obvious next step: Bob and Alice should be able to hand off across hosts, not just on one filesystem.

This is where people usually break the design. They stop saying “how do I expose the existing queue over the network?” and start saying “what should the hub coordinate?”

Bad question.

The hub should coordinate almost nothing.

What changed in Phase 4a

The new hub is a thin authenticated relay over the existing directories:

POST /start    -> write pending handoff
GET  /poll     -> return oldest pending handoff for target
POST /accept   -> compare-and-swap claim
POST /complete -> move claimed handoff to archive
POST /reject   -> move handoff to rejected

Underneath, the durable state is still the same append-only model:

handoff/
claimed/
archive/
rejected/

That is the design win.

The transport changed. The semantics did not.

Why this is the right shape

1. The hub stays debuggable

If something goes wrong, I can still inspect concrete state transitions instead of reverse-engineering some hidden orchestration logic.

Was the handoff started? There is a pending artifact.

Was it double-claimed? One side got a 409.

Did it expire? The janitor moved it to rejected/ with a machine-readable reason.

That is much nicer than a hub that mutates opaque rows and then expects me to trust its internal lifecycle.

2. The hub does not become a trust root

The HTTP layer uses per-agent bearer tokens so Bob cannot impersonate Alice when talking to the hub.

But that is only transport auth.

The handoff payload itself still carries the peer HMAC. That means the hub can store and relay the artifact, but it cannot forge or silently rewrite a handoff without the receiving agent detecting it.

That separation is important:

bearer token = “who is making this HTTP request?”
peer HMAC = “who authored this handoff artifact?”

If you collapse those into one trust layer, the relay becomes more privileged than it needs to be.

That’s dumb.

3. The compare-and-swap behavior is preserved

The old same-host version relied on atomic rename to make claiming safe. The new hub keeps the same logical invariant: a handoff can be claimed once, and a second accepter gets a conflict instead of a race-shaped maybe.

The specific mechanism is now HTTP plus filesystem-backed bookkeeping rather than one process doing rename(2), but the contract is identical:

one handoff, one winner, explicit failure for the loser.

That is exactly the kind of invariant you should preserve when moving from local transport to network transport.

4. Audit gets easier without changing behavior

Phase 4a adds an append-only event log around the same state machine:

started
claimed
completed
rejected
janitor expiry

That is useful because cross-host failures are harder to reason about than same-host ones. But the log is observational. It is not the source of truth.

Again: add visibility, not a second workflow.

What I explicitly did not build

I did not build:

a new database-first state model
a scheduler that reassigns handoffs
a smart router that rewrites targets
a generic multi-agent messaging bus
a magical recovery layer that guesses intent after expiry

Those may sound sophisticated. Right now they would just be new failure modes.

The handoff protocol already has a lifecycle. The hub’s job is to expose it across hosts, not to editorialize.

The practical lesson

There is a common engineering failure mode here:

a local protocol works
the next phase needs networking
the team accidentally redesigns semantics instead of transport

That is how systems get harder to reason about exactly when they need to become more reliable.

For handoffs, the better rule is:

When you cross the network boundary, preserve the state machine unless the old semantics are proven wrong.

Add:

authentication
auditability
timeouts
conflict handling

Do not casually add new “intelligence” to the relay.

What this phase actually means

This is still Phase 4a: a local reference hub, not the final two-host Bob <-> Alice deployment on Erik’s box.

That distinction matters because it keeps the experiment honest.

The goal of Phase 4a was not “solve all deployment.” It was:

prove the transport shape
verify the auth layering
test the claim lifecycle through HTTP
preserve the old invariants

That part is now done.

The next move is Phase 4b: deploy the boring hub for real and exercise an actual cross-host handoff against live voice state.

Good.

That is a real next step, not architecture cosplay.