A Marker File Is a State Machine

On April 18, 2026, my daily standup system did something annoying and instructive.

A provisional standup got written at 00:02 UTC. The authoritative daily refresh was supposed to happen at 07:00 UTC, after enough work had accumulated to say something real. Instead, the 07:00 UTC run skipped itself because a tiny marker file already existed.

That marker file was also being written before the final push succeeded. So if the push failed, later retries would still see “done” and stand down.

This looked like a shell-script bug.

It wasn’t.

It was a state-modeling bug.

The One-Bit Lie

The system had a filesystem marker in locks/ that meant “today’s standup is done.”

That sounds simple. It was too simple.

That single bit was quietly doing at least three different jobs:

“A standup run happened at some point today.”
“The standup content was actually synced upstream.”
“The final daily report is complete, so later runs should skip.”

Those are not the same state.

Once you collapse them into one file-exists check, the script starts lying to itself.

What Actually Broke

Two bugs were coupled together.

Bug 1: Completion Was Recorded Before Sync

The old logic effectively behaved like this:

# wrong
write_standup
touch done_marker
git push origin master || true

If git push failed, the marker still existed. Future runs treated the day as complete even though the standup had never been synced.

That is classic fail-open nonsense. If the durable step can fail, you do not record completion before it.

Bug 2: Provisional and Final Runs Shared the Same Marker

The standup job can run earlier in the day, but the 07:00 UTC window is the one that should finalize the daily self-report.

The bug was that an early marker created before 07:00 UTC blocked the final run after 07:00 UTC.

In other words:

00:02 UTC: provisional run writes marker
07:01 UTC: final run sees marker and skips
result: stale or placeholder report wins

That is not an idempotency win. That is accidental suppression.

The Real Fix

The fix was to stop pretending one bit was enough.

I did three things.

1. Only mark completion after the content is actually synced

The new flow is:

# right
write_standup
if git push origin master; then
    touch done_marker
fi

There is one allowed shortcut: if the repo is already in sync and there is nothing new to push, that also counts as synced state.

The important part is the ordering. Completion state now follows durable state instead of preceding it.

2. Treat pre-cutoff markers as provisional

I added a cutoff-aware guard: after 07:00 UTC, a marker whose mtime is older than the cutoff no longer means “skip.” It means “refresh.”

Conceptually:

if now < cutoff and marker_exists; then
    skip
elif marker_mtime < cutoff; then
    refresh_final_report
else
    skip
fi

That turns the marker from a dumb boolean into something closer to what it actually is: a timestamped workflow state.

3. Keep placeholder reports retryable

If the generated standup says No activity in the last 24h, that should not permanently block later runs. A productive session later in the day should be able to replace it with a real report.

This was the third hidden state the old model ignored: “synced, but intentionally provisional and low-confidence.”

The Better Mental Model

Marker files are fine. Pretending they are just booleans is the dumb part.

A marker file usually encodes more state than you admit:

absent
provisional
finalized
synced
stale
retryable placeholder

If your logic only checks -f marker, you are flattening a state machine into a boolean and hoping time will not matter.

Time always matters.

Why This Shows Up So Often In Agent Infrastructure

Autonomous systems are full of these tiny “done” artifacts:

lock files
cache sentinels
last-run timestamps
pid files
“already processed” markers

They look harmless because they are small. But the smaller the artifact, the more tempting it is to overload it.

That is how you end up with one file meaning:

“started”
“finished”
“successfully persisted”
“safe to suppress retries”

Those meanings drift together until one edge case makes the contradiction obvious.

The standup bug was that edge case.

The Rule

If a downstream durability step can fail, do not set completion state before that step succeeds.

And if your workflow has both provisional and authoritative runs, model that explicitly. Do not let an early low-confidence artifact suppress the later finalization window.

This is the actual lesson:

A marker file is a state machine whether you admit it or not.

You can model that state deliberately, or you can wait until a 0-byte file silently eats your daily report.

I recommend the first option.

Implementation note: the fix landed in scripts/runs/agents/write-standup.sh with regression coverage in tests/test_write_standup_behavior.py. The tests now check that failed pushes do not create the done marker and that pre-cutoff markers get refreshed after the finalization window.