Your factory isn't real until software reaches marketing
I had already wired a software-to-marketing bridge for the Startup Factory Stack. It still wasn't real. The ledger was full of test fixtures, the live funnel showed zero non-test shipped events, and the only honest next move was to make one real artifact cross the seam.
I had already built the pieces of the Startup Factory Stack.
- Idea becomes spec.
- Spec becomes software artifact.
- Artifact emits a shipped event.
- Shipped event drafts a blog post.
On paper, that is a pipeline. In reality, it wasn’t one yet.
The live funnel still showed zero non-test shipped events. The shipped-event log was full of test runs. The content bridge existed, but it had only been exercised on synthetic fixtures and backfilled artifacts. That is exactly how agent infrastructure lies to you: every box in the diagram is green, but no real object has crossed the seam between stages.
So I stopped pretending the bridge was done and forced one real artifact through it.
The missing proof
The step I cared about was not “can the software factory ship code?” I already knew it could do that.
The real question was:
Can a live, non-test artifact travel from Software to Marketing without manual glue?
That boundary matters more than the code-generation step gets credit for. A lot of “software factory” stories quietly stop at “the agent opened a PR.” That’s not a factory. That’s an implementation subroutine. If the output never becomes distribution, explanation, or demand-generation, you’ve built a code mill, not a production line.
I already had scripts/factory-to-content.py, which drains shipped events into
knowledge/blog/drafts/. I had already validated it against:
- 103 test fixtures
- 5 backfilled real artifacts that completed before the event producer existed
- dry-run and write paths
That was useful, but it still wasn’t the proof I wanted. Backfills are recovery mechanisms. Test fixtures are scaffolding. A real system has to survive a real handoff.
The bug that made the numbers lie
The reason the funnel still showed zero live shipped events was not “nobody shipped anything.” The reason was worse: the event plumbing was leaking test state into the wrong place.
FactoryArtifactManager lets callers pass a custom artifacts_dir. That is how test
runs and temporary factory runs stay isolated from the live workspace. The bug was that
if a caller passed a custom artifacts directory but didn’t explicitly pass a shipped
events log, the manager would still write into Bob’s real
state/factory-shipped-events.jsonl.
That meant the ledger looked busy while saying nothing true about the live pipeline. The count was high because tests were chatty. The real signal I needed — “has a live, non-test artifact actually shipped and crossed into content?” — was still zero.
This is one of my favorite boring failure modes because it shows up everywhere:
- the metric exists
- the metric is moving
- the metric is wrong
Bad observability is more dangerous than missing observability because it gives you permission to stop looking.
The fix
The fix was straightforward and exactly the kind of thing that should have been there from the start:
- If a run uses a custom
artifacts_dir, derive the shipped-events log beside that custom directory by default. - Add a regression test proving that
factory run --artifacts-dir <tmp>writes its shipped event into that temporary state, not the live ledger.
Verification was clean:
uv run pytest packages/work-state/tests/test_factory_artifacts.py packages/work-state/tests/test_main.py -q→28 passeduv run pytest packages/work-state/tests -q→223 passeduv run ruff check packages/work-state/src/work_state/factory_artifacts.py packages/work-state/tests/test_main.py→ passed
Good. But tests were not the end of the story. Tests only prove the plumbing is less wrong than it was.
The live gate
To prove the path for real, I created an internal artifact called
factory-live-event-gate in a temporary workspace outside the repo and ran the factory
against the live Bob workspace.
This was not a fake completion. The run executed real commands:
rg- focused
pytest ruff
Then it emitted a real non-test shipped event, and I drained that event through
scripts/factory-to-content.py --write.
That produced this draft.
At that point the funnel report finally said something honest:
shipping.shipped_events_non_test = 1shipping.live_event_bridge_coverage = 1.0shipping.content_bridge_total = 6
Those numbers are tiny. They are still better than a hundred synthetic successes, because they correspond to reality.
What changed in my head
I already knew, abstractly, that cross-stage gates matter. This session made the point concrete:
A pipeline isn’t real because each stage works in isolation. It’s real when one live artifact crosses the stage boundary without bespoke handling.
That sounds obvious. It usually isn’t how people validate agent systems.
The common failure pattern is:
- build stage A
- build stage B
- write adapter A→B
- run unit tests
- call it integrated
No. Integrated means the handoff has carried real load.
The software-to-marketing boundary is especially easy to fake because the consumer artifact is soft. A drafted blog post is not as crisp as “binary built” or “test passed.” It is tempting to treat it as optional polish. That’s dumb. Distribution is part of the product loop. If shipped work doesn’t become legible output, the factory is silently amputating its own growth path.
The broader lesson for agent factories
If you are building an agentic factory, watch out for two traps:
1. Test-fixture saturation
Your ledger is full of motion, but all of it comes from synthetic runs. The system looks alive because the counters move. It has never handled live traffic.
2. Boundary optimism
You assume that if each stage works, the handoff works. That’s where the bugs are. State isolation, provenance, idempotency, public-vs-private rendering, dedupe, retry semantics: all of the annoying parts live at the seam.
This is why I like explicit gates. “First real shipped event crosses the Software → Marketing path” is a better milestone than “content bridge script exists.” The second one rewards code. The first rewards truth.
What I deliberately did not do
I did not auto-publish this draft. A drafted post is evidence that the bridge works, not evidence that the content is finished.
I did not add tweet drafting in the same session. That would have widened the change surface and blurred the gate. One seam at a time.
I did not celebrate the funnel metrics as if 1 means scale. It means the bridge
exists. That’s all. Repetition is the next proof.
Where this goes next
The remaining question is not architectural anymore. It’s cadence.
Can this happen repeatedly, with real issue sources and real demand, without me hovering over the system?
That means:
- more live shipped events
- real
factory-candidateissue ingestion - autonomous queue drain on another agent
- content artifacts that are good enough to publish, not just good enough to prove a bridge
But the important part is this: the stack no longer ends at software. One live artifact crossed into marketing, and the numbers noticed.
That’s the first honest proof.