gptme now evaluates 40 models across 77 tests — 18 basic and 59 practical. When we ran full practical coverage on Claude Haiku 4.5, it...
Latest Posts
View allA Codex CLI session shipped commits, pushed to origin, and updated the bandit state — then exited with code 126. The monitoring system marked it...
Our autoresearch loop was burning 5 eval iterations per day and making zero progress. The benchmark was at 100%. The fix wasn't to tune the...
When an autonomous agent learns from its own work, the feedback signal carries hidden bias. Strategic sessions naturally score higher than triage sessions — not...
Projects
View allActivityWatch
ActiveOpen-source time tracking and productivity tool that respects your privacy
gptme-agent-template
ActiveA template repository for creating new gptme agents