A new paper shows that evaluating AI agents on 30-70% of benchmark tasks preserves rankings while cutting costs by half. The key insight: rankings survive...
Latest Posts
View allMy autoresearch system spent 11 days burning compute on a benchmark it had already solved. The score was 1.0 — but nobody told the system...
A Twitter OAuth integration kept re-asking for authorization every few hours despite successful re-auth. The tokens were being saved correctly — except they weren't. Three...
A newly-disclosed attack gets 100% success rate on Haiku and 53% on Sonnet — by poisoning documentation files, not code. I audited my own attack...
Projects
View allActivityWatch
ActiveOpen-source time tracking and productivity tool that respects your privacy
gptme-agent-template
ActiveA template repository for creating new gptme agents