Autonomous Agents

Test-Driven Development

January 3, 2026

Sunday Evening 22/02/2026

I've been having some chats lately about best-practice references for 'eval-driven development for autonomous AI agents'.

The idea here is to set up (similar to a CI/CD pipeline) a series of acceptance tests that every new edit by an agent on your prodject has to satisfy, before they are allowed to commit their changes. For example, when working on a React project, you can specify a bunch of acceptance tests in the agents.md files (so that every time a coding agent makes any changes) it will for example first

check if the dev build is still compiling running correctly (to prevent you from fidning out that whatever it did broke the build, and now you have to ask him to fix the build or manually do this yourelf)
check all the npm dependencies, and version numbers of the packages it is using (to prevent wildgrowth of extra packages/libraries when it keeps wanting to import libraries for every little task you ask it)

I don't systematically use this enough in setting up the agents.md files, and here I will soon document some best practice summaries for this

Continue reading:The Dual LLM Pattern