Complete the Loop: Why Your AI Writes Bad Code (and How to Fix It)

The biggest weakness in most AI coding workflows is not that the model writes bad first drafts.

It is that we as humans are still doing all the validation.

I describe the feature. The model writes some code. Then I open the browser, click around, read the console, notice the failed request, explain the failure back to the model, and repeat.

That is not really delegation.

It is manual QA wrapped around code generation.

Once I started seeing that clearly, a lot of the usual advice started to feel incomplete.

Yes, context matters. Yes, better prompts help. Yes, it is useful to explain the intent precisely.

But the real unlock is giving the agent a way to validate its own work.

If the agent can inspect the running app, run the checks, compare behavior against scenarios, and keep iterating until the evidence looks right, a lot of the "context problem" starts solving itself.

Now it can discover reality instead of guessing at it.

That is what I mean by completing the loop.

What a Human Developer Has That AI Usually Does Not

Think about how a normal development loop works for a human:

you write code
you run it
you look at what happened
you notice what broke
you fix it
you run it again

That loop sounds obvious because it is obvious. It is the core of engineering work.

But most people use AI like this instead:

ask for code
paste the code
see that it is wrong
tell the model "it does not work"
hope it guesses correctly

That second loop is terrible.

The model is not debugging. It is guessing.

And if all you give it is "the button is broken" or "the page is blank" or "I got an error," you are forcing it to infer runtime reality from incomplete, second-hand descriptions.

That is why the loop matters more than the prompt.

Most Teams Already Describe Intent. They Do Not Describe Proof.

Most engineers are actually pretty good at describing what they want.

They say things like:

add a settings page for team billing
fix the broken onboarding modal
make the invite flow handle expired tokens gracefully

That is usually enough intent to start.

What is usually missing is the other half:

how the feature should be validated
what scenarios must pass
how local auth works
what browser state matters
what request should fire
what counts as "done"

So the model produces a draft. The human becomes the test harness. The human becomes the browser. The human becomes the console. The human becomes the network inspector.

And then people conclude that AI is unreliable.

The more precise conclusion is this:

the agent was never allowed to close its own loop.

That distinction matters a lot.

Because once you hand over both, the workflow changes from "generate something plausible" to "keep working until the evidence matches the intent."

The Core Idea: Give the Agent the Same Feedback You Use

When I say "complete the loop," I mean something simple:

Give the agent access to the signals a competent developer would use to self-correct.

That usually means some combination of:

browser access
console output
network visibility
tests
linting
type checking
screenshots or DOM inspection

If the model can make a change, observe the result, and iterate against real feedback, the quality jumps.

If it cannot, you are stuck in an expensive game of telephone.

This is why I think people sometimes underrate AI tools too early. They are evaluating them in a workflow where the model has no sensory input after it writes the first draft.

Of course the results are mediocre in that setup.

You removed the most important part of the engineering process.

The Loop Has Three Layers

Over time, I realized I rely on three different kinds of feedback, and the best AI workflows use all three.

1. Static feedback

This is the fast, mechanical layer:

linter
type checker
unit tests

These tools answer questions like:

is the code syntactically valid?
does it match the type contracts?
did it break something obvious?

This layer is non-negotiable. If your codebase does not have solid static feedback, AI work gets much riskier.

2. Logical feedback

This is where scenario tracing comes in.

Before I even run a feature, I often ask the agent to walk through the scenarios mentally:

"Trace what happens when the user clicks save. Which function runs? What request fires? How does the UI update? What happens if the request fails?"

This catches a surprising number of issues before the browser is even involved:

missing rollback logic
incorrect state transitions
unhandled empty states
stale cache invalidation
bad assumptions about control flow

This step is cheap and high leverage.

3. Runtime feedback

This is the layer most people skip, and it is the one that makes everything feel real.

Can the agent:

open the app?
click the button?
fill the form?
read the DOM?
inspect the console?
see the error?
retry after a fix?

If yes, it can actually behave like a developer.

If no, it is still just producing drafts.

Browser Access Changes the Game

The single biggest upgrade to my AI workflow was giving the agent browser access.

In practice, that means tools like Playwright Skills and Chrome DevTools Skills.

I am not attached to those exact names. The point is the capability.

Once the model can interact with the running application and inspect what happened, the workflow stops being theoretical.

Now I can say:

"The app is running locally. Test the signup flow end to end. Use the scenarios we defined. If anything fails, debug it and tell me what you changed."

That is a very different request from:

"Write a signup form."

The first request creates a loop.

The second request creates a draft.

And that distinction is most of the game.

The Dry Run Is the Cheapest Feedback You Have

Before the browser, before the console, before the test run, there is one step I use constantly because it catches bugs with almost no cost:

the dry run.

For every meaningful feature, I define scenarios and then ask the model to trace them.

For example:

The scenarios for this feature are:
1. User creates a post
2. User edits a post
3. User deletes a post
4. User with expired auth loses the save request gracefully
5. Empty state appears when no posts remain

Then I ask:

"Walk through each scenario step by step through the code and tell me where anything would fail."

This sounds basic, but it is one of the highest-return prompts I use.

It forces the model to:

check whether state updates are coherent
notice unhandled branches
follow the API flow explicitly
compare the code against the intended behavior

A lot of bugs die right there.

Not because the model got smarter, but because the loop got tighter.

A Better Mental Model Than "Generate Code"

The wrong mental model is:

"I ask for code and judge the result."

The better mental model is:

"I am directing an implementation loop."

That loop looks more like this:

define the behavior
let the model implement
let the model inspect
let the model correct
review the result

That subtle shift changes how you prompt.

You stop asking only for output.

You start asking for verification.

Not:

"Build the feature."

But:

"Build the feature, test it against these scenarios, inspect the console, fix what fails, and report back with anything still uncertain."

That is a much better instruction because it mirrors how good engineering work actually happens.

A Concrete Example

Suppose I want an AI agent to build a basic post editor.

A weak prompt would be:

"Build a CRUD UI for blog posts."

A stronger loop-based workflow would be:

Define the scenarios.
Ask the agent to inspect the current data flow first.
Have it implement the feature.
Ask for a dry run on each scenario.
Run the browser flow.
Inspect any console or network failures.
Fix and retest.

That might sound like more work than the one-line prompt.

It is not.

It is less work than debugging vague AI output for an hour.

This is the broader pattern I have found over and over with AI coding:

shortcuts on the input side create work on the correction side.

The loop is how you pay more upfront and much less later.

What Complete the Loop Does Not Mean

It does not mean AI becomes perfect.

It does not mean you can stop reviewing code.

It does not mean every feature becomes one-shot.

It means the model can now participate in the same correction cycle that a human engineer would use.

That is enough.

I do not need perfect first drafts.

I need fast convergence.

That is a much more realistic standard, and it is the one that actually creates leverage.

The Minimal Version You Should Start With

If you want the practical takeaway, it is this:

Write down the scenarios before implementation.
Ask the model to inspect the relevant code first.
Make it dry-run the scenarios.
Give it static feedback with lint, types, and tests.
Give it runtime feedback with browser and console access.
Review the diff after it converges.

That is enough to dramatically improve results.

You do not need a huge stack of tools on day one.

You just need to stop treating code generation as the whole task.

The real task is the loop around the generation.

Once you fix that, the model gets much better very quickly.

The New Standard

When I look at AI-generated work now, I am no longer asking:

"Did it write something plausible?"

I am asking:

"Did it have enough feedback to become correct?"

That is the standard that matters.

A model working blind can produce something elegant and still wrong.

A model working inside a complete loop can start rough and still arrive at something shippable.

That is why I think feedback infrastructure matters more than prompt tricks.

It is also why I think so many developers are underestimating these tools.

They are trying to evaluate the output without upgrading the loop.

That is like judging a developer after taking away their browser, tests, and console.

You would never do that to a human engineer.

Do not do it to the model either.

Complete the Loop: Why Your AI Writes Bad Code (and How to Fix It)

What a Human Developer Has That AI Usually Does Not

Most Teams Already Describe Intent. They Do Not Describe Proof.

The Core Idea: Give the Agent the Same Feedback You Use

The Loop Has Three Layers

1. Static feedback

2. Logical feedback

3. Runtime feedback

Browser Access Changes the Game

The Dry Run Is the Cheapest Feedback You Have

A Better Mental Model Than "Generate Code"

A Concrete Example

What Complete the Loop Does Not Mean

The Minimal Version You Should Start With

The New Standard

Get future posts by email

Discussion