Skip to main content
anmdotdev
0

Complete the Loop: Why Your AI Writes Bad Code (and How to Fix It)

8 min readAI, Workflow, Developer Tools, TestingAnmol Mahatpurkar

The most common complaint about AI coding is not that the model is dumb.

It is this:

"It wrote something that looked plausible, but it did not actually work."

That complaint is real. I hit it constantly when I first started using AI for production work.

The mistake I made was assuming the problem lived in the model.

Most of the time, it did not.

It lived in the loop.

I was asking the agent to write code, but I was not giving it any meaningful way to verify whether the code worked. It could not see the browser. It could not click the UI. It could not inspect the console. It could not trace the network requests. It could not compare the actual result against the behavior I wanted.

I had effectively hired a developer and blindfolded them.

Once I understood that, my workflow changed. And so did the quality.


What a Human Developer Has That AI Usually Does Not

Think about how a normal development loop works for a human:

  1. you write code
  2. you run it
  3. you look at what happened
  4. you notice what broke
  5. you fix it
  6. you run it again

That loop sounds obvious because it is obvious. It is the core of engineering work.

But most people use AI like this instead:

  1. ask for code
  2. paste the code
  3. see that it is wrong
  4. tell the model "it does not work"
  5. hope it guesses correctly

That second loop is terrible.

The model is not debugging. It is guessing.

And if all you give it is "the button is broken" or "the page is blank" or "I got an error," you are forcing it to infer runtime reality from incomplete, second-hand descriptions.

That is why the loop matters more than the prompt.


The Core Idea: Give the Agent the Same Feedback You Use

When I say "complete the loop," I mean something simple:

Give the agent access to the signals a competent developer would use to self-correct.

That usually means some combination of:

  • browser access
  • console output
  • network visibility
  • tests
  • linting
  • type checking
  • screenshots or DOM inspection

If the model can make a change, observe the result, and iterate against real feedback, the quality jumps.

If it cannot, you are stuck in an expensive game of telephone.

This is why I think people sometimes underrate AI tools too early. They are evaluating them in a workflow where the model has no sensory input after it writes the first draft.

Of course the results are mediocre in that setup.

You removed the most important part of the engineering process.


The Loop Has Three Layers

Over time, I realized I rely on three different kinds of feedback, and the best AI workflows use all three.

1. Static feedback

This is the fast, mechanical layer:

  • linter
  • type checker
  • unit tests

These tools answer questions like:

  • is the code syntactically valid?
  • does it match the type contracts?
  • did it break something obvious?

This layer is non-negotiable. If your codebase does not have solid static feedback, AI work gets much riskier.

2. Logical feedback

This is where scenario tracing comes in.

Before I even run a feature, I often ask the agent to walk through the scenarios mentally:

"Trace what happens when the user clicks save. Which function runs? What request fires? How does the UI update? What happens if the request fails?"

This catches a surprising number of issues before the browser is even involved:

  • missing rollback logic
  • incorrect state transitions
  • unhandled empty states
  • stale cache invalidation
  • bad assumptions about control flow

This step is cheap and high leverage.

3. Runtime feedback

This is the layer most people skip, and it is the one that makes everything feel real.

Can the agent:

  • open the app?
  • click the button?
  • fill the form?
  • read the DOM?
  • inspect the console?
  • see the error?
  • retry after a fix?

If yes, it can actually behave like a developer.

If no, it is still just producing drafts.


Browser Access Changes the Game

The single biggest upgrade to my AI workflow was giving the agent browser access.

In practice, that means tools like Playwright MCP and Chrome DevTools MCP.

I am not attached to those exact names. The point is the capability.

Once the model can interact with the running application and inspect what happened, the workflow stops being theoretical.

Now I can say:

"The app is running locally. Test the signup flow end to end. Use the scenarios we defined. If anything fails, debug it and tell me what you changed."

That is a very different request from:

"Write a signup form."

The first request creates a loop.

The second request creates a draft.

And that distinction is most of the game.


The Dry Run Is the Cheapest Feedback You Have

Before the browser, before the console, before the test run, there is one step I use constantly because it catches bugs with almost no cost:

the dry run.

For every meaningful feature, I define scenarios and then ask the model to trace them.

For example:

The scenarios for this feature are:
1. User creates a post
2. User edits a post
3. User deletes a post
4. User with expired auth loses the save request gracefully
5. Empty state appears when no posts remain

Then I ask:

"Walk through each scenario step by step through the code and tell me where anything would fail."

This sounds basic, but it is one of the highest-return prompts I use.

It forces the model to:

  • check whether state updates are coherent
  • notice unhandled branches
  • follow the API flow explicitly
  • compare the code against the intended behavior

A lot of bugs die right there.

Not because the model got smarter, but because the loop got tighter.


A Better Mental Model Than "Generate Code"

The wrong mental model is:

"I ask for code and judge the result."

The better mental model is:

"I am directing an implementation loop."

That loop looks more like this:

  1. define the behavior
  2. let the model implement
  3. let the model inspect
  4. let the model correct
  5. review the result

That subtle shift changes how you prompt.

You stop asking only for output.

You start asking for verification.

Not:

"Build the feature."

But:

"Build the feature, test it against these scenarios, inspect the console, fix what fails, and report back with anything still uncertain."

That is a much better instruction because it mirrors how good engineering work actually happens.


A Concrete Example

Suppose I want an AI agent to build a basic post editor.

A weak prompt would be:

"Build a CRUD UI for blog posts."

A stronger loop-based workflow would be:

  1. Define the scenarios.
  2. Ask the agent to inspect the current data flow first.
  3. Have it implement the feature.
  4. Ask for a dry run on each scenario.
  5. Run the browser flow.
  6. Inspect any console or network failures.
  7. Fix and retest.

That might sound like more work than the one-line prompt.

It is not.

It is less work than debugging vague AI output for an hour.

This is the broader pattern I have found over and over with AI coding:

shortcuts on the input side create work on the correction side.

The loop is how you pay more upfront and much less later.


What Complete the Loop Does Not Mean

It does not mean AI becomes perfect.

It does not mean you can stop reviewing code.

It does not mean every feature becomes one-shot.

It means the model can now participate in the same correction cycle that a human engineer would use.

That is enough.

I do not need perfect first drafts.

I need fast convergence.

That is a much more realistic standard, and it is the one that actually creates leverage.


The Minimal Version You Should Start With

If you want the practical takeaway, it is this:

  1. Write down the scenarios before implementation.
  2. Ask the model to inspect the relevant code first.
  3. Make it dry-run the scenarios.
  4. Give it static feedback with lint, types, and tests.
  5. Give it runtime feedback with browser and console access.
  6. Review the diff after it converges.

That is enough to dramatically improve results.

You do not need a huge stack of tools on day one.

You just need to stop treating code generation as the whole task.

The real task is the loop around the generation.

Once you fix that, the model gets much better very quickly.


The New Standard

When I look at AI-generated work now, I am no longer asking:

"Did it write something plausible?"

I am asking:

"Did it have enough feedback to become correct?"

That is the standard that matters.

A model working blind can produce something elegant and still wrong.

A model working inside a complete loop can start rough and still arrive at something shippable.

That is why I think feedback infrastructure matters more than prompt tricks.

It is also why I think so many developers are underestimating these tools.

They are trying to evaluate the output without upgrading the loop.

That is like judging a developer after taking away their browser, tests, and console.

You would never do that to a human engineer.

Do not do it to the model either.


Was this helpful?
0

Newsletter

Get future posts by email

If this piece was useful, subscribe to get the next one in your inbox.

No spam. Double opt-in. One email per post.

Discussion

0/2000