My Exact AI Coding Workflow, Step by Step

The earlier posts in this series were the theory.

This is the actual operating procedure.

If I am building a real feature with AI today, this is roughly the sequence I follow. Not every task uses every step with the same intensity, but this is the baseline workflow I trust.

The point of the workflow is not to make AI look magical.

It is to make the output predictable enough that I can move quickly without pretending first drafts are perfect.

Step 1: Pick a Bounded Task

The workflow starts before the prompt.

I want a task that is clear enough to hand off and review.

That means I avoid starting with something like:

"make the task board easier to use"

Instead I break it down, something more like:

"show a 5-second Undo toast after someone deletes a task from the board" "allow moving a task between columns by dragging from the task card itself instead of just the handle" "add a keyboard shortcut to archive completed tasks from the board view"

and so on.

If the task is too broad, the model improvises too much and I lose control of the review surface.

Bounded tasks make everything else easier:

the prompt is clearer
the scenarios are clearer
the diff is clearer
the review is faster

This sounds mundane, but bad task definition poisons the whole chain.

Step 2: Dump Context Fast

Once the task is scoped, I front-load context.

Usually I dictate this rather than typing it, because dictation makes it much easier to include all the details I would otherwise trim.

I will mention things like:

where the relevant files probably are
what the current UX does
what I want changed
what edge cases I am already worried about
what should definitely not regress

The goal is not elegance.

The goal is transmission.

If I know something that would help the model make a better decision, I want it in the prompt instead of in my head.

That usually means the prompt is longer than people expect.

I am not trying to write a beautiful prompt. I am trying to offload as much of the real task context as possible in one pass.

Step 3: Ask for Understanding Before Changes

This is still one of the most important steps in the whole process.

Before I let the model write code, I want it to inspect the relevant part of the codebase and explain it back to me.

A typical first prompt sounds like:

I want you to inspect how task deletion currently works before you change anything. Read the board item actions, the deletion mutation, whatever optimistic update or cache logic exists, and the current toast or notification system. Then explain it back to me in plain English. I want to know how a task gets removed from the UI today, what files own that behavior, what happens immediately on the client versus what waits on the server, and where an undo flow should plug in if we want this to stay reliable. If there is already some pattern elsewhere in the app for reversible actions, call that out too. Do not implement yet. I want understanding first.

The behavior I want is very specific. When someone deletes a task from the board, it should disappear immediately so the UI still feels fast, and at the same time we should show a toast with an Undo action for 5 seconds. If they hit Undo in that window, restore the task to the same column and as close to the same position as we can reasonably get. If they do nothing, then after the 5 seconds we treat the deletion as final. I do not want weird state drift here, especially if someone deletes multiple tasks quickly, refreshes during the undo window, has a slow or failed network request, or if the board changes underneath us while the toast is still open. Please point out any edge cases or tradeoffs you see, and ask clarifying questions before implementation if anything about the current flow is ambiguous.

This is one reason I think dictated prompts work so well.

A short prompt sounds clean, but it usually hides the exact behavior, constraints, and edge cases that actually matter.

This gives me two things:

the model builds context before acting
I can verify its understanding before trusting the implementation

If the explanation is wrong, I correct it early.

That is far cheaper than correcting a bad implementation later.

Step 4: Write a Thin Spec When the Task Deserves It

For small changes, I skip this.

For anything with moving parts, I usually write a short spec or structured note first.

Not a giant document.

Just enough to define:

files or modules
interfaces
key behaviors
non-negotiable constraints

For example:

## Task Delete Undo
 
### Scope
- Add a 5-second undo flow after deleting a task from the board
- Reuse the current deletion path and existing toast system
 
### Constraints
- Do not change the persisted task schema
- Keep the existing delete entry point and board interactions intact
- Avoid a full board refetch for every delete if the current flow already updates local state
 
### Scenarios
1. Deleting a task removes it immediately and shows an Undo toast
2. Clicking Undo within 5 seconds restores the task to the same column and roughly the same position
3. If no action is taken, the task remains deleted after the timeout expires
4. Multiple deletes in a row do not break the board state or restore the wrong task
5. If the delete fails, the task is restored and the user sees an error state

This is usually enough to keep the model inside the right boundaries.

Step 5: Define the Scenarios Explicitly

I treat scenarios as the real contract.

If the scenarios are vague, the implementation will be vague.

If the scenarios are sharp, the model has a concrete target and I have a concrete review standard.

This is also where I often ask the model to help:

"List any edge cases or scenarios I am missing."

It is usually good at finding a few that matter.

By this point, the task has three layers of structure:

contextual prompt
current-state understanding
behavior contract

That is a much better starting point than "build feature X."

Step 6: Let It Implement

Only after the first five steps do I actually ask for code.

Then I get out of the way.

I do not like hovering while the model writes. That tends to make me overreact to implementation details before I have seen the result as a whole.

I would rather let it produce a coherent first pass and then review the output.

This is where a lot of the leverage comes from.

Once the model has enough context and enough structure, implementation becomes the comparatively easy part.

Step 7: Dry Run the Scenarios

Before runtime testing, I often ask for a dry run.

That means:

walk through each scenario step by step through the code and tell me where anything breaks.

This catches things like:

missing rollback paths
stale state
bad focus handling
incomplete error handling
branch logic that never actually leads to the expected UI

It is cheap and it works.

The dry run is one of the best bug filters I have.

Step 8: Run Real Feedback Loops

After the dry run, I want actual feedback.

That means some combination of:

linter
type checker
tests
browser interaction
console inspection

This is where the workflow stops being about generation and becomes about convergence.

I will often give instructions like:

Run the relevant checks. Then test the scenarios in the browser. If anything fails, fix it and retest. Report back with what changed and anything you are still uncertain about.

This is the difference between "the model wrote code" and "the model participated in a development loop."

That second version is the one I care about.

Step 9: Review the Diff Like a Pull Request

Once the model has converged as much as it can, I review the result like I would review a PR from a teammate.

I am looking for:

architectural weirdness
missed edge cases
places where the behavior drifted from the spec
unnecessary scope expansion
anything that feels under-tested

I am not trying to prove I could have written it differently.

I am trying to decide whether this output is safe, correct, and aligned.

That is an important mindset shift.

You get much better results from AI when you treat review as review, not as a chance to reassert authorship over every line.

Step 10: Use a Second Model When It Matters

For higher-risk work, I sometimes ask another model to review the output.

That can be surprisingly effective.

Not because the second model is always better, but because it usually has different blind spots.

My prompt is usually direct:

Another model implemented this feature. Review the diff critically. Find bugs, edge cases, design issues, and anything that does not match the intended behavior.

Then I feed that review back into the original loop or handle the fixes myself.

I do not do this for every tiny change.

But for bigger features, it is a useful extra filter.

Step 11: Merge, Then Reset Context

After a feature is done, I prefer to reset cleanly.

Long-running sessions degrade.

Context drifts.

The model starts anchoring to earlier assumptions or irrelevant history.

So once a task is complete, I would usually rather:

merge it
start a fresh session
rebrief the next task cleanly

Fresh context with a good prompt beats stale context with accumulated noise surprisingly often.

That is especially true once you start running multiple agents in parallel.

What This Looks Like End to End

If I compress the workflow into one linear sequence, it is basically this:

scope the task tightly
dump the relevant context
ask the model to understand first
write a thin spec if needed
define scenarios
implement
dry run
run checks and runtime validation
review the diff
optionally cross-review with another model
merge and reset

That is the workflow.

It is not complicated because each step is individually clever.

It works because each step reduces a different class of error.

Together, they create a system that is much more reliable than simple one-shot prompting.

Where the Leverage Really Comes From

The biggest misconception about this workflow is that the leverage comes from "AI writes code fast."

That is true, but it is not the whole story.

The bigger leverage comes from the fact that once the workflow is stable:

direction gets faster
iteration gets faster
validation gets faster
review gets more focused

That is a much bigger shift than autocomplete.

It changes what part of the job dominates your day.

I spend much less time typing implementation and much more time:

defining behavior
shaping tasks
reviewing outcomes
making architecture decisions

That is why this workflow feels like a genuine change in how software gets built, not just a new productivity trick.

The Most Important Part

If I had to collapse everything into one sentence, it would be this:

AI becomes reliable when you stop treating it like a generator and start treating it like a participant in a full engineering loop.

That is the whole system.

And once you feel how much better that works, it is very hard to go back.

My Exact AI Coding Workflow, Step by Step

Step 1: Pick a Bounded Task

Step 2: Dump Context Fast

Step 3: Ask for Understanding Before Changes

Step 4: Write a Thin Spec When the Task Deserves It

Step 5: Define the Scenarios Explicitly

Step 6: Let It Implement

Step 7: Dry Run the Scenarios

Step 8: Run Real Feedback Loops

Step 9: Review the Diff Like a Pull Request

Step 10: Use a Second Model When It Matters

Step 11: Merge, Then Reset Context

What This Looks Like End to End

Where the Leverage Really Comes From

The Most Important Part

Get future posts by email

Discussion