Loop engineering is for rich builders. Do this instead.
Peter Steinberger posts a monthly reminder that you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.
That post has 8.2 million views.
He's right about the shift. Karpathy, Boris Cherny, and Steinberger all renamed the skill. It's loop engineering now.
He also posted a different screenshot.
$1.3 million in OpenAI tokens in 30 days. Just under $20,000 in a single day. 603 billion tokens across about 100 agents.
OpenAI pays that bill because he works there.
The reminder leaves that part out.
A loop makes sense when tokens cost you nothing and your time is the scarce thing. Steinberger has unlimited tokens. You don't.
I run on a budget. You probably do too.
I get most of what a loop gets me for a fraction of the bill. These are the five plays I run instead.
https://pbs.twimg.com/media/HLHHxJvW0AAB5hx.jpg
What you'll get from this:
→ how a loop quietly runs up the bill
→ 5 cheaper plays that land most of the result
→ the one case where a loop is worth paying for
What loop engineering is
A loop is a short program wrapped around an agent.
The agent runs. It checks its own work. If the work isn't done, it runs again. It keeps going until a test passes or a human stops it.
Geoffrey Huntley popularized the version most people copy. They call it the Ralph loop.
Hand it a task, walk away, come back to finished code. When it works, it feels like magic.
The bill is where it stops feeling like magic.
The agent re-sends your full context on every turn, plus every tool result from every turn before it. By turn ten, you're paying to process ten copies of your starting context and all the noise stacked on top.
Input tokens drive the cost, not the code the agent writes back.
A loop with no stop condition has no ceiling on spend. The agent finishes at turn 8, decides to double-check, wanders off at turn 12, and sits at turn 40 still going. The dashboard shows healthy retries the whole time. The invoice shows something else.
A 50-iteration run on a real codebase can pass $50 to $100 in credits. That's one task.
Steinberger ran the same pattern to $1.3 million in a month. He later clarified that figure used Codex Fast Mode, and that turning it off drops the raw cost to around $300,000. That's still a year of engineer salary, and it's still subsidized.
A loop trades your time for the model's tokens. For Steinberger, that's a free trade. For you, every turn is real money leaving your account.
The loop was built for the person who isn't paying.
https://pbs.twimg.com/media/HLHH0OeWIAAdyaT.jpg
Play 1: Spend the thinking before the tokens
Most of a loop's cost goes to recovery.
The agent starts from a vague instruction, guesses what you meant, drifts, and burns turns walking back the drift.
Tell an agent to "build me auth" and it loops for thirty turns deciding what you wanted. Session tokens or JWT? Refresh logic or none? It guesses, you correct, it guesses again, and you pay for every round.
Write the spec first, and the guessing stops. Name the auth flow, the token expiry, the error states, and the definition of done. Hand the agent that, and one pass reaches what thirty turns of looping was circling.
The spec costs you an hour. The loop costs you a thousand turns. One of those is cheaper.
Play 2: Plan cheap, execute expensive
Your flagship model shouldn't be reading files.
Planning, summarizing, looking up a function, renaming variables. A small model does all of it for a fraction of the price.
Anthropic's Haiku runs at $1 per million input tokens and $5 per million output. Opus runs at $5 and $25. On the work that doesn't need a frontier model, that's a 5x markup you're choosing to pay.
Pair a cheap planner with an expensive executor. One model drafts the implementation plan and reads the codebase. The strong agent only touches the part that's genuinely hard to get right.
Most of any task is the easy 90%. Stop routing it to the priciest model you own.
Play 3: Turn on caching
Caching is the fifteen-minute change that cuts your repeated input by 90%.
Anthropic charges a cache read at 10% of the standard input price. You add a cache_control marker to the stable part of your context, and every call after the first reads it at a tenth of the cost.
One published breakdown runs a 50,000-token system prompt across 500 requests a day. Without caching, it's $75 a day. With caching, it's about $7.69. That's roughly $24,500 saved in a year from a single prompt.
A loop re-sends the same context on every turn. That repeated context is the exact thing caching discounts. The habit that runs up the loop bill is the habit caching makes cheap.
For any job that doesn't need an answer right now, batch processing takes another 50% off the whole thing.
https://pbs.twimg.com/media/HLHH3kGWEAANf4l.jpg
Play 4: Engineer the context instead of iterating into it
A loop gets expensive when its context grows, and its context grows every time the agent reads something new.
Cognition, the team behind Devin, gave the rule a name. Delegate reads, centralize writes.
Send the bulky work to a cheap side-agent. Code search, doc lookup, reading forty files to find one function. That agent does the digging and hands back a short summary. Your main agent reads the summary, not the forty files.
The main thread stays a clean, short line of reasoning. It never carries the whole repository in its window, so you never pay to re-process the whole repository on the next turn.
Load the context right once. Stop paying to rediscover it.
Play 5: Be the loop yourself
The cheapest loop runs in your head.
An autonomous loop pays the model to supply judgment it doesn't reliably have. You supply that judgment for free.
Gate the work instead of automating it. The agent does one step. You read it, you correct it, you point it at the next step. Three passes you steered beat thirty the agent ran alone, and you know where every one of them went.
If you do automate, put a ceiling on it. Cap the run at a fixed number of steps. The Vercel AI SDK defaults to 20 for a reason. A loop with no cap is an open tab at the bar, and the agent keeps ordering.
When a loop is worth paying for
Some tasks earn the loop.
If the finish line is a passing test suite, the run is bounded, and an hour of your time is worth more than the tokens, let it run. Overnight refactors and large backlogs you can't hand-process fit that shape.
That's a narrow slice of the work. Most of what you do every day isn't an overnight refactor.
The reframe
Loops are powerful. For most people, they're also expensive in a way the hype skips over.
The people running $20,000 days aren't sharper than you. They just aren't paying.
You don't need to outspend them. You need to out-think them.
Spend the thinking before the tokens. Send cheap work to cheap models. Cache what repeats. Keep the context lean. Stay in the loop yourself.
That's the game.
The model doesn't think. You do.
P. S. Get my Claude skills bundle ������
https://linktr.ee/alex_prompter