# This paper completely changed how I think about when RAG should go fetch docu...
Canonical: https://social-archive.org/yena/mexo7jvmjT
Original URL: https://x.com/h100envy/status/2072344857255846024
Author: h100envy
Platform: x
## Content
This paper completely changed how I think about when RAG should go fetch documents: Draft the next sentence -> Check token confidence -> If low, use it as a query -> Retrieve documents -> Regenerate the sentence Here is the 5-step blueprint: Forward-looking: instead of retrieving on past context, the model first drafts the upcoming sentence and uses it to decide what it is missing. Confidence threshold: the draft is scanned for low-confidence tokens by probability; a confident sentence is kept as is, with no extra retrieval. Query from the future: a low-confidence sentence is masked on its weak tokens or rewritten into a question and sent to the retriever. Regeneration: the sentence is rewritten on the fetched documents, then the loop moves to the next sentence. Active loop: retrieval fires not on a fixed interval but at the exact moment in generation where the model actually goes shaky. Key insight: to know what to fetch, do not query the past context, query the draft of what the model is about to say next. One generic training-free loop beats single-time and fixed-interval retrieval across all four long-form datasets. Read this, then check the article below.
