NVIDIA's Director just dropped an insane 11-page paper that breaks how everyone is building AI agents.
The shift: you stop trusting the model to be safe. You build a system around it that assumes it's already hijacked.
Plan → Approve → Act → Enforce → Repeat
if your agent reads email, the web, or a tool's output, a single hidden line can give it orders and no prompt will save you.
so they wrap the model: an orchestrator plans, a policy approver checks every action, an enforcer kills it before it touches anything real.
stop asking if the model is safe.
build it so a hijacked one still can't do damage.
read the paper first, then the article below.
1 / 2