yena shared this post · 2h ago
Mnimiy

NVIDIA's Director just dropped an insane 11-page paper that breaks how everyone is building AI agents.

The shift: you stop trusting the model to be safe. You build a system around it that assumes it's already hijacked.

Plan → Approve → Act → Enforce → Repeat

if your agent reads email, the web, or a tool's output, a single hidden line can give it orders and no prompt will save you.

so they wrap the model: an orchestrator plans, a policy approver checks every action, an enforcer kills it before it touches anything real.

stop asking if the model is safe.

build it so a hijacked one still can't do damage.

read the paper first, then the article below.

47