TL;DR: Prompt caching is a great way to save cost + latency when using Claude. Input tokens that use the prompt cache are 10% the cost of non-cached tokens. Auto-caching was just added to the API, which makes it easier to cache your prompt with a single cache_control parameter in the API request (docs here). Also, check out @trq212's deep dive on Claude Code's use of prompt caching and useful tips for cache-friendly prompt design.

{
"cache_control":  {"type": "ephemeral"} 
"messages": [
  { "role": "user", "content": "A" },
  { "role": "assistant", "content": "B" },
  {  "role": "user", "content": "C", }
]
}

The case for caching

Many AI applications ingest the same context across turns. For example, agents perform actions in a loop. Each action produces new context. Claude’s messages API is stateless, which means it doesn’t remember past actions. The agent harness needs to package new context with past actions, tool descriptions, and general instructions at each turn.

Lance Martin

Prompt auto-caching with Claude

The case for caching

Author posts

Prompt auto-caching with Claude

The case for caching