烟花老师 on x

根据大佬的推荐我梳理了一份高质量 AI Engineer 的学习资料清单，值得收藏学习！

太干了太干了！

🥳🥳🥳

一共 11 部分太长了放不下，剩下6部分放评论区。

Harness engineering，不只是 prompt engineering

文章｜Martin Fowler：Harness Engineering for Coding Agent Users — 理解“agent = model + harness”，也就是模型之外的上下文组装、工具接口、状态、执行循环、错误处理、评测与观测层。链接：https://martinfowler.com/articles/harness-engineering.html

文章｜Anthropic：Building Effective AI Agents — 学习 agentic workflow、tool use、agent-computer interface、透明规划、简化设计等工程原则。链接：https://www.anthropic.com/research/building-effective-agents

文章｜OpenAI：Unrolling the Codex Agent Loop — 看真实 coding agent harness 如何组织模型、工具、prompt、执行循环和性能设计。链接：https://openai.com/index/unrolling-the-codex-agent-loop/

YouTube｜How We Build Effective Agents: Barry Zhang, Anthropic — Anthropic agent 架构文章的视频版补充。链接：https://www.youtube.com/watch?v=D7_ipDqhtwk

Prompt caching vs. semantic caching tradeoffs

官方文档｜OpenAI Prompt Caching — 学 prompt caching 的 provider-side 机制、适用条件和 cached token 统计。链接：https://developers.openai.com/api/docs/guides/prompt-caching

官方文档｜Anthropic Prompt Caching — 学 automatic caching 与 explicit cache breakpoints 的区别。链接：https://platform.claude.com/docs/en/build-with-claude/prompt-caching

文章｜Redis：Prompt caching vs semantic caching — 建立 tradeoff：prompt caching 适合复用固定长上下文；semantic caching 适合复用“语义相近”的问题答案；生产系统常常两者结合。链接：https://redis.io/blog/prompt-caching-vs-semantic-caching/

PDF｜GPTCache: An Open-Source Semantic Cache for LLM Applications — 学 semantic cache 的论文级实现：embedding、similarity search、cache hit、错误命中风险、成本/延迟收益。链接：https://aclanthology.org/2023.nlposs-1.24.pdf

KV cache management at scale

PDF｜vLLM / PagedAttention：Efficient Memory Management for LLM Serving — 核心论文，重点看 PagedAttention 如何把 KV cache 分成 block，减少碎片并提升 serving throughput。链接：https://arxiv.org/pdf/2309.06180

文档｜vLLM Automatic Prefix Caching Implementation — 看工程实现：KV cache 被分成 KV blocks，并允许非连续物理内存存储。链接：https://docs.vllm.ai/en/v0.6.1/automatic_prefix_caching/details.html

PDF｜LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference — 学跨请求、跨 engine 的 KV cache 复用、offloading、prefill-decode disaggregation。链接：https://lmcache.ai/tech_report.pdf

YouTube｜Fast LLM Serving with vLLM and PagedAttention — 配合论文理解 vLLM serving、KV cache、PagedAttention 的直觉。链接：https://www.youtube.com/watch?v=5ZlavKF_98U

Speculative decoding vs. quantization

PDF｜Fast Inference from Transformers via Speculative Decoding — speculative decoding 经典论文，理解 draft model 先猜 token、target model 并行验证的机制。链接：https://arxiv.org/pdf/2211.17192

PDF｜QLoRA: Efficient Finetuning of Quantized LLMs — 理解 quantization 基础：4-bit quantized model、LoRA adapter、NF4、double quantization、paged optimizer。链接：https://openreview.net/pdf?id=OUIFPHEgJU

PDF｜QSPEC: Speculative Decoding with Complementary Quantization Schemes — 专门研究 speculative decoding 与 quantization 如何结合。链接：https://aclanthology.org/2025.emnlp-main.240.pdf

文章｜Google Cloud：Five techniques to reach the efficient frontier of LLM inference — 把 continuous batching、paged attention、routing、speculative decoding、quantization 放在一个 inference optimization 框架里看。链接：https://cloud.google.com/blog/topics/developers-practitioners/five-techniques-to-reach-the-efficient-frontier-of-llm-inference

YouTube｜Faster LLMs: Accelerate Inference with Speculative Decoding — speculative decoding 的入门视频。链接：https://www.youtube.com/watch?v=VkWlLSTdHs8

Structured output failures & fallback chains

官方文档｜OpenAI Structured Outputs — 学 JSON Schema、strict schema、structured response 的基本能力和限制。链接：https://developers.openai.com/api/docs/guides/structured-outputs

文章｜OpenAI：Introducing Structured Outputs in the API — 理解 JSON mode 与 Structured Outputs 的差别：JSON mode 不等于 schema 一定正确。链接：https://openai.com/index/introducing-structured-outputs-in-the-api/

文档｜Instructor：Structured LLM Outputs + Validation / Reasking — 学 Pydantic schema、validation failure 后自动 retry / re-ask 的模式。链接：https://python.useinstructor.com/

文档｜Pydantic AI Output Validation — 学模型原生 structured output 之外，为什么还需要应用层 validation 与 retry budget。链接：https://pydantic.dev/docs/ai/core-concepts/output/

文档｜Guardrails AI — 学如何把 raw output、validated output、validation success/failure 作为系统状态处理。链接：https://guardrailsai.com/guardrails/docs/concepts/guard

YouTube｜Validate & Standardize LLM Output with Guardrails-AI — 输出验证和标准化的实操视频。链接：https://www.youtube.com/watch?v=r3JdQxtxVuM