根据大佬的推荐我梳理了一份高质量 AI Engineer 的学习资料清单,值得收藏学习!
太干了太干了!
🥳🥳🥳
一共 11 部分太长了放不下,剩下6部分放评论区。
- Harness engineering,不只是 prompt engineering
文章|Martin Fowler:Harness Engineering for Coding Agent Users — 理解“agent = model + harness”,也就是模型之外的上下文组装、工具接口、状态、执行循环、错误处理、评测与观测层。链接:https://martinfowler.com/articles/harness-engineering.html
文章|Anthropic:Building Effective AI Agents — 学习 agentic workflow、tool use、agent-computer interface、透明规划、简化设计等工程原则。链接:https://www.anthropic.com/research/building-effective-agents
文章|OpenAI:Unrolling the Codex Agent Loop — 看真实 coding agent harness 如何组织模型、工具、prompt、执行循环和性能设计。链接:https://openai.com/index/unrolling-the-codex-agent-loop/
YouTube|How We Build Effective Agents: Barry Zhang, Anthropic — Anthropic agent 架构文章的视频版补充。链接:https://www.youtube.com/watch?v=D7_ipDqhtwk
- Prompt caching vs. semantic caching tradeoffs
官方文档|OpenAI Prompt Caching — 学 prompt caching 的 provider-side 机制、适用条件和 cached token 统计。链接:https://developers.openai.com/api/docs/guides/prompt-caching
官方文档|Anthropic Prompt Caching — 学 automatic caching 与 explicit cache breakpoints 的区别。链接:https://platform.claude.com/docs/en/build-with-claude/prompt-caching
文章|Redis:Prompt caching vs semantic caching — 建立 tradeoff:prompt caching 适合复用固定长上下文;semantic caching 适合复用“语义相近”的问题答案;生产系统常常两者结合。链接:https://redis.io/blog/prompt-caching-vs-semantic-caching/
PDF|GPTCache: An Open-Source Semantic Cache for LLM Applications — 学 semantic cache 的论文级实现:embedding、similarity search、cache hit、错误命中风险、成本/延迟收益。链接:https://aclanthology.org/2023.nlposs-1.24.pdf
- KV cache management at scale
PDF|vLLM / PagedAttention:Efficient Memory Management for LLM Serving — 核心论文,重点看 PagedAttention 如何把 KV cache 分成 block,减少碎片并提升 serving throughput。链接:https://arxiv.org/pdf/2309.06180
文档|vLLM Automatic Prefix Caching Implementation — 看工程实现:KV cache 被分成 KV blocks,并允许非连续物理内存存储。链接:https://docs.vllm.ai/en/v0.6.1/automatic_prefix_caching/details.html
PDF|LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference — 学跨请求、跨 engine 的 KV cache 复用、offloading、prefill-decode disaggregation。链接:https://lmcache.ai/tech_report.pdf
YouTube|Fast LLM Serving with vLLM and PagedAttention — 配合论文理解 vLLM serving、KV cache、PagedAttention 的直觉。链接:https://www.youtube.com/watch?v=5ZlavKF_98U
- Speculative decoding vs. quantization
PDF|Fast Inference from Transformers via Speculative Decoding — speculative decoding 经典论文,理解 draft model 先猜 token、target model 并行验证的机制。链接:https://arxiv.org/pdf/2211.17192
PDF|QLoRA: Efficient Finetuning of Quantized LLMs — 理解 quantization 基础:4-bit quantized model、LoRA adapter、NF4、double quantization、paged optimizer。链接:https://openreview.net/pdf?id=OUIFPHEgJU
PDF|QSPEC: Speculative Decoding with Complementary Quantization Schemes — 专门研究 speculative decoding 与 quantization 如何结合。链接:https://aclanthology.org/2025.emnlp-main.240.pdf
文章|Google Cloud:Five techniques to reach the efficient frontier of LLM inference — 把 continuous batching、paged attention、routing、speculative decoding、quantization 放在一个 inference optimization 框架里看。链接:https://cloud.google.com/blog/topics/developers-practitioners/five-techniques-to-reach-the-efficient-frontier-of-llm-inference
YouTube|Faster LLMs: Accelerate Inference with Speculative Decoding — speculative decoding 的入门视频。链接:https://www.youtube.com/watch?v=VkWlLSTdHs8
- Structured output failures & fallback chains
官方文档|OpenAI Structured Outputs — 学 JSON Schema、strict schema、structured response 的基本能力和限制。链接:https://developers.openai.com/api/docs/guides/structured-outputs
文章|OpenAI:Introducing Structured Outputs in the API — 理解 JSON mode 与 Structured Outputs 的差别:JSON mode 不等于 schema 一定正确。链接:https://openai.com/index/introducing-structured-outputs-in-the-api/
文档|Instructor:Structured LLM Outputs + Validation / Reasking — 学 Pydantic schema、validation failure 后自动 retry / re-ask 的模式。链接:https://python.useinstructor.com/
文档|Pydantic AI Output Validation — 学模型原生 structured output 之外,为什么还需要应用层 validation 与 retry budget。链接:https://pydantic.dev/docs/ai/core-concepts/output/
文档|Guardrails AI — 学如何把 raw output、validated output、validation success/failure 作为系统状态处理。链接:https://guardrailsai.com/guardrails/docs/concepts/guard
YouTube|Validate & Standardize LLM Output with Guardrails-AI — 输出验证和标准化的实操视频。链接:https://www.youtube.com/watch?v=r3JdQxtxVuM