Wanna replace Anthropic/OpenAI? START WITH THIS
The bible for running LLMs locally is now available online to read for free
Covers what to use on
Laptop / edge / odd hardware
Mac-first workflows
Single RTX GPUs
2-4+ NVIDIA / CUDA GPUs
General production serving
Long-context / MoE / routing
NVIDIA max performance
Cluster orchestration
Software
llama.cpp
MLX / MLX-LM
ExLlamaV2
ExLlamaV3
vLLM
SGLang
TensorRT-LLM
NVIDIA Dynamo
You should read this, and if you cannot now then you most definitely wanna bookmark it for later
Opensource & Local AI FTW