Hello everyone, leopardracer here.

I moved everything to a headless Mac Mini M4, the base $599 model with 16GB of RAM. Started with smaller Qwen models for message classification and context compression. Then I found a way to run a 35 billion parameter model on 16 gigs. 17 tokens per second. Zero swap. Everyone said you need 32GB minimum. I’ve been running it for about a week now.
Google dropped Gemma 4 under Apache 2.0. Within hours I benchmarked it against my Qwen setup. Classification went from 8.5 seconds to 1.9 seconds. By evening the swap was done. Five files changed.
This is all of it. How I fit a 35B model on a $600 machine, what local models actually do when you’re running an AI agent 24/7 (spoiler: it’s way more than classification), how model routing works across three tiers, and why Gemma 4 made me swap the brain overnight.
https://pbs.twimg.com/media/HFtDh1RXkAA3zqy.png
My Mac Mini runs as a headless AI automation server.