I think I just broke local AI inferencing

Qwen3.5-35B-A3B AWQ
1,010,000 tokens context (yeah ONE Million)
4,350,080 tokens KV cache (FOUR POINT THREE Million, not a typo)
TurboQuant 3.5
All running on a USB-charger-sized GB10 GPU.

Now the fun part, vLLM access log numbers from a cold start run:
~350 tokens/sec generation throughput peak
~260 tokens/sec sustained under load
64 concurrent requests handled
0.4% to 6.6% KV cache utilization → meaning this thing is barely warming up
Prefix cache hit rate: 0% (no tricks, raw performance)…

Author posts