# I think I just broke local AI inferencing Qwen3.5-35B-A3B AWQ 1,010,000 token...
Canonical: https://social-archive.org/nbluemer/1ZNSyN0TCO
Original URL: https://www.linkedin.com/posts/ownyourai_i-think-i-just-broke-local-ai-inferencing-share-7442864550783123456-bHUd/
Author: Mitko Vasilev
Platform: linkedin
## Content
I think I just broke local AI inferencing Qwen3.5-35B-A3B AWQ 1,010,000 tokens context (yeah ONE Million) 4,350,080 tokens KV cache (FOUR POINT THREE Million, not a typo) TurboQuant 3.5 All running on a USB-charger-sized GB10 GPU. Now the fun part, vLLM access log numbers from a cold start run: ~350 tokens/sec generation throughput peak ~260 tokens/sec sustained under load 64 concurrent requests handled 0.4% to 6.6% KV cache utilization → meaning this thing is barely warming up Prefix cache hit rate: 0% (no tricks, raw performance) My system is not optimized. It’s heavily underutilized. Make sure you own your AI. AI in the cloud is not aligned with you; it’s aligned with the company that owns it.
