Volpeon (Bonus) :wvrnFlat: (@volpi)

I'm now trying Qwen 3 30B A3B, which came out today. I use a version with very little quantization because I heard it still runs very fast even when mostly offloaded to regular RAM, and it's true.
The Chinese at least work on making LLMs more efficient. OpenAI's whole game plan is "send more money and GPUs please

" all the time.