AI giant vows more transparency amid national security concerns

Anthropic will now disclose when requests are downgraded or rejected after criticism over hidden restrictions

US artificial intelligence giant Anthropic said on Wednesday it would make the safeguards governing its most advanced AI models more transparent, including by disclosing when user requests are downgraded or rejected. The move follows criticism over restrictions that were previously not visible to users.

Previously, Anthropic could silently route requests involving areas such as cybersecurity, biology, and advanced AI development from its Fable 5 model to the less capable Opus 4.8. Under the new policy, users will be notified when a request is flagged, while Application Programming Interfaces (API) developers will receive explanations for any rejection or fallback to another model.

The approach of routing some requests related to frontier AI development to a less capable model had drawn criticism from researchers, who argued that the restrictions could slow progress in the field. Responding to the backlash, Anthropic agreed to make the safeguards visible.

“Starting this week, flagged requests will visibly fall back to Opus 4.8 – the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal,” Anthropic said.

Fable 5 is a publicly released model from Anthropic’s Mythos class, which the company unveiled in April but initially withheld, saying models in the family were too adept at bypassing cybersecurity safeguards and too dangerous for broad deployment. Anthropic released Fable 5 this week, saying its capabilities “exceed those of every model we’ve previously made generally available.”

New AI too dangerous for public release – Anthropic

In its latest statement, Anthropic said it would continue downgrading some requests under policies banning use of its models to build competing AI systems, adding that such restrictions are standard in the industry and do not affect most coding and machine learning work.

The company also cited national security as a reason for rejecting or downgrading some requests, saying it wanted to prevent foreign adversaries from using its technology to strengthen their AI capabilities.

“The US and its allies hold an edge in frontier chips and the highly optimized software that runs them at full potential,” a company spokesperson told Fortune. “These safeguards ensure Claude [Anthropic’s family of AI models] isn’t used to erode that advantage – by optimizing chips developed by those adversaries, for example.”

You can share this story on social media:

Follow RT on