Programação

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Publicado porRedacao AIDaily
4 min de leitura
Autor na fonte original: Lorenzo Franceschi-Bicchierai

Cybersecurity researchers are complaining that Anthropic's new model Fable has guardrails that are too strict for any cybersecurity work.

Compartilhar:

Anthropic released its latest model Fable on Tuesday, billing it as a public and limited version of its powerful and much-hyped cybersecurity model Mythos.

But not everyone is happy with the restrictions, and a number of cybersecurity researchers and professionals have aired complaints online.

“[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post,” said Valentina “Chompie” Palmiotti, a well-known security researcher who works at IBM X-Force.

When a prompt triggers its guardrails, Fable pauses the chat and says that its “safety measures flagged this message for cybersecurity or biology topics.”

The guardrails were put in place to limit the risk that Fable could be used to develop malware or compromise software — a long-standing concern within Anthropic. The restrictions on biology come from a similar concern around developing biological weapons .

When the AI giant released Mythos in April, it restricted the model to a limited number of companies and organizations in what it called Project Glasswing , an effort to deploy the model to secure critical software and infrastructure. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries.

But despite the good intentions, many cybersecurity experts are still put off by the haphazard nature of the restrictions. Matt Suiche, a cybersecurity veteran, told TechCrunch that “if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded.” Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail. “It seems to be keyword based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails.”

Contact Us Do you have more information about how hackers are using AI? Or how cybersecuity companies are using AI? We’d love to hear from you. From a non-work device and network, you can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, or via Telegram and Keybase @lorenzofb, or email .

“But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,” said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”

Another researcher griped on X that “even asking for a code review” triggers Fable’s guardrails.

Anthropic did not immediately respond to a request for comment.

Apart from guardrails inside its models, Anthropic requires cybersecurity professionals to apply to the Cyber Verification Program . If they get approved, the applicants have fewer limitations on using Claude for cybersecurity work. OpenAI has a similar program called Trusted Access for Cyber .

When you purchase through links in our articles, we may earn a small commission . This doesn’t affect our editorial independence.

Lorenzo Franceschi-Bicchierai is a Senior Writer at TechCrunch, where he covers hacking, cybersecurity, surveillance, and privacy.

You can contact or verify outreach from Lorenzo by emailing lorenzo@techcrunch.com , via encrypted message at +1 917 257 1382 on Signal, and @lorenzofb on Keybase/Telegram.

Get an inside look at what it takes to scale and succeed from leaders at Mach Industries, Founders Fund, and Shinkei Systems. Through candid fireside chats and high-impact networking, you’ll walk away with valuable insights and new connections.

Google just fired a warning shot in the AI subscription price wars Lucas Ropek Connie Loizos

Google just fired a warning shot in the AI subscription price wars

Google just fired a warning shot in the AI subscription price wars

WWDC 2026: Everything announced on Siri AI, iOS 27, Apple Intelligence, and more Morgan Little Aisha Malik

WWDC 2026: Everything announced on Siri AI, iOS 27, Apple Intelligence, and more

WWDC 2026: Everything announced on Siri AI, iOS 27, Apple Intelligence, and more

Anthropic’s Claude Fable 5 is a version of Mythos the public can access today Rebecca Bellan

Anthropic’s Claude Fable 5 is a version of Mythos the public can access today

Anthropic’s Claude Fable 5 is a version of Mythos the public can access today

It’s not FAANG anymore. It’s MANGOS. Julie Bort

Microsoft’s open source tools were hacked to steal passwords of AI developers Zack Whittaker

Microsoft’s open source tools were hacked to steal passwords of AI developers

Microsoft’s open source tools were hacked to steal passwords of AI developers

Google will pay SpaceX $920M per month for compute Sean O'Kane

Google will pay SpaceX $920M per month for compute

Google will pay SpaceX $920M per month for compute

Mira Murati steps back into the spotlight, carefully Connie Loizos

Mira Murati steps back into the spotlight, carefully

Mira Murati steps back into the spotlight, carefully

Pontos-chave

  • As restrições do Fable podem limitar a eficácia das ferramentas de IA na cibersegurança.
  • A colaboração entre empresas de tecnologia e especialistas em segurança é essencial para o desenvolvimento de soluções eficazes no Brasil.
  • A evolução das guardrails será crucial para garantir que a IA possa ser utilizada de forma eficaz sem comprometer a segurança.

Análise editorial

A insatisfação dos pesquisadores de cibersegurança com as restrições do modelo Fable da Anthropic destaca um dilema crucial na interseção entre inteligência artificial e segurança digital. No Brasil, onde a cibersegurança é uma preocupação crescente, especialmente com o aumento de ataques cibernéticos, a capacidade de utilizar ferramentas de IA para fortalecer defesas é vital. No entanto, a rigidez das guardrails pode limitar a eficácia dessas ferramentas em um contexto onde a adaptabilidade e a inovação são essenciais.

As críticas ao Fable revelam um desafio comum enfrentado por desenvolvedores de IA: equilibrar a segurança e a utilidade. Embora as intenções por trás das restrições sejam compreensíveis, a aplicação excessiva pode resultar em frustração e desconfiança entre os profissionais da área. Isso é particularmente relevante no Brasil, onde a colaboração entre empresas de tecnologia e especialistas em segurança é fundamental para o desenvolvimento de soluções robustas e eficazes.

O futuro da cibersegurança em um mundo cada vez mais digitalizado dependerá da capacidade de empresas como a Anthropic de ajustar suas abordagens. A evolução das guardrails, como sugerido por especialistas, será crucial para garantir que a IA possa ser utilizada de forma eficaz sem comprometer a segurança. A interação entre as empresas de IA e as startups de cibersegurança no Brasil pode ser um indicador importante de como essas tecnologias se desenvolverão e se integrarão ao ecossistema local.

Por fim, é importante observar como a Anthropic e outras empresas de IA responderão ao feedback da comunidade de cibersegurança. A disposição para adaptar suas tecnologias pode não apenas melhorar a aceitação do mercado, mas também contribuir para um ambiente digital mais seguro e resiliente no Brasil e além.

O que esta cobertura entrega

  • Atribuicao clara de fonte com link para a publicacao original.
  • Enquadramento editorial sobre relevancia, impacto e proximos desdobramentos.
  • Revisao de legibilidade, contexto e duplicacao antes da publicacao.

Fonte original:

TechCrunch AI

Sobre este artigo

Este artigo foi curado e publicado pelo AIDaily como parte da nossa cobertura editorial sobre desenvolvimentos em inteligência artificial. O conteúdo é baseado na fonte original citada abaixo, enriquecido com contexto e análise editorial. Ferramentas automatizadas podem auxiliar tradução e estruturação inicial, mas a decisão de publicar, a revisão factual e o enquadramento de contexto seguem responsabilidade editorial.

Saiba mais sobre nosso processo editorial