Patronus AI lands $50M to build ‘digital worlds’ that stress-test AI agents
Agent-testing startup Patronus AI, founded by former Meta AI researchers, is experiencing nearly insatiable demand, its investor says.
AI agents are becoming more sophisticated. They are evolving from answering questions to autonomously executing multi-step complex tasks.
But before these agents can be trusted to book trips or conduct financial analysis on behalf of users, model providers and the startups building such agents want to ensure that they perform reliably across a vast range of scenarios.
AI labs often use benchmarks to show off their model’s prowess, but a high score, even on an agent-oriented benchmark, doesn’t actually prove that an AI can accomplish various complex, real-world jobs correctly.
Patronus AI , a startup founded in 2023 by former Meta AI researchers Anand Kannappan and Rebecca Qian, is helping model makers and companies fine-tune models to do just that by building simulated digital environments in which to evaluate the agents’ performance.
The San Francisco-based startup must be solving an important problem. Virtually every frontier AI lab and many emerging startups are now customers, according to Glenn Solomon, a managing director at Notable Capital, who describes demand for the company’s simulated environments as nearly insatiable.
Patronus’ revenue has grown 15-fold over the past year, fueling significant investor interest. On Thursday, the company announced a $50 million Series B round led by Greenfield Partners, with participation from Notable Capital, Lightspeed, Datadog, and Samsung. The round brings the company’s total funding to $70 million.
Patronus uses what it calls “digital world models” to create replicas of websites and internal systems. In these environments, agents are stress-tested after training using reinforcement learning, which iteratively rewards successful task completion and penalizes errors.
AI labs see great value in these digital simulations because they give agents a chance to try different, sometimes unpredictable, scenarios. The company compares its approach to how Waymo trained autonomous cars by first building synthetic worlds to test vehicles against rare hazards, such as severe weather or a child running after a ball.
The difference with AI agents is that they tend to take shortcuts, which means they fail to complete the task correctly. “Patronus is really good at spotting the hacks and making sure they are holding the models accountable,” Solomon said.
Patronus is currently providing its simulated digital worlds for software engineering and finance, but these are just the start, according to Kannappan.
“Today we’re very focused on the problems that are verifiable, so the problems that you can immediately check and verify, but there are a ton more areas that are very non-verifiable or very hard to verify,” he said.
Just because these processes are verifiable doesn’t mean they are simple. “We want to be able to actually create the environment in which you can operate an agent that can run for 10 hours or 10 days or 10 weeks,” Kannappan said.
As for rivals, Patronus believes it is primarily competing against the internal teams AI labs have already built to evaluate agent behavior. While human-data firms like Mercor and Surge help model makers with reinforcement learning, Patronus operates differently by evaluating how agents behave without any human involvement.
When you purchase through links in our articles, we may earn a small commission . This doesn’t affect our editorial independence.
Marina Temkin is a venture capital and startups reporter at TechCrunch. Prior to joining TechCrunch, she wrote about VC for PitchBook and Venture Capital Journal. Earlier in her career, Marina was a financial analyst and earned a CFA charterholder designation.
You can contact or verify outreach from Marina by emailing marina.temkin@techcrunch.com or via encrypted message at +1 347-683-3909 on Signal.
Last chance to save up to $190 on TechCrunch Founder Summit. Join 1,000+ founders and VCs at all stages for real-world scaling insights and connections that move the needle. Savings end June 26, 11:59 p.m. PT .
Former Infosys chief has a new startup that wants to challenge the IT services world Jagmeet Singh
Former Infosys chief has a new startup that wants to challenge the IT services world
Former Infosys chief has a new startup that wants to challenge the IT services world
OpenAI unveils its first custom chip, built by Broadcom Russell Brandom
OpenAI unveils its first custom chip, built by Broadcom
OpenAI unveils its first custom chip, built by Broadcom
HaloBraid raises $7M from Seven Seven Six to end the six-hour hair salon appointment Dominic-Madori Davis
HaloBraid raises $7M from Seven Seven Six to end the six-hour hair salon appointment
HaloBraid raises $7M from Seven Seven Six to end the six-hour hair salon appointment
WhatsApp gets new chief as Meta taps India’s CRED founder Kunal Shah and invests $900M in startup Jagmeet Singh
WhatsApp gets new chief as Meta taps India’s CRED founder Kunal Shah and invests $900M in startup
WhatsApp gets new chief as Meta taps India’s CRED founder Kunal Shah and invests $900M in startup
Beyond Siri: Here are the practical AI features coming to your iPhone in iOS 27 Sarah Perez
Beyond Siri: Here are the practical AI features coming to your iPhone in iOS 27
Beyond Siri: Here are the practical AI features coming to your iPhone in iOS 27
Every new iOS 27 feature that’s worth knowing about Lauren Forristal
Every new iOS 27 feature that’s worth knowing about
Every new iOS 27 feature that’s worth knowing about
Aura’s impressive e-ink photo frame doesn’t even look digital Amanda Silberling
Aura’s impressive e-ink photo frame doesn’t even look digital
Aura’s impressive e-ink photo frame doesn’t even look digital
Pontos-chave
- A Patronus AI destaca a crescente demanda por ambientes de teste para agentes de IA, refletindo uma necessidade crítica no desenvolvimento de tecnologias confiáveis.
- O crescimento exponencial da receita da Patronus pode inspirar startups brasileiras a investir em soluções semelhantes, especialmente em setores como agronegócio e finanças.
- A abordagem de simulação de mundos digitais pode se tornar uma tendência no desenvolvimento de IA, aumentando a qualidade e a segurança dos modelos.
Análise editorial
A captação de US$ 50 milhões pela Patronus AI destaca a crescente demanda por soluções que garantam a confiabilidade de agentes de IA em cenários complexos. Para o setor de tecnologia brasileiro, isso representa uma oportunidade de aprendizado e adaptação, especialmente considerando que o Brasil abriga um ecossistema de startups em IA em expansão. A necessidade de ambientes de teste simulados, como os oferecidos pela Patronus, pode inspirar iniciativas locais que busquem desenvolver soluções semelhantes, focadas em setores específicos da economia brasileira, como agronegócio e finanças.
Além disso, a abordagem da Patronus, que utiliza modelos de mundos digitais para estressar agentes de IA, pode ser um indicativo de uma nova tendência no desenvolvimento de IA: a criação de ambientes de teste que simulem a complexidade do mundo real. Isso pode levar a um aumento na qualidade dos modelos desenvolvidos, o que, por sua vez, pode acelerar a adoção de IA em setores críticos da economia. Startups brasileiras que atuam em áreas como saúde e logística podem se beneficiar imensamente de tais inovações.
O crescimento exponencial da receita da Patronus, que aumentou 15 vezes em um ano, é um sinal claro de que o mercado está reconhecendo a importância de garantir a eficácia e a segurança dos agentes de IA. Para o Brasil, isso pode significar um aumento no investimento em pesquisa e desenvolvimento, além de uma maior colaboração entre universidades e empresas para criar soluções que atendam às demandas locais. O que observar a seguir é como as startups brasileiras responderão a essa demanda por ambientes de teste e se conseguirão criar suas próprias versões dessas soluções inovadoras.
Por fim, a participação de investidores renomados, como a Samsung e a Lightspeed, pode abrir portas para parcerias e colaborações internacionais que beneficiem o ecossistema brasileiro. A troca de conhecimento e tecnologia entre empresas de diferentes países pode acelerar o desenvolvimento de soluções de IA que atendam às necessidades específicas do mercado brasileiro, promovendo um ambiente mais robusto e inovador para o setor de tecnologia no país.
O que esta cobertura entrega
- Atribuicao clara de fonte com link para a publicacao original.
- Enquadramento editorial sobre relevancia, impacto e proximos desdobramentos.
- Revisao de legibilidade, contexto e duplicacao antes da publicacao.
Fonte original:
TechCrunch AISobre este artigo
Este artigo foi curado e publicado pelo AIDaily como parte da nossa cobertura editorial sobre desenvolvimentos em inteligência artificial. O conteúdo é baseado na fonte original citada abaixo, enriquecido com contexto e análise editorial. Ferramentas automatizadas podem auxiliar tradução e estruturação inicial, mas a decisão de publicar, a revisão factual e o enquadramento de contexto seguem responsabilidade editorial.
Saiba mais sobre nosso processo editorial