LLMs

Encyclopedia Britannica is suing OpenAI for allegedly ‘memorizing’ its content with ChatGPT

Published byAIDaily Editorial Team
2 min read
Original source author: Stevie Bonifield

On Friday, Encyclopedia Britannica and dictionary publisher Merriam-Webster filed a lawsuit against OpenAI alleging that it used their copyrighted content to train its AI, then generated responses that were "substantially similar" to their content, as previously reported by Reuters. According to Britannica, OpenAI repeatedly copied its content without permission, stating, "GPT-4 itself has 'memorized' much […]

Share:

The lawsuit accuses OpenAI of outputting near-identical copies of Britannica and Merriam-Webster’s content.

The lawsuit accuses OpenAI of outputting near-identical copies of Britannica and Merriam-Webster’s content.

On Friday, Encyclopedia Britannica and dictionary publisher Merriam-Webster filed a lawsuit against OpenAI alleging that it used their copyrighted content to train its AI, then generated responses that were “substantially similar” to their content, as previously reported by Reuters .

According to Britannica, OpenAI repeatedly copied its content without permission, stating, “GPT-4 itself has ‘memorized’ much of Britannica’s copyrighted content and will output near-verbatim copies of significant portions on demand. The memorized examples are unauthorized copies that [OpenAI] used to train their models, including GPT-4.”

The lawsuit goes on to include examples of responses from OpenAI’s models side-by-side with Britannica’s text, in which entire passages appear to match word-for-word. Britannica also claims that OpenAI has been “cannibalizing” its web traffic by generating responses that “substitute, or directly compete” with Britannica’s content, rather than directing users to its website the way a traditional search engine would.

It’s the latest in a growing series of copyright lawsuits from publishers aimed at AI companies over the past several years. The New York Times has made similar claims in its ongoing lawsuit against OpenAI , including accusing the AI company of copying mass amounts of its copyrighted content. In September, Anthropic settled a class action lawsuit for using copyrighted books to train its AI models, resulting in a $1.5 billion payout to the books’ authors.

European retailers yank popular headphones after study reports trace amounts of hormone-disrupting chemicals

Meta is reportedly laying off up to 20 percent of its staff

Apple’s $549 AirPods Max 2 add better ANC and live translation

MacBook Air M5 review: a small update for the ‘just right’ Mac

The $100,000 fee for H-1Bs is causing all sorts of problems

Key takeaways

  • The lawsuit highlights the growing concern over unauthorized use of copyrighted content by AI.
  • The case may impact the financial sustainability of media outlets by cannibalizing their traffic.
  • The outcome could set important precedents for future interactions between AI companies and content creators.

Editorial analysis

The lawsuit filed by Encyclopedia Britannica and Merriam-Webster against OpenAI marks a significant milestone in the discussions surrounding copyright and the use of protected content in AI models. This case reflects a growing concern among publishers and content creators about how their works are being used to train AI systems, raising questions about the ethics and legality of such practices. For the Brazilian tech sector, which is also venturing into AI solutions, this situation serves as a warning about the need to respect copyright and establish transparent practices in data usage for model training.

Moreover, the allegation that OpenAI is 'cannibalizing' Britannica's web traffic by generating responses that directly compete with original content highlights a critical point: how AIs are shaping the dynamics of information consumption. This could lead to a decrease in traffic to original content sites, which, in turn, may affect the financial sustainability of these outlets. In Brazil, where many media outlets are still struggling to adapt to the digital environment, this issue is particularly relevant, as it could impact the viability of business models based on advertising and subscriptions.

The outcome of this case could set important precedents for future interactions between AI companies and content creators. As more publications and publishers voice their concerns about unauthorized use of their material, we are likely to see an increase in legal actions in this regard. For startups and tech companies in Brazil, this may mean a need to review their data collection and usage strategies, ensuring compliance with copyright laws and avoiding potential litigation. What is observed is a shift in the paradigm of how information is accessed and utilized, which may require a reassessment of current practices in the sector.

Finally, the current situation may also stimulate a broader debate about AI regulation and the need for a legal framework that protects both content creators and technology companies. As AI becomes increasingly integrated into our daily lives, creating a balance between innovation and copyright protection will be crucial for the sustainable development of the tech sector, both in Brazil and globally.

What this coverage includes

  • Clear source attribution and link to the original publication.
  • Editorial framing about relevance, impact, and likely next developments.
  • Review for readability, context, and duplication before publication.

Original source:

The Verge AI

About this article

This article was curated and published by AIDaily as part of our editorial coverage of artificial intelligence developments. The content is based on the original source cited below, enriched with editorial context and analysis. Automated tools may assist with translation and initial structuring, but publication decisions, factual review, and contextual framing remain editorial responsibilities.

Learn more about our editorial process