Author group sues Salesforce for building XGen AI models on pirated books library

Fonte Cryptopolitan

Salesforce, a software giant, has been sued by a group of authors in federal court in San Francisco for building its XGen AI models on a pirated library of books. According to the lawsuit, they scrubbed references to those sources once questions arose.

The lawsuit was filed on Wednesday by authors E. Molly Tanzer and Jennifer Gilmore under the Copyright Act. It states ongoing infringement, saying Salesforce  “continues to do so by continuing to store, copy, use, and process the datasets containing copies of Plaintiffs’[…] copyrighted books.”

The complaint cites statements from Salesforce CEO Marc Benioff, who told a Bloomberg interviewer in January 2024 that AI companies ripped off training data and that all the training data has been stolen.

The authors seek class certification for all US copyright holders whose works have been used since October 2022. They are seeking statutory damages, the destruction of infringing copies, the return of profits, a declaration of willful infringement, and attorneys’ fees.

Salesforce faces a strong case; AI companies escaped similar claims

According to the complaint, Salesforce pirated hundreds of thousands of copyrighted books to develop its XGen series of large language models. They did this by using the “notorious RedPajama and The Pile datasets,” which have a book corpus called Books3 that has more than 196,000 books copied from the private tracker Bibliotik.

The filing states that Salesforce first mentioned “RedPajama-Books” as one of its training sources when it launched XGen in June 2023. An engineer for the company then linked GitHub users directly to both datasets.

However, by September, those mentions were taken down from Salesforce’s website and replaced with vague descriptions of “natural language data” from “publicly available sources.” The next month, Hugging Face, the site that hosted Books3, removed the dataset due to copyright concerns.

Additionally, the lawsuit revealed that in 2022, Salesforce trained its CodeGen models on The Pile. The company then introduced the technology to the market through its Agentforce AI platform, with the XGen-Sales model being released in October 2024.

However, according to experts, authors must prove real financial harm, not just that their books were used for training. Recently, Judge Vince Chhabria dismissed similar claims against Meta, ruling that “simply claiming ‘our work was used’ isn’t enough.” To that end, the judge found Meta’s use of copyrighted books for training AI as fair use.

Additionally, as reported by Cryptopolitan, recent rulings have favored OpenAI and Anthropic in similar cases, with judges finding that authors failed to prove market harm. However, one judge criticized Anthropic for maintaining a permanent library of pirated books.

Salesforce taps Google’s Gemini AI to power Agentforce 360

In other news, Salesforce has extended its partnership with Google to include deeper integration of Gemini AI models with its Agentforce 360 platform.

Gemini’s multimodal intelligence will be integrated into the Salesforce ecosystem as a result of the partnership. This will help support tasks such as hybrid reasoning and multi-step process automation across enterprise sales and IT services.

The expanded integration enables the Atlas Reasoning Engine, central to Agentforce 360, to leverage Gemini models. This gives enterprise workflows additional model options.

Additionally, the hybrid reasoning capability enables users to set up AI agents within Salesforce that produce consistent and accurate outputs. The collaboration also extends the reach of Salesforce’s Gemini integration, previously limited to Gmail, to other Google Workspace applications, including Sheets, Docs, Drive, Slides, and Meet.

Agentforce 360 now supports native interoperability with Google Workspace, allowing users to initiate sales engagements, qualify leads, and schedule meetings from within applications like Gmail and Google Calendar. It also provides direct access to Salesforce Customer 360 apps within Google tools, streamlining data access and workflow continuity for sales and service teams.

Salesforce chief scientist Silvio Savarese said, “In the enterprise environment, it’s imperative for AI agents to be highly capable and highly consistent, especially for critical use cases […] Together, we are setting a new standard for building the future of what’s possible in the Agentic Enterprise down to the model level.”

Sharpen your strategy with mentorship + daily ideas - 30 days free access to our trading program

Isenção de responsabilidade: Apenas para fins informativos. O desempenho passado não é indicativo de resultados futuros.
placeholder
O ouro reverte a queda corretiva intradiária abaixo de US$ 4.300; volta perto da máxima históricaO ouro (XAU/USD) reverte uma queda na sessão asiática para a região de US$ 4.280-4.279 e volta a subir para perto do pico histórico, atingido na manhã desta sexta-feira.
Autor  FXStreet
11 horas atrás
O ouro (XAU/USD) reverte uma queda na sessão asiática para a região de US$ 4.280-4.279 e volta a subir para perto do pico histórico, atingido na manhã desta sexta-feira.
placeholder
Ouro atinge recorde acima de US$ 4.300 com aversão ao risco; dólar cai com expectativa de corte de juros nos EUAO ouro fechou em forte alta e atingiu o patamar inédito de US$ 4.300 por onça-troy pela primeira vez na história nesta quinta-feira (16), marcando sua quinta sessão consecutiva de ganhos.
Autor  Pedro Augusto Prazeres
14 horas atrás
O ouro fechou em forte alta e atingiu o patamar inédito de US$ 4.300 por onça-troy pela primeira vez na história nesta quinta-feira (16), marcando sua quinta sessão consecutiva de ganhos.
placeholder
SARE11 avança em processo de liquidação; XPSF11 lucra R$ 2,9 milhões e investe em GARE11O fundo de investimento imobiliário Santander Renda de Aluguéis (SARE11) divulgou seu relatório gerencial referente ao mês de setembro, informando uma receita de R$ 4,633 milhões.
Autor  Pedro Augusto Prazeres
15 horas atrás
O fundo de investimento imobiliário Santander Renda de Aluguéis (SARE11) divulgou seu relatório gerencial referente ao mês de setembro, informando uma receita de R$ 4,633 milhões.
placeholder
Ethereum (ETH) luta para se manter acima de US$ 4.000; sentimento do Bitcoin (BTC) atinge 'medo extremo'O Ether (ETH), ativo nativo da rede Ethereum, continua a ser negociado abaixo do crucial patamar de US$ 4.000, lutando para estabelecer um impulso de alta direcional após o "flash crash" da última sexta-feira.
Autor  Pedro Augusto Prazeres
15 horas atrás
O Ether (ETH), ativo nativo da rede Ethereum, continua a ser negociado abaixo do crucial patamar de US$ 4.000, lutando para estabelecer um impulso de alta direcional após o "flash crash" da última sexta-feira.
placeholder
O ouro atinge novo recorde, com os riscos econômicos e as apostas na redução das taxas pelo Fed impulsionando a demandaO ouro (XAU/USD) prolonga sua tendência de alta pelo quinto dia consecutivo e atinge novos recordes durante a sessão asiática desta quinta-feira, em meio a preocupações globais.
Autor  FXStreet
Ontem 06: 06
O ouro (XAU/USD) prolonga sua tendência de alta pelo quinto dia consecutivo e atinge novos recordes durante a sessão asiática desta quinta-feira, em meio a preocupações globais.
goTop
quote