Meta Faces Lawsuit for Using Pirated Books to Train AI Models

Meta Faces Lawsuit for Using Pirated Books to Train AI Models

Credit: Dado Ruvic / Illustration

Meta Platforms, which owns Facebook, is in court over a book piracy case when a group of authors accused the tech giant of using pirated copyrighted books in training its AI systems. It was filed against Meta in California federal court, alleging that Meta, knowingly used works without permission in order to help improve its models of AI, in particular the popular series Llama.

In their complaint, which lists authors like Ta-Nehisi Coates and Sarah Silverman among its plaintiffs, it accuses Meta of relying on a well-known pirated repository of books online known as the LibGen database, with the plaintiff accusing Meta in a complaint to be aware that those books it had relied on for data in creating the corpus are pirated. The authors claim that Meta’s actions were a clear violation of copyright law and they are suing for damages on grounds of infringement.

The lawsuit has generated much publicity as it reflects growing concerns over the sourcing of training data by AI companies, especially large language models. According to the lawsuit, Meta had allegedly been informed that the Books3 dataset—a collection of over 195,000 titles extracted from pirated content libraries—is pirated and did not get permission from the original authors. This is just one point in the recent heated debate regarding the use of AI within intellectual property rights.

Meta has addressed the lawsuit primarily by arguing in its defense for the use of copyrighted works as “fair use.” In court filings, the company argues that there is no requirement for permission or payment before using copyrighted works to train AI models. Meta has denied that it infringed on the plaintiffs’ copyrights, suggesting that its actions were justified under the fair use doctrine, which allows limited use of copyrighted works without permission in specific circumstances, such as research or commentary.

However, the authors involved in the lawsuit do not seem to be giving up. They argue that the utilization of their works for AI training is an illegal appropriation of their intellectual property. The plaintiffs are seeking leave to amend their complaint with additional copyright infringement claims.

The lawsuit also underscores the broader issues facing the AI industry. The question of how companies obtain and use data has emerged as a significant flashpoint in legal and ethical debates as AI technology advances rapidly. “While Meta and other tech giants continue to push the boundaries of artificial intelligence,” the plaintiffs argue, “they also have a responsibility to respect the intellectual property rights of creators.”

For Meta, this lawsuit is just one of several it faces over the use of copyrighted material to train AI systems. Other major AI companies, including OpenAI, have faced similar lawsuits as authors and content creators increasingly push back against what they see as the unauthorized use of their works to power AI models. The outcome of this case may shape the future of AI development and copyright law significantly.

With each progression of this case, eyes will be fixed on how the courts decide what exactly fair use entails for the use of AI for training purposes. The clash in law between the authors and tech giants can lead to further impacts for Meta as well as for the rest of the artificial intelligence sector. Eventually, this case could land before the U.S. Supreme Court and make decisions for emerging technologies in relation to copyrights.

The legal team for the plaintiffs is sure that the case will bring to light the true extent of Meta’s knowledge and actions in using pirated works. They claim that if the case succeeds, companies such as Meta would be forced to reconsider their data-sourcing practices, especially with regard to respecting the rights of creators. With the continuous development of AI, it is obvious that the legal battle concerning data usage will be one of the key battles in the future.