This post has been updated as of February 15, 2024
Sarah Silverman has joined the ranks of authors and artists alleging that AI training with their work is copyright infringement. Generative AI makers like OpenAI disagree, claiming fair use. Is AI training fair use, or are these companies liable for infringement? Silverman also alleges that the outputs from the AI apps generate infringing derivative works of the plaintiff authors’ books in the form of summaries of these texts.
On February 12, 2024, the District Court for the Northern District of California issued a decision on the defendants’ motion to dismiss this case and a nearly identical companion one, Tremblay v. OpenAI, Inc. The court largely sided with the defendants, including OpenAI, and dismissed five of six claims. The court dismissed the creatives’ claims of vicarious copyright infringement; negligence; unjust enrichment, and; three claims arising under the Digital Millenium Copyright Act (DMCA), all related to defendants’ removing copyright management information (information such as the title, the author, the copyright owner, and other identifying information). The only cause of action that was not dismissed was the plaintiffs’ claim of unfair competition.
In short, the court stated that the plaintiffs have until March 13, 2024, to amend their complaint to allege sufficient facts to support the allegations of vicarious infringement (relating to the summaries generated by ChatGPT), the DCMA claims (relating to “copyright management information” including authorship and metadata), and the claim for negligence. It is not clear that plaintiffs will be able to amend the complaint to withstand another motion to dismiss. The claim for direct infringement (ingesting plaintiffs’ works to train the AI application), which was not subject to the motion to dismiss, and the claim for unfair competition under California law remain.
The liability of generative AI companies for copyright infringement remains an open question for the courts, and the class action lawsuit filed by creatives against AI companies is still ongoing, at least as to OpenAI’s direct infringement and unfair competition. This post outlines the creatives’ claims and the defenses to them.
WHAT ARE THE LAWSUITS ABOUT?
Comedian Sarah Silverman and other authors claim that ChatGPT and LLaMA (Meta’s AI generator) were “trained” on their copyrighted books without consent or compensation. Generative AI models like ChatGPT known as Large Language Models (LLMs) are designed to mimic sets of data that they are fed. LLMs produce full sentences and paragraphs that are similar to human language because they are trained to continuously adjust their outputs to resemble sequences of words copied from a training dataset. Silverman’s complaint alleges that LLMs are committing infringement by feeding copies of her works into the AI application for such “training.”
The LLMs are trained using books because they are a great source of long-form, high-quality written language. Silverman’s lawsuit alleges that OpenAI, which generally refuses to reveal its training datasets, has scraped databases of torrented books to train its LLMs. The authors allege that these companies have copied their content without permission or compensation. This, they maintain, is theft.
IS AI TRAINING FAIR USE?
The unauthorized ingestion of copyrighted material into the LLMs by the AI companies likely constitutes copyright infringement. However, these companies may argue that their conduct is fair use. Fair use is a defense to copyright infringement. Section 107 of the Copyright Act directs courts to consider at least four factors when evaluating a fair use defense:
- the purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes;
- the nature of the copyrighted work;
- the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
- the effect of the use upon the potential market for or value of the copyrighted work.
The AI companies have plausible, but rebuttable arguments on each of these factors. First, although the companies are using the work in a transformative way by ingesting it as a series of data points to inform unrelated output, the use is still commercial since AI products are sold for profit. Second, AI treats the input work as factual bits of data, but many of these works are creative in nature regardless of AI’s treatment of it. Third, it will be hard for AI companies to argue they have used only a small portion of each work, as AI can often generate book summaries or accurate writings in the style of a particular author, which would require the digestion of an entire body of work. Fourth, since AI rarely reproduces a work exactly, it is unlikely that AI outputs compete directly with a copyrighted work, but creators may argue that AI is a substitute for their creative efforts in markets in which their copyrighted work is sold.
WHAT SHOULD CREATORS DO NOW?
If your work has already been used to train AI, it cannot be undone. It is impossible to disentangle a single work from the neural network of an LLM. Furthermore, if your work is available online in any form, it is likely hard to protect it from being scraped and used for AI training. With this understanding, some artists are seeking ways to be compensated for their work being used by AI. From musician Grimes to the New York Times, some writers and creators have accepted that AI will inevitably make use of their work and are exploring ways to license or sell the use of their content and likeness.
Regardless of the results of the lawsuits: always register copyrights for your work. Registration is a relatively simple and inexpensive process. Although your work is automatically copyrighted when it’s finished in a tangible form, registration ensures that all remedies for infringement are available to you. It also provides notice to others that you own the work.
CONCLUSION
Creators, including comedians, should consult with an attorney experienced intellectual property law and comedy law to understand how emerging technologies like AI affect their proprietary rights. Contact us to speak with a member of our team.
Contributions to this blog by Gabriella Epley