If AI Can Pirate the Internet, Why Can’t You?

By David Atkinson

David Atkinson is a Postdoctoral Fellow at Georgetown University Law Center and a 2026-2027 IPPI Edison Fellow.

The legal argument that could either protect copyright forever — or blow it up entirely.

Here’s a thought experiment for you: If courts let AI companies download and train on virtually any copyrighted work they please, what’s the legal argument stopping you from doing the same?

You can’t make one. That’s the point.

Yet this is one of the arguments GenAI companies make: Their models simply learn from copyrighted works, much like humans, and since the use of the material (training a model so it can “learn”) is permitted by fair use, just as humans are allowed to learn from materials we read, they should be able to collect and use any copyrighted works they want to train their models without authorization or payment. 

In Unfair Learning: GenAI Exceptionalism and Copyright Law, published in the Chicago-Kent Journal of Intellectual Property, I methodically dismantle the case that generative AI deserves a blanket free pass under copyright’s “fair use” doctrine, not by arguing that AI companies are wrong, but by showing that their logic proves too much.

The Setup: Billions of Stolen Words

Here’s what’s not in dispute: Training a capable AI model requires trillions of tokens, overwhelmingly formed from copyrighted books, articles, songs, and web pages. GenAI companies collect this material by scraping the web at an industrial scale, usually without asking permission or paying anyone.

They defend this under fair use, copyright law’s safety valve that permits certain unauthorized uses of copyrighted works. Their core argument is always along the lines of: We’re not reproducing these works for their own sake. We’re transforming them into something new. The input is a novel, but the output is a chatbot. That’s transformative. That’s fair use.

The stakes are astronomical. One lawsuit against OpenAI alone involves more than 10 million works, with potential damages of up to $150,000 per infringement, theoretically totaling $1.5 trillion. In reality, the court would probably “only” order them to pay perhaps a few billion dollars at the high end, but for GenAI companies without a ton of cash on hand, that threat is still existential. If plaintiffs from different lawsuits each win billions, even OpenAI and Anthropic could be in dire straits.

The Reductio: What About Me?

The paper’s core move is elegant and a little devious. I take every argument GenAI companies make for fair use and ask: Does this apply to humans, too?

The answer, perhaps surprisingly, is yes — and often more so. Consider the following:

“The use is transformative.”

GenAI companies turn copyrighted inputs into novel outputs. So do humans who read a book and then write something. In fact, humans are better at transformation: We’re incapable of memorizing most of what we read, making our outputs inherently more original.

“We only extract underlying ideas, not expression.”

GenAI companies claim it’s just “learning” facts and patterns, not copying creative expression. Humans do roughly the same thing when we learn. And we’re less likely to accidentally quote a paragraph verbatim, which makes us less likely to infringe.

“It doesn’t harm the market.”

GenAI companies argue that their outputs don’t compete with the original works. Human outputs compete even less, since we can’t generate hundreds of novels a minute.

If the legal logic works for the machine, it works better for the person.

Eight Arguments, Eight Counterarguments

Unlike other law review articles, the paper doesn’t stop at the four-factor fair use test. It addresses eight substantive arguments that AI companies and their academic allies have advanced, giving each the “steel man” treatment before methodically dismantling them.

A few highlights:

“Publicly available means fair game.”

Not so fast. Creators post content publicly for reasons including ad revenue, reputation, subscriptions, and fan communities. “Publicly available” has never meant “public domain.” And since anything can end up on the public internet (including pirated books and films), training on “publicly accessible” content is effectively the same as training on anything AI employees can access.

“AI lacks consciousness, so it can’t ‘enjoy’ copyrighted works the way humans do.”

The argument essentially claims AI deserves more legal latitude because it’s less like a human, which is precisely backward. Laws exist for humans. The companies whose employees intentionally collected this data, intentionally weighted certain sources over others, and intentionally built products that can quote trained works verbatim are very much conscious actors. They also enjoy the work through their fat paychecks and equity grants premised on their model capabilities.

“Restricting AI training will stifle innovation.”

Maybe. But GenAI companies have yet to demonstrate that their general-purpose models produce breakthroughs such as new scientific theories, new art forms, and new mathematical proofs that would justify dismantling copyright protections. All of those breakthroughs came from humans. The specialized models that have shown genuine scientific utility (in biology, pharmacology, and geometry) are a different beast entirely from the chatbots at the center of the copyright-based lawsuits, from which there have been no new FDA-approved drugs or vaccines, and no new validated theories of physics or new fields of study. While it has improved at math, it has yet to learn to develop and ask meaningful mathematical questions, and is instead relegated to helping solve questions developed by mathematicians.

The Only Three Options

Society faces exactly three choices:

  1. Grant both AI and humans a near-universal fair use exemption to copyrighted works.
  2. Grant only humans such an exemption (which is the logical implication of the AI companies’ arguments).
  3. Grant neither an exemption. Instead, exercise common sense and recognize that mass unauthorized copying isn’t fair use under almost any scenario.

Option 1 collapses copyright. Nobody would ever have to pay for a movie, book, or song again. This would also destroy the economy.

Option 2 is the implicit conclusion of every “GenAI is just like human learning” argument, which no one actually wants to say out loud.

Option 3 is where the law has always been and where it should stay.

Why This Matters

The lawsuits currently working their way through the courts will likely set the rules of the road for AI and copyright for a generation. Courts have already begun ruling, and doctrinal inconsistencies are piling up.

The paper cuts through technical mystification and asks a simple question the law has always cared about: Does this serve the purpose of copyright? That purpose, written into the Constitution, is to promote science and the useful arts.

Humans, I argue, have a stronger claim to that promotion than any AI model. We’re the ones far more likely to invent new things of significance.

If copyright law is going to bend for anyone, it should bend for us.