The reverse Napster manoeuvre of Big AI

From:

Sergio Visinoni from Sudo Make Me a CTO <makemeacto@substack.com>

To:

Hidden Recipient <hidden@emailshot.io>

Date:

1/21/2026, 6:01 AM

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Hi, 👋 Sergio here!

Welcome to another free post from the Sudo Make Me a CTO newsletter.

If you prefer to read this post online, just click the article title.

As this is a free newsletter, I do immensely appreciate likes, shares and comments. That's what helps other readers discover it!

The reverse Napster manoeuvre of Big AI

Tech companies and copyright haven't always been friends. But today, the balance of power is significantly different to what it used to be around 1999.

Sergio Visinoni

Jan 21

READ IN APP

Do you know the history of Napster?

Are you old enough to remember using their service and following their rise and fall?

Whether you remember or you have no idea what I’m talking about, here is a brief recap of this pivotal case in the history of the Internet and copyright.

The history of Napster

Napster was the first major proprietary peer-to-peer service launched in June 1999. It quickly gained massive popularity across the Internet as it became the de facto standard for MP3 file sharing. In other words, Napster played a key role in democratising online music piracy. And yes, that was the last century.

For the sake of today’s article and what’s currently happening in the dumpster fire we call the tech industry, two aspects of Napster's history are particularly interesting.

The first one is that this was a fairly small startup in an era when Internet investments were starting to become a thing (the dot-com bubble occurred around that time). There was still absolutely nothing fancy about the computer technology. Not yet. Smartphones didn’t exist yet. Social Networks didn’t exist yet. And most people were not connected to the internet. Those were the good days.

Not only was Napster a small company even for 1999 standards, but it was also offering the service entirely for free. They were not making a dime out of it.

The second aspect is that, although Napster was in clear violation of copyright by promoting the exchange of licensed music, something illegal and unethical, it was doing that with what you could define as a Robin Hood-esque approach. They were stealing from the “rich”, in this case, the music labels, to give to the “poor”, in this case, the end users who found that the price they had to pay for music was too high.

In the discourse that opposed the supporters of the freedom to share music and those who consider an atrocious felony to be punished, the music labels often played the role of the greedy villains.

Many considered that artists were, in fact, exploited by the labels, who generally kept the big slice of the sales revenue pie while passing over peanuts to those responsible for creating the actual asset being sold. Somewhat similar to what's happening today with artists and streaming platforms.

In that context, Napster had emerged as a way to make music more accessible. But by doing so, it had a direct negative impact on the interests of music labels first, and secondarily on the interests of the actual artists. Some artists vocally opposed online file sharing, while others actively promoted it. Some even claimed that the broad exposure helped drive more sales and concert tickets.¹

Despite the material benefits for the end users, who got to listen to music essentially for free, and the supposed and questionable benefit for some artists, who claimed increased exposure through making their music available for free, this was a clear case of copyright infringement.

One that had been perpetrated by a small actor going up against big corporations with deep pockets and an army of lawyers.

I’m not trying to make a point in defence of what Napster did, but merely to outline the whole context and the powers at play back in the early days of the Internet.

As you can imagine, it didn’t take long for lawsuits to start flocking in, and Napster, in its original Robin Hood-esque incarnation, had to shut operations a couple of years after launching, in 2002.²

End result:

Big Corporation: 1;
Disruptive Tech Company: 0;
End users: meh;
Artists: calling victory, all the while ignoring what would come next, i.e., the next stage of exploitation coming through the yet-to-be-invented streaming platform.

But why am I telling you about this minor anecdote from the Stone Age of The Internet Economy, as people liked to say around the turn of the Millennium?

For two reasons.

The first one is that today we’re facing another major case of large-scale copyright violation, though with some key differences that make it a lot less obvious to large parts of the population.

The second one is that I’ve noticed online media outlets have recently talked about the possibility that the “AI industry might be facing its Napster moment”³, and I think they are getting the whole thing backwards.

Who said large-scale copyright violation?

At this point in the article, the IANAL⁴ disclaimer becomes necessary.

Please treat what follows as my personal opinions, not as sound legal advice on the matter of copyright.

X: So, what is this large-scale copyright violation thing you’re talking about?⁵

In case you have had the luxury of living on a remote island with no internet, no phones, and no contact with the outside civilization for about 3,5 years, let me tell you how much I envy your situation. The rest of us had to endure the second major pandemic humanity has suffered from in this decade, shortly after Covid-19: GenAI-induced mass euphoria.

Besides large swathes of GPUs and embarrassing amounts of electricity, to train a Large Language Model, two key ingredients are required.

The actual algorithms and processes used to train what is called a Large Language Model. Arguably, the proprietary component of such artifacts.
A huge amount of training data, which is harvested through all sorts of sources. Without the training data, the artefact from point 1 would be about as useful as the most expensive Tesla on a remote island with no access to electricity.⁶

Now, big tech, and particularly the broligarchs running them, seem to be obsessed with two things⁷. Keeping score about who is the richest man⁸, and speed.

Every time regulations or ethical considerations try to slow them down, they dismiss those attempts by accusing governments of stifling innovation and similar handwavy bullshit.⁹

The best way to go fast in this bizarre Silicon Valley version of keeping up with the Joneses¹⁰, when it comes to collecting the vast amount of training data needed to train AI models, is to just grab everything you can find on the internet without asking for permission from the respective authors, and shovel it into the figurative voracious stomach of their shiny training pipelines. You then finally ship the result, call it a revolutionary product, and have people pay for access as if you had all the rights to do so.

And this is exactly what has been happening for longer than Napster managed to operate as a business some twenty-five years ago.

Despite the massive amount of funding they all raised, AI companies have been systematically stealing (in the worst cases) or borrowing without consent (in the mildest cases) content from writers, professors, programmers, and artists to train models that they’re charging users to get access to. Content that is essential for such models to have any (largely overstated) form of usefulness, without which they’d be little more than lab rats with expensive GPU implants.

Of course, there are a whole bunch of lawsuits that have either been settled or are ongoing. This issue is far from being solved, but this is where I think the media calling it the Napster moment for AI companies is getting it wrong.

They are not considering the major shift in power balance that has happened in the past 25 years.

In the case of Napster, the plaintiff (music labels) was the Goliath capable of squeezing the powerless David/Robin Hood with the simple deployment of an army of highly paid lawyers and lobbyists. Napster didn’t have the lobbying power, the government support, or the legal resources to stand the fight. And that’s why they quickly capitulated.

In the case of authors, artists, or programmers facing the large-scale theft from AI companies, the balance is reversed. The plaintiff, or dare I say the victim, is the proverbial David. Those with the lobbying power, government support, and financial and legal resources to drag on lawsuits indefinitely, and accept some fines that are little more than rounding errors on their PNLs are the actual perpetrators¹¹.

That’s why I call this a reverse Napster maneuver.

Where the pioneers of file-sharing were taking from the few to redistribute to the masses, AI companies are doing the exact opposite. They are stealing from a vast amount of individuals and concentrating the wealth resulting from the theft in the hands of the few.

You might have heard the common line of defence against this accusation, that LLMs do not copy the original content. They repurpose and remix it so that the derivative work it generates is “sufficiently differentiated” from the originals used in its training.

The problem with that line is that it’s not true, and the fact that training data and models are all considered trade secrets - hilarious when you consider that AI companies do not literally own the training data, and that a lot of it is distributed with copyleft licenses that explicitly require them to be redistributed in the same way - is only making it easier for AI companies to conceal the truth.

And Marc Andreesen saying “Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development,” is not a reasonable request for exemption, but a blatant admission of guilt.

Claiming this is fair use is forgetting one important aspect: fair for whom?

The fallacy of the analogy with the origins of the Internet

The most common rebuttal I hear from optimists and boosters when they’re faced with the ethical implications of GenAI creation and usage is an embarrassing strawman. They usually come up with one form or another of analogy with the origins of the Internet. They say that people back then wrongly blamed the underlying technology for being responsible for the illegal uses a few people made of it, commonly citing piracy, drug trafficking, or child pornography. That we should not “ban” the Internet on the premise of its usage by a minority of bad actors.

In my view, that analogy doesn't stand for two main reasons¹²:

First of all, I’m not attacking the underlying technology or blaming it for the way it’s being used. Science and experimentation should exist. The technical definition of an LLM is not more or less harmful than a peer-to-peer network of computing devices connected through a common protocol that spans the entire globe. I'm attacking the industry that benefits from such technology. There is a clear ontological difference between the Internet, a piece of shared infrastructure and largely decentralised in its original intentions, and a handful of powerful and rich companies that are deploying technologies with a very clear purpose. It’s that purpose and the means used to achieve it that I’m criticising here, not the OS they might use on their personal devices.
Secondly, illegal and unethical practices only accounted for a marginal portion of Internet usage¹³, while most of its usage was arguably beneficial for plenty of actors. In the case of the current GenAI-based industry, the whole industry is founded on illegal and unethical activities. To go back to the original analogy, it’s as if drug dealers, arms traffickers, scammers, and organised crime cartels operated the Internet and benefited from anyone using it, even for seemingly benevolent activities. Would the benefit of such good actions be significant enough to outweigh the drawbacks of the industry they'd be indirectly fueling?
Pablo Escobar did build schools and create jobs for people in Colombia¹⁴, but that’s not enough of a good deed to justify the means by which he financed his philanthropic endeavours. The recent announcement from Google AI Studio to sponsor Tailwind is just another example of the Silicon Valley-washed version of Escobar’s charity deeds.¹⁵

I’d like to offer a more fitting analogy instead.

What if AI companies didn’t steal all the content they did to train their models, but instead stole all the compute capacity for it? What if they literally stole thousands of NVIDIA GPUs from the pockets of Jensen Huang’s leather jacket instead of paying him what they’re supposedly worth?

How different would we look at the case? Would we consider it far worse, as it involves physical goods? As if hardware had more intrinsic value than software, as in intangible artifacts?

Or would we consider it fair use, as it would just be billionaires stealing from each other, leaving them to solve their issues once they’ve all finally migrated to Mars?

Would we just dismiss the issue, paraphrasing Marc Andreessen’s words by saying that the value of AI companies would be significantly lower if they had to follow the current laws on private property applied to a special category of expensive hardware they’re in dire need of to operate?

Similarly, wouldn’t drug trafficking be significantly less lucrative if it were regulated and dealers had to pay taxes on their revenues, only sell to adults, and regularly inspect the quality of the product they sell? Would that be an argument in favour of accepting trafficking in its criminal form because that makes it more lucrative for shareholders and investors?

This isn’t an argument in favour of drug legalization.

It’s an argument against the perverse capitalistic idea that we should allow companies to deliberately disrespect and not follow existing laws solely on the premise that doing so would negatively affect the company’s valuation.

If this isn’t peak-level arrogance and insanity, I don’t know what is.¹⁶

Where do we go from here?

I guess the first step is to ask ourselves whether we want to actively support this industry. As I wrote about it in a previous post, that’s hard, and it doesn’t come without sacrifices.

It's not a principle until it costs you money

Sergio Visinoni

November 26, 2025

Read full story

I’m particularly worried about radical purist and punitive takes on the topic.

While I do understand the original intention behind the “open slopware” repository¹⁷ born with the intention of providing transparency on FLOSS projects “tainted” by AI-generated code, I believe it tragically missed the mark.

It’s not by shaming open source contributors using AI code assistants that we’ll get the discourse to advance. I found it akin to shaming drug addicts for being responsible for the crime related to drug dealing.

We should inform rather than shame.

We should lobby with regulators to ensure those in positions of power are held accountable.

We should be positive examples through our actions, as directly attacking peers is a comfortable act, but one that rarely leads to changing people’s minds.

While I very rarely open up chatbots (my most common use case is for finding my way around the abysmal user unfriendliness of Excel/Google Spreadsheet formulas and error messages), I do use Windsurf on a pseudo-regular basis for personal needs or to help me make open source contributions by helping me reason about a codebase I am not familiar with, or a language I’ve not used for a long time.

If anything, I do believe that using such tools to make FLOSS contributions is an OK way to give back what has been taken away. Certainly, it doesn’t solve the underlying problem, but if everybody did it consistently, it might become a good act of active resistance.

In a creative but not so extreme take, you could argue that all LLMs have been trained on code distributed under various copyleft licenses that impose obligations on its redistribution. The evidence of memorization and plagiarism could support the case for considering trained models as derivative work of that original copyleft code. As such, the entire model and all its output should be treated accordingly and forced to be distributed under an open source license.

What Baldur Bjarnasson in The Intelligence Illusion mentioned as one of LLMs’ dangers for business, namely the risk of accidentally introducing GPL code into a proprietary codebase with the consequences that that would entail, might turn out to be a feature.

Certainly, enforcing an open license on all commercial models and all the output generated could have massive effects, ones that our friend Marc Andreessen might fear would negatively impact companies’ valuations. At least the valuation of those companies that are ignoring the rules of copyright and code licensing, until they provide substantial evidence that either they’re not using any copyleft-licensed material in their training set, or that they can guarantee 100% that there is no chance of plagiarism in LLM output.

I’m willing to bet that they’d either be unwilling or unable to do so, or both.

An ever more interesting approach would require regulations to be introduced that would force providers of LLM models to adhere to the following set of rules:

Transparency on what data is included in their training set. The list should be publicly accessible and browsable. Its content should be auditable by government authorities at any time.
Enforce opt-in requirements from authors to allow their artefacts to be included in a training dataset. Without explicit opt-in, a piece of content should be assumed not usable for AI model training by default.
Opt-out mechanisms similar to those enforced by GDPR, de facto treating copyrighted material as a modern sibling of Personal Identifiable Information. This would mean the right to data deletion, which should require companies to retrain their models every time a piece of content is the subject of a removal request. This one is going to be fun to implement.
Transparent instrumentation and observability to track every time each specific piece of content is referenced or used as input material for generating a response, and establish a royalty redistribution mechanism similar to the one used by the music streaming platforms. Before you scream that this would introduce an insane amount of complexity, let me stop you. I know. But aren’t these the companies attracting the best and highest-paid talent, and raising billions in funding? If tracking events and lineage is too hard a problem for them, then I don’t know how they’ll ever get to AGI. They can always use Claude Code to solve the problem.
Cherry on top: all the models and their generated output should be subject to the most viral license used among the different artifacts present in the training dataset. Meaning that if there is a single piece of GPL-3 code in there, everything should be redistributed under the same license, following the principle of derivative work.

This might undermine the wet dreams of big fortunes and power that are driving a lot of the investors in the space, but might actually make this a technology that serves as a means of fairly redistributing the wealth generated by collective work.

How about that approach for the betterment of humanity?

Want to talk about this?

In a previous article, I mentioned the intention of kicking off a Luddites in Tech group, where I’ve invited actors in the tech space who share a certain sensibility to the topic of ethics and human decency across this industry.

I’ll soon organise the first meetup, and everyone is welcome to join in.

I do expect that the first chat will touch on defining a clear purpose for the group, and maybe talk about the topic of how to kick off a distributed form of lobbying with lawmakers and push for regulation against this large-scale copyright theft. I’m particularly interested in exploring ways to connect with EU lawmakers in this space, as I’m convinced the first push will come from there.

Happy to be proven wrong, but I wouldn’t bet my money on that.

Even if that specific conversation doesn’t end up anywhere concrete, it will help us advance our collective understanding of what we can or cannot do to influence the industry towards a better direction.

If you’re interested in joining the conversation, just reply to this email or send me a DM here, on LinkedIn, or on Bluesky, and I’ll make sure to include you in the first invitation.

The more, the merrier.

WIT Promo for Q1 2026

I’ve recently decided to resume offering Quarterly promos for people who are willing to benefit from my services.

I’m happy to announce that I’ve opened up the Q1 promo that will run until the end of March 2026.

I’m making it easier for Women in Tech to level up their engineering leadership skills by offering an exclusive discount to the Sudo Make Me a CTO: 30% off for the first 12 months.

You can find out all the details at the official promo page, or by clicking the button below.

WIT Promo 2026

Feel free to share this opportunity with people you know, and do not hesitate to reach out if you’d like to learn more about it.

You can always schedule a free 30-minute session to get all your questions addressed.

Looking forward to seeing the community grow with more diversity.

Some claim that Radiohead's Kid A's explosive success on the Billboard had been directly helped by their music being available in the Napster database, which contributed to promoting its popularity. More details on Napster's Wikipedia page.

Find out more details about legal battles and their outcomes here.

Some examples can be found here, here, or here.

If you’re old enough to remember the Napster case, you should be old enough to know what the IANAL acronym stands for, as it was popular among nerdy circles around the same time. If you have no idea what it’s about, the answer is waiting you here.

Hat tip to a style of simulated dialogues with an imaginary counterparty, widely used by Simon Wardley in his online posts.

I don’t know why I keep talking about islands today. I might have heard that somewhere in the world, a tyrant is threatening to unilaterally invade, buy, steal, or annex an island just to distract us from his ties with convicted pedophiles.

I noticed I also keep coming up with lists of “two” items in this article. But I can’t find a funny way to explain that. Try again later.

Because they’re all men, of course. But hey, there’s diversity as they’re not all native americans. Actually, technically speaking, none of them is a native american. As such, they should all be oppressed by ICE, but I’m digressing.

Seriously, if you feel bad about yourself and you want something to cheer you up, have a read of the techno fascist manifesto from the folks over at a16z. You will feel a much better human being after realising the dip of darkness some seem to have fallen into.

Some would call it “Winning the AI race against China”. Same handwavy bullshit.

This article is already becoming a collection of footnotes. But if you want to have a clear example of what I'm referring to, check out this great article by Brian Merchant.

Blood in the Machine

Don't forget what Silicon Valley tried to do

Greetings all…

7 months ago · 126 likes · 16 comments · Brian Merchant

Thanks for trying again, but I still do not have a reasonable explanation to offer.

Well, unless you’re somewhat including porn in the second category, then it’s a completely different story. Let’s not go there, definitely OT.

Pablo Escobar was definitely a complex figure, but undeniably a criminal.

This is a whole other topic in itself, but for now, you can just read about it here.

OK, I know it’s enough to look at Musk to find plenty of other examples, but let’s not go there, please.

Interestingly enough, the repository has been taken offline. It was originally available here, but it’s gone, and I can’t any longer find the original Bluesky post announcing it. You can still find some traces of the whole affair via this search query https://bsky.app/search?q=open+slopware

Sudo Make Me a CTO is a free newsletter edited by Sergio Visinoni. If you found this post insightful, please share it with your network using the link below.

If you or your company need help with one of the topics I talk about in my newsletter, feel free to visit my website where you can schedule a free 30 minutes discovery call. I'd be delighted to investigate opportunities for collaboration!

Comment

Restack

Similar newsletters

There are other similar shared emails that you might be interested in: