Reddit's Legal Battle Against Perplexity and Data Scrapers: The Fight for Content Rights in the AI Era

Reddit has filed a landmark lawsuit against Perplexity and three other companies over unauthorized data scraping practices, highlighting the growing tension between content platforms and AI companies. The case centers on content ownership rights and proper compensation for data used to train AI systems, potentially setting precedent for how digital content is monetized in the age of artificial intelligence.

Why Reddit Filed Lawsuit Against Perplexity And Data Scrapers

In today's data-driven era, two major platforms are engaged in a legal battle over data usage rights. Reddit has initiated legal proceedings against four companies: Perplexity, SerpApi, Oxylabs, and AWMProxy, alleging unauthorized scraping of its content.

Reddit, which boasts 416 million weekly users, serves as a hub for discussions spanning diverse topics from cosmetics and canine breeds to gaming and travel destinations. The lawsuit, filed in the US District Court for the Southern District of New York, primarily targets Perplexity. This legal action follows Reddit's previous attempt two years ago to monetize its data by requesting payment from companies like OpenAI for access. Instead of complying, these companies allegedly extracted Reddit content from Google search results.

All defendants have rejected the allegations in court. Perplexity maintains its approach is both responsible and principled, while Oxylabs representative Denas Grybauskas argued that public information should remain freely accessible, with no company claiming ownership over publicly available data.

To substantiate its claims against the San Francisco-based AI search engine, Reddit devised a strategic "test post" exclusively visible through Google search. Within hours, this content appeared in Perplexity's search results, suggesting the company continued collecting Reddit data despite receiving a cease-and-desist directive.

Reddit's clever investigative technique garnered appreciation from prominent technology figures. Ed Newton-Rex, CEO of Fairly Trained and composer, commented on X: "Absolutely brilliant detail from the new Reddit AI copyright lawsuit vs. Perplexity."

AI engineer Rohan Paul observed that the case will establish whether "harvesting through Google results and reseller feeds still counts as circumvention of Reddit's protections and terms rather than fair public indexing."

Data scraping has historical precedent in the internet's development. Google initially built its search engine by scraping web pages, collecting information from numerous sites, organizing it, and presenting it to users in search results.

Historically, scraping was less contentious due to a mutually beneficial monetization system. Websites received traffic, scrapers commercialized data, and Google organized content—creating value for all parties involved.

Google spokesman Jose Castaneda stated: "Google has always actively respected the choices websites make through robots.txt, but sadly, there's a bunch of stealthy scrapers that do not."

Over time, various companies began extracting data from Google search results across different categories and selling their findings to businesses aiming for higher search rankings. This ecosystem benefited both Google and website owners by driving traffic through proper indexing.

Doug Leeds, co-founder of Really Simple Licensing, a nonprofit advocating for creator compensation when AI utilizes their work, explained: "It was all the original ecosystem of the web. It wasn't necessarily a problem back then, because there was a monetization method for all the companies involved."

Today's landscape differs significantly as AI companies reportedly scrape data covertly at massive scale to train chatbots without compensating content creators. Reddit has implemented access restrictions to prevent AI companies from freely utilizing its content.

Following the example of The New York Times and Simon & Schuster, who licensed their content to AI companies for millions of dollars, Reddit seeks similar compensation arrangements. This lawsuit follows Reddit's earlier legal action against AI company Anthropic in June for allegedly unlawful data usage.

Source: https://www.ndtv.com/world-news/why-reddit-filed-lawsuit-against-perplexity-and-data-scrapers-9514343