Reddit Sues Perplexity AI and Data Scrapers for Alleged User Content Theft in AI Training

Reddit has filed a lawsuit against Perplexity AI and three other entities, accusing them of unauthorized scraping of user comments for AI training purposes. The lawsuit targets not only the AI company but also data-scraping services that collect content from Google Search results to circumvent Reddit's protective measures. This legal action follows Reddit's previous lawsuit against Anthropic and highlights the growing tensions between content platforms and AI companies over data usage rights.

Reddit Alleges User Data Theft, Sues Perplexity AI And 'Scraping' Partners

Reddit has initiated its second major lawsuit against AI companies, filing a claim against Perplexity AI and three additional entities on Wednesday. The social media platform alleges these companies engaged in "industrial-scale, unlawful" scraping of millions of user comments for commercial purposes.

The lawsuit, filed in a New York federal court, targets San Francisco-based Perplexity AI, creator of an AI chatbot and "answer engine" that competes with Google and ChatGPT in online search. Additional defendants include Lithuanian data-scraping firm Oxylabs UAB, a web domain called AWMProxy (described by Reddit as a "former Russian botnet"), and Texas-based startup SerpApi.

This legal action follows Reddit's previous lawsuit against AI company Anthropic in June. The new case distinguishes itself by confronting not only an AI company but also the lesser-known services that the AI industry depends on to acquire online content for training chatbots.

"Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it's one of the largest and most dynamic collections of human conversation ever created," stated Ben Lee, Reddit's chief legal officer.

Perplexity responded that they had not yet received the lawsuit but "will always fight vigorously for users' rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest."

Oxylabs and SerpAPI did not immediately respond to comment requests, and AWMProxy could not be reached.

Reddit compares the defendants to "would-be bank robbers" who, unable to breach the bank vault, break into the armored truck instead. The lawsuit claims they are circumventing Reddit's anti-scraping measures while also "circumventing Google's controls and scraping Reddit content directly from Google's search engine results."

Lee explained that because these entities cannot scrape Reddit directly, "they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search. Perplexity is a willing customer of at least one of these scrapers, choosing to buy stolen data rather than enter into a lawful agreement with Reddit itself."

Similar to its lawsuit against Anthropic (maker of the chatbot Claude), Reddit claims Perplexity accessed its content despite being asked not to do so. The Anthropic case was transferred from California Superior Court to federal court, with a hearing scheduled for January.

Alongside digitized books and news articles, websites like Wikipedia and Reddit provide valuable written content that helps train AI assistants to understand human language patterns.

Reddit has previously established licensing agreements with Google, OpenAI, and other companies that pay to train their AI systems on public commentary from Reddit's more than 100 million daily users. These licensing deals helped the 20-year-old platform raise funds before its public offering last year.

Source: https://www.ndtv.com/world-news/reddit-sues-perplexity-ai-and-scraping-partners-over-user-data-theft-9500094