With TikTok Ban Looming Large, Parent ByteDance's Web Scraping Bot Draws Attention For Being More Aggressive Than The One ChatGPT Uses: Report

ByteDance, the company behind TikTok, has introduced a powerful web scraper named “Bytespider.” Launched in April, Bytespider is recognized as one of the most aggressive data collectors online, outpacing other major tech firms significantly in terms of data collection speed.

What Happened: Research conducted by Kasada, a bot management company, and Dark Visitors, a group monitoring scraper bots, confirmed Bytespider’s activity. According to Kasada CEO Sam Crowther, Bytespider collects data 25 times faster than GPTbot, utilized by OpenAI for ChatGPT, and 3,000 times faster than ClaudeBot from Anthropic, Fortune reported on Friday.

Despite the looming threat of a U.S. ban on TikTok, ByteDance continues its aggressive data collection strategy. President Joe Biden has demanded the sale or shutdown of TikTok due to national security concerns. Bytespider’s disregard for robots.txt, a voluntary code that advises scrapers to avoid certain websites, adds to the controversy.

See Also: Elon Musk Mocks Vinod Khosla After OpenAI Investor Mixes Up Argentina’s Poverty And Unemployment Rates To

The increase in web scraping is linked to ByteDance’s development of a new large language model (LLM) to improve TikTok’s search capabilities. A recent update to TikTok’s search function allows real-time keyword searches for ads, potentially enhancing ad visibility.

ByteDance has yet to respond to Benzinga’s queries.

Why It Matters: The aggressive web scraping by ByteDance follows a trend among major tech companies. In June, OpenAI and Anthropic were reported to have ignored web scraping rules, bypassing the robots.txt protocol to gather free data for AI model training. This practice has sparked controversy, highlighting the tension between AI development and data privacy.

In August, NVIDIA faced scrutiny for scraping videos from platforms like YouTube to train its AI models. This revelation raised concerns about content creators’ rights and the ethical implications of using publicly available data without explicit consent.

Similarly, in September, Microsoft’s owned LinkedIn was criticized for using user data for AI training without updating its terms of service, particularly affecting users in the U.S.

Read Next:

Photo by XanderSt on Shutterstock

This story was generated using Benzinga Neuro and edited by Pooja Rajkumari

Market News and Data brought to you by Benzinga APIs
Comments
Loading...
Posted In:
Benzinga simplifies the market for smarter investing

Trade confidently with insights and alerts from analyst ratings, free reports and breaking news that affects the stocks you care about.

Join Now: Free!