The Wikimedia Foundation, the non-profit organization behind Wikipedia, is facing significant operational challenges due to the increasing activity of AI crawlers. These automated bots relentlessly scrape vast amounts of text and multimedia content from the online encyclopedia to train generative artificial intelligence models. This surge in automated traffic has placed considerable strain on Wikipedia's servers, leading to escalating infrastructure costs and, in some instances, slower loading times for human users seeking information. The foundation has noted that the network strain caused by these AI scrapers is becoming unsustainable, highlighting a critical tension between its mission of providing free access to knowledge and the practicalities of maintaining the necessary infrastructure. The scale of the issue is substantial, with reports indicating a dramatic increase in network traffic directly attributable to these bots. Since January 2024, Wikimedia has observed a 50% rise in bandwidth usage specifically for downloading multimedia content, a spike driven not by increased human interest but by automated scraping programs. These bots often access data differently than human users, frequently targeting less popular or older content that isn't efficiently served by caching systems. This pattern forces greater reliance on core data centers, which is significantly more expensive to operate. While Wikimedia is equipped to handle traffic surges from human readers during major events, the constant, high-volume demand from AI crawlers presents a persistent and growing burden on resources. In response to this escalating pressure, the Wikimedia Foundation is taking proactive steps. Site managers have already implemented measures like case-by-case rate limiting and even banning particularly aggressive AI crawlers. However, recognizing the need for a more sustainable, long-term solution, the foundation is developing a "Responsible Use of Infrastructure" plan. A key part of this strategy involves offering AI developers an alternative to scraping the live public website. Wikimedia has partnered with Kaggle, Google's data science platform, to release a beta version of a structured dataset derived from Wikipedia content, available initially in English and French. This dataset is specifically formatted for machine learning applications, aiming to provide a more efficient resource for AI training and development while diverting traffic away from the main Wikipedia servers. This newly offered dataset is designed to be useful for AI developers, containing specific elements formatted for machine learning. According to Wikimedia Enterprise, the division focused on making Wikipedia data accessible via APIs, the dataset includes: AbstractsShort descriptionsInfobox-style key-value dataImage linksClearly segmented article sections Notably absent are references and other non-prose elements like video clips. While the content originates from Wikipedia and is freely licensed under Creative Commons or public domain terms, the exclusion of references might create ambiguity regarding the attribution and verification of information used in AI models trained on this data. Wikimedia emphasizes that while its content is free, its infrastructure is not, underscoring the need for responsible usage by large-scale data consumers. The introduction of this structured dataset represents a significant effort by Wikimedia to regain balance and ensure the long-term health of its infrastructure. By providing a dedicated resource for AI training, the foundation hopes to reduce the disruptive impact of web crawlers on the public-facing Wikipedia site, preserving performance for its global community of human readers and editors. This initiative also opens a dialogue about the responsibilities of AI companies that leverage freely available online resources. Finding a sustainable path forward will likely involve ongoing adjustments, potentially including clearer guidelines for bot operators, authentication requirements for high-volume usage, and continued exploration of partnerships that acknowledge the value and cost of maintaining one of the world's most vital knowledge repositories.