NAIROBI, Kenya – The rapid growth of artificial intelligence, particularly systems like ChatGPT, is at risk of slowing down due to an impending shortage of publicly available text data.
A recent study by Epoch AI projects this depletion to occur between 2026 and 2032, posing a significant challenge to the continued advancement of AI technologies.
AI’s remarkable progress has been fueled by vast quantities of human-generated text data.
Companies such as OpenAI and Google have relied on purchasing high-quality data sources, including content from platforms like Reddit and various news outlets, to train their AI models.
However, the finite nature of these resources means they are gradually being exhausted. The scarcity of fresh data could compel these companies to consider more controversial alternatives, such as using sensitive private data or less reliable synthetic data.
The Epoch AI study highlights a critical issue: the scaling of AI models requires enormous computational power and extensive data sets.
As these data sources dwindle, the feasibility of maintaining current growth rates becomes questionable.
Although some new techniques have alleviated the pressure, the need for high-quality human-generated data remains fundamental.
To address this bottleneck, some experts advocate for the development of more specialized AI models rather than continually expanding the size of existing ones.
In response to these looming challenges, AI developers are exploring alternative methods, such as generating synthetic data.
However, the quality and effectiveness of synthetic data are still under scrutiny, raising concerns about its ability to sustain AI advancements.