Soumya Prakash Pradhan

Generative AI has become highly popular lately, especially after OpenAI introduced ChatGPT in November last year.

This technology has caught the attention of many people who were previously unfamiliar with it.

Some are excited about these advancements, while others worry about their impact on jobs.

However, a recent study has revealed an important fact: AI relies on human involvement to succeed.

AI's dependency on humans

Distinguished researchers from prestigious institutions, including Cambridge, Oxford, University of Toronto, and Imperial College London, conducted a thorough study called 'The Curse of Recursion: Training on Generated Data Makes Models Forget.'

Their research paper brings attention to a major threat that affects the future of large language models (LLMs).

The study reveals that training these models on AI-generated content instead of human-created content poses significant risks.

Tools like ChatGPT, Bing, and Bard currently rely on existing data to provide answers and information.

These tools have been trained using data that was originally created by humans.

For instance, when someone uses Bing to search for information about different sea creatures, it uses articles written by human authors as its primary source of reference.

The data utilised to train large language models (LLMs), such as articles, photos, and research papers, has been solely generated by humans.

This content was created before AI-powered tools like ChatGPT, Bing, and Bard emerged, and these tools have since become invaluable in assisting individuals with their content requirements.

However, relying on AI-generated content for writing purposes may have adverse implications for the future of LLMs.

Threat of Model Collapse

The research paper mentioned earlier highlights a phenomenon called 'model collapse.'

This phenomenon occurs when LLMs are trained on AI-generated content, causing them to deviate significantly from reality and become corrupted.

According to VentureBeat, the researchers argue that "learning from data generated by other models leads to model collapse, a deteriorative process in which models gradually forget the true underlying data distribution. This process is unavoidable, even under nearly ideal conditions for long-term learning."

Ilia Shumailov, one of the main authors of the research paper, clarified that as time passes, the errors in AI-generated data accumulate, leading models trained on such data to develop even more distorted views of reality.

Shumailov also raises concerns about the potentially serious consequences of model collapse, such as the perpetuation of discrimination based on sensitive attributes like gender or ethnicity.

Human-Generated Content

The research paper highlights the significance of human-generated content and stresses the importance of preserving original human-created data to mitigate the risks associated with model collapse.

Currently, there is no effective way to distinguish between human-generated and AI-generated data.

Additionally, the paper predicts that in the future, the value of human-created content will surpass its current importance, primarily serving as a source of high-quality training data for AI.

scrollToTop