Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
AI Success Depends On How You Choose This One Number
 Originally published in Forbes, March 25, 2024. To do...
Elon Musk Predicts Artificial General Intelligence In 2 Years. Here’s Why That’s Hype
 Originally published in Forbes, April 10, 2024 When OpenAI’s...
Survey: Machine Learning Projects Still Routinely Fail to Deploy
 Originally published in KDnuggets. Eric Siegel highlights the chronic...
Three Best Practices for Unilever’s Global Analytics Initiatives
    This article from Morgan Vawter, Global Vice...
SHARE THIS:

1 month ago
Model Collapse: An Experiment – What happens when AI is trained on its own output?

 

Originally published in O’Reilly, October 24, 2023.

Ever since the current craze for AI-generated everything took hold, I’ve wondered: what will happen when the world is so full of AI-generated stuff (text, software, pictures, music) that our training sets for AI are dominated by content created by AI. We already see hints of that on GitHub: in February 2023, GitHub said that 46% of all the code checked in was written by Copilot. That’s good for the business, but what does that mean for future generations of Copilot? At some point in the near future, new models will be trained on code that they have written. The same is true for every other generative AI application: DALL-E 4 will be trained on data that includes images generated by DALL-E 3, Stable Diffusion, Midjourney, and others; GPT-5 will be trained on a set of texts that includes text generated by GPT-4; and so on. This is unavoidable. What does this mean for the quality of the output they generate? Will that quality improve or will it suffer?

I’m not the only person wondering about this. At least one research group has experimented with training a generative model on content generated by generative AI, and has found that the output, over successive generations, was more tightly constrained, and less likely to be original or unique. Generative AI output became more like itself over time, with less variation. They reported their results in “The Curse of Recursion,” a paper that’s well worth reading. (Andrew Ng’s newsletter has an excellent summary of this result.)

To continue reading this article, click here.

Comments are closed.