To summarize this article,
A.I. models trained on its own output start degenerating:
– an L.L.M. trained on its own sentences first hallucinated and after a few rounds started to print long lists and repeat phrases
– A.I. image models trained on their own output create distorted images
This happens because
what A.I. does is “assemble a statistical distribution” for each word or pixel to generate, choosing the average.
when A.I. is trained on its own output the distributions get very narrow, meaning smaller range of possible values to choose from or to do statistics on.
If this process goes on the distribution “would eventually become a spike” and the model will collapse.
- For example, it will generate the same blurry image for all the digits from 0 to 9.
A “hidden danger” that isn’t as obvious as a blatant model failure would be of A.I. models, trained on their own output, starting to produce output with less and less diversity.
This problem will likely get worse because it’s hard to detect A.I. generated data.
“A.I. Is Homogenizing Our Thoughts” – the New Yorker
https://www.newyorker.com/culture/infinite-scroll/ai-is-homogenizing-our-thoughts
And this comes as no surprise.
I’ve always found A.I.-generated texts to be very boring and dry, so I hardly ever turn to them for anything. This made me think about what set human-written texts apart from an L.L.M.’s output, what made them so interesting and sometimes fun to read.
I think it can be explained partly by the human writer’s ability to make effective use of unusual combinations of words, or fresh phrases that still make sense, and to place together ideas that are seemingly unrelated to each other yet together make up an element of surprise or joy within the text.
One example I can think of right now (because I read this today):
“Earth is drenched in God’s affectionate satisfaction”
Psalm 33:5 (The Message)
I don’t know if this translation is entirely faithful to the original Hebrew text, but anyway the phrase “affectionate satisfaction” is not just unique and pleasing but, I think, also so potently conveys a major theme that runs through the Bible.
If all that we read is A.I. generated texts, which are increasingly bland (not to mention full of erroneous claims), the languages will lose their richness and we’ll forget how powerful a piece of writing could be.