Above: Image By Gemini.
When we talk about the collapse of language models. We talk about the phenomenon where artificial intelligence starts to recycle data through it. In many models, the AI that uses certain data sources starts to raise their page rank in the web index. That means that it's possible. That data starts to travel in the ring of the most used LLMs This thing causes an effect called data degeneration. The data degeneration means that as DNA will not be completely copied in living organisms in networks data will not stay perfectly in its shape.
There are always some kind of errors. And some turbulence in the networks. The network sometimes lost a byte or two. That thing causes the loss of information that travels in the networks. At the beginning of that process, outsiders do not recognize anything. But when there are enough lost bytes or bits that thing causes the effect, that there are lots of missing parts in data. So that means that data is degenerating in the net in the same way as genomes. Another problem is this.
If the AIs or LLMs use or recycle only certain data sources that can make the change to put new data into the system harder. The new data is always published in new homepages that are not yet ranked. That means that the AIs can bypass the newest possible data, and use old data. That is page-ranked. The thing that can fix this problem is that the AI can have an algorithm that helps it select the right data sources.
And then the algorithm must have the ability to select data by using the publishing date. That date should be seen in search results. And of course, the system must use trusted data sources. So it should ask if the purpose of writing is scientific or if it's made for entertainment.
And that can help the AI select trusted sources like university databases. And trusted publishers. That helps the user to select data that is trusted.
The AI must handle much more data than traditional web servers. That means the AI requires lots of energy. And if there are lots of users. That data mass can block the net. In some models. The AI that searches data from the net bombs search engines so much, that the servers will fall. That is one scenario where the AI can simply break down the net accidentally.
https://www.technologyreview.com/2024/10/31/1106504/ai-search-could-break-the-web/
Comments
Post a Comment