AIs need quality texts to train. An opportunity for the media

AIs need quality texts to train.  An opportunity for the media

[ad_1]

Selling access to artificial intelligence for your content can become a new source of revenue for news organizations. The case of Reddit and the negotiations of some publishers revealed by the Financial Times

In recent days Redditthe site for sharing content and news, has been affected by a large protest led by millions of users against the decision of the ceo Steve Huffman to monetize some aspects of the social network hitherto little exploited. At the heart of the dispute is the API (or application programming interface), the system that allows an external app to access Reddit data, which Huffman wants to offer only for a fee – at a very high price, such as to destroy the business of some services born around the site over the years.

Huffman’s turning point seems to have been influenced by the debate on artificial intelligences and their “training”: today we know that OpenAI, developer of ChatGPT, also used the archive of posts from Reddit to “teach” their neural networks to express themselves correctly. All without paying a penny to the social network. This was enough to convince the head of Reddit to move to protect and monetize the content of his site. Huffman’s reaction was deemed too drastic, but she’s not alone in this direction. In recent weeks, direct contacts have increased between companies operating in the AI ​​sector and newspapers, custodians of a resource that has suddenly become very important: texts written well and verified by qualified human beings. The Financial Times exclusively revealed the discussions that recently took place between OpenAI, Microsoft, Google and Adobe with some of the most relevant Western publishers and newspapersincluding News Corp (Rupert Murdoch’s group), Axel Springer (the historical editor of German Bild which also owns Politico), the New York Times and the Guardian.

According to the London newspaper, this negotiation would still be in its infancy but could come to include the payment of a fixed fee, a sort of “subscription” for access to their content “for the purpose of developing the technology that makes chatbots such as ChatGPT work OpenAI and Bard by Google”. It’s too early to tell if selling access to your content for AIs could become a new source of revenue for news organizationsalso because companies in the sector such as Stability AI and OpenAI have already been accused of copyright infringement by various artists, precisely because the works of the latter had ended up “inspiring” the generation of images by these systems.

Last May, the chief executive officer of News Corp, Robert Thomson, declared that “the collective intellectual properties of the media are under attack” and that it was therefore necessary to “loudly demand compensation” from AI companies. In all of this, the media and journalism sector may have a chance to assert itself due to a little-known weakness of generative AIs, which, as mentioned, must be trained with large amounts of content. However, the quality of this archive is a crucial aspect on which much of the future of this technology will be based: according to experts in the sector, in fact, it is important to verify that among the materials used in the training of the AIs there are no contents that have been generated by them. In fact, the risk is that of a “feedback loop”, a vicious circle that would quickly lead to the collapse of the entire system, with disastrous results.

A recent study by British and Canadian researchers noted how “errors in the generated data accumulate over time” and, when they are used in the formation of artificial intelligences, “they lead the models to misunderstand reality more and more”. One of the authors told the Venture Beat website that he was “surprised at how quickly a similar collapse occurs”. The only way to avoid this is therefore to make sure that there are no connection points between the source and generated texts, giving new and unexpected hope to the journalism sector.

[ad_2]

Source link