A few dollars are enough to boycott artificial intelligence

A few dollars are enough to boycott artificial intelligence

[ad_1]

“If you put garbage in, garbage will come out”: the popular saying among computer scientists perfectly sums up the crucial role that databases play in training machine learning algorithms. Simply put, the quality of the data (text, images, video and more) used to teach an algorithm to complete a specific task – for example, recognizing the cats in the images – is of paramount importance.

If data is mislabelled or contains inaccurate information, these inaccuracies will inevitably be absorbed by AIcompromising the success of his training.

But where is this data collected from? In many cases yes exploit vast publicly accessible databases, as Laion (which collects 400 million text-image pairings, the kind of information used to train systems like MidJourney) or Coyo (which instead contains 700 million). In turn, these databases come built through automatic tools who regularly scour thousands of websites, hoarding the information contained within them. In the case of ChatGPT, for example, the entire English-language Wikipedia is known to have been used for its training, using the public CommonCrawl database.

And this is where the pitfall lurks: what would happen if we willfully tamper with the data contained in the sites used to build the databases (which are often specified in the technical documentation), inserting incorrect information or labeling all the photos in which there is an orange with the word “apple”?

Digital Rights

Artificial intelligence: hallucinations and human irresponsibility

by Diletta Huyskes


According to a study published by a team of researchers from the ETH Zurich, an operation of this type it is not only possible, but it is sufficient to corrupt a small part of the data used for training an algorithm to compromise its ability to successfully complete the task for which it was designed.

In practice, it is it is sufficient to buy the domains of the many abandoned websites, but regularly visited by bots, for a few euros that collect data, e fill them with incorrect information. According to the researchers, with 10,000 dollars it is possible to alter up to 1% of the data contained in archives such as the aforementioned Laion or Coyo, and even only $60 to alter 0.01%.

Very small percentages, but that if exploited in a targeted way they can still compromise the training of an algorithm (as demonstrated in a second study, also by ETH Zurich). “We’ve known for years that, in principle, if it was possible to insert arbitrary information into datasets, then all the training would go wrong,” he explained to Fast Company Florian Tramèr, one of the authors of the study. “Nevertheless, this still doesn’t seem to have happened and we’re wondering why“.

The case

AI-generated photo of fake Pentagon attack shook Wall Street

by Pier Luigi Pisa



Is it possible that there is no entity – nations or rival companies – interested in compromising the functioning of some artificial intelligence algorithms, even by investing much higher amounts of money than the small sums mentioned above?

The New York University researcher gave a possible explanation Julian Togeliuswho confirmed how a few tens of dollars can actually be enough to compromise a database, but above all explained how the amount of work required to accomplish this would likely outweigh the potential benefits. More than the necessary money, Togelius always explains, “the question is linked to the effort required to locate the right websites, buy their domains and prepare your material in the correct format“.

However, considering how much many machine learning algorithms already have a strategic importance today and how much this will continue to increase in the future, the fact that it is possible – even only in theory – to pollute the very source of the development of artificial intelligence still opens up scenarios worrying.

Artificial intelligence

The strange appeal against AI: “We risk extinction.” But companies continue to develop them

by Emanuele Capone



[ad_2]

Source link