The data economy needs responsible Ai, correct datasets and authoritative sources

The data economy needs responsible Ai, correct datasets and authoritative sources

We knew that data was oil. How many times have we read it? I still am today but it seems that everything has changed. We hardly talk about Big data anymore, it doesn't seem to be fashionable anymore. Artificial intelligence in its most pop form - generative AI - has occupied all the spaces dedicated to innovation. In reality no one has replaced anyone. The economy of data has not disappeared to make way for that of words or questions. There has simply been a reversal of factors. The second descends from the first. Big data is at the root of the artificial intelligence boom.

Let's try to take a step back. The first artificial intelligence (AI) system was a robotic mouse that could find its way out of a maze, built by Claude Shannon in 1950. Then immediately after the first neural networks and nothing for at least twenty years. Then the convergence between the miniaturization of chips which led to an exponential increase in computing power and the introduction of network technologies led to a change in the speed of research in AI. The ability to train the algorithms on a large amount of data was the factor that made the difference. So much so that today the question we ask ourselves most frequently when we question a chatbot like ChatGpt is: who told you that?

Yeah, who told you that? How authoritative and accurate is the information from which you learned Large Language Models (LLM)? A team of MIT data scientists examining ten of the most widely used datasets for testing machine learning algorithms found that about 3.4% of the data was inaccurate or mislabeled, which, they concluded, it could cause problems for AI systems that use these datasets. We have also noticed this when we use these systems. Especially at the beginning they were prey to what are technically called hallucinations. In other words in very assertive tones they answered incorrectly.

Today we have to ask ourselves how and in how much time we will be able to correct these systems. How the quality of datasets can be improved. For the first six decades, the training computation increased in line with Moore's Law, doubling roughly every 20 months. Since about 2010 this exponential growth has further accelerated, up to a doubling time of about 6 months. The data economy today more than ever needs correct and verified data.

Find out more

Source link