Photos, videos, audio: artificial intelligence is learning to replicate reality

Photos, videos, audio: artificial intelligence is learning to replicate reality

[ad_1]

The period between September and October is that of Fashion Week, especially in Milan and Paris. This year, immediately after the two most anticipated events of the season, another kind of fashion week has begun, all digital. It is the AI ​​Fashion Week, promoted by an Instagram account called @ dailydall.ee which works in collaboration with Open AI, the artificial intelligence company wanted by Sam Altaman and Elon Musk. Every day a series of 4 photographs is published, with as many outfits by the most famous designers, from Paco Rabanne to Givenchy.

The point is that no one designed those clothes. He created them, starting from a text, an artificial intelligence. In this case, it was used Dall-E 2, the Open AI system able to transform words into images, but there are also Stable Diffusion, Midjourney or Imagen from Google. And the new frontiers are the creation of video and sound. A few words, a simple description, to start seeing animations or listening to sounds.

We are in a phase of explosion of the potential of artificial intelligencea first point of arrival in a journey over a decade long, which has undergone a significant acceleration in the last two years.

History

Privacy and facial recognition: the adversarial clothes of an Italian startup will hide us from AI

by Emanuele Capone


From text to video in two years: the path of AI

June 2020
Open AI unveils GPT-3, the most advanced text generation system with artificial intelligence in the world: it is able to predict and sequencing words and phrases based on a request or a context offered by a human being. GPT-3 kicks off the era of chatbots, virtual assistants: there is that of Google, which Blake Lemoine, later fired by Big G, believed to be sentient, or that of Meta, whose first words were insults towards Mark Zuckerberg.

April 2022
Two years later, Open AI is still the first to debut an image generator on the market: it’s called Dall-E 2 and has learned to recognize the relationship between text and images and, on the basis of this ability, he is able to create new ones starting from written requests. Dall-E 2 is only the first; a few months later arrive, among others, Stable Diffusion and Midjourney, which bring this kind of artificial intelligence to practically everyone.

September 2022
What if images could also be created in motion? The first to realize this suggestion is Meta, which creates Make-A-Video, a system, for now still closed to the public, capable of creating images and animating them. It works no differently from Dall-E 2: it has learned to generate images by receiving photos and illustrations from the Internet as input, accompanied by a explanation. But along with this he also received some videos, which allowed the system to understand how things move, how the frames are happening one after the other. Meta is not the only one to have this technology available: in the days following the debut of Make-A-Video, Google also arrives, with Imagen Video, and Phenaki, another artificial intelligence able to generate videos, with the possibility of starting from an image and a text describing the animation modes.

Investments drive the evolution of AI

To summarize, in just over two years we have gone from generating simple lines of text to generating videos. And that’s not all: on September 30th, Felix Krause, Meta researcher, announced the development of a sound generator. A system that, just like the others, is able to create audio starting from a textual request.

Rapid growth, perhaps unexpected, driven by evolutions in hardware and by the monstrous investments of Big Tech. According to an article in the Wall Street Journal, the research and development divisions of Meta and Alphabet alone spent over 60 billion dollars in this field in 2021 alone. Investments that have consolidated an evolution, which now allows machines to learn faster, with less data available and, above all, to relate words and images or sounds more effectively.

Technological advances that have opened up new paths. Just think of diffusion, the technique used to create static or moving images. Artificial intelligence receives millions of images from across the Internet as inputs, labeled with a description. At that point, it breaks them down into thousands of pixels which, starting from user requests, are then reconnected to create a new image. There is no simple overlay of existing photos and illustrations: it is a generative processwhich starts from a set of pixels and refines them to create something new.

ITW 2022

Barbara Caputo: “From artificial intelligence the next Italian unicorn”

by Bruno Ruffilli


Risks and Benefits: What Will Happen?

Along with the technical evolutions, there are also commercial and political choices. If Open AI has chosen a slow release for Dall-E 2, with a very long waiting list, this summer Stability AI has changed the cards on the table with Stable Diffusion. This image generation model has in fact been made available to everyone, including the code. This is one of the reasons that are pushing the democratization of these systems.

A democratization that will undoubtedly create a series of questions to be resolved. There are, for example, the controversy over possession of images that are generated. At the moment, everything depends on the service: those generated on Midjourney, for example, are entirely available to the user; those with Dall-E 2 remain the property of Open AI.

Another point concerns the copyright of the images that were used to feed those systems. In other words, any artwork uploaded to the Internet in recent years could have been used as the basis for training these AIs. And therefore, today, these human artists would find themselves competing with machines capable of replicating their style.

And that’s not all: the dangers in the generation of images or videos that, for example, portray people in realistic situations, are also scary. The risk is there fake news: the availability of images, such as also highlighted by researchers at Penn State University, greatly affects the credibility of fake news. While many systems have creation filters that portray faces or situations that are considered inappropriate, Stable Diffusion by Stability.AI, as also pointed out in an article on The Verge, would allow more experienced users to generate any type of image. Including pornography.



[ad_2]

Source link