With 3 seconds of our voice the AI ​​will make us say things we never said

With 3 seconds of our voice the AI ​​will make us say things we never said

[ad_1]

In the not too distant future, the words will no longer belong to us. They will be generated by aartificial intelligence like the revolutionary one just announced by the researchers of Microsoft. Is called VALLEY and promises to replicate a person’s voice starting from a ‘sample’ of just three seconds. All while preserving (miraculously) the timbre, tone and emotionality of the speaker.

Microsoft says VALL-E will have a significant impact on all applications which I am already capable of today transform a sentence of text into spoken speech. The impact increases when you consider that VALL-E can be associated with tools such as OpenAI’s ChatGpti.e. an artificial intelligence that can generate a credible and interesting text starting from a simple written question.

Proof

ChatGPT: how the polite artificial intelligence that writes essays and solves equations works

by Francesco Marino


The difference with the methods text-to-speech that we know is huge. Unlike other models, which synthesize voice by manipulating waveform characteristics, VALL-E generates audio codecs directly from text and sound samples.

Furthermore, what changes is the time necessary for the production of the sound file. Generally, to synthesize a voice, software requires several hours of listening to the speech it intends to replicate. With VALL-E training time is practically zeroed: only three seconds of recording are submitted and, starting from these, it can be obtained a personalized speech of high qualitycontaining words that a person has never said.

The ‘training’ of VALL-E was actually done a priori. Microsoft has trained the new speech synthesizer thanks to an audio library provided by Meta: LibriLight is a popular reference for automatic speech recognition (ASR) training: contains 60,000 hours of spoken Englishprimarily readings from public domain audiobooks available at LibriVox.

Microsoft, however, has also specified a possible Achilles heel of VALL-E: For best results, the entry in the three-second sample should be similar to one in the Meta library. But since there are around 7,000 speakers on LibriLight, there’s a good chance that the voice you intend to replicate will find an adequate match.

The extraordinary thing is that you can also faithfully reproduce the background and sound interferences that mix with the voice in the original 3-second samples. This means that VALL-E, if requested, can clone the voice of a person who is speaking in a restaurant, or on the telephone, reconstructing the environment in which it was recorded.


On its research page, Microsoft has published some amazing examples of VALL-E’s work. Each example presents the sentence you want the synthetic voice to read, the sample with the original voice (Speaker Prompt), the text sentence read from the original voice (Ground Truth), and finally the text sentence read by VALL-E.

Interestingly, VALL-E is also capable of reproduce the emotional state of the original voiceas highlighted section of examples related to Speaker’s Emotion Maintenance: tones associated with anger, boredom or disgust are replicated to perfection using entirely different words than those submitted via the original 3-second voice sample.

As we have already had the opportunity to point out in recent months, when a popular app made it possible to replicate the voice of famous people such as Giorgia Meloni and Silvio Berlusconi, the use of software – such as VALL-E – which makes someone say things they have never said involves risks. Which go beyond the right to report, or satire.

The case

What is Fakeyou, the app to speak with the voice of Conte and Berlusconi

by Francesco Marino


The possibility that a miraculous tool capable of saving our voices – and perhaps generating fantastic podcasts starting from text alone – could turn into a factory of deepfakescannot be excluded.

Microsoft has decided not to provide VALL-E code for public testing at this time. Researchers know very well i potential social harm that this technology could bring about. Indeed, they are calling for responsible development and use of this tool.

The euphoria for the progress of artificial intelligence, meanwhile, is accompanied by possible new investments in a period in which Big Tech seemed to curb costs and employment.

Search engine

Microsoft challenges Google: ChatGPT will be integrated into Bing

by Francesco Marino


Own Microsoft appears to be investing $10 billion in OpenAIthe company that was once a non-profit and is now chasing the monetization of its most powerful tool: ChatGptthe artificial intelligence capable of conversing naturally with users and providing answers to their questions in a surprisingly ‘human’ way.

The CEO of OpenAI is Sam Altmanformer president of Y Combinatorone of the most popular startup accelerators in the world. Among the co-founders of OpenAI there is also Elon Musk.

[ad_2]

Source link