AI Voice vs Human Voice - An Expert analysis

Amanda de Andrade audio engineer brazilian voice over studio and voice talent spectrogram of synthetic ai voice

It was a Thursday of August, 2023 when somebody called me from Italy. I couldn't pick up the phone immediately, so I received an email right away: the multinational company was studying to launch a new service and they wanted to consult me on the viability of this new service in Brazilian Portuguese.

Peoples' and companies' names, along with anything that can identify them or couldn't be disclosed, were omitted.

We jumped on a meeting 30 minutes later and we had a warm and exciting discussion about this new service they were considering exploring: AI voice production and post-production, also known as Artificial intelligence voice synthesis and editing.

I was indicated by an inside person (who I bring very dearly) to be the sound & BRPT language specialist to:

  • advise them on this new endeavor,
  • to test the tools and AI voice generator they had chosen,
  • to write a report,
  • and compose a step-by-step guide to control the quality of the output.

In summary, I was requested to combine my experience and knowledge as a linguist and a sound engineer specialist in voice and voice over in Brazilian Portuguese to assess the viability and the output's quality of this new service.

It was an interesting business proposal from my perspective as an audio engineer and a linguist. I was going to:

  • test and manipulate many AI voices in Brazilian Portuguese (different versions of female and male voices),
  • see how they were generated by one of the major players in the AI synthetic voice industry,
  • discover the strengths and weaknesses of synthetic voices,
  • be able to interact and exchange impressions with other sound engineers from other languages,
  • see what audio editing could - or couldn't - do to improve an AI voiceover.

So at this point, you might be convinced that this is not an article of a voice actor demonizing and criticizing the era of AI synthetic voices. It's an article from a sound engineer who specializes in voice and has an open mind and a true desire to understand the state of the industry and the real viability, risks, cons and pros of AI voice service.

Only by being open, technical, and honest, I could arrive at a lucid, unbiased conclusion (at the end of the article).

AI voiceover projects

In September 2023, the company prepared a testing project - just like a real voiceover project, with the client's briefing, requirements, and script ready to be recorded.

In late September and throughout October and November 2023, we had our first real project, and this time the client required AI voiceover synchronization with the videos in English.

A snippet of the Report on AI voiceover technology

Big glitch with a wide range of frequencies in a synthetic speech - Artificial Intelligence voiceover
Undesired tremolo in a vowel of a synthetic speech - Artificial Intelligence voiceover
Extraneous sounds in a synthetic speech - Artificial Intelligence voiceover
Poorly pronounced transients in Ms and Ns from a synthetic speech - Artificial Intelligence voiceover
Inconsistent volume in a synthetized speech from an Artificial Intelligence voiceover
Fixing pronunciation of a synthetic speech - Artificial Intelligence voiceover
Report's preview #1 on synthetic speech - Artificial Intelligence voiceover
Report's preview #2 on synthetic speech - Artificial Intelligence voiceover
Report's preview #3 on synthetic speech - Artificial Intelligence voiceover

Audio examples of AI voiceover cons

AI's poorly pronounced transients - Ms & Ns
AI's undesired tremolo in a vowel
AI's wobbling phrase with glitches

Why did they choose AI over humans?

Based on the experience from both the testing phase and the real project, I wonder why they chose AI voice for such a project.

  • You might guess: "Maybe to have a neutral, distant, robotic descriptive voice...?"

It could be their initial aspiration, but it's not what they received at the end of the project. Currently, the AI voiceover sounds more like a weird, unbalanced voice actor than a genuine robot voice.

Additionally, if the goal was to have a Voice of God narration (neutral but not robotic, of course), a professional voice actor would be the shortest way - or better, the only way.

By the way, the AI voice's timbre in Brazilian Portuguese didn't sound distant at all. In some instances (when there weren't glitches, tremolos, and weird cadences), that segment of voice could have been perceived as being from a human. It sounded like a real person's voice (lacking communication skills) from São Paulo (Paulista neutral accent) which recording was messed up and truncated.

So see, the timbre of an AI voice is not a technical issue for AI companies anymore. They have arrived at a development stage where AI's voice timbre is quite similar to a human's timbre.

The problem now is with pitch, consistency, quality of vowels and consonants, and many other aspects of the human voice that make them so suitable, versatile, and desirable.

And by the way, AI companies are intensively investing more and more in a conversational and natural tone for their synthetic voices. So I don't think the final client was looking for a distant, robotic voice. I can't see how it could have benefited them in this project. And a robotic voice was not what they took as an output, anyway.

  • "To cut down the costs?"

No, because the math doesn't make sense. If the AI voice was generated by the final client itself, surely the costs would have been lower for them. However, because of the middle company and the human QA, they were paying large amounts for a renowned company and skilled professionals in each language so we could fix the mistakes and improve the perceived quality of something that was already poor in its genesis.

  • "To deal with less people?"

Absolutely no. Well, the middle company had to deal with about the same amount of people, we, audio engineers and linguists.

  • "To experiment?"

Possibly.

  • "Hype?"

Possibly.

  • "To sound different in the market?"

That's possible too. However, since AI voices are being used more commonly in low-budget and amateur projects, there's no competitive advantage for serious businesses in being connected with this vocal aesthetic of low quality and rude speech.

Conclusion

  • Time: done at its best, AI voiceover can take much more time than human voice-over, specially because humans can adapt fast - AI is rigid and sound less natural when synchronization is needed, for example.
    • It was much more time-consuming.
    • It took me dozens of hours of analysis, manipulation, tests, editing, and mastering.
    • It would have taken me a couple of hours of voice-over work.
  • Cost: lower, but still high, with notable waste of time and resources.
    • In terms of price, I didn't see much difference from a real person, human voiceover project.
    • I spent much more time with this AI voiceover than I would with any human VO. As a consequence, the company covered many more hours of my participation than it would have spent if they had called a professional voiceover actor or voice talent to do it.
    • I know the price by which they are selling the service and it's not cheap. After all, they're committed to quality and the addition of a human touch (human supervision and QA).
  • Quality: great overall clarity of speech - indicated for informational use.
    • The output quality falls short of the quality of a recording done by a professional voice-over artist.
    • By the spectrogram, I can tell it lacks richness, frequency, and clear transients.
    • By ear, I miss the natural cadence, the variation, and the natural pauses and breaths.
    • The core message is still easily understood with an AI speech.
    • The quality sits between a very simple mic recording (without background noise) and a message listened to through the telephone.

I finished this article knowing that I had much more to report and share with you about Artificial Intelligence, Generative AI, Voice Cloning, and Speech Synthesis.

This subject doesn't end here. Send me an email to subscribe to my newsletter and become an insider. Get original and fresh information and discover what nobody is saying about AI & voice.

See you soon,

Contact Amanda

Send a message and receive her reply immediately.

Feel free to write your questions and project details here - your information is safe.