Back to Member Vault

Riding the wave of AI audio evolution

Tags:

Ad Tech AI Audio and Voice
Ad Tech AI Audio and Voice

This content was created by an IAB UK member

Members of IAB UK can contribute to the Member Vault. Log in to submit your content.

According to a recent study published in eMarketer, a fifth of all time spent with digital media in the US will be on digital audio in 2024, writes AudioStack's Co-Founder & CEO, Timo Kunz 

The audio sector & why it's flourishing right now

According to a recent study published in eMarketer, a fifth of all time spent with digital media in the US will go to digital audio in 2024. Despite not being as explosive as 10 years ago, the audio sector is still growing. This growth has been generating more and more interest in audio as a channel for a while now, increasing media share and budgets. Due to the way audio is being produced and broadcasted, these budgets have traditionally been rather small compared to other channels such as video.

Thanks to digital audio, addressability - the ability to reach specific target audiences, especially with dynamic content - has been solved and audiences can be identified a lot better than in traditional radio on-the-air broadcasting. 

However, that still leaves us with the linear process of creating audio: a script has to be created for a product or service, then a speaker will read the script in a studio, which is then being mixed and polished by an engineer to sound perfect. Enter synthetic media production. Due to the vast advancements in machine learning and AI in recent years, the process of creating audio can now be accelerated significantly, meaning audio can now be created faster than real-time (i.e. production is faster than the script could be read in real-time). 

AI has also revolutionised the creative process, integrating seamlessly into traditional creative processes, i.e. helping with script creation, or using human voice or music recordings alongside synthetically generated assets. That is where AI’s real strength comes into play, working hand-in-hand with humans.

On top of that audio has also become modular and dynamic. This means that a speaker, music or script can be changed at any time in the process, instead of having to go back to the beginning and casting another speaker or adding a new music track in the studio. This comes in handy when you want to adjust a Christmas campaign to a summer one or to exchange product offerings on the fly.
 

The biggest opportunities in AI audio 

For the first time, audio creation can be accelerated to a point where use cases that have been prohibitive for over a century are now possible. Examples include real-time audio generation, podcast and news generation based on personal preferences or dynamic advertising using real-time data such as weather, local product stock or music. On top of that, this myriad of possibilities can easily be used for voice overs in video applications.

In a nutshell, audio is becoming so addressable, flexible and scalable, that media creators, publishers and advertisers are rethinking how they generate their content.

This is exactly what AudioStack does, offering different audio production workflows for enterprises to create audio ads from scratch in seconds or build thousands of versions of an audiotrack programmatically. These workflows can also be combined, allowing for highly customised processes and can be integrated into any media creation system.

As an example, Australian agency Creative Fix used AudioStack to build real-time news ads for News Corp. In a groundbreaking campaign, headlines and sub-headlines taken from articles on news.com.au were used to programmatically build 30 second ads that were seamlessly inserted into matching podcasts, across categories such as breaking news, finance, entertainment, lifestyle, or tech. These ads had a maximum lifespan of 12 hours, meaning that only synthetic media was a feasible method of creating the audio production for each spot. Click here to listen.
 

The challenges in the market

Over the last five years, voice quality has been the biggest showstopper for synthetic media. Lack of emotion, mistakes in pronunciation or limited dynamic range (how diverse words and sentences sound) have put a hard stop on any use case where the information wasn’t a key element of the production. This is being overcome by ever better text-to-speech and speech-to-speech technology, as well as growing investments into the industry, creating a flywheel effect. Using AudioStack, large corporations such as McDonald’s and Porsche have started to air 100% synthetic audio ads or even hybrid TV commercials.

Another challenge essential to synthetic media has been the lack of accountability for companies and their users when creating media assets, especially with respect to privacy, identity and copyright. However, the industry has grown more mature and solid regulations are being put into place by the EU and US government. This has been leading to AI media companies being more transparent about how they use training data or deleting it automatically. Also, contracts now include more specific and prohibitive clauses about voice identity and model usage, ensuring a more responsible and ethical use of AI in audio.

AudioStack decided to become SOC2 compliant and built auditability into the systems in order to create a robust, audited permission management system, protecting customer’s IP.

The last challenge is a certain lack of deliverability for dynamic audio assets. Given that it has never been possible to create audio content fast and dynamically, ad tech platforms haven’t felt the need to develop technical tooling that would allow users to automatically create dynamic audio content and serve it to a specific audience at scale. AudioStack actively pursues and develops these technologies together with leading players in the ad tech field, such as Adswizz or Acast. The goal is to make it easy for advertisers to air the right content to the right audience, in the right moment or context, no matter how diverse it might be. Sounds good, doesn’t it?
 

About AudioStack

AudioStack is the world’s leading end-to-end enterprise solution for AI audio production. The proprietary technology connects AI-powered media creation forms such as AI script generation, text-to-speech, speech-to-speech, generative music and dynamic versioning. This allows enterprises to build complex audio production workflows faster than real-time. AudioStack unlocks cost and time efficient audio that is addressable at scale, without compromising on quality.

https://audiostack.ai/

https://www.linkedin.com/company/audiostack-ai

For press enquiries please contact [email protected] 

By Timo Kunz, Co-founder & CEO

AudioStack

AudioStack is a London/Barcelona/New York based software company. AudioStack.ai is the world's leading infrastructure for fully automated, scalable AI audio production. By connecting cutting edge technologies such as text to speech, music, AI-based post production or versioning, audio brands and agencies can build complex audio production workflows in a breeze. For the first time, audio assets can be created in real-time - unlocking completely new use cases that are faster, addressable at scale and more cost-effective than ever before.

Posted on: Tuesday 10 September 2024