“No-one’s using it….Writing’s quicker….It won’t catch on...It doesn’t work....” My partner informed me all these things were dismissively said when email was introduced at his work. Fast forward 46 years from its conception and 269 billion~ emails are now sent daily across the world.
Voice recognition actually predates email, harking back to IBM’s 16 word recognising Shoebox in 1962 but it’s only recently through Google, Apple and Amazon that it’s really been widely available and usable. So what’s holding us back from using our voices over our hands?
The Recent Evolution of Voice Recognition
As voice recognition tech had developed, there’s been one big issue…..It’s not really worked. After the first commercial product of the super expensive Dragon Dictate product in 1990, in early 2000s accuracy rate topped out at about 80%. That’s only 4 in 5 words being recognised. Most likely the key operator in a sentence being the one that’s misunderstood meant that even later on when voice recognition was available on Vista and OS X in ‘07, people didn’t use it.
Google’s Voice Search App for iPhone was the big breakthrough of understanding. It was the first to really leverage cloud computing and machine learning in order to process and analyse inputs. The 2010 addition in which users’ voices were recorded to help distinguish verbal links and nuances in accents further improved accuracy and recognition from a wider audience.
This Cloud computing method has been the way forward since, with Siri’s introduction with the iPhone 4S in 2011 following, and most recently Amazon Echo using ‘natural language’ and ‘automated speech’ recognition technology. This now puts accuracy around the 95%~ mark with tech also now handling more complex commands than before.
But now with voice activated devices in the majority of hands and homes, what is still holding the world back from the era of speech?
Ever been on public transport and said ‘Siri/Google [insert command]?” Maybe once? You can feel the stares, can’t you? That’s the thing. It’s embarrassing to do. You know people are listening, assessing, judging.
Whilst 90% of people have at least tried using the voice commands that came with purchased devices, only 6% have done so in public.
Psychologists Janz & Becker said in 1984 that ‘of all constructs, perceived barriers are the most significant in determining change.” It’s suggested here that it’s this expected embarrassment that needs to be overcome for it to be consistently integrated into our lives, and the key to unlocking this is a combination of speed and usefulness.
The greater the perceived embarrassment expected from a particular action, the more positive the potential outcome needs to be for the user to overcome the anxiety caused. That each different vocal interaction has different projected outcomes means each interaction is to be taken on its own merits. Whilst embarrassment is a personal attitude – e.g. some would be mortified by stumbling over themselves in public whereas some wouldn’t bat an eyelid – broadly speaking the key variable to a voice interaction that may cause this is its intimacy.
For example, asking Siri ‘What’s the weather like in Spain’ returns a result much more quickly than going onto a weather app, searching for Spain and waiting for it to load. It’s also not that personal so perceived embarrassment is low and therefore easily overcome. Opposingly the voice command ‘Add Diarrhea medicine to my Shopping List’ is similarly quicker than using a To Do List App manually, however, the embarrassment factor is much greater. The highly personal nature makes it a much more embarrassing comment to be overheard in public, with the usefulness and speed benefits not great enough to overcome this.
Whilst this tech has been about for a long time, it’s only just starting to be good enough to be really used, so the behaviour is something that’s still very alien and not habitual yet. As such it’s also suggested that the reason why the uptake in this is still so slow, is that because of this the speed and usefulness needs to be significantly greater than the perceived embarrassment from undertaking the action.
A Catalytic Echo of Change?
Using our voices to control technology well is still new and its legacy of inaccuracy has left a taste of distrust in our mouths. In order to lessen the perceived embarrassment and increase knowledge and expectation of the positive outcomes, people need a safe environment in which to experiment and form habits.
This is where Amazon Echo has potentially started the change.
Being used within your home means that you are generally either using by yourself or with trusted family and friends around. This naturally reduces any embarrassment compared to public experiments through providing a safer personal or shared experience. The vastly improved tech also facilitates more accurate and positive outcomes for the user and starts to build trust. The consequence of lower embarrassment and growth in faster, useful experiences is that habits can start to form more quickly. The first time I asked Echo to set an alarm it felt weird and I didn’t trust it’d work. But being faster and working just as well as setting up on my phone it’s quickly become a daily habit. Now when strangers are around I’ll happily chat away to the automaton free of embarrassment.
Whilst millions have been sold, global penetration of these specific devices isn’t high. However just as with letters -> email, black and white TV -> colour TV, landlines -> smartphones, pretty much any widespread movement, this is about reaching a critical mass of change to cause a mass behavioural shift. That Google has also got involved with their Home product as well as Siri being available on your Macbook - so useable in a similarly safe environment - all help. With the wave of voice controlled tech that’s being built in preparation for our Smart future - that’s set to be worth $122 billion in just 5 years time - this seismic movement is needed.
As more are sold and tech continues to develop, more will be exposed to positive voice experiences. It may be that we will soon start unashamedly bringing these personal experiences out of the safety of our home and into the world.
As said by almost anyone that’s had a quote written down: ’History repeats itself.’ Whilst our scepticism of voice activation and interaction has to date been sluggish, is the Echo of the present the tool to help us reflect our overcoming of email apprehensions echoed from the past? Could the invitation of Amazon into our homes be the catalyst to help us ‘safely’ overcome the barriers of perceived embarrassment, poor quality and speed, and unlock the door to exponentially Smarter lives?
Want the answer? Just ask your nearest robot.
 Quoted under the topic of Perceived Barriers, Introduction to Health Behavior Theory, J Aboyoun Hayden, W. Paterson, 2013. N.B. This was more aligned to health than technology at the time of the original observation.
 It is suggested that this is why more functional voice interactions such as setting an alarm or finding out the time are being performed more commonly. Embarrassment is easy to overcome, so people are doing more and experiencing the benefits of the interaction quickly and gaining confidence in having that interaction.
 Amazon are notoriously secretive with their sales stats but various sources have put sales in the multiple millions now.