Designing Voice Interactions

Conversational UIs will be the fifth interaction paradigm. After punch-cards, consoles, WIMP and touchscreens, voice and chat are destined to be the next digital frontier. The major tech players are at war with each other for your living room, and the rate of change is staggering.

In this session Craig Pugsley covered the entire conversational UI landscape: from the predominantly voice-first platforms of Amazon and Google, to Facebook’s push into messaging and chatbots, and helped participants to build their first working conversational assistant.

A conversation is an informal exchange of ideas between two entities. In a computer conversation you are replacing one side with a massive cloud super computer. This is important because the estimated number of smart speaker sales is crazy. 45% of these users are in the US, 20% in China. UK also makes up around 20% of the market. 42% of those who have a device say it is essential to their life and people use them for everything from asking a question, setting a timer, and using a favourite third party assistant app (which is rapidly growing in popularity).

When we talk about designing conversational experience, we mean voice and chatbots. Google Assistant is a hybrid of these.

Craig took participants through the steps to build their own Google Assistant action using dialogflow.com. Participants created a new ‘agent’ (an app), named it, created an ‘intent’ and added a response.

An ‘intent’ is like a verb. You need to provide a list of things that people might say to trigger this intent. Craig explained the anatomy of a skill, which includes a wake word (Alexa, ask), an invocation name (JustEat), an intent (to order) and a slot (dim sum). He also discussed how to use SSML to markup responses.

He took participants through how to add intent details and how to test the app. He highlighted where the fuzziness in the system occurs by demonstrating where Google can handle differences in language (it knows that “Where’s my order” is the same as “Where is my order”) and where it can’t, to show how much training may be required. You need to match what people say with your intent. You do this with lots and lots of user testing.

Craig gave a quick summary of how Alexa works: it sends the audio to an ASR to translate it into text, then sends the text out to an NLP (natural language processor) to identify the intent and the slots in a statement, then uses a TTS (text to speech) server to create an audio file with the response. All this has to happen within 0.5 seconds, otherwise people will not engage.

The magic in these interfaces is something that can have a conversation with you. For this you need to create a flow, where it asks you questions. You can do this using suggestion chips that users can tap, but also allow for the user to respond to the question using speech.

Craig warned that when you build a voice assistant app you be mindful that public expectation is very high: people expect to have a proper conversation. This means you need think about how to move people on if they say something the device doesn’t understand. You also need to make sure that your app has a character, a backstory a motivation a tone, a mood and a style – because everything you have ever held a conversation with has a personality, so people will project this onto their assistant. However, you need a balance – a little too much personality can be bad! Copywriters will become your best friend – they will make your micro-copy magical. This is a key skill for a conversation designer.

Craig shared the voice design principles he has developed through working on voice assistants:

1. Be fast
2. Be malleable
3. Repair quickly
4. Have character
5. Keep context
6. Don’t make me think
7. Be intimate
8. Set expectations

In a practical exercise, Craig asked participants to extract some traits from a sample set of brand values to help develop a personality for our assistant, with a prize for the best top traits.

 

About Craig

 
Craig loves making things and making things make sense. He’s the Director and Creative Lead at StudioFlow – a digital studio specialising in emerging technologies, voice apps and chatbots – pushing the frontiers of how people engage with next-gen tech.

A hopeless technophile, Craig has been obsessed with everything bleepy since he plugged in his first Spectrum, back in the darkest 80s. After studying AI and machine learning at uni, he helped found a tech startup, worked for two of the biggest tech companies in the world, and was creative lead in a FTSE 100’s tech innovation product research lab. Alongside running StudioFlow, he hosts the South West’s leading community of voice apps and chatbot designers and builders: VCSW – Voice apps and Chatbots South West.