Human interface design considerations contribute significantly to a customer’s sense of privacy and understanding. These considerations involve the customer interacting with or receiving information from the device itself, not from a voice agent.
Products that support more than one simultaneous voice agent have unique challenges both in designing usable controls, such as buttons, and in representing their current state with attention system displays and audio cues.
The following best practices are designed to ensure that your customer always knows when a device is active and detecting wake words. These recommendations are vitally important to maintaining customer trust.
It is recommended that buttons or other controls that interact with agents, such as Play and Pause buttons, do so consistently between agents. Devices should not implement separate sets of similar controls for different agents. Note that this will require that the device be able to direct the button press command to the proper agent.
These best practices apply whether the buttons are physical or virtual, and they also inform the decisions about which commands map to Universal Device Commands.
Overloaded buttons, or other controls, may have more than one function or be able to invoke more than one agent. They may behave differently based on a mode the device is in, or on the length or pattern of pressing the button. Overloaded buttons are intrinsically difficult to use for customers, who must then remember both the extra functions of the control, as well as more than one method of interaction.
With multiple agents on a device, there is a risk of even more complicated interactions. Overloaded buttons should be avoided when possible. If you must overload a button, you should:
The microphone on/off button must not be overloaded. Amazon recommends that the Action button not be overloaded.
Universal Device Commands (UDCs) are those commands and controls that a customer may use with any compatible agent to control certain device functions, even if the agent was not used to initiate the experience. UDCs can broadly be classified in two categories:
UDCs are a necessary feature for devices with multiple active agents, and aim to satisfy customer expectations and solve for the most common frustration points and address customer expectations.
For example: Imagine one person sets an alarm and then leaves the room. Then another person enters the room, hears the alarm sound, and wants to turn it off. The interaction should be possible using any compatible agent, and should not result in an agent telling the person that there are no alarms set. Similarly, customers should be able to use any active agent to control the device’s volume, much like using the volume control buttons on the device.
A recommended set of UDCs is listed below. Your product may include other UDCs depending upon the experience and agent capabilities. When considering implementing additional commands, keep in mind:
In this version of the Design Guide, we include the following categories of Universal Device Commands that multi-agent devices should consistently support:
As a user, I want to invoke any compatible agent on device and say ‘Stop’ to stop any active media playback including music, radio, long-form audio (ebooks, podcasts, news, etc.) and videos.
As a user, I want to invoke any compatible agent on device and say ‘Stop’ (and variants like ‘End’) to stop any active streaming smart home camera feeds.
As a user, I want to invoke any compatible agent on device and say ‘Reject’ (and variants like ‘Stop’, ‘Cancel’) to reject an incoming phone, audio, or video call, regardless of which agent provides the service.
As a user, I want to invoke any compatible agent on device and say ‘Stop’ (and supported variants like ‘End’, ‘Cancel’, etc.) to stop any ongoing agent speech activity, regardless of which agent is speaking.
As a user, I want to invoke any compatible agent on device and say ‘Stop’ (and supported variants like ‘End’, ‘Shut up’, etc.) to stop the intended foreground activity when there is more than one active session (e.g. Timer over Music, Weather TTS over Music), regardless of which agent initiated the activities.
As a user, I want to invoke any compatible agent on device and say ‘Set volume up/down’ (and variants like ’turn it up/down’) to change the global volume setting, regardless of which agent set it last.
As a user, I want to invoke any compatible agent on device and say ‘Set volume to N’ (where N = values from 0 to 10) to change the global volume setting, regardless of which agent set it last.
As a user, I want to invoke any compatible agent on device and say ‘Mute’ to mute the global volume setting, regardless of which agent set it last.