Privacy and Security

Strong privacy and security guidelines are also key aspects to earning and maintaining customer trust. This section describes some of the basic building blocks involved in designing simultaneous multi-agent experiences to help earn and maintain customer trust.

Customer Privacy

Devices should provide transparent, easily predictable and expected behaviors and experiences to customers when operating a device with multiple simultaneously available voice agents.

Baseline Guidance

The presence and use of multiple agents should not compromise a customer’s privacy.

Device makers should ensure that a customer’s voice recording (or“utterance”) is sent only to the agent that the customer intends to invoke (i.e. the agent whose wake word the customer uses).
Devices or agents should implement an attention system (eg.LEDsorvoice chrome) to ensure customers know that an agent is collecting a voice recording.
Customers should be able to easily understand when any voice recording is shared between agents, and have the ability to provide consent for experiences that require sharing recordings or other types of data.
Each voice agent should provide customers transparency by enabling them to see and understand which voice recordings were handled by that agent. If an agent provides a voice history, customers should be able to delete it.

Attention States and Attention System

The attention system on a device is an important factor in building and maintaining customer trust in your device. Just as it is for single agent products, multi-agent products should clearly communicate the current attention state to customers. Customers should easily be able to understand what state the device is in, or any active agent on the device, as well as when that state changes. This section describes recommendations for attention system behaviors in multi-agent interactions.

All coexisting agents should convey at least the 3 core attention states:

Listening

Meaning an agent has been invoked, either by voice or touch, and is recording a customer utterance.

Thinking

Meaning playing a voice reply, or otherwise delivering a response to the customer. (Optional for non-voice responses or for devices replying using visuals on a screen).

Speaking

Meaning the agent or device is processing a request or waiting for a reply from the voice service. (This may not apply when, for example, local agents have no perceived latency between Listening and Speaking.)

Visual and sound cues for the 3 core attention states should be clear and easy to understand for all active agents, even if some are unique to an agent or device.

Baseline Guidance

All agents and devices should convey to customers the core attention states: Listening, Thinking (when applicable), and Speaking (eg. displayed on the device, listed in Settings, or indicated in a companion app).
Agents should not use attention state colors and sound cues which conflict in meaning. For example, the same color should not be used as Listening for one agent and Mic Off for another.
It is very important for a product to convey a device’s Microphone On/Off state.

Agent and Device Security

Securing a device with multiple simultaneous voice agents requires a multifaceted approach in each step of the development process and beyond. Device and agent makers should evaluate potential threat scenarios by performing threat modeling for all features and use cases for their device. The following list represents general security guidelines.

Baseline Guidance

The presence and use of multiple agents should never compromise the security of the device or the customer's data.

A device should not store any data related to personal customer information. Any required storage of personal data should be minimized and encrypted.
All customer data in the cloud should be handled in a secure manner (eg.access control, automatic logging, encryption, multi-factor authentication).
A device should have hardware and software security capabilities that include secure boot, a trusted compute boundary, an anti-roll-back mechanism, and should support hardware-based cryptographic engines.
A device should implement sufficient hardening and access control techniques to limit system access to authorized users, processes, or applications.
A device should implement adequate authorization, authentication, and input sanitization mechanisms.
A device should implement a secure software update process to apply all security patches.
A device should implement secure transmission of data between a device and the cloud, such as use of latest TLS, certificate validation of cloud endpoints.