Synchronize Spoken Text with Text on the Screen

Note: Learn how to improve your skills with APL with Build visually rich experiences using APL at the Alexa Learning Lab.

Your skill response can associate speech with an APL Text component, and issue a command that highlights lines of text as the speech audio is played, to create a "karoke" effect that shows the lines that are in focus for a block of text.

To use this feature, you must provide speech data as plain text or as marked-up text using Speech Synthesis Markup Language (SSML) expressions. Before this data can be consumed by an Alexa-enabled device, it must be transformed into speech. To enable this transformation, you can use the ssmlToSpeech transformer to transform the text to speech and strip SSML tags from an SSML expression. These transformers cannot be used with the audio tag.

ssmlToSpeech and ssmlToText transformers

ssmlToSpeech and ssmlToText transformers

Property	Type	Required	Description
`transformer`	enum: `ssmlToSpeech` \| `ssmlToText`	Yes	The type of transformation required. Initially, two transformers will be available: 1) `ssmlToSpeech` converts a data source value to a text-to-speech URL, and 2) `ssmlToText` converts an SSML expression to plain text by stripping out any SSML tags.
`inputPath`	string	Yes	The path of the data source value that needs to be transformed.
`outputName`	string	No	The name of the data source property where the transformed output will be stored. This output property will always be a sibling of the input property. If an `outputName` isn't provided, the value in the `inputPath` will be replaced with the output of the transformer.

The following sample APL document shows a version of a "Cat Facts" skill that associates speech with a Text component bound to a cat fact. The Text component is wrapped in a ScrollView component. This means the device will automatically scroll to the parts of the cat fact that aren't visible on screen as they are spoken.

Part of an APL document that shows a Text component that binds to speech

{
    "type": "ScrollView",
    "item": {
        "type": "Text",
        "id": "catFactText",
        "text": "${catFactData.properties.catFact}",
        "speech": "${catFactData.properties.catFactSpeech}"
    }
}

The following sample shows the corresponding object data source and transformers sent by skill developers.

Object data source and transformer bound to the APL document

{
 "datasources": {
  "catFactData": {
   "type": "object",
   "properties": {
    "backgroundImage": "https://.../catfacts.png",
    "title": "Cat Fact #9",
    "logoUrl": "https://.../logo.png",
    "image": "https://.../catfact9.png",
    "catFactSsml": "<speak>Not all cats like <emphasis level='strong'>catnip</emphasis>.</speak>"
   },
   "transformers": [{
     "inputPath": "catFactSsml",
     "outputName": "catFactSpeech",
     "transformer": "ssmlToSpeech"
    },
    {
     "inputPath": "catFactSsml",
     "outputName": "catFact",
     "transformer": "ssmlToText"
    }
   ]
  }
 }
}

In this snippet, the transformed data source is now set to the device.

Transformed data source received by the device

{
    "datasources": {
        "catFactData": {
            "type": "object",
            "properties": {
                "backgroundImage": "https://.../catfacts.png",
                "title": "Cat Fact #9",
                "logoUrl": "https://.../logo.png",
                "image": "https://.../catfact9.png",
                "catFactSsml": "<speak>Not all cats like <emphasis level='strong'>catnip</emphasis>.</speak>",
                "catFactSpeech": "https://tinyurl.amazon.com/aaaaaa/catfact.mp3",
                "catFact": "Not all cats like catnip."**
            }
        }
    }
}

To read the cat fact, you must use the Alexa.Presentation.APL.ExecuteCommands directive with the SpeakItem command. The next snippet shows the Alexa.Presentation.APL.ExecuteCommands directive that you can use to read the cat fact. The token supplied in the ExecuteCommands directive is required, and must match the token provided by the skill in the RenderDocument directive used to render the APL document.

An Alexa.Presentation.APL.ExecuteCommands skill directive with a SpeakItem command

{
    "type" : "Alexa.Presentation.APL.ExecuteCommands",
    "token": "[SkillProvidedToken]",
    "commands": [{
        "type": "SpeakItem",
        "componentId" : "catFactText"
    }]
}

Was this page helpful?

Provide feedback

Last updated: Nov 28, 2023