4.1 KiB
title |
---|
Assist Pipelines |
The Assist pipeline integration runs the common steps of a voice assistant:
- Speech to text
- Intent recognition
- Text to speech
Pipelines are run via a WebSocket API:
{
"type": "assist_pipeline/run",
"start_stage": "stt",
"end_stage": "tts",
"input": {
"sample_rate": 16000,
}
}
The following input fields are available:
Name | Type | Description |
---|---|---|
start_stage |
enum | Required. The first stage to run. One of stt , intent , tts . |
end_stage |
enum | Required. The last stage to run. One of stt , intent , tts . |
input |
dict | Depends on start_stage . For STT, the dictionary should contain a key sample_rate with an integer value. For intent and TTS, the key text should contain the input text. |
pipeline |
string | Optional. ID of the pipeline (use assist_pipeline/pipeline/list to get names). |
conversation_id |
string | Optional. Unique id for conversation. |
timeout |
number | Optional. Number of seconds before pipeline times out (default: 30). |
Events
As the pipeline runs, it emits events back over the WebSocket connection. The following events can be emitted:
Name | Description | Emitted | Attributes |
---|---|---|---|
run-start |
Start of pipeline run | always | pipeline - ID of the pipelinelanguage - Language used for pipelinerunner_data - Extra WebSocket data:
|
run-end |
End of pipeline run | always | |
stt-start |
Start of speech to text | audio only | engine : STT engine usedmetadata : incoming audio metadata |
stt-end |
End of speech to text | audio only | stt_output - Object with text , the detected text. |
intent-start |
Start of intent recognition | always | engine - Agent engine usedlanguage : Processing language. intent_input - Input text to agent |
intent-end |
End of intent recognition | always | intent_output - conversation response |
tts-start |
Start of text to speech | audio only | engine - TTS engine usedlanguage : Output language.voice : Output voice. tts_input : Text to speak. |
tts-end |
End of text to speech | audio only | media_id - Media Source ID of the generated audiourl - URL to the generated audiomime_type - MIME type of the generated audio |
error |
Error in pipeline | On error | code - Error codemessage - Error message |
Sending speech data
After starting a pipeline with stt
as the first stage of the run and receiving a stt-start
event, speech data can be sent over the WebSocket connection as binary data. Audio should be sent as soon as it is available, with each chunk prefixed with a byte for the stt_binary_handler_id
.
For example, if stt_binary_handler_id
is 1
and the audio chunk is a1b2c3
, the message would be (in hex):
01a1b2c3
To indicate the end of sending speech data, send a binary message containing a single byte with the stt_binary_handler_id
.