developers.home-assistant/docs/voice/pipelines/index.md

4.1 KiB

title
Assist Pipelines

The Assist pipeline integration runs the common steps of a voice assistant:

  1. Speech to text
  2. Intent recognition
  3. Text to speech

Pipelines are run via a WebSocket API:

{
  "type": "assist_pipeline/run",
  "start_stage": "stt",
  "end_stage": "tts",
  "input": {
    "sample_rate": 16000,
  }
}

The following input fields are available:

Name Type Description
start_stage enum Required. The first stage to run. One of stt, intent, tts.
end_stage enum Required. The last stage to run. One of stt, intent, tts.
input dict Depends on start_stage. For STT, the dictionary should contain a key sample_rate with an integer value. For intent and TTS, the key text should contain the input text.
pipeline string Optional. ID of the pipeline (use assist_pipeline/pipeline/list to get names).
conversation_id string Optional. Unique id for conversation.
timeout number Optional. Number of seconds before pipeline times out (default: 30).

Events

As the pipeline runs, it emits events back over the WebSocket connection. The following events can be emitted:

Name Description Emitted Attributes
run-start Start of pipeline run always pipeline - ID of the pipeline
language - Language used for pipeline
runner_data - Extra WebSocket data:
  • stt_binary_handler_id is the prefix to send speech data over.
  • timeout is the max run time for the whole pipeline.
run-end End of pipeline run always
stt-start Start of speech to text audio only engine: STT engine used
metadata: incoming audio metadata
stt-end End of speech to text audio only stt_output - Object with text, the detected text.
intent-start Start of intent recognition always engine - Agent engine used
language: Processing language.
intent_input - Input text to agent
intent-end End of intent recognition always intent_output - conversation response
tts-start Start of text to speech audio only engine - TTS engine used
language: Output language.
voice: Output voice.
tts_input: Text to speak.
tts-end End of text to speech audio only media_id - Media Source ID of the generated audio
url - URL to the generated audio
mime_type - MIME type of the generated audio
error Error in pipeline On error code - Error code
message - Error message

Sending speech data

After starting a pipeline with stt as the first stage of the run and receiving a stt-start event, speech data can be sent over the WebSocket connection as binary data. Audio should be sent as soon as it is available, with each chunk prefixed with a byte for the stt_binary_handler_id.

For example, if stt_binary_handler_id is 1 and the audio chunk is a1b2c3, the message would be (in hex):

01a1b2c3

To indicate the end of sending speech data, send a binary message containing a single byte with the stt_binary_handler_id.