developers.home-assistant/index.md at recorder_excluded_attributes

4.1 KiB

Raw Permalink Blame History

title
Assist Pipelines

The Assist pipeline integration runs the common steps of a voice assistant:

Speech to text
Intent recognition
Text to speech

Pipelines are run via a WebSocket API:

{
  "type": "assist_pipeline/run",
  "start_stage": "stt",
  "end_stage": "tts",
  "input": {
    "sample_rate": 16000,
  }
}

The following input fields are available:

Name	Type	Description
`start_stage`	enum	Required. The first stage to run. One of `stt`, `intent`, `tts`.
`end_stage`	enum	Required. The last stage to run. One of `stt`, `intent`, `tts`.
`input`	dict	Depends on `start_stage`. For STT, the dictionary should contain a key `sample_rate` with an integer value. For intent and TTS, the key `text` should contain the input text.
`pipeline`	string	Optional. ID of the pipeline (use `assist_pipeline/pipeline/list` to get names).
`conversation_id`	string	Optional. Unique id for conversation.
`timeout`	number	Optional. Number of seconds before pipeline times out (default: 30).

Events

As the pipeline runs, it emits events back over the WebSocket connection. The following events can be emitted:

Name	Description	Emitted	Attributes
`run-start`	Start of pipeline run	always	`pipeline` - ID of the pipeline `language` - Language used for pipeline `runner_data` - Extra WebSocket data: `stt_binary_handler_id` is the prefix to send speech data over. `timeout` is the max run time for the whole pipeline.
`run-end`	End of pipeline run	always
`stt-start`	Start of speech to text	audio only	`engine`: STT engine used `metadata`: incoming audio metadata
`stt-end`	End of speech to text	audio only	`stt_output` - Object with `text`, the detected text.
`intent-start`	Start of intent recognition	always	`engine` - Agent engine used `language`: Processing language. `intent_input` - Input text to agent
`intent-end`	End of intent recognition	always	`intent_output` - conversation response
`tts-start`	Start of text to speech	audio only	`engine` - TTS engine used `language`: Output language. `voice`: Output voice. `tts_input`: Text to speak.
`tts-end`	End of text to speech	audio only	`media_id` - Media Source ID of the generated audio `url` - URL to the generated audio `mime_type` - MIME type of the generated audio
`error`	Error in pipeline	On error	`code` - Error code `message` - Error message

Sending speech data

After starting a pipeline with stt as the first stage of the run and receiving a stt-start event, speech data can be sent over the WebSocket connection as binary data. Audio should be sent as soon as it is available, with each chunk prefixed with a byte for the stt_binary_handler_id.

For example, if stt_binary_handler_id is 1 and the audio chunk is a1b2c3, the message would be (in hex):

01a1b2c3

To indicate the end of sending speech data, send a binary message containing a single byte with the stt_binary_handler_id.

4.1 KiB Raw Permalink Blame History

Events

Sending speech data

4.1 KiB

Raw Permalink Blame History