developers.home-assistant/intent_conversation_api.md at master

8.7 KiB

Raw Permalink Blame History

title	sidebar_label
Conversation API	Conversation API

Intents can be recognized from text and fired using the conversation integration.

An API endpoint is available which receives an input sentence and produces an conversation response. A "conversation" is tracked across multiple inputs and responses by passing a conversation id generated by Home Assistant.

The API is available via the Rest API and Websocket API.

A sentence may be POST-ed to /api/conversation/process like:

{
  "text": "turn on the lights in the living room",
  "language": "en"
}

Or sent via the WebSocket API like:

{
  "type": "conversation/process",
  "text": "turn on the lights in the living room",
  "language": "en"
}

The following input fields are available:

Name	Type	Description
`text`	string	Input sentence.
`language`	string	Optional. Language of the input sentence (defaults to configured language).
`conversation_id`	string	Optional. Unique id to track conversation. Generated by Home Assistant.

Conversation response

The JSON response from /api/conversation/process contains information about the effect of the fired intent, for example:

{
  "response": {
    "response_type": "action_done",
    "language": "en",
    "data": {
      "targets": [
        {
          "type": "area",
          "name": "Living Room",
          "id": "living_room"
        },
        {
          "type": "domain",
          "name": "light",
          "id": "light"
        }
      ],
      "success": [
        {
          "type": "entity",
          "name": "My Light",
          "id": "light.my_light"
        }
      ],
      "failed": [],
    },
    "speech": {
      "plain": {
        "speech": "Turned Living Room lights on"
      }
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

The following properties are available in the "response" object:

Name	Type	Description
`response_type`	string	One of `action_done`, `query_answer`, or `error` (see response types).
`data`	dictionary	Relevant data for each response type.
`language`	string	The language of the intent and response.
`speech`	dictionary	Optional. Response text to speak to the user (see speech).

The conversation id is returned alongside the conversation response.

Response types

Action done

The intent produced an action in Home Assistant, such as turning on a light. The data property of the response contains a targets list, where each target looks like:

Name	Type	Description
`type`	string	Target type. One of `area`, `domain`, `device_class`, `device`, `entity`, or `custom`.
`name`	string	Name of the affected target.
`id`	string	Optional. Id of the target.

Two additional target lists are included, containing the devices or entities that were a success or failed:

{
  "response": {
    "response_type": "action_done",
    "data": {
      "targets": [
        (area or domain)
      ],
      "success": [
        (entities/devices that succeeded)
      ],
      "failed": [
        (entities/devices that failed)
      ]
    }
  }
}

An intent can have multiple targets which are applied on top of each other. The targets must be ordered from general to specific:

area
- A registered area
domain
- Home Assistant integration domain, such as "light"
device_class
- Device class for a domain, such as "garage_door" for the "cover" domain
device
- A registered device
entity
- A Home Assistant entity
custom
- A custom target

Most intents end up with 0, 1 or 2 targets. 3 targets currenly only happens when device classes are involved. Examples of target combinations:

"Turn off all lights"
- 1 target: domain:light
"Turn on the kitchen lights"
- 2 targets: area:kitchen, domain:light
"Open the kitchen blinds"
- 3 targets: area:kitchen, domain:cover, device_class:blind

Query answer

The response is an answer to a question, such as "what is the temperature?". See the speech property for the answer text.

{
  "response": {
    "response_type": "query_answer",
    "language": "en",
    "speech": {
      "plain": {
        "speech": "It is 65 degrees"
      }
    },
    "data": {
      "targets": [
        {
          "type": "domain",
          "name": "climate",
          "id": "climate"
        }
      ],
      "success": [
        {
          "type": "entity",
          "name": "Ecobee",
          "id": "climate.ecobee"
        }
      ],
      "failed": [],
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

Error

An error occurred either during intent recognition or handling. See data.code for the specific type of error, and the speech property for the error message.

{
  "response": {
    "response_type": "error",
    "language": "en",
    "data": {
      "code": "no_intent_match"
    },
    "speech": {
      "plain": {
        "speech": "Sorry, I didn't understand that"
      }
    }
  }
}

data.code is a string that can be one of:

no_intent_match - The input text did not match any intents.
no_valid_targets - The targeted area, device, or entity does not exist.
failed_to_handle - An unexpected error occurred while handling the intent.
unknown - An error occurred outside the scope of intent processing.

Speech

The spoken response to the user is provided in the speech property of the response. It can either be plain text (the default), or SSML.

For plain text speech, the response will look like:

{
  "response": {
    "response_type": "...",
    "speech": {
      "plain": {
        "speech": "...",
        "extra_data": null
      }
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

If the speech is SSML, it will instead be:

{
  "response": {
    "response_type": "...",
    "speech": {
      "ssml": {
        "speech": "...",
        "extra_data": null
      }
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

Conversation Id

Conversations can be tracked by a unique id generated from within Home Assistant if supported by the answering conversation agent. To continue a conversation, retrieve the conversation_id from the HTTP API response (alongside the conversation response) and add it to the next input sentence:

Initial input sentence:

{
  "text": "Initial input sentence."
}

JSON response contains conversation id:

{
  "conversation_id": "<generated-id-from-ha>",
  "response": {
    (conversation response)
  }
}

POST with the next input sentence:

{
  "text": "Related input sentence.",
  "conversation_id": "<generated-id-from-ha>"
}

Pre-loading sentences

Sentences for a language can be pre-loaded using the WebSocket API:

{
  "type": "conversation/prepare",
  "language": "en"
}