developers.home-assistant/docs/intent_conversation_api.md

8.7 KiB

title sidebar_label
Conversation API Conversation API

Intents can be recognized from text and fired using the conversation integration.

An API endpoint is available which receives an input sentence and produces an conversation response. A "conversation" is tracked across multiple inputs and responses by passing a conversation id generated by Home Assistant.

The API is available via the Rest API and Websocket API.

A sentence may be POST-ed to /api/conversation/process like:

{
  "text": "turn on the lights in the living room",
  "language": "en"
}

Or sent via the WebSocket API like:

{
  "type": "conversation/process",
  "text": "turn on the lights in the living room",
  "language": "en"
}

The following input fields are available:

Name Type Description
text string Input sentence.
language string Optional. Language of the input sentence (defaults to configured language).
conversation_id string Optional. Unique id to track conversation. Generated by Home Assistant.

Conversation response

The JSON response from /api/conversation/process contains information about the effect of the fired intent, for example:

{
  "response": {
    "response_type": "action_done",
    "language": "en",
    "data": {
      "targets": [
        {
          "type": "area",
          "name": "Living Room",
          "id": "living_room"
        },
        {
          "type": "domain",
          "name": "light",
          "id": "light"
        }
      ],
      "success": [
        {
          "type": "entity",
          "name": "My Light",
          "id": "light.my_light"
        }
      ],
      "failed": [],
    },
    "speech": {
      "plain": {
        "speech": "Turned Living Room lights on"
      }
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

The following properties are available in the "response" object:

Name Type Description
response_type string One of action_done, query_answer, or error (see response types).
data dictionary Relevant data for each response type.
language string The language of the intent and response.
speech dictionary Optional. Response text to speak to the user (see speech).

The conversation id is returned alongside the conversation response.

Response types

Action done

The intent produced an action in Home Assistant, such as turning on a light. The data property of the response contains a targets list, where each target looks like:

Name Type Description
type string Target type. One of area, domain, device_class, device, entity, or custom.
name string Name of the affected target.
id string Optional. Id of the target.

Two additional target lists are included, containing the devices or entities that were a success or failed:

{
  "response": {
    "response_type": "action_done",
    "data": {
      "targets": [
        (area or domain)
      ],
      "success": [
        (entities/devices that succeeded)
      ],
      "failed": [
        (entities/devices that failed)
      ]
    }
  }
}

An intent can have multiple targets which are applied on top of each other. The targets must be ordered from general to specific:

Most intents end up with 0, 1 or 2 targets. 3 targets currenly only happens when device classes are involved. Examples of target combinations:

  • "Turn off all lights"
    • 1 target: domain:light
  • "Turn on the kitchen lights"
    • 2 targets: area:kitchen, domain:light
  • "Open the kitchen blinds"
    • 3 targets: area:kitchen, domain:cover, device_class:blind

Query answer

The response is an answer to a question, such as "what is the temperature?". See the speech property for the answer text.

{
  "response": {
    "response_type": "query_answer",
    "language": "en",
    "speech": {
      "plain": {
        "speech": "It is 65 degrees"
      }
    },
    "data": {
      "targets": [
        {
          "type": "domain",
          "name": "climate",
          "id": "climate"
        }
      ],
      "success": [
        {
          "type": "entity",
          "name": "Ecobee",
          "id": "climate.ecobee"
        }
      ],
      "failed": [],
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

Error

An error occurred either during intent recognition or handling. See data.code for the specific type of error, and the speech property for the error message.

{
  "response": {
    "response_type": "error",
    "language": "en",
    "data": {
      "code": "no_intent_match"
    },
    "speech": {
      "plain": {
        "speech": "Sorry, I didn't understand that"
      }
    }
  }
}

data.code is a string that can be one of:

  • no_intent_match - The input text did not match any intents.
  • no_valid_targets - The targeted area, device, or entity does not exist.
  • failed_to_handle - An unexpected error occurred while handling the intent.
  • unknown - An error occurred outside the scope of intent processing.

Speech

The spoken response to the user is provided in the speech property of the response. It can either be plain text (the default), or SSML.

For plain text speech, the response will look like:

{
  "response": {
    "response_type": "...",
    "speech": {
      "plain": {
        "speech": "...",
        "extra_data": null
      }
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

If the speech is SSML, it will instead be:

{
  "response": {
    "response_type": "...",
    "speech": {
      "ssml": {
        "speech": "...",
        "extra_data": null
      }
    }
  },
  "conversation_id": "<generated-id-from-ha>",
}

Conversation Id

Conversations can be tracked by a unique id generated from within Home Assistant if supported by the answering conversation agent. To continue a conversation, retrieve the conversation_id from the HTTP API response (alongside the conversation response) and add it to the next input sentence:

Initial input sentence:

{
  "text": "Initial input sentence."
}

JSON response contains conversation id:

{
  "conversation_id": "<generated-id-from-ha>",
  "response": {
    (conversation response)
  }
}

POST with the next input sentence:

{
  "text": "Related input sentence.",
  "conversation_id": "<generated-id-from-ha>"
}

Pre-loading sentences

Sentences for a language can be pre-loaded using the WebSocket API:

{
  "type": "conversation/prepare",
  "language": "en"
}

The following input fields are available:

Name Type Description
language string Optional. Language of the sentences to load (defaults to configured language).