matrix-doc/proposals/XXXX-matrixDB-encrypted-eve...

119 lines
3.5 KiB
Markdown

# MSC0000: Encrypted event indexing
While matrix rooms make a great place to store data it is very hard to retrieve this
data in a structured way.
In encrypted rooms the only approach is to download all `m.room.encrypted` events
and then parse all of them locally. So all server optimizations an indexed database
provides are not available.
It would be phenomenal and enable new ways to use matrix if one could do things like
in an encrypted room:
- Get all events of a specific type.
- Fetch all events that are tagged with a specific usecase.
- Get all events that fullfill simple conditions event.value is in interval. e.g:
- get all events describing users in a specific
- age interval,
- get all videos with a specific length interval,
- get all images in a specific x and y geographic coordinate interval,
- get all calender events with specific participants,
- get all calender events within a specific date,
- ...
Here an idea is proposed that tries to leak a limited amount of metadata but
still allow the homeserver to retrieve the correct
events and send it to the homeserver.
## Proposal
To achieve this clients can introduce indexed fields.
To the homeserver these are readable because they are stored
outside the encrypted payload (like relations) and are made out of a UUID
and a floating point value.
```json
"index":{
"001234567890UUID0987654321": 16.0,
"101234567890UUID0987654321": 812324567653452.1,
...
}
```
The goal is to make the index by itself useless because there is no immediate
pattern/structure to it. Instead the meaning of the `index` section of each
event is derived by other encrypted events. `m.index-descriptor`
```json
{
"type": "m.index-descriptor",
"target_value": ["content","path_to","target_property"]
"index": "1234index-uuid4321",
"data_type": "date", "integer", "float", "enum" "custom",
"enum_table":{
"value1":[100,200],
"value1":[200,2000],
...
}
"transform":{
"function":sin(x),
"offset":[103.2, 100]
}
}
```
The transform.function and transform.offset will be used to transform the values.
The full equation looks as following:
`integral(abs(transform.function(input + offset[0])))+offset[1] = index_value`
for each event the client will compute this and add it to the event outside the
encrypted content.
If the client wants to fetch a specific range it applies the transform to the
borders of the range and asks the homeserver for all events that fulfill the condition.
Request:
```json
{
"index": "1234index-uuid4321",
"from": 0,
"to": 1,
"closest_to": 100,
"page_size": 50,
"page_token": "token_for_next_page_acquired_by_last_request"
}
```
Since the page tokens are unique it is enough to just sent a page token.
The homeserver will reuse the page size from the previous request.
The order can be defined by swapping the value of from/to.
cloest_to can not be used in combination with from/to in case both
are provided the hs has to ignore the closest_to property.
closest to will return an array where the `abs(index_value - closest_to)` is ascending.
Response:
```json
{
"events":[{"MatrixEvent"}],
"next_page_token": "token_for_next_page"
}
```
All `m.index-descriptor` events are listed with index entry state events
`m.index-entry`:
```json
{
"type": "m.index-entry",
"index_event_id": 1234event-uuid4321
}
```
## Potential issues
## Alternatives
Using unencrypted rooms for this kind of applications.
## Security considerations