3.2 KiB
MSC3618: Simplify federation /send
response
Overview
Currently we specify that the federation /send
endpoint returns a body of
pdus: { string: PDU Processing Result}
. In theory a homeserver can return
information here on an event-by-event basis as to whether there was a problem
processing events in the transaction or not.
However, this does not really make much difference in practice — soft-fails
are silent and rejected events may be too – and server implementations do not
"cherry-pick" which events in a transaction to retry later. Since the presence
of a txnId
in the request implies that we should consider a transaction to be
idempotent for a given txnId
, we should therefore either accept that the
entire transaction was accepted successfully by the remote side or we should
retry the entire transaction.
The worst case is that the homeserver is not able to process the transaction at all for some reason, i.e. due to the database being down or similar, in which case the server really should just return a HTTP 500 status code and this signals to the sender to retry later.
Proposal
This MSC proposes that we remove the pdus
section from the response body, so
that we return only one of two conditions:
- A HTTP 200 with a
{}
body to signal that the transaction was accepted; - A HTTP 500 to signal that there was a problem with the transaction and to retry sending later.
Benefits
A significant benefit is that the receiving homeserver no longer needs to block the
the /send
request in order to wait for the events to be processed for their PDU Processing Result
s.
Given that it is possible for a transaction to contain events from multiple rooms, or
EDUs for unrelated purposes, it is bad that a single busy room can lengthen the amount of
time to return the /send
response to the caller. This means that new events for other
rooms may be held back unnecessarily by processing events for a single busy room, as
per the spec:
The sending server must wait and retry for a 200 OK response before sending a transaction with a different txnId to the receiving server.
With this proposal, the receiving server no longer needs to wait for PDU Processing Result
s
as this MSC does away with them. Receiving servers that do not want to durably persist transactions
before processing them can continue to perform all work in-memory by continuing to block the /send
request until all processing is completed, as may be done today. Additionally, a receiving
server that is receiving too many transactions from a remote homeserver may wish to block for
an arbitrary period of time for rate-limiting purposes, but this is an implementation specific
detail and not strictly required.
Another benefit is that sending homeservers no longer need to parse the response body at all and can instead just determine whether the transaction was accepted successfully by observing the HTTP status code.
Potential issues
Synapse appears to use the "pdus"
key for logging (see here).
Conduit does the same and treats the response as an empty list if it is not present. Dendrite
ignores the response body altogether.