4.2 KiB
MSC4120: Allow HEAD
on /download
Most servers have a media upload size limit in place which gets applied to remote downloads as well, ideally preventing "excessively" large media from transiting through the server. Unfortunately, the best way to prevent large media from being downloaded is to try downloading it.
Many HTTP client libraries support reading headers before the remaining response body, though this is time consuming and prone to issues. Some libraries do not offer the functionality at all, and require the body to be processed by the caller. Other libraries buffer the response body while the caller determines if it should continue with the request, though this buffer is typically minimal.
To prevent this exact issue, HTTP has the HEAD
request method which acts like a GET
request, except the response body is omitted. This proposal
introduces HEAD
as a legal method for download requests, allowing a requesting server to make
decisions about whether to download the entire file in a subsequent request.
Proposal
HEAD
becomes a legal request method on the following endpoints:
/_matrix/media/v3/download/:serverName/:mediaId
/_matrix/media/v3/download/:serverName/:mediaId/:fileName
HEAD
behaves as described by the HTTP specification.
Servers which do not support the HEAD
method on the endpoints would respond with a 405 M_UNRECOGNIZED
error code, as per the common error codes spec.
In this case, requesting servers will likely have to take a risk and call the GET
endpoint without
knowing how much data there is to download.
In future when the media download endpoint is split into client and federation versions, like in
MSC3916, it is suggested that both
APIs get the same HEAD
method support. This will allow clients to check cache headers, and still
provide servers with information about file size.
Note: HEAD
is not supported on /thumbnail
as the thumbnail may be generated at the time of
request and have unknown size. /download
does not typically have this issue, unless a form of streaming
file transfer is used, like MSC4016.
Potential issues
Adding a round trip to the already-expensive download sequence isn't great and may be an over-optimization for what is usually a rare problem.
Alternatives
As mentioned in the introduction, requesting servers could abort their request after receiving headers and possibly part of the body. This may be difficult to do with some libraries/languages, and can still result in higher-than-ideal bandwidth usage.
Security considerations
Servers should note that while the HTTP spec suggests
that a HEAD
request have the same headers as a GET
request, the HEAD
request is notably capable
of lacking useful headers like Content-Length
. Additionally, a malicious server could lie about
the download size on HEAD
and return a larger file on GET
. Servers should continue to limit GET
requests as best they can to stay within their size limits and bandwidth requirements, particularly
when the HEAD
request doesn't contain a Content-Length
header.
Unstable prefix
This proposal could have an unstable prefix by versioning the endpoints themselves, however as the
HTTP feature is well defined and no servers appear to be using HEAD
requests currently, this proposal
does not include an unstable prefix. Servers should implement HEAD
as described by the HTTP specification,
but only call other servers with HEAD
if in an experimental or unstable mode of operation. For example,
if the Synapse configuration has the HEAD
feature flag disabled then no HEAD
request should be
generated by that Synapse instance.
Dependencies
This proposal doesn't work very well without MSC4138.