matrix-doc/proposals/4120-head-method-download.md

4.2 KiB

MSC4120: Allow HEAD on /download

Most servers have a media upload size limit in place which gets applied to remote downloads as well, ideally preventing "excessively" large media from transiting through the server. Unfortunately, the best way to prevent large media from being downloaded is to try downloading it.

Many HTTP client libraries support reading headers before the remaining response body, though this is time consuming and prone to issues. Some libraries do not offer the functionality at all, and require the body to be processed by the caller. Other libraries buffer the response body while the caller determines if it should continue with the request, though this buffer is typically minimal.

To prevent this exact issue, HTTP has the HEAD request method which acts like a GET request, except the response body is omitted. This proposal introduces HEAD as a legal method for download requests, allowing a requesting server to make decisions about whether to download the entire file in a subsequent request.

Proposal

HEAD becomes a legal request method on the following endpoints:

HEAD behaves as described by the HTTP specification.

Servers which do not support the HEAD method on the endpoints would respond with a 405 M_UNRECOGNIZED error code, as per the common error codes spec. In this case, requesting servers will likely have to take a risk and call the GET endpoint without knowing how much data there is to download.

In future when the media download endpoint is split into client and federation versions, like in MSC3916, it is suggested that both APIs get the same HEAD method support. This will allow clients to check cache headers, and still provide servers with information about file size.

Note: HEAD is not supported on /thumbnail as the thumbnail may be generated at the time of request and have unknown size. /download does not typically have this issue, unless a form of streaming file transfer is used, like MSC4016.

Potential issues

Adding a round trip to the already-expensive download sequence isn't great and may be an over-optimization for what is usually a rare problem.

Alternatives

As mentioned in the introduction, requesting servers could abort their request after receiving headers and possibly part of the body. This may be difficult to do with some libraries/languages, and can still result in higher-than-ideal bandwidth usage.

Security considerations

Servers should note that while the HTTP spec suggests that a HEAD request have the same headers as a GET request, the HEAD request is notably capable of lacking useful headers like Content-Length. Additionally, a malicious server could lie about the download size on HEAD and return a larger file on GET. Servers should continue to limit GET requests as best they can to stay within their size limits and bandwidth requirements, particularly when the HEAD request doesn't contain a Content-Length header.

Unstable prefix

This proposal could have an unstable prefix by versioning the endpoints themselves, however as the HTTP feature is well defined and no servers appear to be using HEAD requests currently, this proposal does not include an unstable prefix. Servers should implement HEAD as described by the HTTP specification, but only call other servers with HEAD if in an experimental or unstable mode of operation. For example, if the Synapse configuration has the HEAD feature flag disabled then no HEAD request should be generated by that Synapse instance.

Dependencies

This proposal doesn't work very well without MSC4138.