Add X-Robots-Tag header to stop crawlers from indexing media (#8887)

Fixes / related to: https://github.com/matrix-org/synapse/issues/6533 This should do essentially the same thing as a robots.txt file telling robots to not index the media repo. https://developers.google.com/search/reference/robots_meta_tag Signed-off-by: Aaron Raimist <aaron@raim.ist>
2025-08-04 02:44:12 -04:00 · 2020-12-08 16:51:03 -06:00 · 2020-12-08 16:51:03 -06:00 · cd9e72b185
commit cd9e72b185
parent ab7a24cc6b
3 changed files with 19 additions and 0 deletions
--- a/synapse/rest/media/v1/_base.py
+++ b/synapse/rest/media/v1/_base.py
@ -155,6 +155,11 @@ def add_file_headers(request, media_type, file_size, upload_name):
    request.setHeader(b"Cache-Control", b"public,max-age=86400,s-maxage=86400")
    request.setHeader(b"Content-Length", b"%d" % (file_size,))

+    # Tell web crawlers to not index, archive, or follow links in media. This
+    # should help to prevent things in the media repo from showing up in web
+    # search results.
+    request.setHeader(b"X-Robots-Tag", "noindex, nofollow, noarchive, noimageindex")
+

 # separators as defined in RFC2616. SP and HT are handled separately.
 # see _can_encode_filename_as_token.