Commit graph

206 commits

Author SHA1 Message Date
Tulir Asokan
ca9515d2c7 Merge remote-tracking branch 'upstream/release-v1.67' 2022-09-06 08:22:38 -04:00
Patrick Cloke
303b40b988
Do not wait for background updates to complete do expire URL cache. (#13657)
Media downloaded as part of a URL preview is normally deleted after two days.
However, while a background database migration is running, the process is
stopped. A long-running database migration can therefore cause the media
store to fill up with old preview files.

This logic was added in #2697 to make sure that we didn't try to run the expiry
without an index on `local_media_repository.created_ts`; the original logic that
needs that index was added in #2478 (in `get_url_cache_media_before`, as
amended by 93247a424a), and is still present.

Given that the background update was added before Synapse v1.0.0, just drop
this check and assume the index exists.
2022-08-30 07:15:54 -04:00
Tulir Asokan
b0f213fd3d Merge remote-tracking branch 'upstream/release-v1.64' 2022-07-28 10:49:41 +03:00
Patrick Cloke
1381563988
Inline URL preview documentation. (#13261)
Inline URL preview documentation near the implementation.
2022-07-12 15:01:58 -04:00
Tulir Asokan
1543a3643e Merge remote-tracking branch 'upstream/release-v1.61' 2022-06-14 13:54:26 +03:00
Patrick Cloke
148fe58a24
Do not break URL previews if an image is unreachable. (#12950)
Avoid breaking a URL preview completely if the chosen image 404s
or is unreachable for some other reason (e.g. DNS).
2022-06-06 07:46:04 -04:00
Tulir Asokan
8975980844 Merge remote-tracking branch 'upstream/release-v1.60' 2022-05-24 14:19:02 +03:00
Andrew Morgan
57f6c496d0
URL preview cache expiry logs: INFO -> DEBUG, text clarifications (#12720) 2022-05-12 18:16:32 +01:00
Tulir Asokan
b2fa6ec9f6 Merge remote-tracking branch 'upstream/release-v1.57' 2022-04-21 13:53:47 +03:00
Brendan Abolivier
f96b85eca8
Ensure the type of URL attributes is always str when matching against preview blacklist (#12333) 2022-03-31 11:49:49 +02:00
Tulir Asokan
ef6a9c8a70 Merge remote-tracking branch 'upstream/release-v1.56' 2022-03-29 13:30:30 +03:00
Patrick Cloke
4587b35929
Clean-up logic for rebasing URLs during URL preview. (#12219)
By using urljoin from the standard library and reducing the number
of places URLs are rebased.
2022-03-16 07:21:36 -04:00
Tulir Asokan
0b2b774c33 Merge remote-tracking branch 'upstream/release-v1.54' 2022-03-02 13:31:59 +02:00
Richard van der Hoff
e24ff8ebe3
Remove HomeServer.get_datastore() (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
AndrewRyanChama
066171643b
Fetch images when previewing Twitter URLs. (#11985)
By including "bot" in the User-Agent, which some sites use
to decide whether to include additional Open Graph information.
2022-02-22 07:11:39 -05:00
Tulir Asokan
d12ff25f13 Merge remote-tracking branch 'upstream/release-v1.53' 2022-02-15 13:32:02 +02:00
Denis Kasak
337f38cac3
Implement a content type allow list for URL previews (#11936)
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.

This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.

Signed-off-by: Denis Kasak dkasak@termina.org.uk
2022-02-10 15:43:01 +00:00
Tulir Asokan
9870604741 Merge remote-tracking branch 'upstream/release-v1.52' 2022-02-01 14:22:34 +02:00
Patrick Cloke
807efd26ae
Support rendering previews with data: URLs in them (#11767)
Images which are data URLs will no longer break URL
previews and will properly be "downloaded" and
thumbnailed.
2022-01-24 08:58:18 -05:00
Tulir Asokan
50ad805804 Merge remote-tracking branch 'upstream/release-v1.51' 2022-01-21 15:58:03 +02:00
Philippe Daouadi
15ffc4143c
Fix preview of imgur and Tenor URLs. (#11669)
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
2022-01-18 13:20:24 -05:00
Tulir Asokan
e9caf56ca0 Merge remote-tracking branch 'upstream/release-v1.50' 2022-01-07 14:21:32 +02:00
Patrick Cloke
eb39da6782
Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00
Tulir Asokan
9f4fa40b64 Merge remote-tracking branch 'upstream/release-v1.48' 2021-11-25 18:33:37 +02:00
Patrick Cloke
9b90b9454b
Add type hints to media repository storage module (#11311) 2021-11-12 11:05:26 -05:00
Tulir Asokan
8d54d3bbbf Merge remote-tracking branch 'upstream/release-v1.46' 2021-11-02 15:35:09 +02:00
Patrick Cloke
b3e843be88
Fix URL preview errors when previewing XML documents. (#11196) 2021-10-27 14:48:02 +00:00
Tulir Asokan
cf45cfd314 Merge remote-tracking branch 'upstream/release-v1.46' 2021-10-27 15:42:34 +03:00
Patrick Cloke
efd0074ab7
Ensure each charset is attempted only once during media preview. (#11089)
There's no point in trying more than once since it is guaranteed to
continually fail.
2021-10-14 18:51:44 +00:00
Patrick Cloke
e2f0b49b3f
Attempt different character encodings when previewing a URL. (#11077)
This follows similar logic to BeautifulSoup where we attempt different
character encodings until we find one which works.
2021-10-14 10:17:20 -04:00
Patrick Cloke
732bbf6737
Be more lenient when parsing the version for oEmbed responses. (#11065) 2021-10-13 07:00:07 -04:00
Tulir Asokan
80adb0a6ca Merge remote-tracking branch 'upstream/release-v1.45' 2021-10-12 13:54:46 +03:00
Patrick Cloke
1b112840d2
Autodiscover oEmbed endpoint from returned HTML (#10822)
Searches the returned HTML for an oEmbed endpoint using the
autodiscovery mechanism (`<link rel=...>`), and will request it
to generate the preview.
2021-10-08 14:14:42 -04:00
Tulir Asokan
8631aaeb5a Merge remote-tracking branch 'upstream/release-v1.44' 2021-09-29 16:08:42 +03:00
Sean Quah
2be0fde3d6
Fix empty url_cache_thumbnails/yyyy-mm-dd/ directories being left behind (#10924) 2021-09-29 10:24:37 +01:00
Sean Quah
f7768f62cb
Avoid storing URL cache files in storage providers (#10911)
URL cache files are short-lived and it does not make sense to offload
them (eg. to the cloud) or back them up.
2021-09-27 12:55:27 +01:00
Patrick Cloke
bb7fdd821b
Use direct references for configuration variables (part 5). (#10897) 2021-09-24 07:25:21 -04:00
Erik Johnston
50022cff96
Add reactor to SynapseRequest and fix up types. (#10868) 2021-09-24 11:01:25 +01:00
Patrick Cloke
6fc8be9a1b
Include more information in oEmbed previews. (#10819)
* Improved titles (fall back to the author name if there's not title) and include the site name.
* Handle photo/video payloads.
* Include the original URL in the Open Graph response.
* Fix the expiration time (by properly converting from seconds to milliseconds).
2021-09-22 09:45:20 -04:00
Patrick Cloke
ba7a91aea5
Refactor oEmbed previews (#10814)
The major change is moving the decision of whether to use oEmbed
further up the call-stack. This reverts the _download_url method to
being a "dumb" functionwhich takes a single URL and downloads it
(as it was before #7920).

This also makes more minor refactorings:

* Renames internal variables for clarity.
* Factors out shared code between the HTML and rich oEmbed
  previews.
* Fixes tests to preview an oEmbed image.
2021-09-21 16:09:57 +00:00
Patrick Cloke
b93259082c
Add missing type hints to non-client REST servlets. (#10817)
Including admin, consent, key, synapse, and media. All REST servlets
(the synapse.rest module) now require typed method definitions.
2021-09-15 08:45:32 -04:00
Tulir Asokan
1e11b73441 Merge remote-tracking branch 'upstream/release-v1.43' 2021-09-14 10:00:40 -04:00
Patrick Cloke
89ba834818
Use attrs internally for the URL preview code & add documentation. (#10753) 2021-09-07 13:10:34 +00:00
Patrick Cloke
e2481dbe93
Allow configuration of the oEmbed URLs. (#10714)
This adds configuration options (under an `oembed` section) to
configure which URLs are matched to use oEmbed for URL
previews.
2021-08-31 18:37:07 -04:00
Tulir Asokan
7359964d9f Merge remote-tracking branch 'upstream/release-v1.40' 2021-08-03 14:38:58 +03:00
sri-vidyut
8e1febc6a1
Support underscores (in addition to hyphens) for charset detection. (#10410) 2021-07-27 17:29:42 +00:00
Patrick Cloke
5db118626b
Add a return type to parse_string. (#10438)
And set the required attribute in a few places which will error if
a parameter is not provided.
2021-07-21 09:47:56 -04:00
Tulir Asokan
9754df5623 Merge remote-tracking branch 'upstream/release-v1.39' 2021-07-20 16:33:01 +03:00
Jonathan de Jong
98aec1cc9d
Use inline type hints in handlers/ and rest/. (#10382) 2021-07-16 18:22:36 +01:00
Tulir Asokan
c9cee6c534 Merge remote-tracking branch 'upstream/release-v1.33.0' 2021-04-28 14:15:25 +03:00