Commit Graph

414 Commits

Author SHA1 Message Date
Denis Kasak
337f38cac3
Implement a content type allow list for URL previews (#11936)
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.

This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.

Signed-off-by: Denis Kasak dkasak@termina.org.uk
2022-02-10 15:43:01 +00:00
Patrick Cloke
314ca4c86d
Pass the proper type when uploading files. (#11927)
The Content-Length header should be treated as an int, not
a string. This shouldn't have any user-facing change.
2022-02-07 10:06:52 -05:00
Patrick Cloke
807efd26ae
Support rendering previews with data: URLs in them (#11767)
Images which are data URLs will no longer break URL
previews and will properly be "downloaded" and
thumbnailed.
2022-01-24 08:58:18 -05:00
Philippe Daouadi
15ffc4143c
Fix preview of imgur and Tenor URLs. (#11669)
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
2022-01-18 13:20:24 -05:00
Patrick Cloke
10a88ba91c
Use auto_attribs/native type hints for attrs classes. (#11692) 2022-01-13 13:49:28 +00:00
Patrick Cloke
cbd82d0b2d
Convert all namedtuples to attrs. (#11665)
To improve type hints throughout the code.
2021-12-30 18:47:12 +00:00
Patrick Cloke
eb39da6782
Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00
Sean Quah
858d80bf0f
Fix media repository failing when media store path contains symlinks (#11446) 2021-12-02 16:05:24 +00:00
Sean Quah
454c3d7694 Merge branch 'master' into develop 2021-11-23 13:06:56 +00:00
Sean Quah
91f2bd0907 Prevent the media store from writing outside of the configured directory
Also tighten validation of server names by forbidding invalid characters
in IPv6 addresses and empty domain labels.
2021-11-19 13:39:15 +00:00
Patrick Cloke
9b90b9454b
Add type hints to media repository storage module (#11311) 2021-11-12 11:05:26 -05:00
Neeeflix
6ce19b94e8
Fix error in thumbnail generation (#11288)
Signed-off-by: Jonas Zeunert <jonas@zeunert.org>
2021-11-10 20:49:43 +00:00
Erik Johnston
237f7eb87a Merge remote-tracking branch 'origin/master' into develop 2021-11-02 14:28:27 +00:00
Shay
f5c6a80886
Handle missing Content-Type header when accessing remote media (#11200)
* add code to handle missing content-type header and a test to verify that it works

* add handling for missing content-type in the /upload endpoint as well

* slightly refactor test code to put private method in approriate place

* handle possible null value for content-type when pulling from the local db

* add changelog

* refactor test and add code to handle missing content-type in cached remote media

* requested changes

* Update changelog.d/11200.bugfix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2021-11-01 10:26:02 -07:00
Patrick Cloke
b3e843be88
Fix URL preview errors when previewing XML documents. (#11196) 2021-10-27 14:48:02 +00:00
Patrick Cloke
efd0074ab7
Ensure each charset is attempted only once during media preview. (#11089)
There's no point in trying more than once since it is guaranteed to
continually fail.
2021-10-14 18:51:44 +00:00
Patrick Cloke
e2f0b49b3f
Attempt different character encodings when previewing a URL. (#11077)
This follows similar logic to BeautifulSoup where we attempt different
character encodings until we find one which works.
2021-10-14 10:17:20 -04:00
Sean Quah
b59f3281d5
Remove dead code from MediaFilePaths (#11056) 2021-10-13 13:41:24 +01:00
Patrick Cloke
732bbf6737
Be more lenient when parsing the version for oEmbed responses. (#11065) 2021-10-13 07:00:07 -04:00
Patrick Cloke
9abc5f2a05 Merge remote-tracking branch 'origin/release-v1.45' into develop 2021-10-12 14:21:05 -04:00
Sean Quah
8eaffe013c
Update _wrap_in_base_path type hints to preserve function arguments (#11055) 2021-10-12 18:19:21 +01:00
Patrick Cloke
1db9282dfa
Fix formatting string when oEmbed errors occur. (#11061) 2021-10-12 17:15:42 +00:00
Patrick Cloke
1b112840d2
Autodiscover oEmbed endpoint from returned HTML (#10822)
Searches the returned HTML for an oEmbed endpoint using the
autodiscovery mechanism (`<link rel=...>`), and will request it
to generate the preview.
2021-10-08 14:14:42 -04:00
David Robertson
797ee7812d
Relax ignore-missing-imports for modules that have stubs now and update mypy (#11006)
Updating mypy past version 0.9 means that third-party stubs are no-longer distributed with typeshed. See http://mypy-lang.blogspot.com/2021/06/mypy-0900-released.html for details.
We therefore pull in stub packages in setup.py

Additionally, some modules that we were previously ignoring import failures for now have stubs. So let's use them.

The rest of this change consists of fixups to make the newer mypy + stubs pass CI.

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2021-10-08 14:49:41 +01:00
Sean Quah
2be0fde3d6
Fix empty url_cache_thumbnails/yyyy-mm-dd/ directories being left behind (#10924) 2021-09-29 10:24:37 +01:00
Sean Quah
f7768f62cb
Avoid storing URL cache files in storage providers (#10911)
URL cache files are short-lived and it does not make sense to offload
them (eg. to the cloud) or back them up.
2021-09-27 12:55:27 +01:00
Sean Quah
6c83c27107
Fix race conditions when creating media store and config directories (#10913) 2021-09-27 11:29:23 +01:00
Patrick Cloke
bb7fdd821b
Use direct references for configuration variables (part 5). (#10897) 2021-09-24 07:25:21 -04:00
Erik Johnston
50022cff96
Add reactor to SynapseRequest and fix up types. (#10868) 2021-09-24 11:01:25 +01:00
Patrick Cloke
47854c71e9
Use direct references for configuration variables (part 4). (#10893) 2021-09-23 12:03:01 -04:00
Patrick Cloke
6fc8be9a1b
Include more information in oEmbed previews. (#10819)
* Improved titles (fall back to the author name if there's not title) and include the site name.
* Handle photo/video payloads.
* Include the original URL in the Open Graph response.
* Fix the expiration time (by properly converting from seconds to milliseconds).
2021-09-22 09:45:20 -04:00
Patrick Cloke
ba7a91aea5
Refactor oEmbed previews (#10814)
The major change is moving the decision of whether to use oEmbed
further up the call-stack. This reverts the _download_url method to
being a "dumb" functionwhich takes a single URL and downloads it
(as it was before #7920).

This also makes more minor refactorings:

* Renames internal variables for clarity.
* Factors out shared code between the HTML and rich oEmbed
  previews.
* Fixes tests to preview an oEmbed image.
2021-09-21 16:09:57 +00:00
Patrick Cloke
b93259082c
Add missing type hints to non-client REST servlets. (#10817)
Including admin, consent, key, synapse, and media. All REST servlets
(the synapse.rest module) now require typed method definitions.
2021-09-15 08:45:32 -04:00
Patrick Cloke
b996782df5
Convert media repo's FileInfo to attrs. (#10785)
This is mostly an internal change, but improves type hints in the
media code.
2021-09-14 07:09:38 -04:00
Patrick Cloke
580a15e039
Request JSON for oEmbed requests (and ignore XML only providers). (#10759)
This adds the format to the request arguments / URL to
ensure that JSON data is returned (which is all that
Synapse supports).

This also adds additional error checking / filtering to the
configuration file to ignore XML-only providers.
2021-09-08 07:17:52 -04:00
Patrick Cloke
89ba834818
Use attrs internally for the URL preview code & add documentation. (#10753) 2021-09-07 13:10:34 +00:00
Patrick Cloke
e2481dbe93
Allow configuration of the oEmbed URLs. (#10714)
This adds configuration options (under an `oembed` section) to
configure which URLs are matched to use oEmbed for URL
previews.
2021-08-31 18:37:07 -04:00
Sean
7367473f96
Fix error when selecting between thumbnails with the same quality (#10684)
Fixes #10318
2021-08-25 09:51:08 +00:00
Dirk Klimpel
915b37e5ef
Admin API to delete media for a specific user (#10558) 2021-08-11 19:29:59 +00:00
sri-vidyut
8e1febc6a1
Support underscores (in addition to hyphens) for charset detection. (#10410) 2021-07-27 17:29:42 +00:00
Denis Kasak
2476d5373c
Mitigate media repo XSSs on IE11. (#10468)
IE11 doesn't support Content-Security-Policy but it has support for
a non-standard X-Content-Security-Policy header, which only supports the
sandbox directive. This prevents script execution, so it at least offers
some protection against media repo-based attacks.

Signed-off-by: Denis Kasak <dkasak@termina.org.uk>
2021-07-27 13:45:10 +02:00
Patrick Cloke
5db118626b
Add a return type to parse_string. (#10438)
And set the required attribute in a few places which will error if
a parameter is not provided.
2021-07-21 09:47:56 -04:00
Jonathan de Jong
95e47b2e78
[pyupgrade] synapse/ (#10348)
This PR is tantamount to running 
```
pyupgrade --py36-plus --keep-percent-format `find synapse/ -type f -name "*.py"`
```

Part of #9744
2021-07-19 15:28:05 +01:00
Jonathan de Jong
98aec1cc9d
Use inline type hints in handlers/ and rest/. (#10382) 2021-07-16 18:22:36 +01:00
Patrick Cloke
9e4610cc27
Correct type hints for parse_string(s)_from_args. (#10137) 2021-06-08 08:30:48 -04:00
Michael Telatynski
e8ac9ac8ca
Fix /upload 500'ing when presented a very large image (#10029)
* Fix /upload 500'ing when presented a very large image

Catch DecompressionBombError and re-raise as ThumbnailErrors

* Set PIL's MAX_IMAGE_PIXELS to match homeserver.yaml

to get it to bomb out quicker, to load less into memory
in the case of super large images

* Add changelog entry for 10029
2021-05-21 18:31:59 +02:00
Andrew Morgan
fe604a022a
Remove various bits of compatibility code for Python <3.6 (#9879)
I went through and removed a bunch of cruft that was lying around for compatibility with old Python versions. This PR also will now prevent Synapse from starting unless you're running Python 3.6+.
2021-04-27 13:13:07 +01:00
Richard van der Hoff
3ff2251754
Improved validation for received requests (#9817)
* Simplify `start_listening` callpath

* Correctly check the size of uploaded files
2021-04-23 19:20:44 +01:00
Richard van der Hoff
5a153772c1
remove HomeServer.get_config (#9815)
Every single time I want to access the config object, I have to remember
whether or not we use `get_config`. Let's just get rid of it.
2021-04-14 19:09:08 +01:00
rkfg
c9a2b5d402
More robust handling of the Content-Type header for thumbnail generation (#9788)
Signed-off-by: Sergey Shpikin <rkfg@rkfg.me>
2021-04-14 16:30:59 +01:00