Commit Graph

138 Commits

Author SHA1 Message Date
Patrick Cloke
0963d39ea6
Handle additional errors when previewing URLs. (#9333)
* Handle the case of lxml not finding a document tree.
* Parse the document encoding from the XML tag.
2021-02-08 12:33:30 -05:00
Patrick Cloke
4937fe3d6b
Try to recover from unknown encodings when previewing media. (#9164)
Treat unknown encodings (according to lxml) as UTF-8
when generating a preview for HTML documents. This
isn't fully accurate, but will hopefully give a reasonable
title and summary.
2021-01-26 07:32:17 -05:00
Patrick Cloke
d34c6e1279
Add type hints to media rest resources. (#9093) 2021-01-15 10:57:37 -05:00
Patrick Cloke
1f3748f033
Do not raise a 500 exception when previewing empty media. (#8883) 2020-12-07 10:00:08 -05:00
Richard van der Hoff
11c9e17738
Add type annotations to SimpleHttpClient (#8372) 2020-09-24 15:47:20 +01:00
Patrick Cloke
aec294ee0d
Use slots in attrs classes where possible (#8296)
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
2020-09-14 12:50:06 -04:00
Patrick Cloke
4e874ed593
Remove unnecessary maybeDeferred calls (#8044) 2020-08-07 09:44:48 -04:00
David Vo
4dd27e6d11
Reduce unnecessary whitespace in JSON. (#7372) 2020-08-07 08:02:55 -04:00
Erik Johnston
a7bdf98d01
Rename database classes to make some sense (#8033) 2020-08-05 21:38:57 +01:00
Patrick Cloke
68626ff8e9
Convert the remaining media repo code to async / await. (#7947) 2020-07-27 14:40:11 -04:00
Patrick Cloke
3fc8fdd150
Support oEmbed for media previews. (#7920)
Fixes previews of Twitter URLs by using their oEmbed endpoint to grab content.
2020-07-27 07:50:44 -04:00
Erik Johnston
5cdca53aa0
Merge different Resource implementation classes (#7732) 2020-07-03 19:02:19 +01:00
Erik Johnston
b44bdd7f7b
Support running multiple media repos. (#7706)
This requires a new config option to specify which media repo should be
responsible for running background jobs to e.g. clear out expired URL
preview caches.
2020-06-17 14:13:30 +01:00
Dagfinn Ilmari Mannsåker
a3f11567d9
Replace all remaining six usage with native Python 3 equivalents (#7704) 2020-06-16 08:51:47 -04:00
Michael Kaye
5308239d5d
Reduce logging verbosity of URL cache cleanup. (#7295) 2020-04-22 07:45:16 -04:00
Andrew Morgan
a48138784e
Allow specifying the value of Accept-Language header for URL previews (#7265) 2020-04-15 13:35:29 +01:00
Patrick Cloke
caec7d4fa0
Convert some of the media REST code to async/await (#7110) 2020-03-20 07:20:02 -04:00
Erik Johnston
b0a66ab83c
Fixup synapse.rest to pass mypy (#6732) 2020-01-20 17:38:21 +00:00
Erik Johnston
4a33a6dd19 Move background update handling out of store 2019-12-05 11:11:26 +00:00
Richard van der Hoff
ef1a85e773
Fix startup error when http proxy is defined. (#6421)
Guess I only tested this on python 2 :/

Fixes #6419.
2019-11-26 18:10:50 +00:00
Andrew Morgan
3916e1b97a
Clean up newline quote marks around the codebase (#6362) 2019-11-21 12:00:14 +00:00
Richard van der Hoff
5570d1c93f
Merge pull request #6334 from matrix-org/rav/url_preview_limit_title_2
Fix exception when OpenGraph tag values are ints
2019-11-05 17:28:11 +00:00
Richard van der Hoff
81d49cbb07 Fix exception when OpenGraph tag values are ints 2019-11-05 17:22:58 +00:00
Richard van der Hoff
55a7da247a
Merge branch 'develop' into rav/url_preview_limit_title 2019-11-05 17:08:07 +00:00
Richard van der Hoff
e78167c94b
Apply suggestions from code review
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-Authored-By: Erik Johnston <erik@matrix.org>
2019-11-05 16:46:39 +00:00
Richard van der Hoff
e9bfe719ba Strip overlong OpenGraph data from url preview
... to stop people causing DoSes with malicious web pages
2019-11-05 15:51:18 +00:00
Richard van der Hoff
1cb84c6486
Support for routing outbound HTTP requests via a proxy (#6239)
The `http_proxy` and `HTTPS_PROXY` env vars can be set to a `host[:port]` value which should point to a proxy.

The address of the proxy should be excluded from IP blacklists such as the `url_preview_ip_range_blacklist`.

The proxy will then be used for
 * push
 * url previews
 * phone-home stats
 * recaptcha validation
 * CAS auth validation

It will *not* be used for:
 * Application Services
 * Identity servers
 * Outbound federation
 * In worker configurations, connections from workers to masters

Fixes #4198.
2019-11-01 14:07:44 +00:00
Andrew Morgan
54fef094b3
Remove usage of deprecated logger.warn method from codebase (#6271)
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
2019-10-31 10:23:24 +00:00
Michael Kaye
e4d98188da Address codestyle concerns 2019-10-24 18:43:13 +01:00
Michael Kaye
8f4a808d9d Delay printf until logging is required.
Using % will cause the string to be generated even if debugging
is off.
2019-10-24 18:31:53 +01:00
Erik Johnston
ca3e01e50d Fix store_url_cache using bytes 2019-10-10 14:52:29 +01:00
Andrew Morgan
2a44782666
Remove double return statements (#5962)
Remove all the "double return" statements which were a result of us removing all the instances of

```
defer.returnValue(...)
return
```

statements when we switched to python3 fully.
2019-09-03 11:42:45 +01:00
Amber Brown
4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Amber Brown
463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Amber Brown
0ee9076ffe Fix media repo breaking (#5593) 2019-07-02 19:01:28 +01:00
Amber Brown
f40a7dc41f
Make the http server handle coroutine-making REST servlets (#5475) 2019-06-29 17:06:55 +10:00
Amber Brown
32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Andrew Morgan
2f48c4e1ae
URL preview blacklisting fixes (#5155)
Prevents a SynapseError being raised inside of a IResolutionReceiver and instead opts to just return 0 results. This thus means that we have to lump a failed lookup and a blacklisted lookup together with the same error message, but the substitute should be generic enough to cover both cases.
2019-05-10 10:32:44 -07:00
Amber Brown
ea6abf6724
Fix IP URL previews on Python 3 (#4215) 2018-12-22 01:56:13 +11:00
Amber Brown
8b1affe7d5
Fix Content-Disposition in media repository (#4176) 2018-11-15 15:55:58 -06:00
Amber Brown
df758e155d
Use <meta> tags to discover the per-page encoding of html previews (#4183) 2018-11-15 11:05:08 -06:00
Amber Brown
b3708830b8
Fix URL preview bugs (type error when loading cache from db, content-type including quotes) (#4157) 2018-11-08 01:37:43 +11:00
Richard van der Hoff
ef771cc4c2 Fix a number of flake8 errors
Broadly three things here:

* disable W504 which seems a bit whacko
* remove a bunch of `as e` expressions from exception handlers that don't use
  them
* use `r""` for strings which include backslashes

Also, we don't use pep8 any more, so we can get rid of the duplicate config
there.
2018-10-24 10:39:03 +01:00
Erik Johnston
f6a0a02a62 Fix bug where we raised StopIteration in a generator
This made python 3.7 unhappy
2018-10-17 16:10:52 +01:00
Erik Johnston
8601c24287 Fix some instances of ExpiringCache not expiring cache items
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.

This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
2018-09-21 14:19:46 +01:00
Amber Brown
02aa41809b
Port rest/ to Python 3 (#3823) 2018-09-12 20:41:31 +10:00
Amber Brown
b37c472419
Rename async to async_helpers because async is a keyword on Python 3.7 (#3678) 2018-08-10 23:50:21 +10:00
Richard van der Hoff
03751a6420 Fix some looping_call calls which were broken in #3604
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604.

Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
2018-07-26 11:48:08 +01:00
Richard van der Hoff
371da42ae4 Wrap a number of things that run in the background
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
2018-07-25 09:41:12 +01:00
Krombel
32fd6910d0 Use parse_{int,str} and assert from http.servlet
parse_integer and parse_string can take a request and raise errors
in case we have wrong or missing params.
This PR tries to use them more to deduplicate some code and make it
better readable
2018-07-13 21:40:14 +02:00