Previously if an autodiscovered oEmbed request failed (e.g. the
oEmbed endpoint is down or does not exist) then the entire URL
preview would fail. Instead we now return everything we can, even
if this additional request fails.
Ideally we would replace this with parsing of the Accept header
or something else, but for now just make Synapse spec compliant
by ignoring the unspecced parameter.
It does not seem that this is ever sent by a client, and even if it is
there's a reasonable fallback.
* Update mypy and mypy-zope
* Remove unused ignores
These used to suppress
```
synapse/storage/engines/__init__.py:28: error: "__new__" must return a
class instance (got "NoReturn") [misc]
```
and
```
synapse/http/matrixfederationclient.py:1270: error: "BaseException" has no attribute "reasons" [attr-defined]
```
(note that we check `hasattr(e, "reasons")` above)
* Avoid empty body warnings, sometimes by marking methods as abstract
E.g.
```
tests/handlers/test_register.py:58: error: Missing return statement [empty-body]
tests/handlers/test_register.py:108: error: Missing return statement [empty-body]
```
* Suppress false positive about `JaegerConfig`
Complaint was
```
synapse/logging/opentracing.py:450: error: Function "Type[Config]" could always be true in boolean context [truthy-function]
```
* Fix not calling `is_state()`
Oops!
```
tests/rest/client/test_third_party_rules.py:428: error: Function "Callable[[], bool]" could always be true in boolean context [truthy-function]
```
* Suppress false positives from ParamSpecs
````
synapse/logging/opentracing.py:971: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
synapse/logging/opentracing.py:1017: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
````
* Drive-by improvement to `wrapping_logic` annotation
* Workaround false "unreachable" positives
See https://github.com/Shoobx/mypy-zope/issues/91
```
tests/http/test_proxyagent.py:626: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:762: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:826: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:838: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:845: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:151: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:452: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:60: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:93: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:127: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:152: error: Statement is unreachable [unreachable]
```
* Changelog
* Tweak DBAPI2 Protocol to be accepted by mypy 1.0
Some extra context in:
- https://github.com/matrix-org/python-canonicaljson/pull/57
- https://github.com/python/mypy/issues/6002
- https://mypy.readthedocs.io/en/latest/common_issues.html#covariant-subtyping-of-mutable-protocol-members-is-rejected
* Pull in updated canonicaljson lib
so the protocol check just works
* Improve comments in opentracing
I tried to workaround the ignores but found it too much trouble.
I think the corresponding issue is
https://github.com/python/mypy/issues/12909. The mypy repo has a PR
claiming to fix this (https://github.com/python/mypy/pull/14677) which
might mean this gets resolved soon?
* Better annotation for INTERACTIVE_AUTH_CHECKERS
* Drive-by AUTH_TYPE annotation, to remove an ignore
* Fix MediaStorage type hint
* Typecheck tests.rest.media.v1.test_media_storage
* Changelog
* Remove assert and make the comment succinct
* Fix syntax for olddeps
* Perfer `type(x) is int` to `isinstance(x, int)`
This covered all additional instances I could see where `x` was
user-controlled.
The remaining cases are
```
$ rg -s 'isinstance.*[^_]int'
tests/replication/_base.py
576: if isinstance(obj, int):
synapse/util/caches/stream_change_cache.py
136: assert isinstance(stream_pos, int)
214: assert isinstance(stream_pos, int)
246: assert isinstance(stream_pos, int)
267: assert isinstance(stream_pos, int)
synapse/replication/tcp/external_cache.py
133: if isinstance(result, int):
synapse/metrics/__init__.py
100: if isinstance(calls, (int, float)):
synapse/handlers/appservice.py
262: assert isinstance(new_token, int)
synapse/config/_util.py
62: if isinstance(p, int):
```
which cover metrics, logic related to `jsonschema`, and replication and
data streams. AFAICS these are all internal to Synapse
* Changelog
It doesn't seem valid that HTML entities should appear in
the title field of oEmbed responses, but a popular WordPress
plug-in seems to do it.
There should not be harm in unescaping these.
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.
Also adds some missing type hints which were simple to fill in.
Attempt to parse any valid information from an oEmbed response
(instead of bailing at the first unexpected data). This should allow
for more partial oEmbed data to be returned, resulting in better /
more URL previews, even if those URL previews are only partial.
We incorrectly didn't use the returned `Responder` if the client had
disconnected, which meant that the resource used by the Responder
wasn't correctly released.
In particular, this exhausted the thread pools so that *all* requests
timed out.
Media downloaded as part of a URL preview is normally deleted after two days.
However, while a background database migration is running, the process is
stopped. A long-running database migration can therefore cause the media
store to fill up with old preview files.
This logic was added in #2697 to make sure that we didn't try to run the expiry
without an index on `local_media_repository.created_ts`; the original logic that
needs that index was added in #2478 (in `get_url_cache_media_before`, as
amended by 93247a424a), and is still present.
Given that the background update was added before Synapse v1.0.0, just drop
this check and assume the index exists.
Fix https://github.com/matrix-org/synapse/issues/13016
## New error code and status
### Before
Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.
```json
{
"errcode": "M_NOT_FOUND",
"error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```
### After
What does the spec say?
> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*
Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.
```json
{
"errcode": "M_UNKNOWN",
"error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```
> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)
---
We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122
We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
* Make _iterate_over_text easier to read by using simple data structures
* Prefer a set of tags to ignore
In my tests, it's 4x faster to check for containment in a set of this size
* Add a stack size limit to _iterate_over_text
* Continue accepting the case where there is no body element
* Use an early return instead for None
Co-authored-by: Richard van der Hoff <richard@matrix.org>
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.
Prefers Open Graph information over Twitter card information.
==============================
This release of Synapse adds a unique index to the `state_group_edges` table, in
order to prevent accidentally introducing duplicate information (for example,
because a database backup was restored multiple times). If your Synapse database
already has duplicate rows in this table, this could fail with an error and
require manual remediation.
Additionally, the signature of the `check_event_for_spam` module callback has changed.
The previous signature has been deprecated and remains working for now. Module authors
should update their modules to use the new signature where possible.
See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600)
for more details.
Features
--------
- Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883))
Bugfixes
--------
- Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875))
Internal Changes
----------------
- Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmKQqMcACgkQ+84Kzgcy
GG9Z2Av+N+b/fvaB3D56UkFqTW/xLmCEyri65njcXU8625bWiLSPM6hssmyJB1FA
xlc2RBKr8QxlnHRS/v31wDtONC8YZ2O3fyzYPFfY1fF5Ul7Kg3XCzLeUH4/j1/Ar
5bqriDqaN9FQ/6QJybShXlA4l7lY1Fs0C4P23jDBgqfKjnlToeVLqhVA70dDaFu/
ir+vVprKCkQI1iqnYXwIxGRmgBzLWGoVqQFGbSI6hugGwXpGIyX7+2I+0v8tI6vA
SZ99vLFWcvnd6DJTyBhIeV22Ff4qA7eQsyPvSrMETdsaZmrxGlG+t332HNCgplv8
gv2gUpJL0br++5WTAX+nRc7HpfKo/74vKeTktqPmlvFP8kUOg+PbzmoJFUu21PhA
rnq5TzgsPHK0dqBhM1RC2vtOiJ5v3ZBqzJJzSRXl6lsFpWxxFmwesEcIDAYS0Nmh
QoJb7/L8cPCHksHvZM76bzB465tSH9NhuFYZQoLGHcpxa0kYekrdlYasP8U0FU7L
nF3C0Pgw
=D3F+
-----END PGP SIGNATURE-----
Merge tag 'v1.60.0rc2' into develop
Synapse 1.60.0rc2 (2022-05-27)
==============================
This release of Synapse adds a unique index to the `state_group_edges` table, in
order to prevent accidentally introducing duplicate information (for example,
because a database backup was restored multiple times). If your Synapse database
already has duplicate rows in this table, this could fail with an error and
require manual remediation.
Additionally, the signature of the `check_event_for_spam` module callback has changed.
The previous signature has been deprecated and remains working for now. Module authors
should update their modules to use the new signature where possible.
See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600)
for more details.
Features
--------
- Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883))
Bugfixes
--------
- Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875))
Internal Changes
----------------
- Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.
Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.
Signed-off-by: Sean Quah <seanq@element.io>
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.
This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.
Signed-off-by: Denis Kasak dkasak@termina.org.uk
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
* Make it clear which methods are "internal" to the module.
* Clarify what the methods do.