Spawned from missing messages we were seeing on `matrix.org` from a
federated Gtiter bridged room, https://gitlab.com/gitterHQ/webapp/-/issues/2770.
The underlying issue in Synapse is tracked by https://github.com/matrix-org/synapse/issues/10066
where the message and join event race and the message is `soft_failed` before the
`join` event reaches the remote federated server.
Less soft_failed events = better and usually this should only trigger for events
where people are doing bad things and trying to fuzz and fake everything.
This PR updates the build tags that we perform Complement runs with to match our [buildkite pipeline](618b3e90bc/synapse/pipeline.yml (L570)), as well as adding `msc2403` (as it will be required once #9359 is merged). Build tags are what we use to determine which tests to run in Complement (really it determines which test files are compiled into the final binary).
I haven't put in a comment about updating the buildkite side here, as we've decided to migrate fully to GitHub Actions anyhow.
With the prior format, 1.33.0 / 1.33.1 / 1.33.2 got separate branches:
release-v1.33.0
release-v1.33.1
release-v1.33.2
Under the new model, all three would share a common branch:
release-v1.33
As before, RCs and actual releases exist as tags on these branches.
This better reflects our support model, e.g., that the "1.33" series had
a formal release followed by two patches / updates.
Signed-off-by: Dan Callahan <danc@element.io>
Fixes#1834.
`get_new_events_for_appservice` internally calls `get_events_as_list`, which will filter out any rejected events. If all returned events are filtered out, `_notify_interested_services` will return without updating the last handled stream position. If there are 100 consecutive such events, processing will halt altogether.
Breaking the loop is now done by checking whether we're up-to-date with `current_max` in the loop condition, instead of relying on an empty `events` list.
Signed-off-by: Willem Mulder <14mRh4X0r@gmail.com>
If backfilling is slow then the client may time out and retry, causing
Synapse to start a new `/backfill` before the existing backfill has
finished, duplicating work.
This adds quite a lot of OpenTracing decoration for database activity. Specifically it adds tracing at four different levels:
* emit a span for each "interaction" - ie, the top level database function that we tend to call "transaction", but isn't really, because it can end up as multiple transactions.
* emit a span while we hold a database connection open
* emit a span for each database transaction - actual actual transaction.
* emit a span for each database query.
I'm aware this might be quite a lot of overhead, but even just running it on a local Synapse it looks really interesting, and I hope the overhead can be offset just by turning down the sampling frequency and finding other ways of tracing requests of interest (eg, the `force_tracing_for_users` setting).
The existing tracing reports an error each time there is a timeout, which isn't
really representative.
Additionally, we log things about the way `wait_for_events` works
(eg, the result of the callback) to the *parent* span, which is confusing.
So that they render nicely in mdbook (see #10086), and so that we no longer have a mix of structured text languages in our documentation (excluding files outside of `docs/`).
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
* Make `invalidate` and `invalidate_many` do the same thing
... so that we can do either over the invalidation replication stream, and also
because they always confused me a bit.
* Kill off `invalidate_many`
* changelog
`keylen` seems to be a thing that is frequently incorrectly set, and we don't really need it.
The only time it was used was to figure out if we had removed a subtree in `del_multi`, which we can do better by changing `TreeCache.pop` to return a different type (`TreeCacheNode`).
Commits should be independently reviewable.
* Fix /upload 500'ing when presented a very large image
Catch DecompressionBombError and re-raise as ThumbnailErrors
* Set PIL's MAX_IMAGE_PIXELS to match homeserver.yaml
to get it to bomb out quicker, to load less into memory
in the case of super large images
* Add changelog entry for 10029
https://github.com/matrix-org/synapse/issues/9962 uncovered that we accidentally removed all but one of the presence updates that we store in the database when persisting multiple updates. This could cause users' presence state to be stale.
The bug was fixed in #10014, and this PR just adds a test that failed on the old code, and was used to initially verify the bug.
The test attempts to insert some presence into the database in a batch using `PresenceStore.update_presence`, and then simply pulls it out again.
Fixes: https://github.com/matrix-org/synapse/issues/9962
This is a fix for above problem.
I fixed it by swaping the order of insertion of new records and deletion of old ones. This ensures that we don't delete fresh database records as we do deletes before inserts.
Signed-off-by: Marek Matys <themarcq@gmail.com>
Also add support for giving a callback to generate the JSON object to
verify. This should reduce memory usage, as we no longer have the event
in memory in dict form (which has a large memory footprint) for extend
periods of time.
Instead of parsing the full response to `/send_join` into Python objects (which can be huge for large rooms) and *then* parsing that into events, we instead use ijson to stream parse the response directly into `EventBase` objects.
To be more consistent with similar code. The check now automatically
raises an AuthError instead of passing back a boolean. It also absorbs
some shared logic between callers.
- use a tuple rather than a list for the iterable that is passed into the
wrapped function, for performance
- test that we can pass an iterable and that keys are correctly deduped.