This PR:
- creates an OpenMetrics server that enables collecting performance data from this process by e.g. a Prometheus server;
- exposes as metrics the performance of http requests with MatrixBot.
Further metrics may of course be added.
This started out as just a way to find out why mjolnir was syncing with lists several times for each update to a policy list.
The main changes are
- Verbosity was irrelevant to the sync command but for some reason was an option.
Unfortunately all this did was suppress whether to tell you when it had finished, meaning it wouldn't
when verbose logging was disabled. Historically this was probably a parameter that got passed through
to applyServerAcl/applyUserBans, which can be horribly verbose, but they access the config directly.
- Stop emitting `'PolicyList.update'` when there are no changes.
- Include a revision ID for the `'PolicyList.update'`method and event.
- Use the revision ID in the `ProtectedRoomsSet` so that we don't unnecessarily resynchronize all rooms when the `'PolicyList.update'` event is received. Though not when the `sync` command is used. Since this is supposed to `sync` in the case when there is a state reset or otherwise or the user has changed some room settings.
- insert an await lock around the `PolicyList.update` method to avoid a race condition where a call can be started and finished within the extent of an existing call (via another task, this can happen if the server is slow with handling one request). `PolicyList.udpate` now has a helper that is synchronous to be called directly after requesting the room state. The reason for this is to enforce that no one `await`s while updating the policy list's cache of rules. Which is important because it is one of the biggest methods that I tolerate and visually checking for `await` is impossible.
- The revision ID uses a ULID, but this is unnecessary and could have just been a "dumb counter".
closes https://github.com/matrix-org/mjolnir/issues/447
The Sentry package is very useful for monitoring runtime errors. With this PR,
we simply add the necessary mechanism to:
- log to sentry any uncaught error that reaches the toplevel, including startup errors.
* upgrade to matrix-appservice-bridge 8.0.0
this is so we can use their new logger
* Configure and use matrix-appservice-bridge's `Logger`
https://github.com/matrix-org/mjolnir/issues/422
Haven't changed all of the mjolnir components to use this,
just the appservice.
The fact that we've configured this properly means we get
logging from matrix-appservice-bridge components too (we didn't before).
* use try/catch instead
Mjolnir can now be run as an application service,
meaning it will host multiple independent mjolnirs that can be requested by users.
If the user is on the same homeserver as the appservice is deployed on,
then they can provision a mjolnir via a widget https://github.com/matrix-org/mjolnir-widget.
Otherwise they can invite the appservice bot to a room they want to protect.
This will create them a mjolnir, a management room and a policy list.
The appservice shares the same docker image as the bot,
but is started slightly differently by specifying "appservice"
as the first argument to docker run (this s managed by `mjolnir-entrypoint.sh`.
We could have used another Dockerfile for the appservice,
extending the existing one but we decided not to because there
would have been lots of fiddling around the entrypoint
and logistics involved around adding a tag for it via github actions.
Not to mention that this would be duplicating the image
just to run it with a different binary.
A list of followup issues can be found here https://github.com/issues?q=is%3Aopen+is%3Aissue+author%3AGnuxie+archived%3Afalse+label%3AA-Appservice.
Somewhat relevant and squashed commit messages(regrettably squashing because frankly these won't make sense in isolation):
* draft widget backend
* add `managementRoomId` to `provisionNewMjolnir`
* remove ratelimits from appservice mjolnirs
* add /join endpoint to api backend
* tighter guard around room type in PolicyList
matrix-bot-sdk imporved the types for this
* enable esModuleInterop
* launch and use postgres in a container whilst using mx-tester
* limited access control
policy list used for access control
* Redesign initialization API of many mjolnir.
It's much harder to forget to initialize the components now that you have to in order to construct them in the first place.
* Ammend config not to clash with existing CI
this means that the appsrvice bot is now called 'mjolnir-bot' by default
which was easier than going through old code base and renaming
* Change entrypoint in Dockerfile so that we can start the appservice.
We could have used another Dockerfile for the appservice,
extending the exising one but we decided not to because there
would have been lots of fiddling around the entrypoint
and logistics involved around adding a tag for it via github actions.
Not to mention that this would be duplicating the image
just to run it with a different binary.
This solution is much simpler, backwards compatible, and conscious about the future.
Co-authored-by: gnuxie <gnuxie@element.io>
The combination of `resyncJoinedRooms`, `unprotectedWatchedListRooms`,
`explicitlyProtectedRoomIds`, `protectedJoinedRoomIds` was incomprehensible.
https://github.com/matrix-org/mjolnir/issues/370
Separating out the management of `explicitlyProtectedRoomIds`, then
making sure all policy lists have to be explicitly protected
(in either setting of `config.protectAllJoinedRooms`) will make
this code much much simpler.
We will later change the `status` command to explicitly show
which lists are watched and which are watched and protected.
* Remove debug leftovers from a test.
This is really terrible and has meant whenever anyone has run `yarn test:integration` they have only been running this test.
💀💀💀https://www.youtube.com/watch?v=jmX-tzSOFE0
* Set a default timeout for integration tests that is 5 minutes long.
Seriously, I don't think there is much to gain by making people guess
a reasnoble time for a test to complete in all the time, especially
with how much Synapse changes in response time and all of the machines
involved in running these tests.
* Warn when giving up on being throttled
* For some reason it takes longer for events to appear in /state
no i am not going to track down why yet.
* Rate limiting got a lot more aggresive.
https://github.com/matrix-org/synapse/pull/13018
Rate limiting in Synapse used to reset the burst count and remove
the backoff when you were spamming continuously, now it doesn't.
Ideally we'd rewrite the rate limiting logic to back off for longer
than suggested so we could get burst again, but for now
lets just unblock CI by reducing the number of events we send in these
tests.
The reason we want this is so that people do not forget to change the version number in the synapse module.
The precedent is that the version number has been 0.1.0 since the begging until now.
While this solution does mean that there may be new version of the module where
nothing has actually changed, this is still better than not changing the version at all.
Another version scheme for the module would be inconsistent to
the git repository tags and that has the potential to cause much more confusion
than "blank" version bumps.
If this is a problem, then antispam must be extracted to another repository.
In order to test this, run `yarn version --patch` observe the changes with `git log` and `git diff HEAD~`,
then YOU MUST delete the tag with `git tag --delete vd.d.d` when you are finished.
* more robust