We start all pushers on start up and immediately start a background
process to fetch push to send. This makes start up incredibly painful
when dealing with many pushers.
Instead, let's do a quick fast DB check to see if there *may* be push to
send and only start the background processes for those pushers. We also
stagger starting up and doing those checks so that we don't try and
handle all pushers at once.
This brings it into line with on_new_notifications and on_new_receipts. It
requires a little bit of hoop-jumping in EmailPusher to load the throttle
params before the first loop.
`on_new_notifications` and `on_new_receipts` in `HttpPusher` and `EmailPusher`
now always return synchronously, so we can remove the `defer.gatherResults` on
their results, and the `run_as_background_process` wrappers can be removed too
because the PusherPool methods will now complete quickly enough.
Each pusher has its own loop which runs for as long as it has work to do. This
should run in its own background thread with its own logcontext, as other
similar loops elsewhere in the system do - which means that CPU usage is
consistently attributed to that loop, rather than to whatever request happened
to start the loop.
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.
This is unsatisfactory for a number of reasons:
- logging on garbage collection is best-effort and may happen some time after
the error, if at all
- it can be hard to figure out where the error actually happened.
- it is logged as a scary CRITICAL error which (a) I always forget to grep for
and (b) it's not really CRITICAL if a background process we don't care about
fails.
So this is an attempt to add exception handling to everything we fire off into
the background.
Update the last stream ordering if the
`get_unread_push_actions_for_user_in_range_for_email` returns no new
push actions. This reduces the range that it needs to check next
iteration.
for the email and http pushers rather than trying to make a single
method that will work with their conflicting requirements.
The http pusher needs to get the messages in ascending stream order, and
doesn't want to miss a message.
The email pusher needs to get the messages in descending timestamp order,
and doesn't mind if it misses messages.
Were it not for that fact that you can't use the base handler in the pusher because it pulls in the world. Comitting while I fix that on a different branch.
* After initial 10 minute window, only alert every 24h for room notifs
* Reset room state after 6h of idleness
* Synchronise throttles for messages sent in the same notif, so the 24 hourly notifs 'line up'
* Fix the email subjects to say what triggered the notification
* Order the rooms in reverse activity order in the email, so the 'reason' room should always come first