Commit Graph

3692 Commits

Author SHA1 Message Date
Richard van der Hoff
f01e2ca039
Use symbolic names for replication stream names (#7768)
This makes it much easier to find where streams are referenced.
2020-07-01 16:35:40 +01:00
Richard van der Hoff
244dbb04f7
Fix incorrect error message when database CTYPE was set incorrectly. (#7760) 2020-07-01 13:56:16 +01:00
Brendan Abolivier
74d3e177f0
Back out MSC2625 implementation (#7761) 2020-07-01 11:08:25 +01:00
Patrick Cloke
95e41f368b
Allow local media to be marked as safe from being quarantined. (#7718) 2020-06-22 08:04:14 -04:00
Brendan Abolivier
5a5cf6460e
Fix unread counts in sync
* Always return an unread_count in get_unread_event_push_actions_by_room_for_user
* Don't always expect unread_count to be there so we don't take out sync entirely if something goes wrong
2020-06-17 15:10:44 +01:00
Brendan Abolivier
46613aaf79
Implement unread counter (MSC2625) (#7673)
Implementation of https://github.com/matrix-org/matrix-doc/pull/2625
2020-06-17 10:58:32 +01:00
Erik Johnston
f6f7511a4c
Refactor getting replication updates from database. (#7636)
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
2020-06-16 17:10:28 +01:00
Patrick Cloke
231252516c
Fix "argument of type 'ObservableDeferred' is not iterable" error (#7708) 2020-06-16 12:01:18 -04:00
Dagfinn Ilmari Mannsåker
a3f11567d9
Replace all remaining six usage with native Python 3 equivalents (#7704) 2020-06-16 08:51:47 -04:00
Brendan Abolivier
6efb2b0ad4
Merge branch 'develop' into babolivier/mark_unread 2020-06-15 16:37:52 +01:00
Patrick Cloke
bd6dc17221
Replace iteritems/itervalues/iterkeys with native versions. (#7692) 2020-06-15 07:03:36 -04:00
Brendan Abolivier
fed493c5fd
Incorporate review 2020-06-15 09:58:55 +01:00
Patrick Cloke
2d11ea385c
Fix warnings about losing log context during UI auth. (#7688) 2020-06-12 15:01:00 -04:00
Brendan Abolivier
e186c660b1
Lint 2020-06-12 15:31:59 +01:00
Brendan Abolivier
e47e5a2dcd
Incorporate review bits 2020-06-12 15:13:12 +01:00
Brendan Abolivier
1e5a50302f
Pre-populate the unread_count column 2020-06-12 15:05:47 +01:00
Brendan Abolivier
9549d557ea
Don't update the schema version 2020-06-12 15:03:26 +01:00
Brendan Abolivier
cf92fbb8aa
Use attr instead of a dict 2020-06-12 15:02:15 +01:00
Brendan Abolivier
3cc7f43e8d
Fix summary rotation 2020-06-12 11:07:26 +01:00
Brendan Abolivier
cb6d4d07b1
Log for invalid values of notif 2020-06-11 18:30:31 +01:00
Brendan Abolivier
803291728c
Fix SQL 2020-06-11 18:25:25 +01:00
Brendan Abolivier
34fd1f7ab5
Fix schema update 2020-06-11 18:12:12 +01:00
Brendan Abolivier
d0f095625c
Lint 2020-06-11 18:04:43 +01:00
Brendan Abolivier
ce74a6685d
Save the count of unread messages to event_push_summary 2020-06-11 17:58:26 +01:00
Brendan Abolivier
df3323a7cf
Use temporary prefixes as per the MSC 2020-06-10 20:32:01 +01:00
Brendan Abolivier
c7b99a1180
Use a more efficient way of calculating counters 2020-06-10 17:54:33 +01:00
Brendan Abolivier
ef345c5a7b
Add a new unread_counter to sync responses 2020-06-10 16:21:16 +01:00
Brendan Abolivier
6f6a4bfc07
Rename dont_push into mark_unread 2020-06-10 14:24:01 +01:00
Brendan Abolivier
ec0a7b9034 Merge branch 'develop' into babolivier/mark_unread 2020-06-10 11:42:30 +01:00
Erik Johnston
664409b169
Fix bug in account data replication stream. (#7656)
* Ensure account data stream IDs are unique.

The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.

The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.

* Fix bug in account data replication stream.

If the same stream ID was used in both global and room account data then
the getting updates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).

Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.

Fixes #7617
2020-06-09 16:28:57 +01:00
Andrew Morgan
e91abfd291
async/await get_user_id_by_threepid (#7620)
Based on #7619 

async's `get_user_id_by_threepid` and its call stack.
2020-06-03 17:15:57 +01:00
Dagfinn Ilmari Mannsåker
df8a3cef6b
Improve performance of _get_state_groups_from_groups_txn (#7567)
The query keeps showing up in my slow query log.

This changes the plan under the top-level Sort node from

```
    WindowAgg  (cost=280335.88..292963.15 rows=561212 width=80) (actual time=138.651..160.562 rows=27112 loops=1)
      ->  Sort  (cost=280335.88..281738.91 rows=561212 width=84) (actual time=138.597..140.622 rows=27112 loops=1)
            Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group
            Sort Method: quicksort  Memory: 4581kB
            ->  Nested Loop  (cost=2.83..226745.22 rows=561212 width=84) (actual time=21.548..47.657 rows=27112 loops=1)
                  ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8) (actual time=21.526..21.535 rows=20 loops=1)
                        Group Key: state.state_group
                        ->  CTE Scan on state  (cost=0.00..2.02 rows=101 width=8) (actual time=21.280..21.493 rows=20 loops=1)
                  ->  Index Scan using state_groups_state_type_idx on state_groups_state  (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.005..0.991 rows=1356 loops=20)
                        Index Cond: (state_group = state.state_group)
```

to

```
    Nested Loop  (cost=2.83..226745.22 rows=561212 width=84) (actual time=24.194..52.834 rows=27112 loops=1)
      ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8) (actual time=24.130..24.138 rows=20 loops=1)
            Group Key: state.state_group
            ->  CTE Scan on state  (cost=0.00..2.02 rows=101 width=8) (actual time=23.887..24.113 rows=20 loops=1)
      ->  Index Scan using state_groups_state_type_idx on state_groups_state  (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.016..1.159 rows=1356 loops=20)
            Index Cond: (state_group = state.state_group)
```

This cuts the execution time from ~190ms to ~130ms, i.e. a reduction
of ~30%.

The full plans are visualised at https://explain.depesz.com/s/WpbT and
https://explain.depesz.com/s/KlEk

Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
2020-06-01 15:23:43 +01:00
Dagfinn Ilmari Mannsåker
2dc430d36e
Use upsert when inserting read receipts (#7607)
Fixes #7469

Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
2020-06-01 10:53:06 +01:00
Andrew Morgan
0a6e837aaa
Fix incorrect placeholder syntax in database prepartion code (#7575)
We were using `logger` syntax which isn't supported by `Exception`s.
2020-05-27 16:26:59 +01:00
Richard van der Hoff
edd9a7214c
Replace device_27_unique_idx bg update with a fg one (#7562)
The bg update never managed to complete, because it kept being interrupted by
transactions which want to take a lock.

Just doing it in the foreground isn't that bad, and is a good deal simpler.
2020-05-26 11:43:17 +01:00
Richard van der Hoff
d14c4d6b6d
Simplify reap_monthly_active_users (#7558)
we can use `make_in_list_sql_clause` rather than doing our own half-baked
equivalent, which has the benefit of working just fine with empty lists.

(This has quite a lot of tests, so I think it's pretty safe)
2020-05-23 01:20:10 +01:00
Richard van der Hoff
f4269694ce
Optimise some references to hs.config (#7546)
These are surprisingly expensive, and we only really need to do them at startup.
2020-05-22 21:47:07 +01:00
Erik Johnston
e5c67d04db
Add option to move event persistence off master (#7517) 2020-05-22 16:11:35 +01:00
Erik Johnston
1531b214fc
Add ability to wait for replication streams (#7542)
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).

Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.

People probably want to look at this commit by commit.
2020-05-22 14:21:54 +01:00
Brendan Abolivier
d1ae1015ec
Retry to sync out of sync device lists (#7453)
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it.

https://github.com/matrix-org/synapse/pull/6776 introduced some code infrastructure to mark device lists as stale/out of sync.

This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now.

It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`.

Fixes #7418
2020-05-21 17:41:12 +02:00
Erik Johnston
f6f92845f8
Fix bug in persist events when dealing with non member types. (#7548)
`_is_server_still_joined` will throw if it is given state updates with non-user ID state keys with local user leaves. This is actually rarely a problem since local leaves almost always get persisted by themselves.

(I discovered this on a branch that was otherwise broken, so I haven't seen this in the wild)
2020-05-21 13:20:10 +01:00
Richard van der Hoff
4d1afb1dfe
Merge pull request #7519 from matrix-org/rav/kill_py2_code
Kill off some old python 2 code
2020-05-18 10:45:30 +01:00
Richard van der Hoff
e6027562e2 remove builtins.buffer code from storage code
this is no longer needed on python 3
2020-05-15 19:37:41 +01:00
Richard van der Hoff
65902e08c3 remove to_ascii
this is a no-op on python 3.
2020-05-15 19:12:03 +01:00
Richard van der Hoff
08fa96f030 Remove exception_to_unicode
this is a no-op on python 3.
2020-05-15 19:07:24 +01:00
Richard van der Hoff
6c1f7c722f
Fix limit logic for AccountDataStream (#7384)
Make sure that the AccountDataStream presents complete updates, in the right
order.

This is much the same fix as #7337 and #7358, but applied to a different stream.
2020-05-15 19:03:25 +01:00
Andrew Morgan
34a43f0084 Fix a couple of small typos 2020-05-15 18:54:32 +01:00
Erik Johnston
03aff4c75e
Add a worker store for search insertion. (#7516)
This is required as both event persistence and the background update needs access to this function. It should be perfectly safe for two workers to write to that table at the same time.
2020-05-15 17:22:47 +01:00
Andrew Morgan
16090a077f
Prevent 0-member/null room_version rooms from appearing in group room queries (#7465) 2020-05-15 17:17:42 +01:00
Erik Johnston
1f36ff69e8
Move event stream handling out of slave store. (#7491)
This allows us to have the logic on both master and workers, which is necessary to move event persistence off master.

We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.
2020-05-15 16:43:59 +01:00