PostgreSQL may underestimate the number of distinct `room_id`s in
`event_search`, which can cause it to use table scans for queries for
multiple rooms.
Fix this by setting `n_distinct` on the column.
Resolves#14402.
Signed-off-by: Sean Quah <seanq@matrix.org>
Support a unified search query syntax which leverages more of the full-text
search of each database supported by Synapse.
Supports, with the same syntax across Postgresql 11+ and Sqlite:
- quoted "search terms"
- `AND`, `OR`, `-` (negation) operators
- Matching words based on their stem, e.g. searches for "dog" matches
documents containing "dogs".
This is achieved by
- If on postgresql 11+, pass the user input to `websearch_to_tsquery`
- If on sqlite, manually parse the query and transform it into the sqlite-specific
query syntax.
Note that postgresql 10, which is close to end-of-life, falls back to using
`phraseto_tsquery`, which only supports a subset of the features.
Multiple terms separated by a space are implicitly ANDed.
Note that:
1. There is no escaping of full-text syntax that might be supported by the database;
e.g. `NOT`, `NEAR`, `*` in sqlite. This runs the risk that people might discover this
as accidental functionality and depend on something we don't guarantee.
2. English text is assumed for stemming. To support other languages, either the target
language needs to be known at the time of indexing the message (via room metadata,
or otherwise), or a separate index for each language supported could be created.
Sqlite docs: https://www.sqlite.org/fts3.html#full_text_index_queries
Postgres docs: https://www.postgresql.org/docs/11/textsearch-controls.html
Implement the /threads endpoint from MSC3856.
This is currently unstable and behind an experimental configuration
flag.
It includes a background update to backfill data, results from
the /threads endpoint will be partial until that finishes.
Applies the proper logic for unthreaded and threaded receipts to either
apply to all events in the room or only events in the same thread, respectively.
When retrieving counts of notifications segment the results based on the
thread ID, but choose whether to return them as individual threads or as
a single summed field by letting the client opt-in via a sync flag.
The summarization code is also updated to be per thread, instead of per
room.
c.f. #12993 (comment), point 3
This stores all device list updates that we receive while partial joins are ongoing, and processes them once we have the full state.
Note: We don't actually process the device lists in the same ways as if we weren't partially joined. Instead of updating the device list remote cache, we simply notify local users that a change in the remote user's devices has happened. I think this is safe as if the local user requests the keys for the remote user and we don't have them we'll simply fetch them as normal.