Commit Graph

8844 Commits

Author SHA1 Message Date
Richard van der Hoff
51d33d5178
Merge pull request #3961 from matrix-org/neilj/lock_mau_upserts
fix #3854 MAU transaction errors
2018-09-27 12:31:37 +01:00
Richard van der Hoff
333bee27f5 Include event when resolving state for missing prevs
If we have a forward extremity for a room as `E`, and you receive `A`, `B`,
s.t. `A -> B -> E`, and `B` also points to an unknown event `X`, then we need
to do state res between `X` and `E`.

When that happens, we need to make sure we include `X` in the state that goes
into the state res alg.

Fixes #3934.
2018-09-27 11:37:39 +01:00
Richard van der Hoff
bd61c82bdf Include state from remote servers in pdu handling
If we've fetched state events from remote servers in order to resolve the state
for a new event, we need to actually pass those events into
resolve_events_with_factory (so that it can do the state res) and then persist
the ones we need - otherwise other bits of the codebase get confused about why
we have state groups pointing to non-existent events.
2018-09-27 11:37:39 +01:00
Richard van der Hoff
a215b698c4 Fix "unhashable type: 'list'" exception in federation handling
get_state_groups returns a map from state_group_id to a list of FrozenEvents,
so was very much the wrong thing to be putting as one of the entries in the
list passed to resolve_events_with_factory (which expects maps from
(event_type, state_key) to event id).

We actually want get_state_groups_ids().values() rather than
get_state_groups().

This fixes the main problem in #3923, but there are other problems with this
bit of code which get discovered once you do so.
2018-09-27 11:37:39 +01:00
Richard van der Hoff
28223841e0 more comments 2018-09-27 11:31:51 +01:00
Richard van der Hoff
e3c159863d Clarifications in FederationHandler
* add some comments on things that look a bit bogus
* rename this `state` variable to avoid confusion with the `state` used
  elsewhere in this function. (There was no actual conflict, but it was
  a confusing bit of spaghetti.)
2018-09-27 11:31:51 +01:00
Richard van der Hoff
92abd3d6d6
Merge pull request #3966 from matrix-org/rav/rx_txn_logging_2
Logging improvements
2018-09-27 11:27:46 +01:00
Richard van der Hoff
4a15a3e4d5
Include eventid in log lines when processing incoming federation transactions (#3959)
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.

This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
2018-09-27 11:25:34 +01:00
Richard van der Hoff
ae6ad4cf41
docstrings and unittests for storage.state (#3958)
I spent ages trying to figure out how I was going mad...
2018-09-27 11:22:25 +01:00
Richard van der Hoff
e70b4ce069 Logging improvements
Some logging tweaks to help with debugging incoming federation transactions
2018-09-26 17:36:14 +01:00
Richard van der Hoff
a87d419a85 Run notify_app_services as a bg process
This ensures that its resource usage metrics get recorded somewhere rather than
getting lost.

(It also fixes an error when called from a nested logging context which
completes before the bg process)
2018-09-26 17:00:40 +01:00
Richard van der Hoff
9453c65948 remove spurious federation checks on localhost
There's really no point in checking for destinations called "localhost" because
there is nothing stopping people creating other DNS entries which point to
127.0.0.1. The right fix for this is
https://github.com/matrix-org/synapse/issues/3953.

Blocking localhost, on the other hand, means that you get a surprise when
trying to connect a test server on localhost to an existing server (with a
'normal' server_name).
2018-09-26 16:53:52 +01:00
Richard van der Hoff
607eec0456 fix docstring for FederationClient.get_state_for_room
trivial fixes for docstring
2018-09-26 16:52:24 +01:00
Neil Johnson
28781b65e7 fix #3854 2018-09-26 16:16:41 +01:00
Richard van der Hoff
8afddf7afe Fix error handling for missing auth_event
When we were authorizing an event, if there was no `m.room.create` in its
auth_events, we would raise a SynapseError with a cryptic message, which then
meant that we would bail out of processing any incoming events, rather than
storing a rejection for the faulty event and moving on.

We should treat the absent event the same as any other auth failure, by
raising an AuthError, so that the event is marked as rejected.
2018-09-26 14:40:16 +01:00
Richard van der Hoff
4e8276a34a
Merge pull request #3956 from matrix-org/rav/fix_expiring_cache_len
Fix ExpiringCache.__len__ to be accurate
2018-09-26 13:10:13 +01:00
Richard van der Hoff
5b4028fa78 Merge branch 'rav/fix_expiring_cache_len' into erikj/destination_retry_cache 2018-09-26 12:55:53 +01:00
Richard van der Hoff
7ee94fc1ba Log which cache is throwing exceptions 2018-09-26 12:43:08 +01:00
Amber Brown
66a1d57adb
Merge pull request #3948 from matrix-org/rav/no_symlink_synctl
Move synctl into top dir to avoid a symlink
2018-09-26 21:41:58 +10:00
Amber Brown
c2185f14d7
Merge pull request #3924 from matrix-org/rav/clean_up_on_receive_pdu
Comments and interface cleanup for on_receive_pdu
2018-09-26 21:41:26 +10:00
Erik Johnston
3baf6e1667 Fix ExpiringCache.__len__ to be accurate
It used to try and produce an estimate, which was sometimes negative.
This caused metrics to be sad, so lets always just calculate it from
scratch.

(This appears to have been a longstanding bug, but one which has been made more
of a problem by #3932 and #3933).

(This was originally done by Erik as part of #3933. I'm cherry-picking it
because really it's a fix in its own right)
2018-09-26 12:32:29 +01:00
Richard van der Hoff
a1cd37390f Merge remote-tracking branch 'origin/develop' into erikj/destination_retry_cache 2018-09-25 12:03:54 +01:00
Richard van der Hoff
4c3e7eeec5
Merge pull request #3932 from matrix-org/erikj/auto_start_expiring_caches
Fix some instances of ExpiringCache not expiring cache items
2018-09-25 12:02:57 +01:00
Jérémy Farnaud
6cf261930a added "media-src: 'self'" to CSP for resources (#3578)
Synapse doesn’t allow for media resources to be played directly from
Chrome. It is a problem for users on other networks (e.g. IRC)
communicating with Matrix users through a gateway. The gateway sends
them the raw URL for the resource when a Matrix user uploads a video
and the video cannot be played directly in Chrome using that URL.

Chrome argues it is not authorized to play the video because of the
Content Security Policy. Chrome checks for the "media-src" policy which
is missing, and defauts to the "default-src" policy which is "none".

As Synapse already sends "object-src: 'self'" I thought it wouldn’t be
a problem to add "media-src: 'self'" to the CSP to fix this problem.
2018-09-25 11:55:02 +01:00
Richard van der Hoff
94f7befc31
Merge pull request #3925 from matrix-org/erikj/fix_producers_unregistered
Fix spurious exceptions when client closes conncetion
2018-09-25 11:52:06 +01:00
Richard van der Hoff
c53336986d Move synctl into top dir to avoid a symlink
symlinks apparently break setuptools on python3 and alpine
(https://bugs.python.org/issue31940), so let's stop using a symlink and just
use the file directly.
2018-09-25 11:19:27 +01:00
Richard van der Hoff
a9d84f4e44 We require attrs 16.0.0
Ref: https://github.com/matrix-org/synapse/issues/3945
2018-09-25 10:43:39 +01:00
Matthew Hodgson
787d22ed6c
Only lazy load self-members on initial sync
Given we have disabled lazy loading for incr syncs in #3840, we can make self-LL more efficient by only doing it on initial sync.  Also adds a bounds check for if/when we change our mind, so that we don't try to include LL members on sync responses with no timeline.
2018-09-25 00:49:26 +01:00
Amber Brown
fbe5ba25f6 Merge branch 'master' into develop 2018-09-25 03:10:01 +10:00
Amber Brown
6b6cb32297 bump version 2018-09-25 02:54:34 +10:00
Amber Brown
04eed80a73 Merge branch 'master' into develop 2018-09-24 23:42:25 +10:00
Amber Brown
e302f40e20 update version 2018-09-24 23:40:05 +10:00
Erik Johnston
19dc676d1a Fix ExpiringCache.__len__ to be accurate
It used to try and produce an estimate, which was sometimes negative.
This caused metrics to be sad, so lets always just calculate it from
scratch.
2018-09-21 16:25:42 +01:00
Erik Johnston
fdd1a62e8d Add a five minute cache to get_destination_retry_timings
Hopefully helps with #3931
2018-09-21 14:56:12 +01:00
Erik Johnston
79eded1ae4 Make ExpiringCache slightly more performant 2018-09-21 14:52:21 +01:00
Erik Johnston
8601c24287 Fix some instances of ExpiringCache not expiring cache items
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.

This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
2018-09-21 14:19:46 +01:00
Erik Johnston
ad53a5497d
Merge pull request #3927 from matrix-org/erikj/handle_background_errors
Handle exceptions thrown by background tasks
2018-09-21 09:26:30 +01:00
Matthew Hodgson
a2ddaa90f2
Always LL ourselves if we're in a room to simplify clients (#3916)
Should fix https://github.com/vector-im/riot-web/issues/7209
2018-09-20 21:21:54 +01:00
Erik Johnston
94ae1dea3c Add missing logger 2018-09-20 17:05:34 +01:00
Erik Johnston
9ea408441f Handle exceptions thrown by background tasks
Fixes #3921
2018-09-20 16:15:21 +01:00
Erik Johnston
b28a7ed503 Fix spurious exceptions when client closes conncetion
If a HTTP handler throws an exception while processing a request we
automatically write a JSON error response. If the handler had already
started writing a response twisted throws an exception.

We should check for this case and simple abort the connection if there
was an error after the response had started being written.
2018-09-20 13:44:20 +01:00
Neil Johnson
23b53b4ef8
Merge pull request #3868 from matrix-org/neilj/fix_room_invite_mail_links
Neilj/fix room invite mail links
2018-09-20 13:32:38 +01:00
Richard van der Hoff
703de4ec13 Comments and interface cleanup for on_receive_pdu
Add some informative comments about what's going on here.

Also, `sent_to_us_directly` and `get_missing` were doing the same thing (apart
from in `_handle_queued_pdus`, which looks like a bug), so let's get rid of
`get_missing` and use `sent_to_us_directly` consistently.
2018-09-20 13:06:55 +01:00
Amber Brown
1f3f5fcf52
Fix client IPs being broken on Python 3 (#3908) 2018-09-20 20:14:34 +10:00
Erik Johnston
3fd68d533b
Merge pull request #3914 from matrix-org/erikj/remove_retry_cache
Remove get_destination_retry_timings cache
2018-09-20 10:54:49 +01:00
Amber Brown
aeca5a5ed5
Add a regression test for logging on failed connections (#3912) 2018-09-20 16:28:18 +10:00
Richard van der Hoff
642199570c
Improve the logging when handling a federation transaction (#3904)
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:

- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
  them into other, more cryptic, exceptions.
2018-09-19 17:28:18 +01:00
Erik Johnston
ce846bb620 Merge branch 'develop' of github.com:matrix-org/synapse into erikj/faster_typing 2018-09-19 15:08:36 +01:00
Erik Johnston
bbab6ebfd9 Fix up changelog and remove spurious comment 2018-09-19 14:45:14 +01:00
Erik Johnston
392a54128c pep8 2018-09-19 14:37:49 +01:00
Erik Johnston
b9158ac2bf Remove get_destination_retry_timings cache
Currently we rely on the master to invalidate this cache promptly.
However, after having moved most federation endpoints off of master this
no longer happens, causing outbound fedeariont to get blackholed.

Fixes #3798
2018-09-19 14:22:57 +01:00
Erik Johnston
80d2d50f47 Fixup 2018-09-19 11:19:47 +01:00
Erik Johnston
9407bcf37a Replace custom DeferredTimeoutError with defer.TimeoutError 2018-09-19 11:07:29 +01:00
Erik Johnston
6c48aa0256 Run canceller first to allow it to generate correct error 2018-09-19 11:07:27 +01:00
Erik Johnston
a334e1cace Update to use new timeout function everywhere.
The existing deferred timeout helper function (and the one into twisted)
suffer from a bug when a deferred's canceller throws an exception, #3842.

The new helper function doesn't suffer from this problem.
2018-09-19 10:39:40 +01:00
Amber Brown
47c02e6332
Merge pull request #3909 from turt2live/travis/fix-logging-1
Fix matrixfederationclient.py logging: Destination is a string
2018-09-19 18:14:47 +10:00
Amber Brown
3d6b24fb1b
Merge pull request #3907 from matrix-org/rav/set_sni_to_server_name
Set SNI to the server_name, not whatever was in the SRV record
2018-09-19 17:59:33 +10:00
Amber Brown
f773ecbd61
Merge pull request #3903 from matrix-org/rav/increase_get_missing_events_timeout
Bump timeout on get_missing_events request
2018-09-19 17:57:48 +10:00
Travis Ralston
35aec19f0a Destination is a string 2018-09-18 15:29:30 -06:00
Richard van der Hoff
38ead946a9 Merge remote-tracking branch 'origin/develop' into neilj/fix_room_invite_mail_links 2018-09-18 19:02:45 +01:00
Richard van der Hoff
a219ce8726
Use directory server for room joins (#3899)
When we do a join, always try the server we used for the alias lookup first.

Fixes #2418
2018-09-18 18:27:37 +01:00
Richard van der Hoff
31c15dcb80
Refactor matrixfederationclient to fix logging (#3906)
We want to wait until we have read the response body before we log the request
as complete, otherwise a confusing thing happens where the request appears to
have completed, but we later fail it.

To do this, we factor the salient details of a request out to a separate
object, which can then keep track of the txn_id, so that it can be logged.
2018-09-18 18:17:15 +01:00
Amber Brown
c600886d47
Merge pull request #3894 from matrix-org/hs/phone_home_py_version
Add python_version phone home stat
2018-09-19 02:40:04 +10:00
Richard van der Hoff
b3097396e7 Set SNI to the server_name, not whatever was in the SRV record
Fixes #3843
2018-09-18 17:01:12 +01:00
Richard van der Hoff
550007cb0e Bump timeout on get_missing_events request 2018-09-18 15:02:51 +01:00
Richard van der Hoff
286d6930b7
Merge pull request #3879 from matrix-org/matthew/fix-autojoin
don't ratelimit autojoins
2018-09-18 13:07:01 +01:00
Richard van der Hoff
1e09a1d48a
Merge pull request #3889 from matrix-org/rav/404_on_remove_unknown_alias
Return a 404 when deleting unknown room alias
2018-09-18 12:59:30 +01:00
Will Hunt
5baa087312
typo 2018-09-17 17:37:56 +01:00
Will Hunt
b58714789f
make pip happy? 2018-09-17 17:35:54 +01:00
Richard van der Hoff
ac80cb08fe Fix more b'abcd' noise in metrics 2018-09-17 17:16:50 +01:00
Will Hunt
9a1cceeca9
Use a string for versions 2018-09-17 17:09:06 +01:00
Richard van der Hoff
f75b9961c6 Reinstate missing null check 2018-09-17 16:52:02 +01:00
Will Hunt
2b39494cd5
Add python_version phone home stat 2018-09-17 16:35:18 +01:00
Richard van der Hoff
f00a9d2636 Fix some b'abcd' noise in logs and metrics
Python 3 compatibility: make sure that we decode some byte sequences before we
use them to create log lines and metrics labels.
2018-09-17 16:15:42 +01:00
Amber Brown
fe88907d04 version 2018-09-17 22:33:22 +10:00
Richard van der Hoff
85a43f4167 Return a 404 when deleting unknown room alias
As per https://github.com/matrix-org/matrix-doc/issues/1675

Fixes https://github.com/matrix-org/synapse/issues/2782
2018-09-17 13:19:00 +01:00
Matthew Hodgson
d42d79e3c3 don't ratelimit autojoins 2018-09-15 22:27:41 +01:00
Erik Johnston
24efb2a70d Fix timeout function
Turns out deferred.cancel sometimes throws, so we do that last to ensure
that we always do resolve the new deferred.
2018-09-15 11:38:39 +01:00
Erik Johnston
fcfe7a850d Add an awful secondary timeout to fix wedged requests
This is an attempt to mitigate #3842 by adding yet-another-timeout
2018-09-14 19:23:07 +01:00
Matthew Hodgson
024be6cf18
don't filter membership events based on history visibility (#3874)
don't filter membership events based on history visibility
as we will already have filtered the messages in the timeline, and state events
are always visible.

and because @erikjohnston said so.
2018-09-14 18:12:52 +01:00
Erik Johnston
3e6e94fe9f
Merge pull request #3872 from matrix-org/hawkowl/timeouts-2
timeouts 2: electric boogaloo
2018-09-14 16:58:44 +01:00
Amber Brown
bc9af88a2d fix 2018-09-15 00:26:00 +10:00
Erik Johnston
d0f6c1ce21 Remove spurious comment 2018-09-14 15:12:36 +01:00
Erik Johnston
9e2f9a7b57 Measure outbound requests 2018-09-14 15:11:26 +01:00
Erik Johnston
0a81038ea0 Add in flight real time metrics for Measure blocks 2018-09-14 15:08:37 +01:00
Neil Johnson
7de1989ea2 fix link for case that config.email_riot_base_url is set 2018-09-13 22:43:50 +01:00
Amber Brown
c971aa7b9d fix 2018-09-14 03:57:02 +10:00
Amber Brown
8f08d848f5 fix 2018-09-14 03:53:56 +10:00
Travis Ralston
f1a7264663
Fix minor typo in exception 2018-09-13 11:51:12 -06:00
Amber Brown
7c33ab76da redact better 2018-09-14 03:45:34 +10:00
Amber Brown
63755fa4c2 we do that higher up 2018-09-14 03:21:47 +10:00
Amber Brown
73884ebac5 Merge remote-tracking branch 'origin/develop' into hawkowl/timeouts-2 2018-09-14 03:11:25 +10:00
Amber Brown
7c27c4d51c
merge (#3576) 2018-09-14 03:11:11 +10:00
Amber Brown
1c3f4d9ca5 buffer? 2018-09-14 03:09:13 +10:00
Erik Johnston
6c0f8d9d50
Merge pull request #3856 from matrix-org/erikj/speed_up_purge
Make purge history slightly faster
2018-09-13 16:14:46 +01:00
Erik Johnston
ed5331a627 comment 2018-09-13 16:10:56 +01:00
Erik Johnston
89a76d1889 Fix handling of redacted events from federation
If we receive an event that doesn't pass their content hash check (e.g.
due to already being redacted) then we hit a bug which causes an
exception to be raised, which then promplty stops the event (and
request) from being processed.

This effects all sorts of federation APIs, including joining rooms with
a redacted state event.
2018-09-13 15:44:12 +01:00
Amber Brown
bfa0b759e0
Attempt to figure out what's going on with timeouts (#3857) 2018-09-14 00:15:51 +10:00
Erik Johnston
9cbd0094f0 pep8 2018-09-13 15:15:35 +01:00
Erik Johnston
9dbe38ea7d Create indices after insertion 2018-09-13 15:05:52 +01:00