* Correctly retry and back off if we get a HTTPerror response
* Refactor request sending to have better excpetions
MatrixFederationHttpClient blindly reraised exceptions to the caller
without differentiating "expected" failures (e.g. connection timeouts
etc) versus more severe problems (e.g. programming errors).
This commit adds a RequestSendFailed exception that is raised when
"expected" failures happen, allowing the TransactionQueue to log them as
warnings while allowing us to log other exceptions as actual exceptions.
prometheus_client 0.5 has a named-tuple Sample type with more member
than the old plain tuple had. This commit makes sure the unit test
detects this and changes the way it reads the sample.
Signed-off-by: Maarten de Vries <maarten@de-vri.es>
Allow for the creation of a support user.
A support user can access the server, join rooms, interact with other users, but does not appear in the user directory nor does it contribute to monthly active user limits.
This implements both a SAML2 metadata endpoint (at
`/_matrix/saml2/metadata.xml`), and a SAML2 response receiver (at
`/_matrix/saml2/authn_response`). If the SAML2 response matches what's been
configured, we complete the SSO login flow by redirecting to the client url
(aka `RelayState` in SAML2 jargon) with a login token.
What we don't yet have is anything to build a SAML2 request and redirect the
user to the identity provider. That is left as an exercise for the reader.
This is mostly factoring out the post-CAS-login code to somewhere we can reuse
it for other SSO flows, but it also fixes the userid mapping while we're at it.
* Rip out half-implemented m.login.saml2 support
This was implemented in an odd way that left most of the work to the client, in
a way that I really didn't understand. It's going to be a pain to maintain, so
let's start by ripping it out.
* drop undocumented dependency on dateutil
It turns out we were relying on dateutil being pulled in transitively by
pysaml2. There's no need for that bloat.
* Add better diagnostics to flakey keyring test
* fix interpolation fail
* Check logcontexts before and after each test
* update changelog
* update changelog
* Some words about garbage collections and logcontexts
* Do a GC after each test to fix logcontext leaks
This feels like an awful hack, but...
* changelog
Currently when fetching state groups from the data store we make two
hits two the database: once for members and once for non-members (unless
request is filtered to one or the other). This adds needless load to the
datbase, so this PR refactors the lookup to make only a single database
hit.
Debug tests
Try printing the channel
fix
Import and use six
Remove debugging
Disable captcha
Add some mocks
Define the URL
Fix the clock?
Less rendering?
use the other render
Complete the dummy auth stage
Fix last stage of the test
Remove mocks we don't need
Fixes a bug introduced in https://github.com/matrix-org/synapse/pull/1783 which
meant that single backslashes were not allowed in event field filters.
The intention here is to allow single-backslashes, but disallow
double-backslashes.
Broadly three things here:
* disable W504 which seems a bit whacko
* remove a bunch of `as e` expressions from exception handlers that don't use
them
* use `r""` for strings which include backslashes
Also, we don't use pep8 any more, so we can get rid of the duplicate config
there.
This test didn't do what it claimed to do, and what it claimed to do was the
same as test_cant_hide_direct_ancestors anyway.
This stuff is tested by sytest anyway.
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.
This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
Older Twisted (18.4.0) returns TimeoutError instead of
ConnectingCancelledError when connection times out.
This change allows tests to be compatible with this behaviour.
Signed-off-by: Oleg Girko <ol@infoserver.lv>
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.
This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
We want to wait until we have read the response body before we log the request
as complete, otherwise a confusing thing happens where the request appears to
have completed, but we later fail it.
To do this, we factor the salient details of a request out to a separate
object, which can then keep track of the txn_id, so that it can be logged.
Splits the state_group_cache in two.
One half contains normal state events; the other contains member events.
The idea is that the lazyloading common case of: "I want a subset of member events plus all of the other state" can be accomplished efficiently by splitting the cache into two, and asking for "all events" from the non-members cache, and "just these keys" from the members cache. This means we can avoid having to make DictionaryCache aware of these sort of complicated queries, whilst letting LL requests benefit from the caching.
Previously we were unable to sensibly use the caching and had to pull all state from the DB irrespective of the filtering, which made things slow. Hopefully fixes https://github.com/matrix-org/synapse/issues/3720.
This is the first tranche of support for room versioning. It includes:
* setting the default room version in the config file
* new room_version param on the createRoom API
* storing the version of newly-created rooms in the m.room.create event
* fishing the version of existing rooms out of the m.room.create event
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
This is more involved than it might otherwise be, because the current
implementation just drops its logcontexts and runs everything in the sentinel
context.
It turns out that we aren't actually using a bunch of the functionality here
(notably suppress_failures and the fact that Distributor.fire returns a
deferred), so the easiest way to fix this is actually by simplifying a bunch of
code.
Newer syntax
attr.ib(factory=dict)
is just a syntactic sugar for
attr.ib(default=attr.Factory(dict))
It was introduced in newest version of attrs package (18.1.0)
and doesn't work with older versions.
We should either require minimum version of attrs to be 18.1.0,
or use older (slightly more verbose) syntax.
Requiring newest version is not a good solution because
Linux distributions may have older version of attrs (17.4.0 in Fedora 28),
and requiring to build (and package)
newer version just to use newer syntactic sugar in only one test
is just too much.
It's much better to fix that test to use older syntax.
Signed-off-by: Oleg Girko <ol@infoserver.lv>
We need to do a bit more validation when we get a server name, but don't want
to be re-doing it all over the shop, so factor out a separate
parse_and_validate_server_name, and do the extra validation.
Also, use it to verify the server name in the config file.
a61738b removed a call to run_on_reactor from a unit test, but that call was
doing something useful, in making the function in question asynchronous.
Reinstate the call and add a check that we are testing what we wanted to be
testing.
Make sure that server_names used in auth headers are sane, and reject them with
a sensible error code, before they disappear off into the depths of the system.
When _get_state_for_groups is given a wildcard filter, just do a complete
lookup. Hopefully this will give us the best of both worlds by not filling up
the ram if we only need one or two keys, but also making the cache still work
for the federation reader usecase.
This is only used by filter_events_for_client, so we can simplify the whole
thing by just doing one user at a time, and removing a dead storage function to
boot.
The transaction cache has some code which tries to stop it caching failures,
but if the callback function failed straight away, then things would happen
backwards and we'd end up with the failure stuck in the cache.
This simplifies things as it is, but will also allow us to change the
way we traverse topologically without having to update the way push
actions work.
So, it turns out that if you have a first `Deferred` `D1`, you can add a
callback which returns another `Deferred` `D2`, and `D2` must then complete
before any further callbacks on `D1` will execute (and later callbacks on `D1`
get the *result* of `D2` rather than `D2` itself).
So, `D1` might have `called=True` (as in, it has started running its
callbacks), but any new callbacks added to `D1` won't get run until `D2`
completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield`
will 'block'.
In conclusion: some of our assumptions in `logcontext` were invalid. We need to
make sure that we don't optimise out the logcontext juggling when this
situation happens. Fortunately, it is easy to detect by checking `D1.paused`.
This closes#2602
v1auth was created to account for the differences in status code between
the v1 and v2_alpha revisions of the protocol (401 vs 403 for invalid
tokens). However since those protocols were merged, this makes the r0
version/endpoint internally inconsistent, and violates the
specification for the r0 endpoint.
This might break clients that rely on this inconsistency with the
specification. This is said to affect the legacy angular reference
client. However, I feel that restoring parity with the spec is more
important. Either way, it is critical to inform developers about this
change, in case they rely on the illegal behaviour.
Signed-off-by: Adrian Tschira <nota@notafile.com>
In most cases, we limit the number of prev_events for a given event to 10
events. This fixes a particular code path which created events with huge
numbers of prev_events.
This is a mixed commit that fixes various small issues
* print parentheses
* 01 is invalid syntax (it was octal in py2)
* [x for i in 1, 2] is invalid syntax
* six moves
Signed-off-by: Adrian Tschira <nota@notafile.com>
These worked accidentally before (python2 doesn't complain if you
compare incompatible types) but under py3 this blows up spectacularly
Signed-off-by: Adrian Tschira <nota@notafile.com>
Doing this I learned e.message was pretty shortlived, added in 2.6,
they realized it was a bad idea and deprecated it in 2.7
Signed-off-by: Adrian Tschira <nota@notafile.com>
Currently the handling of auto_join_rooms only works when a user
registers itself via public register api. Registrations via
registration_shared_secret and ModuleApi do not work
This auto_joins the users in the registration handler which enables
the auto join feature for all 3 registration paths.
This is related to issue #2725
Signed-Off-by: Matthias Kesler <krombel@krombel.de>
* Split state group persist into seperate storage func
* Add per database engine code for state group id gen
* Move store_state_group to StateReadStore
This allows other workers to use it, and so resolve state.
* Hook up store_state_group
* Fix tests
* Rename _store_mult_state_groups_txn
* Rename StateGroupReadStore
* Remove redundant _have_persisted_state_group_txn
* Update comments
* Comment compute_event_context
* Set start val for state_group_id_seq
... otherwise we try to recreate old state groups
* Update comments
* Don't store state for outliers
* Update comment
* Update docstring as state groups are ints
We extract the storage-independent bits of the state group resolution out to a
separate functiom, and stick it in a new handler, in preparation for its use
from the storage layer.
... instead of creating our own special SQLiteMemoryDbPool, whose purpose was a
bit of a mystery.
For some reason this makes one of the tests run slightly slower, so bump the
sleep(). Sorry.
Add federation_domain_whitelist
gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
Twisted core doesn't have a general purpose one, so we need to write one
ourselves.
Features:
- All writing happens in background thread
- Supports both push and pull producers
- Push producers get paused if the consumer falls behind
It turns out that the only thing we use the __dict__ of LoggingContext for is
`request`, and given we create lots of LoggingContexts and then copy them every
time we do a db transaction or log line, using the __dict__ seems a bit
redundant. Let's try to optimise things by making the request attribute
explicit.
I'm still unclear on what the intended behaviour for
`verify_json_objects_for_server` is, but at least I now understand the
behaviour of most of the things it calls...
Most of the time was spent copying a dict to filter out sentinel values
that indicated that keys did not exist in the dict. The sentinel values
were added to ensure that we cached the non-existence of keys.
By updating DictionaryCache to keep track of which keys were known to
not exist itself we can remove a dictionary copy.
The cache wrappers had a habit of leaking the logcontext into the reactor while
the lookup function was running, and then not restoring it correctly when the
lookup function had completed. It's all the fault of
`preserve_context_over_{fn,deferred}` which are basically a bit broken.
Due to a failure to instantiate DeferredTimedOutError, time_bound_deferred
would throw a CancelledError when the deferred timed out, which was rather
confusing.
The `@cached` decorator on `KeyStore._get_server_verify_key` was missing
its `num_args` parameter, which meant that it was returning the wrong key for
any server which had more than one recorded key.
By way of a fix, change the default for `num_args` to be *all* arguments. To
implement that, factor out a common base class for `CacheDescriptor` and `CacheListDescriptor`.
Fix a bug in ``logcontext.preserve_fn`` which made it leak context into the
reactor, and add a test for it.
Also, get rid of ``logcontext.reset_context_after_deferred``, which tried to do
the same thing but had its own, different, set of bugs.
This was broken when device list updates were implemented, as Mailer
could no longer instantiate an AuthHandler due to a dependency on
federation sending.
Instead of calculating the size of the cache repeatedly, which can take
a long time now that it can use a callback, instead cache the size and
update that on insertion and deletion.
This requires changing the cache descriptors to have two caches, one for
pending deferreds and the other for the actual values. There's no reason
to evict from the pending deferreds as they won't take up any more
memory.
The old test expected an incorrect wrapping due to the preview function
not using unicode properly, so it got the wrong length.
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
We might as well treat all refresh_tokens as invalid. Just return a 403 from
/tokenrefresh, so that we don't have a load of dead, untestable code hanging
around.
Still TODO: removing the table from the schema.
The 'time' caveat on the access tokens was something of a lie, since we weren't
enforcing it; more pertinently its presence stops us ever adding useful time
caveats.
Let's move in the right direction by not lying in our caveats.
Since we're not doing refresh tokens any more, we should start killing off the
dead code paths. /tokenrefresh itself is a bit of a thornier subject, since
there might be apps out there using it, but we can at least not generate
refresh tokens on new logins.
Allows delegating the password auth to an external module. This also
moves the LDAP auth to using this system, allowing it to be removed from
the synapse tree entirely in the future.
Some streams will occaisonally advance their positions without actually
having any new rows to send over federation. Currently this means that
the token will not advance on the workers, leading to them repeatedly
sending a slightly out of date token. This in turns requires the master
to hit the DB to check if there are any new rows, rather than hitting
the no op logic where we check if the given token matches the current
token.
This commit changes the API to always return an entry if the position
for a stream has changed, allowing workers to advance their tokens
correctly.
This is for two reasons:
1. Suppresses duplicates correctly, as the notifier doesn't do any
duplicate suppression.
2. Makes it easier to connect the AppserviceHandler to the replication
stream.
This includes:
- Splitting out methods of a class into stand alone functions, to make
them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
In the situation where all of a user's devices get deleted, we want to
indicate this to a client, so we want to return an empty dictionary, rather
than nothing at all.
for the email and http pushers rather than trying to make a single
method that will work with their conflicting requirements.
The http pusher needs to get the messages in ascending stream order, and
doesn't want to miss a message.
The email pusher needs to get the messages in descending timestamp order,
and doesn't mind if it misses messages.
1. Give the handler used for logging in unit tests a formatter, so that the
output is slightly more meaningful
2. Log some synapse.storage stuff, because it's useful.
A bit of a cleanup for background_updates, and make sure that the real
background updates have run before we start the unit tests, so that they don't
interfere with the tests.
implement a GET /devices endpoint which lists all of the user's devices.
It also returns the last IP where we saw that device, so there is some dancing
to fish that out of the user_ips table.
Record the device_id when we add a client ip; it's somewhat redundant as we
could get it via the access_token, but it will make querying rather easier.
This doesn't cover *all* of the registration flows, but it does cover the most
common ones: in particular: shared_secret registration, appservice
registration, and normal user/pass registration.
Pull device_id from the registration parameters. Register the device in the
devices table. Associate the device with the returned access and refresh
tokens. Profit.
* `RegistrationHandler.appservice_register` no longer issues an access token:
instead it is left for the caller to do it. (There are two of these, one in
`synapse/rest/client/v1/register.py`, which now simply calls
`AuthHandler.issue_access_token`, and the other in
`synapse/rest/client/v2_alpha/register.py`, which is covered below).
* In `synapse/rest/client/v2_alpha/register.py`, move the generation of
access_tokens into `_create_registration_details`. This means that the normal
flow no longer needs to call `AuthHandler.issue_access_token`; the
shared-secret flow can tell `RegistrationHandler.register` not to generate a
token; and the appservice flow continues to work despite the above change.
This is meant to be an *almost* non-functional change, with the exception that
it fixes what looks a lot like a bug in that it only calls
`auth_handler.add_threepid` and `add_pusher` once instead of three times.
The idea is to move the generation of the `access_token` out of
`registration_handler.register`, because `access_token`s now require a
device_id, and we only want to generate a device_id once registration has been
successful.
Add a 'devices' table to the storage, as well as a 'device_id' column to
refresh_tokens.
Allow the client to pass a device_id, and initial_device_display_name, to
/login. If login is successful, then register the device in the devices table
if it wasn't known already. If no device_id was supplied, make one up.
Associate the device_id with the access token and refresh token, so that we can
get at it again later. Ensure that the device_id is copied from the refresh
token to the access_token when the token is refreshed.
Use the pure-python ldap3 library, which eliminates the need for a
system dependency.
Offer both a `search` and `simple_bind` mode, for more sophisticated
ldap scenarios.
- `search` tries to find a matching DN within the `user_base` while
employing the `user_filter`, then tries the bind when a single
matching DN was found.
- `simple_bind` tries the bind against a specific DN by combining the
localpart and `user_base`
Offer support for STARTTLS on a plain connection.
The configuration was changed to reflect these new possibilities.
Signed-off-by: Martin Weinelt <hexa@darmstadt.ccc.de>
Renames ``load_config`` to ``load_or_generate_config``
Adds a method called ``load_config`` that just loads the
config.
The main synapse.app.homeserver will continue to use
``load_or_generate_config`` to retain backwards compat.
However new worker processes can use ``load_config`` to
load the config avoiding some of the cruft needed to generate
the config.
As the new ``load_config`` method is expected to be used by new
configs it removes support for the legacy commandline overrides
that ``load_or_generate_config`` supports
We change it so that each cache has an individual CacheMetric, instead
of having one global CacheMetric. This means that when a cache tries to
increment a counter it does not need to go through so many indirections.
* Add infrastructure to the presence handler to track sync requests in external processes
* Expire stale entries for dead external processes
* Add an http endpoint for making users as syncing
Add some docstrings and comments.
* Fixes
Access it directly from the homeserver itself. It already wasn't
inheriting from BaseHandler storing it on the Handlers object was
already somewhat dubious.