Instead of parsing the full response to `/send_join` into Python objects (which can be huge for large rooms) and *then* parsing that into events, we instead use ijson to stream parse the response directly into `EventBase` objects.
Part of #9744
Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
### Changes proposed in this PR
- Add support for the `no_proxy` and `NO_PROXY` environment variables
- Internally rely on urllib's [`proxy_bypass_environment`](bdb941be42/Lib/urllib/request.py (L2519))
- Extract env variables using urllib's `getproxies`/[`getproxies_environment`](bdb941be42/Lib/urllib/request.py (L2488)) which supports lowercase + uppercase, preferring lowercase, except for `HTTP_PROXY` in a CGI environment
This does contain behaviour changes for consumers so making sure these are called out:
- `no_proxy`/`NO_PROXY` is now respected
- lowercase `https_proxy` is now allowed and taken over `HTTPS_PROXY`
Related to #9306 which also uses `ProxyAgent`
Signed-off-by: Timothy Leung tim95@hotmail.co.uk
This reduces the memory usage of previewing media files which
end up larger than the `max_spider_size` by avoiding buffering
content internally in treq.
It also checks the `Content-Length` header in additional places
instead of streaming the content to check the body length.
- Update black version to the latest
- Run black auto formatting over the codebase
- Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
- Update `code_style.md` docs around installing black to use the correct version
There are going to be a couple of paths to get to the final step of SSO reg, and I want the URL in the browser to consistent. So, let's move the final step onto a separate path, which we redirect to.
A reactor was being passed instead of a whitelist for the BlacklistingAgentWrapper
used by the WellyKnownResolver. This coulld cause exceptions when attempting to
connect to IP addresses that are blacklisted, but in reality this did not have any
observable affect since this code is not used for IP literals.
SynapseRequest is in danger of becoming a bit of a dumping-ground for "useful stuff relating to Requests",
which isn't really its intention (its purpose is to override render, finished and connectionLost to set up the
LoggingContext and write the right entries to the request log).
Putting utility functions inside SynapseRequest means that lots of our code ends up requiring a
SynapseRequest when there is nothing synapse-specific about the Request at all, and any old
twisted.web.iweb.IRequest will do. This increases code coupling and makes testing more difficult.
In short: move get_user_agent out to a utility function.
Replaces the `federation_ip_range_blacklist` configuration setting with an
`ip_range_blacklist` setting with wider scope. It now applies to:
* Federation
* Identity servers
* Push notifications
* Checking key validitity for third-party invite events
The old `federation_ip_range_blacklist` setting is still honored if present, but
with reduced scope (it only applies to federation and identity servers).
We do it this way round so that only the "owner" can delete the access token (i.e. `/logout/all` by the "owner" also deletes that token, but `/logout/all` by the "target user" doesn't).
A future PR will add an API for creating such a token.
When the target user and authenticated entity are different the `Processed request` log line will be logged with a: `{@admin:server as @bob:server} ...`. I'm not convinced by that format (especially since it adds spaces in there, making it harder to use `cut -d ' '` to chop off the start of log lines). Suggestions welcome.
Not being able to serialise `frozendicts` is fragile, and it's annoying to have
to think about which serialiser you want. There's no real downside to
supporting frozendicts, so let's just have one json encoder.
This allows trailing commas in multi-line arg lists.
Minor, but we might as well keep our formatting current with regard to
our minimum supported Python version.
Signed-off-by: Dan Callahan <danc@element.io>
* Remove `on_timeout_cancel` from `timeout_deferred`
The `on_timeout_cancel` param to `timeout_deferred` wasn't always called on a
timeout (in particular if the canceller raised an exception), so it was
unreliable. It was also only used in one place, and to be honest it's easier to
do what it does a different way.
* Fix handling of connection timeouts in outgoing http requests
Turns out that if we get a timeout during connection, then a different
exception is raised, which wasn't always handled correctly.
To fix it, catch the exception in SimpleHttpClient and turn it into a
RequestTimedOutError (which is already a documented exception).
Also add a description to RequestTimedOutError so that we can see which stage
it failed at.
* Fix incorrect handling of timeouts reading federation responses
This was trapping the wrong sort of TimeoutError, so was never being hit.
The effect was relatively minor, but we should fix this so that it does the
expected thing.
* Fix inconsistent handling of `timeout` param between methods
`get_json`, `put_json` and `delete_json` were applying a different timeout to
the response body to `post_json`; bring them in line and test.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
c.f. #8021
A lot of the code here is to change the `Completed 200 OK` logging to include the request URI so that we can drop the `Sending request...` log line.
Some notes:
1. We won't log retries, which may be confusing considering the time taken log line includes retries and sleeps.
2. The `_send_request_with_optional_trailing_slash` will always be logged *without* the forward slash, even if it succeeded only with the forward slash.
This ended up being a bit more invasive than I'd hoped for (not helped by
generic_worker duplicating some of the code from homeserver), but hopefully
it's an improvement.
The idea is that, rather than storing unstructured `dict`s in the config for
the listener configurations, we instead parse it into a structured
`ListenerConfig` object.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
Splitting based on the response code means we can avoid double logging here and identical information from line 164 while still logging at info if we don't get a good response and need to retry.
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
A lot of the things we log at INFO are now a bit superfluous, so lets
make them DEBUG logs to reduce the amount we log by default.
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
The `http_proxy` and `HTTPS_PROXY` env vars can be set to a `host[:port]` value which should point to a proxy.
The address of the proxy should be excluded from IP blacklists such as the `url_preview_ip_range_blacklist`.
The proxy will then be used for
* push
* url previews
* phone-home stats
* recaptcha validation
* CAS auth validation
It will *not* be used for:
* Application Services
* Identity servers
* Outbound federation
* In worker configurations, connections from workers to masters
Fixes#4198.
Replace `client_secret` query parameter values with `<redacted>` in the logs. Prevents a scenario where a MITM of server traffic can horde 3pids on their account.
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
Propagate opentracing contexts across workers
Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
extract methods. - useful when injecting into carriers
locally. Otherwise we'd always have to include our
own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
from a request to a servlet
Add authenticated_entity and servlet_names tags.
Functionally:
- Add a tag for authenticated_entity
- Add a tag for servlet_names
Stylistically:
Moved to importing methods directly from opentracing.
This refactors MatrixFederationAgent to move the SRV lookup into the
endpoint code, this has two benefits:
1. Its easier to retry different host/ports in the same way as
HostnameEndpoint.
2. We avoid SRV lookups if we have a free connection in the pool
If we have recently seen a valid well-known for a domain we want to
retry on (non-final) errors a few times, to handle temporary blips in
networking/etc.
This gives a bit of a grace period where we can attempt to refetch a
remote `well-known`, while still using the cached result if that fails.
Hopefully this will make the well-known resolution a bit more torelant
of failures, rather than it immediately treating failures as "no result"
and caching that for an hour.
It costs both us and the remote server for us to fetch the well known
for every single request we send, so we add a minimum cache period. This
is set to 5m so that we still honour the basic premise of "refetch
frequently".
--------
- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl04Ur0THGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9F4oD/0TY6S/SEd2uAmzor64ojmbX5BOwPzf
j/wzUTrfvuf40EvkNPDpnejNZSvy/ysbaGQaQusv0SQKlV3xrvdn4RuMvnOWVWck
kBsO+lvzOaUTR0KHDxN4y9F5eI2NdPbub4847PPVzyqSIHAd+kolxXS8kSBBhwpL
yfaICWV/AOy5L7xN+JZ9IQpnegVAvUj5DmgXzDHd6VdeiHDVJuARaBgrR5uCkwVS
ZoLRqZ95XV/qiguMAUvPOwyEqht2mwO64989MswP16YYm8oMkB5QA6I5nYnACsTP
qk9YcN/oNvEfQXUhttku6MxK1/4yUMPUhEoDBDH7ebc0440QDtWN+IHTdA6oPVZB
IuStL9YGY16m7Ltx37ZUA4URfNMiSeLHo3zKc/mCAcwxN4HyOjJewtxbG5zKQAOZ
SMs8UcDwGR4zL1hnt8ZDNYtWwfzJBQIdGjoHvjXJEY7/1csTv2lmAwewFTXiqSAr
30GW5ews94kotqBK53zZT6V0F5gHNqgGHniOz1ZpqLLxYLqO3LSAGe97CrqlWUdX
GkhA9tZyweknociD9fyyBmKdcFJ4mL4a+oGI5CMnSMph8UvCY8Y5XMb1T+iYEABI
tA9G3mBvgkLPj+5V+8QggNkBafSigW2Q4FX7enGsDmiiskZOtfeKrAcVkapD4ooi
3I7IW5aetZr2IQ==
=+JBn
-----END PGP SIGNATURE-----
Merge tag 'v1.2.0rc2' into develop
Bugfixes
--------
- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
* Fix servlet metric names
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Remove redundant check
* Cover all return paths
* Configure and initialise tracer
Includes config options for the tracer and sets up JaegerClient.
* Scope manager using LogContexts
We piggy-back our tracer scopes by using log context.
The current log context gives us the current scope. If new scope is
created we create a stack of scopes in the context.
* jaeger is a dependency now
* Carrier inject and extraction for Twisted Headers
* Trace federation requests on the way in and out.
The span is created in _started_processing and closed in
_finished_processing because we need a meaningful log context.
* Create logcontext for new scope.
Instead of having a stack of scopes in a logcontext we create a new
context for a new scope if the current logcontext already has a scope.
* Remove scope from logcontext if logcontext is top level
* Disable tracer if not configured
* typo
* Remove dependence on jaeger internals
* bools
* Set service name
* :Explicitely state that the tracer is disabled
* Black is the new black
* Newsfile
* Code style
* Use the new config setup.
* Generate config.
* Copyright
* Rename config to opentracing
* Remove user whitelisting
* Empty whitelist by default
* User ConfigError instead of RuntimeError
* Use isinstance
* Use tag constants for opentracing.
* Remove debug comment and no need to explicitely record error
* Two errors a "s(c)entry"
* Docstrings!
* Remove debugging brainslip
* Homeserver Whitlisting
* Better opentracing config comment
* linting
* Inclue worker name in service_name
* Make opentracing an optional dependency
* Neater config retreival
* Clean up dummy tags
* Instantiate tracing as object instead of global class
* Inlcude opentracing as a homeserver member.
* Thread opentracing to the request level
* Reference opetnracing through hs
* Instantiate dummy opentracin g for tests.
* About to revert, just keeping the unfinished changes just in case
* Revert back to global state, commit number:
9ce4a3d9067bf9889b86c360c05ac88618b85c4f
* Use class level methods in tracerutils
* Start and stop requests spans in a place where we
have access to the authenticated entity
* Seen it, isort it
* Make sure to close the active span.
* I'm getting black and blue from this.
* Logger formatting
Co-Authored-By: Erik Johnston <erik@matrix.org>
* Outdated comment
* Import opentracing at the top
* Return a contextmanager
* Start tracing client requests from the servlet
* Return noop context manager if not tracing
* Explicitely say that these are federation requests
* Include servlet name in client requests
* Use context manager
* Move opentracing to logging/
* Seen it, isort it again!
* Ignore twisted return exceptions on context exit
* Escape the scope
* Scopes should be entered to make them useful.
* Nicer decorator names
* Just one init, init?
* Don't need to close something that isn't open
* Docs make you smarter
Prevents a SynapseError being raised inside of a IResolutionReceiver and instead opts to just return 0 results. This thus means that we have to lump a failed lookup and a blacklisted lookup together with the same error message, but the substitute should be generic enough to cover both cases.
Firstly, we always logged that the request was being handled via
`JsonResource._async_render`, so we change that to use the servlet name
we add to the request.
Secondly, we pass the exception information to the logger rather than
formatting it manually. This makes it consistent with other exception
logging, allwoing logging hooks and formatters to access the exception
information.