Support generating structured logs in addition to standard logs. (#8607)

This modifies the configuration of structured logging to be usable from
the standard Python logging configuration.

This also separates the formatting of logs from the transport allowing
JSON logs to files or standard logs to sockets.
This commit is contained in:
Patrick Cloke 2020-10-29 07:27:37 -04:00 committed by GitHub
parent 9a7e0d2ea6
commit 00b24aa545
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
19 changed files with 715 additions and 1020 deletions

View File

@ -75,6 +75,22 @@ for example:
wget https://packages.matrix.org/debian/pool/main/m/matrix-synapse-py3/matrix-synapse-py3_1.3.0+stretch1_amd64.deb wget https://packages.matrix.org/debian/pool/main/m/matrix-synapse-py3/matrix-synapse-py3_1.3.0+stretch1_amd64.deb
dpkg -i matrix-synapse-py3_1.3.0+stretch1_amd64.deb dpkg -i matrix-synapse-py3_1.3.0+stretch1_amd64.deb
Upgrading to v1.23.0
====================
Structured logging configuration breaking changes
-------------------------------------------------
This release deprecates use of the ``structured: true`` logging configuration for
structured logging. If your logging configuration contains ``structured: true``
then it should be modified based on the `structured logging documentation
<https://github.com/matrix-org/synapse/blob/master/docs/structured_logging.md>`_.
The ``structured`` and ``drains`` logging options are now deprecated and should
be replaced by standard logging configuration of ``handlers`` and ``formatters`.
A future will release of Synapse will make using ``structured: true`` an error.
Upgrading to v1.22.0 Upgrading to v1.22.0
==================== ====================

1
changelog.d/8607.misc Normal file
View File

@ -0,0 +1 @@
Re-organize the structured logging code to separate the TCP transport handling from the JSON formatting.

View File

@ -3,7 +3,11 @@
# This is a YAML file containing a standard Python logging configuration # This is a YAML file containing a standard Python logging configuration
# dictionary. See [1] for details on the valid settings. # dictionary. See [1] for details on the valid settings.
# #
# Synapse also supports structured logging for machine readable logs which can
# be ingested by ELK stacks. See [2] for details.
#
# [1]: https://docs.python.org/3.7/library/logging.config.html#configuration-dictionary-schema # [1]: https://docs.python.org/3.7/library/logging.config.html#configuration-dictionary-schema
# [2]: https://github.com/matrix-org/synapse/blob/master/docs/structured_logging.md
version: 1 version: 1

View File

@ -1,11 +1,116 @@
# Structured Logging # Structured Logging
A structured logging system can be useful when your logs are destined for a machine to parse and process. By maintaining its machine-readable characteristics, it enables more efficient searching and aggregations when consumed by software such as the "ELK stack". A structured logging system can be useful when your logs are destined for a
machine to parse and process. By maintaining its machine-readable characteristics,
it enables more efficient searching and aggregations when consumed by software
such as the "ELK stack".
Synapse's structured logging system is configured via the file that Synapse's `log_config` config option points to. The file must be YAML and contain `structured: true`. It must contain a list of "drains" (places where logs go to). Synapse's structured logging system is configured via the file that Synapse's
`log_config` config option points to. The file should include a formatter which
uses the `synapse.logging.TerseJsonFormatter` class included with Synapse and a
handler which uses the above formatter.
There is also a `synapse.logging.JsonFormatter` option which does not include
a timestamp in the resulting JSON. This is useful if the log ingester adds its
own timestamp.
A structured logging configuration looks similar to the following: A structured logging configuration looks similar to the following:
```yaml
version: 1
formatters:
structured:
class: synapse.logging.TerseJsonFormatter
handlers:
file:
class: logging.handlers.TimedRotatingFileHandler
formatter: structured
filename: /path/to/my/logs/homeserver.log
when: midnight
backupCount: 3 # Does not include the current log file.
encoding: utf8
loggers:
synapse:
level: INFO
handlers: [remote]
synapse.storage.SQL:
level: WARNING
```
The above logging config will set Synapse as 'INFO' logging level by default,
with the SQL layer at 'WARNING', and will log to a file, stored as JSON.
It is also possible to figure Synapse to log to a remote endpoint by using the
`synapse.logging.RemoteHandler` class included with Synapse. It takes the
following arguments:
- `host`: Hostname or IP address of the log aggregator.
- `port`: Numerical port to contact on the host.
- `maximum_buffer`: (Optional, defaults to 1000) The maximum buffer size to allow.
A remote structured logging configuration looks similar to the following:
```yaml
version: 1
formatters:
structured:
class: synapse.logging.TerseJsonFormatter
handlers:
remote:
class: synapse.logging.RemoteHandler
formatter: structured
host: 10.1.2.3
port: 9999
loggers:
synapse:
level: INFO
handlers: [remote]
synapse.storage.SQL:
level: WARNING
```
The above logging config will set Synapse as 'INFO' logging level by default,
with the SQL layer at 'WARNING', and will log JSON formatted messages to a
remote endpoint at 10.1.2.3:9999.
## Upgrading from legacy structured logging configuration
Versions of Synapse prior to v1.23.0 included a custom structured logging
configuration which is deprecated. It used a `structured: true` flag and
configured `drains` instead of ``handlers`` and `formatters`.
Synapse currently automatically converts the old configuration to the new
configuration, but this will be removed in a future version of Synapse. The
following reference can be used to update your configuration. Based on the drain
`type`, we can pick a new handler:
1. For a type of `console`, `console_json`, or `console_json_terse`: a handler
with a class of `logging.StreamHandler` and a `stream` of `ext://sys.stdout`
or `ext://sys.stderr` should be used.
2. For a type of `file` or `file_json`: a handler of `logging.FileHandler` with
a location of the file path should be used.
3. For a type of `network_json_terse`: a handler of `synapse.logging.RemoteHandler`
with the host and port should be used.
Then based on the drain `type` we can pick a new formatter:
1. For a type of `console` or `file` no formatter is necessary.
2. For a type of `console_json` or `file_json`: a formatter of
`synapse.logging.JsonFormatter` should be used.
3. For a type of `console_json_terse` or `network_json_terse`: a formatter of
`synapse.logging.TerseJsonFormatter` should be used.
For each new handler and formatter they should be added to the logging configuration
and then assigned to either a logger or the root logger.
An example legacy configuration:
```yaml ```yaml
structured: true structured: true
@ -24,60 +129,33 @@ drains:
location: homeserver.log location: homeserver.log
``` ```
The above logging config will set Synapse as 'INFO' logging level by default, with the SQL layer at 'WARNING', and will have two logging drains (to the console and to a file, stored as JSON). Would be converted into a new configuration:
## Drain Types ```yaml
version: 1
Drain types can be specified by the `type` key. formatters:
json:
class: synapse.logging.JsonFormatter
### `console` handlers:
console:
class: logging.StreamHandler
location: ext://sys.stdout
file:
class: logging.FileHandler
formatter: json
filename: homeserver.log
Outputs human-readable logs to the console. loggers:
synapse:
level: INFO
handlers: [console, file]
synapse.storage.SQL:
level: WARNING
```
Arguments: The new logging configuration is a bit more verbose, but significantly more
flexible. It allows for configuration that were not previously possible, such as
- `location`: Either `stdout` or `stderr`. sending plain logs over the network, or using different handlers for different
modules.
### `console_json`
Outputs machine-readable JSON logs to the console.
Arguments:
- `location`: Either `stdout` or `stderr`.
### `console_json_terse`
Outputs machine-readable JSON logs to the console, separated by newlines. This
format is not designed to be read and re-formatted into human-readable text, but
is optimal for a logging aggregation system.
Arguments:
- `location`: Either `stdout` or `stderr`.
### `file`
Outputs human-readable logs to a file.
Arguments:
- `location`: An absolute path to the file to log to.
### `file_json`
Outputs machine-readable logs to a file.
Arguments:
- `location`: An absolute path to the file to log to.
### `network_json_terse`
Delivers machine-readable JSON logs to a log aggregator over TCP. This is
compatible with LogStash's TCP input with the codec set to `json_lines`.
Arguments:
- `host`: Hostname or IP address of the log aggregator.
- `port`: Numerical port to contact on the host.

View File

@ -80,7 +80,7 @@ else
# then lint everything! # then lint everything!
if [[ -z ${files+x} ]]; then if [[ -z ${files+x} ]]; then
# Lint all source code files and directories # Lint all source code files and directories
files=("synapse" "tests" "scripts-dev" "scripts" "contrib" "synctl" "setup.py") files=("synapse" "tests" "scripts-dev" "scripts" "contrib" "synctl" "setup.py" "synmark")
fi fi
fi fi

View File

@ -23,7 +23,6 @@ from string import Template
import yaml import yaml
from twisted.logger import ( from twisted.logger import (
ILogObserver,
LogBeginner, LogBeginner,
STDLibLogObserver, STDLibLogObserver,
eventAsText, eventAsText,
@ -32,11 +31,9 @@ from twisted.logger import (
import synapse import synapse
from synapse.app import _base as appbase from synapse.app import _base as appbase
from synapse.logging._structured import ( from synapse.logging._structured import setup_structured_logging
reload_structured_logging,
setup_structured_logging,
)
from synapse.logging.context import LoggingContextFilter from synapse.logging.context import LoggingContextFilter
from synapse.logging.filter import MetadataFilter
from synapse.util.versionstring import get_version_string from synapse.util.versionstring import get_version_string
from ._base import Config, ConfigError from ._base import Config, ConfigError
@ -48,7 +45,11 @@ DEFAULT_LOG_CONFIG = Template(
# This is a YAML file containing a standard Python logging configuration # This is a YAML file containing a standard Python logging configuration
# dictionary. See [1] for details on the valid settings. # dictionary. See [1] for details on the valid settings.
# #
# Synapse also supports structured logging for machine readable logs which can
# be ingested by ELK stacks. See [2] for details.
#
# [1]: https://docs.python.org/3.7/library/logging.config.html#configuration-dictionary-schema # [1]: https://docs.python.org/3.7/library/logging.config.html#configuration-dictionary-schema
# [2]: https://github.com/matrix-org/synapse/blob/master/docs/structured_logging.md
version: 1 version: 1
@ -176,11 +177,11 @@ class LoggingConfig(Config):
log_config_file.write(DEFAULT_LOG_CONFIG.substitute(log_file=log_file)) log_config_file.write(DEFAULT_LOG_CONFIG.substitute(log_file=log_file))
def _setup_stdlib_logging(config, log_config, logBeginner: LogBeginner): def _setup_stdlib_logging(config, log_config_path, logBeginner: LogBeginner) -> None:
""" """
Set up Python stdlib logging. Set up Python standard library logging.
""" """
if log_config is None: if log_config_path is None:
log_format = ( log_format = (
"%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s" "%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s"
" - %(message)s" " - %(message)s"
@ -196,7 +197,8 @@ def _setup_stdlib_logging(config, log_config, logBeginner: LogBeginner):
handler.setFormatter(formatter) handler.setFormatter(formatter)
logger.addHandler(handler) logger.addHandler(handler)
else: else:
logging.config.dictConfig(log_config) # Load the logging configuration.
_load_logging_config(log_config_path)
# We add a log record factory that runs all messages through the # We add a log record factory that runs all messages through the
# LoggingContextFilter so that we get the context *at the time we log* # LoggingContextFilter so that we get the context *at the time we log*
@ -204,12 +206,14 @@ def _setup_stdlib_logging(config, log_config, logBeginner: LogBeginner):
# filter options, but care must when using e.g. MemoryHandler to buffer # filter options, but care must when using e.g. MemoryHandler to buffer
# writes. # writes.
log_filter = LoggingContextFilter(request="") log_context_filter = LoggingContextFilter(request="")
log_metadata_filter = MetadataFilter({"server_name": config.server_name})
old_factory = logging.getLogRecordFactory() old_factory = logging.getLogRecordFactory()
def factory(*args, **kwargs): def factory(*args, **kwargs):
record = old_factory(*args, **kwargs) record = old_factory(*args, **kwargs)
log_filter.filter(record) log_context_filter.filter(record)
log_metadata_filter.filter(record)
return record return record
logging.setLogRecordFactory(factory) logging.setLogRecordFactory(factory)
@ -255,21 +259,40 @@ def _setup_stdlib_logging(config, log_config, logBeginner: LogBeginner):
if not config.no_redirect_stdio: if not config.no_redirect_stdio:
print("Redirected stdout/stderr to logs") print("Redirected stdout/stderr to logs")
return observer
def _load_logging_config(log_config_path: str) -> None:
def _reload_stdlib_logging(*args, log_config=None): """
logger = logging.getLogger("") Configure logging from a log config path.
"""
with open(log_config_path, "rb") as f:
log_config = yaml.safe_load(f.read())
if not log_config: if not log_config:
logger.warning("Reloaded a blank config?") logging.warning("Loaded a blank logging config?")
# If the old structured logging configuration is being used, convert it to
# the new style configuration.
if "structured" in log_config and log_config.get("structured"):
log_config = setup_structured_logging(log_config)
logging.config.dictConfig(log_config) logging.config.dictConfig(log_config)
def _reload_logging_config(log_config_path):
"""
Reload the log configuration from the file and apply it.
"""
# If no log config path was given, it cannot be reloaded.
if log_config_path is None:
return
_load_logging_config(log_config_path)
logging.info("Reloaded log config from %s due to SIGHUP", log_config_path)
def setup_logging( def setup_logging(
hs, config, use_worker_options=False, logBeginner: LogBeginner = globalLogBeginner hs, config, use_worker_options=False, logBeginner: LogBeginner = globalLogBeginner
) -> ILogObserver: ) -> None:
""" """
Set up the logging subsystem. Set up the logging subsystem.
@ -282,41 +305,18 @@ def setup_logging(
logBeginner: The Twisted logBeginner to use. logBeginner: The Twisted logBeginner to use.
Returns:
The "root" Twisted Logger observer, suitable for sending logs to from a
Logger instance.
""" """
log_config = config.worker_log_config if use_worker_options else config.log_config log_config_path = (
config.worker_log_config if use_worker_options else config.log_config
)
def read_config(*args, callback=None): # Perform one-time logging configuration.
if log_config is None: _setup_stdlib_logging(config, log_config_path, logBeginner=logBeginner)
return None # Add a SIGHUP handler to reload the logging configuration, if one is available.
appbase.register_sighup(_reload_logging_config, log_config_path)
with open(log_config, "rb") as f: # Log immediately so we can grep backwards.
log_config_body = yaml.safe_load(f.read())
if callback:
callback(log_config=log_config_body)
logging.info("Reloaded log config from %s due to SIGHUP", log_config)
return log_config_body
log_config_body = read_config()
if log_config_body and log_config_body.get("structured") is True:
logger = setup_structured_logging(
hs, config, log_config_body, logBeginner=logBeginner
)
appbase.register_sighup(read_config, callback=reload_structured_logging)
else:
logger = _setup_stdlib_logging(config, log_config_body, logBeginner=logBeginner)
appbase.register_sighup(read_config, callback=_reload_stdlib_logging)
# make sure that the first thing we log is a thing we can grep backwards
# for
logging.warning("***** STARTING SERVER *****") logging.warning("***** STARTING SERVER *****")
logging.warning("Server %s version %s", sys.argv[0], get_version_string(synapse)) logging.warning("Server %s version %s", sys.argv[0], get_version_string(synapse))
logging.info("Server hostname: %s", config.server_name) logging.info("Server hostname: %s", config.server_name)
logging.info("Instance name: %s", hs.get_instance_name()) logging.info("Instance name: %s", hs.get_instance_name())
return logger

View File

@ -0,0 +1,20 @@
# -*- coding: utf-8 -*-
# Copyright 2020 The Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# These are imported to allow for nicer logging configuration files.
from synapse.logging._remote import RemoteHandler
from synapse.logging._terse_json import JsonFormatter, TerseJsonFormatter
__all__ = ["RemoteHandler", "JsonFormatter", "TerseJsonFormatter"]

View File

@ -13,6 +13,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import logging
import sys import sys
import traceback import traceback
from collections import deque from collections import deque
@ -21,6 +22,7 @@ from math import floor
from typing import Callable, Optional from typing import Callable, Optional
import attr import attr
from typing_extensions import Deque
from zope.interface import implementer from zope.interface import implementer
from twisted.application.internet import ClientService from twisted.application.internet import ClientService
@ -32,7 +34,8 @@ from twisted.internet.endpoints import (
) )
from twisted.internet.interfaces import IPushProducer, ITransport from twisted.internet.interfaces import IPushProducer, ITransport
from twisted.internet.protocol import Factory, Protocol from twisted.internet.protocol import Factory, Protocol
from twisted.logger import ILogObserver, Logger, LogLevel
logger = logging.getLogger(__name__)
@attr.s @attr.s
@ -45,11 +48,11 @@ class LogProducer:
Args: Args:
buffer: Log buffer to read logs from. buffer: Log buffer to read logs from.
transport: Transport to write to. transport: Transport to write to.
format_event: A callable to format the log entry to a string. format: A callable to format the log record to a string.
""" """
transport = attr.ib(type=ITransport) transport = attr.ib(type=ITransport)
format_event = attr.ib(type=Callable[[dict], str]) _format = attr.ib(type=Callable[[logging.LogRecord], str])
_buffer = attr.ib(type=deque) _buffer = attr.ib(type=deque)
_paused = attr.ib(default=False, type=bool, init=False) _paused = attr.ib(default=False, type=bool, init=False)
@ -61,16 +64,19 @@ class LogProducer:
self._buffer = deque() self._buffer = deque()
def resumeProducing(self): def resumeProducing(self):
# If we're already producing, nothing to do.
self._paused = False self._paused = False
# Loop until paused.
while self._paused is False and (self._buffer and self.transport.connected): while self._paused is False and (self._buffer and self.transport.connected):
try: try:
# Request the next event and format it. # Request the next record and format it.
event = self._buffer.popleft() record = self._buffer.popleft()
msg = self.format_event(event) msg = self._format(record)
# Send it as a new line over the transport. # Send it as a new line over the transport.
self.transport.write(msg.encode("utf8")) self.transport.write(msg.encode("utf8"))
self.transport.write(b"\n")
except Exception: except Exception:
# Something has gone wrong writing to the transport -- log it # Something has gone wrong writing to the transport -- log it
# and break out of the while. # and break out of the while.
@ -78,60 +84,63 @@ class LogProducer:
break break
@attr.s class RemoteHandler(logging.Handler):
@implementer(ILogObserver)
class TCPLogObserver:
""" """
An IObserver that writes JSON logs to a TCP target. An logging handler that writes logs to a TCP target.
Args: Args:
hs (HomeServer): The homeserver that is being logged for.
host: The host of the logging target. host: The host of the logging target.
port: The logging target's port. port: The logging target's port.
format_event: A callable to format the log entry to a string.
maximum_buffer: The maximum buffer size. maximum_buffer: The maximum buffer size.
""" """
hs = attr.ib() def __init__(
host = attr.ib(type=str) self,
port = attr.ib(type=int) host: str,
format_event = attr.ib(type=Callable[[dict], str]) port: int,
maximum_buffer = attr.ib(type=int) maximum_buffer: int = 1000,
_buffer = attr.ib(default=attr.Factory(deque), type=deque) level=logging.NOTSET,
_connection_waiter = attr.ib(default=None, type=Optional[Deferred]) _reactor=None,
_logger = attr.ib(default=attr.Factory(Logger)) ):
_producer = attr.ib(default=None, type=Optional[LogProducer]) super().__init__(level=level)
self.host = host
self.port = port
self.maximum_buffer = maximum_buffer
def start(self) -> None: self._buffer = deque() # type: Deque[logging.LogRecord]
self._connection_waiter = None # type: Optional[Deferred]
self._producer = None # type: Optional[LogProducer]
# Connect without DNS lookups if it's a direct IP. # Connect without DNS lookups if it's a direct IP.
if _reactor is None:
from twisted.internet import reactor
_reactor = reactor
try: try:
ip = ip_address(self.host) ip = ip_address(self.host)
if isinstance(ip, IPv4Address): if isinstance(ip, IPv4Address):
endpoint = TCP4ClientEndpoint( endpoint = TCP4ClientEndpoint(_reactor, self.host, self.port)
self.hs.get_reactor(), self.host, self.port
)
elif isinstance(ip, IPv6Address): elif isinstance(ip, IPv6Address):
endpoint = TCP6ClientEndpoint( endpoint = TCP6ClientEndpoint(_reactor, self.host, self.port)
self.hs.get_reactor(), self.host, self.port
)
else: else:
raise ValueError("Unknown IP address provided: %s" % (self.host,)) raise ValueError("Unknown IP address provided: %s" % (self.host,))
except ValueError: except ValueError:
endpoint = HostnameEndpoint(self.hs.get_reactor(), self.host, self.port) endpoint = HostnameEndpoint(_reactor, self.host, self.port)
factory = Factory.forProtocol(Protocol) factory = Factory.forProtocol(Protocol)
self._service = ClientService(endpoint, factory, clock=self.hs.get_reactor()) self._service = ClientService(endpoint, factory, clock=_reactor)
self._service.startService() self._service.startService()
self._connect() self._connect()
def stop(self): def close(self):
self._service.stopService() self._service.stopService()
def _connect(self) -> None: def _connect(self) -> None:
""" """
Triggers an attempt to connect then write to the remote if not already writing. Triggers an attempt to connect then write to the remote if not already writing.
""" """
# Do not attempt to open multiple connections.
if self._connection_waiter: if self._connection_waiter:
return return
@ -158,9 +167,7 @@ class TCPLogObserver:
# Make a new producer and start it. # Make a new producer and start it.
self._producer = LogProducer( self._producer = LogProducer(
buffer=self._buffer, buffer=self._buffer, transport=r.transport, format=self.format,
transport=r.transport,
format_event=self.format_event,
) )
r.transport.registerProducer(self._producer, True) r.transport.registerProducer(self._producer, True)
self._producer.resumeProducing() self._producer.resumeProducing()
@ -168,19 +175,19 @@ class TCPLogObserver:
def _handle_pressure(self) -> None: def _handle_pressure(self) -> None:
""" """
Handle backpressure by shedding events. Handle backpressure by shedding records.
The buffer will, in this order, until the buffer is below the maximum: The buffer will, in this order, until the buffer is below the maximum:
- Shed DEBUG events - Shed DEBUG records.
- Shed INFO events - Shed INFO records.
- Shed the middle 50% of the events. - Shed the middle 50% of the records.
""" """
if len(self._buffer) <= self.maximum_buffer: if len(self._buffer) <= self.maximum_buffer:
return return
# Strip out DEBUGs # Strip out DEBUGs
self._buffer = deque( self._buffer = deque(
filter(lambda event: event["log_level"] != LogLevel.debug, self._buffer) filter(lambda record: record.levelno > logging.DEBUG, self._buffer)
) )
if len(self._buffer) <= self.maximum_buffer: if len(self._buffer) <= self.maximum_buffer:
@ -188,7 +195,7 @@ class TCPLogObserver:
# Strip out INFOs # Strip out INFOs
self._buffer = deque( self._buffer = deque(
filter(lambda event: event["log_level"] != LogLevel.info, self._buffer) filter(lambda record: record.levelno > logging.INFO, self._buffer)
) )
if len(self._buffer) <= self.maximum_buffer: if len(self._buffer) <= self.maximum_buffer:
@ -209,17 +216,17 @@ class TCPLogObserver:
self._buffer.extend(reversed(end_buffer)) self._buffer.extend(reversed(end_buffer))
def __call__(self, event: dict) -> None: def emit(self, record: logging.LogRecord) -> None:
self._buffer.append(event) self._buffer.append(record)
# Handle backpressure, if it exists. # Handle backpressure, if it exists.
try: try:
self._handle_pressure() self._handle_pressure()
except Exception: except Exception:
# If handling backpressure fails,clear the buffer and log the # If handling backpressure fails, clear the buffer and log the
# exception. # exception.
self._buffer.clear() self._buffer.clear()
self._logger.failure("Failed clearing backpressure") logger.warning("Failed clearing backpressure")
# Try and write immediately. # Try and write immediately.
self._connect() self._connect()

View File

@ -12,138 +12,12 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import logging
import os.path import os.path
import sys from typing import Any, Dict, Generator, Optional, Tuple
import typing
import warnings
from typing import List
import attr from constantly import NamedConstant, Names
from constantly import NamedConstant, Names, ValueConstant, Values
from zope.interface import implementer
from twisted.logger import (
FileLogObserver,
FilteringLogObserver,
ILogObserver,
LogBeginner,
Logger,
LogLevel,
LogLevelFilterPredicate,
LogPublisher,
eventAsText,
jsonFileLogObserver,
)
from synapse.config._base import ConfigError from synapse.config._base import ConfigError
from synapse.logging._terse_json import (
TerseJSONToConsoleLogObserver,
TerseJSONToTCPLogObserver,
)
from synapse.logging.context import current_context
def stdlib_log_level_to_twisted(level: str) -> LogLevel:
"""
Convert a stdlib log level to Twisted's log level.
"""
lvl = level.lower().replace("warning", "warn")
return LogLevel.levelWithName(lvl)
@attr.s
@implementer(ILogObserver)
class LogContextObserver:
"""
An ILogObserver which adds Synapse-specific log context information.
Attributes:
observer (ILogObserver): The target parent observer.
"""
observer = attr.ib()
def __call__(self, event: dict) -> None:
"""
Consume a log event and emit it to the parent observer after filtering
and adding log context information.
Args:
event (dict)
"""
# Filter out some useless events that Twisted outputs
if "log_text" in event:
if event["log_text"].startswith("DNSDatagramProtocol starting on "):
return
if event["log_text"].startswith("(UDP Port "):
return
if event["log_text"].startswith("Timing out client") or event[
"log_format"
].startswith("Timing out client"):
return
context = current_context()
# Copy the context information to the log event.
context.copy_to_twisted_log_entry(event)
self.observer(event)
class PythonStdlibToTwistedLogger(logging.Handler):
"""
Transform a Python stdlib log message into a Twisted one.
"""
def __init__(self, observer, *args, **kwargs):
"""
Args:
observer (ILogObserver): A Twisted logging observer.
*args, **kwargs: Args/kwargs to be passed to logging.Handler.
"""
self.observer = observer
super().__init__(*args, **kwargs)
def emit(self, record: logging.LogRecord) -> None:
"""
Emit a record to Twisted's observer.
Args:
record (logging.LogRecord)
"""
self.observer(
{
"log_time": record.created,
"log_text": record.getMessage(),
"log_format": "{log_text}",
"log_namespace": record.name,
"log_level": stdlib_log_level_to_twisted(record.levelname),
}
)
def SynapseFileLogObserver(outFile: typing.IO[str]) -> FileLogObserver:
"""
A log observer that formats events like the traditional log formatter and
sends them to `outFile`.
Args:
outFile (file object): The file object to write to.
"""
def formatEvent(_event: dict) -> str:
event = dict(_event)
event["log_level"] = event["log_level"].name.upper()
event["log_format"] = "- {log_namespace} - {log_level} - {request} - " + (
event.get("log_format", "{log_text}") or "{log_text}"
)
return eventAsText(event, includeSystem=False) + "\n"
return FileLogObserver(outFile, formatEvent)
class DrainType(Names): class DrainType(Names):
@ -155,30 +29,12 @@ class DrainType(Names):
NETWORK_JSON_TERSE = NamedConstant() NETWORK_JSON_TERSE = NamedConstant()
class OutputPipeType(Values): DEFAULT_LOGGERS = {"synapse": {"level": "info"}}
stdout = ValueConstant(sys.__stdout__)
stderr = ValueConstant(sys.__stderr__)
@attr.s
class DrainConfiguration:
name = attr.ib()
type = attr.ib()
location = attr.ib()
options = attr.ib(default=None)
@attr.s
class NetworkJSONTerseOptions:
maximum_buffer = attr.ib(type=int)
DEFAULT_LOGGERS = {"synapse": {"level": "INFO"}}
def parse_drain_configs( def parse_drain_configs(
drains: dict, drains: dict,
) -> typing.Generator[DrainConfiguration, None, None]: ) -> Generator[Tuple[str, Dict[str, Any]], None, None]:
""" """
Parse the drain configurations. Parse the drain configurations.
@ -186,11 +42,12 @@ def parse_drain_configs(
drains (dict): A list of drain configurations. drains (dict): A list of drain configurations.
Yields: Yields:
DrainConfiguration instances. dict instances representing a logging handler.
Raises: Raises:
ConfigError: If any of the drain configuration items are invalid. ConfigError: If any of the drain configuration items are invalid.
""" """
for name, config in drains.items(): for name, config in drains.items():
if "type" not in config: if "type" not in config:
raise ConfigError("Logging drains require a 'type' key.") raise ConfigError("Logging drains require a 'type' key.")
@ -202,6 +59,18 @@ def parse_drain_configs(
"%s is not a known logging drain type." % (config["type"],) "%s is not a known logging drain type." % (config["type"],)
) )
# Either use the default formatter or the tersejson one.
if logging_type in (DrainType.CONSOLE_JSON, DrainType.FILE_JSON,):
formatter = "json" # type: Optional[str]
elif logging_type in (
DrainType.CONSOLE_JSON_TERSE,
DrainType.NETWORK_JSON_TERSE,
):
formatter = "tersejson"
else:
# A formatter of None implies using the default formatter.
formatter = None
if logging_type in [ if logging_type in [
DrainType.CONSOLE, DrainType.CONSOLE,
DrainType.CONSOLE_JSON, DrainType.CONSOLE_JSON,
@ -217,9 +86,11 @@ def parse_drain_configs(
% (logging_type,) % (logging_type,)
) )
pipe = OutputPipeType.lookupByName(location).value yield name, {
"class": "logging.StreamHandler",
yield DrainConfiguration(name=name, type=logging_type, location=pipe) "formatter": formatter,
"stream": "ext://sys." + location,
}
elif logging_type in [DrainType.FILE, DrainType.FILE_JSON]: elif logging_type in [DrainType.FILE, DrainType.FILE_JSON]:
if "location" not in config: if "location" not in config:
@ -233,18 +104,25 @@ def parse_drain_configs(
"File paths need to be absolute, '%s' is a relative path" "File paths need to be absolute, '%s' is a relative path"
% (location,) % (location,)
) )
yield DrainConfiguration(name=name, type=logging_type, location=location)
yield name, {
"class": "logging.FileHandler",
"formatter": formatter,
"filename": location,
}
elif logging_type in [DrainType.NETWORK_JSON_TERSE]: elif logging_type in [DrainType.NETWORK_JSON_TERSE]:
host = config.get("host") host = config.get("host")
port = config.get("port") port = config.get("port")
maximum_buffer = config.get("maximum_buffer", 1000) maximum_buffer = config.get("maximum_buffer", 1000)
yield DrainConfiguration(
name=name, yield name, {
type=logging_type, "class": "synapse.logging.RemoteHandler",
location=(host, port), "formatter": formatter,
options=NetworkJSONTerseOptions(maximum_buffer=maximum_buffer), "host": host,
) "port": port,
"maximum_buffer": maximum_buffer,
}
else: else:
raise ConfigError( raise ConfigError(
@ -253,126 +131,29 @@ def parse_drain_configs(
) )
class StoppableLogPublisher(LogPublisher): def setup_structured_logging(log_config: dict,) -> dict:
""" """
A log publisher that can tell its observers to shut down any external Convert a legacy structured logging configuration (from Synapse < v1.23.0)
communications. to one compatible with the new standard library handlers.
""" """
def stop(self):
for obs in self._observers:
if hasattr(obs, "stop"):
obs.stop()
def setup_structured_logging(
hs,
config,
log_config: dict,
logBeginner: LogBeginner,
redirect_stdlib_logging: bool = True,
) -> LogPublisher:
"""
Set up Twisted's structured logging system.
Args:
hs: The homeserver to use.
config (HomeserverConfig): The configuration of the Synapse homeserver.
log_config (dict): The log configuration to use.
"""
if config.no_redirect_stdio:
raise ConfigError(
"no_redirect_stdio cannot be defined using structured logging."
)
logger = Logger()
if "drains" not in log_config: if "drains" not in log_config:
raise ConfigError("The logging configuration requires a list of drains.") raise ConfigError("The logging configuration requires a list of drains.")
observers = [] # type: List[ILogObserver] new_config = {
"version": 1,
"formatters": {
"json": {"class": "synapse.logging.JsonFormatter"},
"tersejson": {"class": "synapse.logging.TerseJsonFormatter"},
},
"handlers": {},
"loggers": log_config.get("loggers", DEFAULT_LOGGERS),
"root": {"handlers": []},
}
for observer in parse_drain_configs(log_config["drains"]): for handler_name, handler in parse_drain_configs(log_config["drains"]):
# Pipe drains new_config["handlers"][handler_name] = handler
if observer.type == DrainType.CONSOLE:
logger.debug(
"Starting up the {name} console logger drain", name=observer.name
)
observers.append(SynapseFileLogObserver(observer.location))
elif observer.type == DrainType.CONSOLE_JSON:
logger.debug(
"Starting up the {name} JSON console logger drain", name=observer.name
)
observers.append(jsonFileLogObserver(observer.location))
elif observer.type == DrainType.CONSOLE_JSON_TERSE:
logger.debug(
"Starting up the {name} terse JSON console logger drain",
name=observer.name,
)
observers.append(
TerseJSONToConsoleLogObserver(observer.location, metadata={})
)
# File drains # Add each handler to the root logger.
elif observer.type == DrainType.FILE: new_config["root"]["handlers"].append(handler_name)
logger.debug("Starting up the {name} file logger drain", name=observer.name)
log_file = open(observer.location, "at", buffering=1, encoding="utf8")
observers.append(SynapseFileLogObserver(log_file))
elif observer.type == DrainType.FILE_JSON:
logger.debug(
"Starting up the {name} JSON file logger drain", name=observer.name
)
log_file = open(observer.location, "at", buffering=1, encoding="utf8")
observers.append(jsonFileLogObserver(log_file))
elif observer.type == DrainType.NETWORK_JSON_TERSE: return new_config
metadata = {"server_name": hs.config.server_name}
log_observer = TerseJSONToTCPLogObserver(
hs=hs,
host=observer.location[0],
port=observer.location[1],
metadata=metadata,
maximum_buffer=observer.options.maximum_buffer,
)
log_observer.start()
observers.append(log_observer)
else:
# We should never get here, but, just in case, throw an error.
raise ConfigError("%s drain type cannot be configured" % (observer.type,))
publisher = StoppableLogPublisher(*observers)
log_filter = LogLevelFilterPredicate()
for namespace, namespace_config in log_config.get(
"loggers", DEFAULT_LOGGERS
).items():
# Set the log level for twisted.logger.Logger namespaces
log_filter.setLogLevelForNamespace(
namespace,
stdlib_log_level_to_twisted(namespace_config.get("level", "INFO")),
)
# Also set the log levels for the stdlib logger namespaces, to prevent
# them getting to PythonStdlibToTwistedLogger and having to be formatted
if "level" in namespace_config:
logging.getLogger(namespace).setLevel(namespace_config.get("level"))
f = FilteringLogObserver(publisher, [log_filter])
lco = LogContextObserver(f)
if redirect_stdlib_logging:
stuff_into_twisted = PythonStdlibToTwistedLogger(lco)
stdliblogger = logging.getLogger()
stdliblogger.addHandler(stuff_into_twisted)
# Always redirect standard I/O, otherwise other logging outputs might miss
# it.
logBeginner.beginLoggingTo([lco], redirectStandardIO=True)
return publisher
def reload_structured_logging(*args, log_config=None) -> None:
warnings.warn(
"Currently the structured logging system can not be reloaded, doing nothing"
)

View File

@ -16,141 +16,65 @@
""" """
Log formatters that output terse JSON. Log formatters that output terse JSON.
""" """
import json import json
from typing import IO import logging
from twisted.logger import FileLogObserver
from synapse.logging._remote import TCPLogObserver
_encoder = json.JSONEncoder(ensure_ascii=False, separators=(",", ":")) _encoder = json.JSONEncoder(ensure_ascii=False, separators=(",", ":"))
# The properties of a standard LogRecord.
def flatten_event(event: dict, metadata: dict, include_time: bool = False): _LOG_RECORD_ATTRIBUTES = {
""" "args",
Flatten a Twisted logging event to an dictionary capable of being sent "asctime",
as a log event to a logging aggregation system. "created",
"exc_info",
The format is vastly simplified and is not designed to be a "human readable # exc_text isn't a public attribute, but is used to cache the result of formatException.
string" in the sense that traditional logs are. Instead, the structure is "exc_text",
optimised for searchability and filtering, with human-understandable log "filename",
keys. "funcName",
"levelname",
Args: "levelno",
event (dict): The Twisted logging event we are flattening. "lineno",
metadata (dict): Additional data to include with each log message. This "message",
can be information like the server name. Since the target log "module",
consumer does not know who we are other than by host IP, this "msecs",
allows us to forward through static information. "msg",
include_time (bool): Should we include the `time` key? If False, the "name",
event time is stripped from the event. "pathname",
""" "process",
new_event = {} "processName",
"relativeCreated",
# If it's a failure, make the new event's log_failure be the traceback text. "stack_info",
if "log_failure" in event: "thread",
new_event["log_failure"] = event["log_failure"].getTraceback() "threadName",
}
# If it's a warning, copy over a string representation of the warning.
if "warning" in event:
new_event["warning"] = str(event["warning"])
# Stdlib logging events have "log_text" as their human-readable portion,
# Twisted ones have "log_format". For now, include the log_format, so that
# context only given in the log format (e.g. what is being logged) is
# available.
if "log_text" in event:
new_event["log"] = event["log_text"]
else:
new_event["log"] = event["log_format"]
# We want to include the timestamp when forwarding over the network, but
# exclude it when we are writing to stdout. This is because the log ingester
# (e.g. logstash, fluentd) can add its own timestamp.
if include_time:
new_event["time"] = round(event["log_time"], 2)
# Convert the log level to a textual representation.
new_event["level"] = event["log_level"].name.upper()
# Ignore these keys, and do not transfer them over to the new log object.
# They are either useless (isError), transferred manually above (log_time,
# log_level, etc), or contain Python objects which are not useful for output
# (log_logger, log_source).
keys_to_delete = [
"isError",
"log_failure",
"log_format",
"log_level",
"log_logger",
"log_source",
"log_system",
"log_time",
"log_text",
"observer",
"warning",
]
# If it's from the Twisted legacy logger (twisted.python.log), it adds some
# more keys we want to purge.
if event.get("log_namespace") == "log_legacy":
keys_to_delete.extend(["message", "system", "time"])
# Rather than modify the dictionary in place, construct a new one with only
# the content we want. The original event should be considered 'frozen'.
for key in event.keys():
if key in keys_to_delete:
continue
if isinstance(event[key], (str, int, bool, float)) or event[key] is None:
# If it's a plain type, include it as is.
new_event[key] = event[key]
else:
# If it's not one of those basic types, write out a string
# representation. This should probably be a warning in development,
# so that we are sure we are only outputting useful data.
new_event[key] = str(event[key])
# Add the metadata information to the event (e.g. the server_name).
new_event.update(metadata)
return new_event
def TerseJSONToConsoleLogObserver(outFile: IO[str], metadata: dict) -> FileLogObserver: class JsonFormatter(logging.Formatter):
""" def format(self, record: logging.LogRecord) -> str:
A log observer that formats events to a flattened JSON representation. event = {
"log": record.getMessage(),
"namespace": record.name,
"level": record.levelname,
}
Args: return self._format(record, event)
outFile: The file object to write to.
metadata: Metadata to be added to each log object.
"""
def formatEvent(_event: dict) -> str: def _format(self, record: logging.LogRecord, event: dict) -> str:
flattened = flatten_event(_event, metadata) # Add any extra attributes to the event.
return _encoder.encode(flattened) + "\n" for key, value in record.__dict__.items():
if key not in _LOG_RECORD_ATTRIBUTES:
event[key] = value
return FileLogObserver(outFile, formatEvent) return _encoder.encode(event)
def TerseJSONToTCPLogObserver( class TerseJsonFormatter(JsonFormatter):
hs, host: str, port: int, metadata: dict, maximum_buffer: int def format(self, record: logging.LogRecord) -> str:
) -> FileLogObserver: event = {
""" "log": record.getMessage(),
A log observer that formats events to a flattened JSON representation. "namespace": record.name,
"level": record.levelname,
"time": round(record.created, 2),
}
Args: return self._format(record, event)
hs (HomeServer): The homeserver that is being logged for.
host: The host of the logging target.
port: The logging target's port.
metadata: Metadata to be added to each log object.
maximum_buffer: The maximum buffer size.
"""
def formatEvent(_event: dict) -> str:
flattened = flatten_event(_event, metadata, include_time=True)
return _encoder.encode(flattened) + "\n"
return TCPLogObserver(hs, host, port, formatEvent, maximum_buffer)

33
synapse/logging/filter.py Normal file
View File

@ -0,0 +1,33 @@
# -*- coding: utf-8 -*-
# Copyright 2020 The Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing_extensions import Literal
class MetadataFilter(logging.Filter):
"""Logging filter that adds constant values to each record.
Args:
metadata: Key-value pairs to add to each record.
"""
def __init__(self, metadata: dict):
self._metadata = metadata
def filter(self, record: logging.LogRecord) -> Literal[True]:
for key, value in self._metadata.items():
setattr(record, key, value)
return True

View File

@ -21,45 +21,6 @@ except ImportError:
from twisted.internet.pollreactor import PollReactor as Reactor from twisted.internet.pollreactor import PollReactor as Reactor
from twisted.internet.main import installReactor from twisted.internet.main import installReactor
from synapse.config.homeserver import HomeServerConfig
from synapse.util import Clock
from tests.utils import default_config, setup_test_homeserver
async def make_homeserver(reactor, config=None):
"""
Make a Homeserver suitable for running benchmarks against.
Args:
reactor: A Twisted reactor to run under.
config: A HomeServerConfig to use, or None.
"""
cleanup_tasks = []
clock = Clock(reactor)
if not config:
config = default_config("test")
config_obj = HomeServerConfig()
config_obj.parse_config_dict(config, "", "")
hs = setup_test_homeserver(
cleanup_tasks.append, config=config_obj, reactor=reactor, clock=clock
)
stor = hs.get_datastore()
# Run the database background updates.
if hasattr(stor.db_pool.updates, "do_next_background_update"):
while not await stor.db_pool.updates.has_completed_background_updates():
await stor.db_pool.updates.do_next_background_update(1)
def cleanup():
for i in cleanup_tasks:
i()
return hs, clock.sleep, cleanup
def make_reactor(): def make_reactor():
""" """

View File

@ -12,20 +12,20 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import sys import sys
from argparse import REMAINDER from argparse import REMAINDER
from contextlib import redirect_stderr from contextlib import redirect_stderr
from io import StringIO from io import StringIO
import pyperf import pyperf
from synmark import make_reactor
from synmark.suites import SUITES
from twisted.internet.defer import Deferred, ensureDeferred from twisted.internet.defer import Deferred, ensureDeferred
from twisted.logger import globalLogBeginner, textFileLogObserver from twisted.logger import globalLogBeginner, textFileLogObserver
from twisted.python.failure import Failure from twisted.python.failure import Failure
from synmark import make_reactor
from synmark.suites import SUITES
from tests.utils import setupdb from tests.utils import setupdb

View File

@ -13,20 +13,22 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import logging
import warnings import warnings
from io import StringIO from io import StringIO
from mock import Mock from mock import Mock
from pyperf import perf_counter from pyperf import perf_counter
from synmark import make_homeserver
from twisted.internet.defer import Deferred from twisted.internet.defer import Deferred
from twisted.internet.protocol import ServerFactory from twisted.internet.protocol import ServerFactory
from twisted.logger import LogBeginner, Logger, LogPublisher from twisted.logger import LogBeginner, LogPublisher
from twisted.protocols.basic import LineOnlyReceiver from twisted.protocols.basic import LineOnlyReceiver
from synapse.logging._structured import setup_structured_logging from synapse.config.logger import _setup_stdlib_logging
from synapse.logging import RemoteHandler
from synapse.util import Clock
class LineCounter(LineOnlyReceiver): class LineCounter(LineOnlyReceiver):
@ -62,7 +64,15 @@ async def main(reactor, loops):
logger_factory.on_done = Deferred() logger_factory.on_done = Deferred()
port = reactor.listenTCP(0, logger_factory, interface="127.0.0.1") port = reactor.listenTCP(0, logger_factory, interface="127.0.0.1")
hs, wait, cleanup = await make_homeserver(reactor) # A fake homeserver config.
class Config:
server_name = "synmark-" + str(loops)
no_redirect_stdio = True
hs_config = Config()
# To be able to sleep.
clock = Clock(reactor)
errors = StringIO() errors = StringIO()
publisher = LogPublisher() publisher = LogPublisher()
@ -72,47 +82,49 @@ async def main(reactor, loops):
) )
log_config = { log_config = {
"loggers": {"synapse": {"level": "DEBUG"}}, "version": 1,
"drains": { "loggers": {"synapse": {"level": "DEBUG", "handlers": ["tersejson"]}},
"formatters": {"tersejson": {"class": "synapse.logging.TerseJsonFormatter"}},
"handlers": {
"tersejson": { "tersejson": {
"type": "network_json_terse", "class": "synapse.logging.RemoteHandler",
"host": "127.0.0.1", "host": "127.0.0.1",
"port": port.getHost().port, "port": port.getHost().port,
"maximum_buffer": 100, "maximum_buffer": 100,
"_reactor": reactor,
} }
}, },
} }
logger = Logger(namespace="synapse.logging.test_terse_json", observer=publisher) logger = logging.getLogger("synapse.logging.test_terse_json")
logging_system = setup_structured_logging( _setup_stdlib_logging(
hs, hs.config, log_config, logBeginner=beginner, redirect_stdlib_logging=False hs_config, log_config, logBeginner=beginner,
) )
# Wait for it to connect... # Wait for it to connect...
await logging_system._observers[0]._service.whenConnected() for handler in logging.getLogger("synapse").handlers:
if isinstance(handler, RemoteHandler):
break
else:
raise RuntimeError("Improperly configured: no RemoteHandler found.")
await handler._service.whenConnected()
start = perf_counter() start = perf_counter()
# Send a bunch of useful messages # Send a bunch of useful messages
for i in range(0, loops): for i in range(0, loops):
logger.info("test message %s" % (i,)) logger.info("test message %s", i)
if ( if len(handler._buffer) == handler.maximum_buffer:
len(logging_system._observers[0]._buffer) while len(handler._buffer) > handler.maximum_buffer / 2:
== logging_system._observers[0].maximum_buffer await clock.sleep(0.01)
):
while (
len(logging_system._observers[0]._buffer)
> logging_system._observers[0].maximum_buffer / 2
):
await wait(0.01)
await logger_factory.on_done await logger_factory.on_done
end = perf_counter() - start end = perf_counter() - start
logging_system.stop() handler.close()
port.stopListening() port.stopListening()
cleanup()
return end return end

View File

@ -0,0 +1,34 @@
# -*- coding: utf-8 -*-
# Copyright 2019 The Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
class LoggerCleanupMixin:
def get_logger(self, handler):
"""
Attach a handler to a logger and add clean-ups to remove revert this.
"""
# Create a logger and add the handler to it.
logger = logging.getLogger(__name__)
logger.addHandler(handler)
# Ensure the logger actually logs something.
logger.setLevel(logging.INFO)
# Ensure the logger gets cleaned-up appropriately.
self.addCleanup(logger.removeHandler, handler)
self.addCleanup(logger.setLevel, logging.NOTSET)
return logger

View File

@ -0,0 +1,153 @@
# -*- coding: utf-8 -*-
# Copyright 2019 The Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from twisted.test.proto_helpers import AccumulatingProtocol
from synapse.logging import RemoteHandler
from tests.logging import LoggerCleanupMixin
from tests.server import FakeTransport, get_clock
from tests.unittest import TestCase
def connect_logging_client(reactor, client_id):
# This is essentially tests.server.connect_client, but disabling autoflush on
# the client transport. This is necessary to avoid an infinite loop due to
# sending of data via the logging transport causing additional logs to be
# written.
factory = reactor.tcpClients.pop(client_id)[2]
client = factory.buildProtocol(None)
server = AccumulatingProtocol()
server.makeConnection(FakeTransport(client, reactor))
client.makeConnection(FakeTransport(server, reactor, autoflush=False))
return client, server
class RemoteHandlerTestCase(LoggerCleanupMixin, TestCase):
def setUp(self):
self.reactor, _ = get_clock()
def test_log_output(self):
"""
The remote handler delivers logs over TCP.
"""
handler = RemoteHandler("127.0.0.1", 9000, _reactor=self.reactor)
logger = self.get_logger(handler)
logger.info("Hello there, %s!", "wally")
# Trigger the connection
client, server = connect_logging_client(self.reactor, 0)
# Trigger data being sent
client.transport.flush()
# One log message, with a single trailing newline
logs = server.data.decode("utf8").splitlines()
self.assertEqual(len(logs), 1)
self.assertEqual(server.data.count(b"\n"), 1)
# Ensure the data passed through properly.
self.assertEqual(logs[0], "Hello there, wally!")
def test_log_backpressure_debug(self):
"""
When backpressure is hit, DEBUG logs will be shed.
"""
handler = RemoteHandler(
"127.0.0.1", 9000, maximum_buffer=10, _reactor=self.reactor
)
logger = self.get_logger(handler)
# Send some debug messages
for i in range(0, 3):
logger.debug("debug %s" % (i,))
# Send a bunch of useful messages
for i in range(0, 7):
logger.info("info %s" % (i,))
# The last debug message pushes it past the maximum buffer
logger.debug("too much debug")
# Allow the reconnection
client, server = connect_logging_client(self.reactor, 0)
client.transport.flush()
# Only the 7 infos made it through, the debugs were elided
logs = server.data.splitlines()
self.assertEqual(len(logs), 7)
self.assertNotIn(b"debug", server.data)
def test_log_backpressure_info(self):
"""
When backpressure is hit, DEBUG and INFO logs will be shed.
"""
handler = RemoteHandler(
"127.0.0.1", 9000, maximum_buffer=10, _reactor=self.reactor
)
logger = self.get_logger(handler)
# Send some debug messages
for i in range(0, 3):
logger.debug("debug %s" % (i,))
# Send a bunch of useful messages
for i in range(0, 10):
logger.warning("warn %s" % (i,))
# Send a bunch of info messages
for i in range(0, 3):
logger.info("info %s" % (i,))
# The last debug message pushes it past the maximum buffer
logger.debug("too much debug")
# Allow the reconnection
client, server = connect_logging_client(self.reactor, 0)
client.transport.flush()
# The 10 warnings made it through, the debugs and infos were elided
logs = server.data.splitlines()
self.assertEqual(len(logs), 10)
self.assertNotIn(b"debug", server.data)
self.assertNotIn(b"info", server.data)
def test_log_backpressure_cut_middle(self):
"""
When backpressure is hit, and no more DEBUG and INFOs cannot be culled,
it will cut the middle messages out.
"""
handler = RemoteHandler(
"127.0.0.1", 9000, maximum_buffer=10, _reactor=self.reactor
)
logger = self.get_logger(handler)
# Send a bunch of useful messages
for i in range(0, 20):
logger.warning("warn %s" % (i,))
# Allow the reconnection
client, server = connect_logging_client(self.reactor, 0)
client.transport.flush()
# The first five and last five warnings made it through, the debugs and
# infos were elided
logs = server.data.decode("utf8").splitlines()
self.assertEqual(
["warn %s" % (i,) for i in range(5)]
+ ["warn %s" % (i,) for i in range(15, 20)],
logs,
)

View File

@ -1,214 +0,0 @@
# -*- coding: utf-8 -*-
# Copyright 2019 The Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
import os.path
import shutil
import sys
import textwrap
from twisted.logger import Logger, eventAsText, eventsFromJSONLogFile
from synapse.config.logger import setup_logging
from synapse.logging._structured import setup_structured_logging
from synapse.logging.context import LoggingContext
from tests.unittest import DEBUG, HomeserverTestCase
class FakeBeginner:
def beginLoggingTo(self, observers, **kwargs):
self.observers = observers
class StructuredLoggingTestBase:
"""
Test base that registers a cleanup handler to reset the stdlib log handler
to 'unset'.
"""
def prepare(self, reactor, clock, hs):
def _cleanup():
logging.getLogger("synapse").setLevel(logging.NOTSET)
self.addCleanup(_cleanup)
class StructuredLoggingTestCase(StructuredLoggingTestBase, HomeserverTestCase):
"""
Tests for Synapse's structured logging support.
"""
def test_output_to_json_round_trip(self):
"""
Synapse logs can be outputted to JSON and then read back again.
"""
temp_dir = self.mktemp()
os.mkdir(temp_dir)
self.addCleanup(shutil.rmtree, temp_dir)
json_log_file = os.path.abspath(os.path.join(temp_dir, "out.json"))
log_config = {
"drains": {"jsonfile": {"type": "file_json", "location": json_log_file}}
}
# Begin the logger with our config
beginner = FakeBeginner()
setup_structured_logging(
self.hs, self.hs.config, log_config, logBeginner=beginner
)
# Make a logger and send an event
logger = Logger(
namespace="tests.logging.test_structured", observer=beginner.observers[0]
)
logger.info("Hello there, {name}!", name="wally")
# Read the log file and check it has the event we sent
with open(json_log_file, "r") as f:
logged_events = list(eventsFromJSONLogFile(f))
self.assertEqual(len(logged_events), 1)
# The event pulled from the file should render fine
self.assertEqual(
eventAsText(logged_events[0], includeTimestamp=False),
"[tests.logging.test_structured#info] Hello there, wally!",
)
def test_output_to_text(self):
"""
Synapse logs can be outputted to text.
"""
temp_dir = self.mktemp()
os.mkdir(temp_dir)
self.addCleanup(shutil.rmtree, temp_dir)
log_file = os.path.abspath(os.path.join(temp_dir, "out.log"))
log_config = {"drains": {"file": {"type": "file", "location": log_file}}}
# Begin the logger with our config
beginner = FakeBeginner()
setup_structured_logging(
self.hs, self.hs.config, log_config, logBeginner=beginner
)
# Make a logger and send an event
logger = Logger(
namespace="tests.logging.test_structured", observer=beginner.observers[0]
)
logger.info("Hello there, {name}!", name="wally")
# Read the log file and check it has the event we sent
with open(log_file, "r") as f:
logged_events = f.read().strip().split("\n")
self.assertEqual(len(logged_events), 1)
# The event pulled from the file should render fine
self.assertTrue(
logged_events[0].endswith(
" - tests.logging.test_structured - INFO - None - Hello there, wally!"
)
)
def test_collects_logcontext(self):
"""
Test that log outputs have the attached logging context.
"""
log_config = {"drains": {}}
# Begin the logger with our config
beginner = FakeBeginner()
publisher = setup_structured_logging(
self.hs, self.hs.config, log_config, logBeginner=beginner
)
logs = []
publisher.addObserver(logs.append)
# Make a logger and send an event
logger = Logger(
namespace="tests.logging.test_structured", observer=beginner.observers[0]
)
with LoggingContext("testcontext", request="somereq"):
logger.info("Hello there, {name}!", name="steve")
self.assertEqual(len(logs), 1)
self.assertEqual(logs[0]["request"], "somereq")
class StructuredLoggingConfigurationFileTestCase(
StructuredLoggingTestBase, HomeserverTestCase
):
def make_homeserver(self, reactor, clock):
tempdir = self.mktemp()
os.mkdir(tempdir)
log_config_file = os.path.abspath(os.path.join(tempdir, "log.config.yaml"))
self.homeserver_log = os.path.abspath(os.path.join(tempdir, "homeserver.log"))
config = self.default_config()
config["log_config"] = log_config_file
with open(log_config_file, "w") as f:
f.write(
textwrap.dedent(
"""\
structured: true
drains:
file:
type: file_json
location: %s
"""
% (self.homeserver_log,)
)
)
self.addCleanup(self._sys_cleanup)
return self.setup_test_homeserver(config=config)
def _sys_cleanup(self):
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
# Do not remove! We need the logging system to be set other than WARNING.
@DEBUG
def test_log_output(self):
"""
When a structured logging config is given, Synapse will use it.
"""
beginner = FakeBeginner()
publisher = setup_logging(self.hs, self.hs.config, logBeginner=beginner)
# Make a logger and send an event
logger = Logger(namespace="tests.logging.test_structured", observer=publisher)
with LoggingContext("testcontext", request="somereq"):
logger.info("Hello there, {name}!", name="steve")
with open(self.homeserver_log, "r") as f:
logged_events = [
eventAsText(x, includeTimestamp=False) for x in eventsFromJSONLogFile(f)
]
logs = "\n".join(logged_events)
self.assertTrue("***** STARTING SERVER *****" in logs)
self.assertTrue("Hello there, steve!" in logs)

View File

@ -14,57 +14,33 @@
# limitations under the License. # limitations under the License.
import json import json
from collections import Counter import logging
from io import StringIO
from twisted.logger import Logger from synapse.logging._terse_json import JsonFormatter, TerseJsonFormatter
from synapse.logging._structured import setup_structured_logging from tests.logging import LoggerCleanupMixin
from tests.unittest import TestCase
from tests.server import connect_client
from tests.unittest import HomeserverTestCase
from .test_structured import FakeBeginner, StructuredLoggingTestBase
class TerseJSONTCPTestCase(StructuredLoggingTestBase, HomeserverTestCase): class TerseJsonTestCase(LoggerCleanupMixin, TestCase):
def test_log_output(self): def test_terse_json_output(self):
""" """
The Terse JSON outputter delivers simplified structured logs over TCP. The Terse JSON formatter converts log messages to JSON.
""" """
log_config = { output = StringIO()
"drains": {
"tersejson": {
"type": "network_json_terse",
"host": "127.0.0.1",
"port": 8000,
}
}
}
# Begin the logger with our config handler = logging.StreamHandler(output)
beginner = FakeBeginner() handler.setFormatter(TerseJsonFormatter())
setup_structured_logging( logger = self.get_logger(handler)
self.hs, self.hs.config, log_config, logBeginner=beginner
)
logger = Logger( logger.info("Hello there, %s!", "wally")
namespace="tests.logging.test_terse_json", observer=beginner.observers[0]
)
logger.info("Hello there, {name}!", name="wally")
# Trigger the connection # One log message, with a single trailing newline.
self.pump() data = output.getvalue()
logs = data.splitlines()
_, server = connect_client(self.reactor, 0)
# Trigger data being sent
self.pump()
# One log message, with a single trailing newline
logs = server.data.decode("utf8").splitlines()
self.assertEqual(len(logs), 1) self.assertEqual(len(logs), 1)
self.assertEqual(server.data.count(b"\n"), 1) self.assertEqual(data.count("\n"), 1)
log = json.loads(logs[0]) log = json.loads(logs[0])
# The terse logger should give us these keys. # The terse logger should give us these keys.
@ -72,163 +48,74 @@ class TerseJSONTCPTestCase(StructuredLoggingTestBase, HomeserverTestCase):
"log", "log",
"time", "time",
"level", "level",
"log_namespace", "namespace",
"request", ]
"scope", self.assertCountEqual(log.keys(), expected_log_keys)
"server_name", self.assertEqual(log["log"], "Hello there, wally!")
"name",
def test_extra_data(self):
"""
Additional information can be included in the structured logging.
"""
output = StringIO()
handler = logging.StreamHandler(output)
handler.setFormatter(TerseJsonFormatter())
logger = self.get_logger(handler)
logger.info(
"Hello there, %s!", "wally", extra={"foo": "bar", "int": 3, "bool": True}
)
# One log message, with a single trailing newline.
data = output.getvalue()
logs = data.splitlines()
self.assertEqual(len(logs), 1)
self.assertEqual(data.count("\n"), 1)
log = json.loads(logs[0])
# The terse logger should give us these keys.
expected_log_keys = [
"log",
"time",
"level",
"namespace",
# The additional keys given via extra.
"foo",
"int",
"bool",
] ]
self.assertCountEqual(log.keys(), expected_log_keys) self.assertCountEqual(log.keys(), expected_log_keys)
# It contains the data we expect. # Check the values of the extra fields.
self.assertEqual(log["name"], "wally") self.assertEqual(log["foo"], "bar")
self.assertEqual(log["int"], 3)
self.assertIs(log["bool"], True)
def test_log_backpressure_debug(self): def test_json_output(self):
""" """
When backpressure is hit, DEBUG logs will be shed. The Terse JSON formatter converts log messages to JSON.
""" """
log_config = { output = StringIO()
"loggers": {"synapse": {"level": "DEBUG"}},
"drains": {
"tersejson": {
"type": "network_json_terse",
"host": "127.0.0.1",
"port": 8000,
"maximum_buffer": 10,
}
},
}
# Begin the logger with our config handler = logging.StreamHandler(output)
beginner = FakeBeginner() handler.setFormatter(JsonFormatter())
setup_structured_logging( logger = self.get_logger(handler)
self.hs,
self.hs.config,
log_config,
logBeginner=beginner,
redirect_stdlib_logging=False,
)
logger = Logger( logger.info("Hello there, %s!", "wally")
namespace="synapse.logging.test_terse_json", observer=beginner.observers[0]
)
# Send some debug messages # One log message, with a single trailing newline.
for i in range(0, 3): data = output.getvalue()
logger.debug("debug %s" % (i,)) logs = data.splitlines()
self.assertEqual(len(logs), 1)
self.assertEqual(data.count("\n"), 1)
log = json.loads(logs[0])
# Send a bunch of useful messages # The terse logger should give us these keys.
for i in range(0, 7): expected_log_keys = [
logger.info("test message %s" % (i,)) "log",
"level",
# The last debug message pushes it past the maximum buffer "namespace",
logger.debug("too much debug") ]
self.assertCountEqual(log.keys(), expected_log_keys)
# Allow the reconnection self.assertEqual(log["log"], "Hello there, wally!")
_, server = connect_client(self.reactor, 0)
self.pump()
# Only the 7 infos made it through, the debugs were elided
logs = server.data.splitlines()
self.assertEqual(len(logs), 7)
def test_log_backpressure_info(self):
"""
When backpressure is hit, DEBUG and INFO logs will be shed.
"""
log_config = {
"loggers": {"synapse": {"level": "DEBUG"}},
"drains": {
"tersejson": {
"type": "network_json_terse",
"host": "127.0.0.1",
"port": 8000,
"maximum_buffer": 10,
}
},
}
# Begin the logger with our config
beginner = FakeBeginner()
setup_structured_logging(
self.hs,
self.hs.config,
log_config,
logBeginner=beginner,
redirect_stdlib_logging=False,
)
logger = Logger(
namespace="synapse.logging.test_terse_json", observer=beginner.observers[0]
)
# Send some debug messages
for i in range(0, 3):
logger.debug("debug %s" % (i,))
# Send a bunch of useful messages
for i in range(0, 10):
logger.warn("test warn %s" % (i,))
# Send a bunch of info messages
for i in range(0, 3):
logger.info("test message %s" % (i,))
# The last debug message pushes it past the maximum buffer
logger.debug("too much debug")
# Allow the reconnection
client, server = connect_client(self.reactor, 0)
self.pump()
# The 10 warnings made it through, the debugs and infos were elided
logs = list(map(json.loads, server.data.decode("utf8").splitlines()))
self.assertEqual(len(logs), 10)
self.assertEqual(Counter([x["level"] for x in logs]), {"WARN": 10})
def test_log_backpressure_cut_middle(self):
"""
When backpressure is hit, and no more DEBUG and INFOs cannot be culled,
it will cut the middle messages out.
"""
log_config = {
"loggers": {"synapse": {"level": "DEBUG"}},
"drains": {
"tersejson": {
"type": "network_json_terse",
"host": "127.0.0.1",
"port": 8000,
"maximum_buffer": 10,
}
},
}
# Begin the logger with our config
beginner = FakeBeginner()
setup_structured_logging(
self.hs,
self.hs.config,
log_config,
logBeginner=beginner,
redirect_stdlib_logging=False,
)
logger = Logger(
namespace="synapse.logging.test_terse_json", observer=beginner.observers[0]
)
# Send a bunch of useful messages
for i in range(0, 20):
logger.warn("test warn", num=i)
# Allow the reconnection
client, server = connect_client(self.reactor, 0)
self.pump()
# The first five and last five warnings made it through, the debugs and
# infos were elided
logs = list(map(json.loads, server.data.decode("utf8").splitlines()))
self.assertEqual(len(logs), 10)
self.assertEqual(Counter([x["level"] for x in logs]), {"WARN": 10})
self.assertEqual([0, 1, 2, 3, 4, 15, 16, 17, 18, 19], [x["num"] for x in logs])

View File

@ -571,12 +571,10 @@ def connect_client(reactor: IReactorTCP, client_id: int) -> AccumulatingProtocol
reactor reactor
factory: The connecting factory to build. factory: The connecting factory to build.
""" """
factory = reactor.tcpClients[client_id][2] factory = reactor.tcpClients.pop(client_id)[2]
client = factory.buildProtocol(None) client = factory.buildProtocol(None)
server = AccumulatingProtocol() server = AccumulatingProtocol()
server.makeConnection(FakeTransport(client, reactor)) server.makeConnection(FakeTransport(client, reactor))
client.makeConnection(FakeTransport(server, reactor)) client.makeConnection(FakeTransport(server, reactor))
reactor.tcpClients.pop(client_id)
return client, server return client, server