2015-04-23 11:07:49 -04:00
|
|
|
How to monitor Synapse metrics using Prometheus
|
|
|
|
===============================================
|
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
1. Install Prometheus:
|
2015-04-23 11:07:49 -04:00
|
|
|
|
2017-02-19 18:06:08 -05:00
|
|
|
Follow instructions at http://prometheus.io/docs/introduction/install/
|
2015-04-23 11:07:49 -04:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
2. Enable Synapse metrics:
|
2015-04-23 11:07:49 -04:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
There are two methods of enabling metrics in Synapse.
|
2015-04-23 11:07:49 -04:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
The first serves the metrics as a part of the usual web server and can be
|
|
|
|
enabled by adding the "metrics" resource to the existing listener as such::
|
2017-02-19 18:06:08 -05:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
resources:
|
|
|
|
- names:
|
|
|
|
- client
|
|
|
|
- metrics
|
2017-02-19 18:06:08 -05:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
This provides a simple way of adding metrics to your Synapse installation,
|
|
|
|
and serves under ``/_synapse/metrics``. If you do not wish your metrics be
|
|
|
|
publicly exposed, you will need to either filter it out at your load
|
|
|
|
balancer, or use the second method.
|
2018-01-16 08:04:01 -05:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
The second method runs the metrics server on a different port, in a
|
|
|
|
different thread to Synapse. This can make it more resilient to heavy load
|
|
|
|
meaning metrics cannot be retrieved, and can be exposed to just internal
|
|
|
|
networks easier. The served metrics are available over HTTP only, and will
|
|
|
|
be available at ``/``.
|
2015-04-23 11:07:49 -04:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
Add a new listener to homeserver.yaml::
|
|
|
|
|
|
|
|
listeners:
|
|
|
|
- type: metrics
|
|
|
|
port: 9000
|
|
|
|
bind_addresses:
|
|
|
|
- '0.0.0.0'
|
|
|
|
|
|
|
|
For both options, you will need to ensure that ``enable_metrics`` is set to
|
|
|
|
``True``.
|
|
|
|
|
|
|
|
Restart Synapse.
|
|
|
|
|
|
|
|
3. Add a Prometheus target for Synapse.
|
2017-02-19 18:06:08 -05:00
|
|
|
|
2017-04-21 11:03:32 -04:00
|
|
|
It needs to set the ``metrics_path`` to a non-default value (under ``scrape_configs``)::
|
2016-10-28 08:58:27 -04:00
|
|
|
|
|
|
|
- job_name: "synapse"
|
|
|
|
metrics_path: "/_synapse/metrics"
|
|
|
|
static_configs:
|
2017-04-21 11:03:32 -04:00
|
|
|
- targets: ["my.server.here:9092"]
|
2016-10-31 11:06:52 -04:00
|
|
|
|
2018-01-16 08:04:01 -05:00
|
|
|
If your prometheus is older than 1.5.2, you will need to replace
|
2017-02-19 18:06:08 -05:00
|
|
|
``static_configs`` in the above with ``target_groups``.
|
2018-01-16 08:04:01 -05:00
|
|
|
|
2018-05-31 05:04:50 -04:00
|
|
|
Restart Prometheus.
|
|
|
|
|
|
|
|
|
|
|
|
Removal of deprecated metrics & time based counters becoming histograms in 0.31.0
|
|
|
|
---------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
The duplicated metrics deprecated in Synapse 0.27.0 have been removed.
|
|
|
|
|
|
|
|
All time duration-based metrics have been changed to be seconds. This affects:
|
|
|
|
|
2018-06-06 01:52:37 -04:00
|
|
|
+----------------------------------+
|
|
|
|
| msec -> sec metrics |
|
|
|
|
+==================================+
|
|
|
|
| python_gc_time |
|
|
|
|
+----------------------------------+
|
|
|
|
| python_twisted_reactor_tick_time |
|
|
|
|
+----------------------------------+
|
|
|
|
| synapse_storage_query_time |
|
|
|
|
+----------------------------------+
|
|
|
|
| synapse_storage_schedule_time |
|
|
|
|
+----------------------------------+
|
|
|
|
| synapse_storage_transaction_time |
|
|
|
|
+----------------------------------+
|
2018-05-31 05:04:50 -04:00
|
|
|
|
|
|
|
Several metrics have been changed to be histograms, which sort entries into
|
|
|
|
buckets and allow better analysis. The following metrics are now histograms:
|
|
|
|
|
2018-06-06 01:52:37 -04:00
|
|
|
+-------------------------------------------+
|
|
|
|
| Altered metrics |
|
|
|
|
+===========================================+
|
|
|
|
| python_gc_time |
|
|
|
|
+-------------------------------------------+
|
|
|
|
| python_twisted_reactor_pending_calls |
|
|
|
|
+-------------------------------------------+
|
|
|
|
| python_twisted_reactor_tick_time |
|
|
|
|
+-------------------------------------------+
|
|
|
|
| synapse_http_server_response_time_seconds |
|
|
|
|
+-------------------------------------------+
|
|
|
|
| synapse_storage_query_time |
|
|
|
|
+-------------------------------------------+
|
|
|
|
| synapse_storage_schedule_time |
|
|
|
|
+-------------------------------------------+
|
|
|
|
| synapse_storage_transaction_time |
|
|
|
|
+-------------------------------------------+
|
2017-02-19 18:06:08 -05:00
|
|
|
|
2018-01-16 08:04:01 -05:00
|
|
|
|
|
|
|
Block and response metrics renamed for 0.27.0
|
|
|
|
---------------------------------------------
|
|
|
|
|
|
|
|
Synapse 0.27.0 begins the process of rationalising the duplicate ``*:count``
|
|
|
|
metrics reported for the resource tracking for code blocks and HTTP requests.
|
|
|
|
|
|
|
|
At the same time, the corresponding ``*:total`` metrics are being renamed, as
|
|
|
|
the ``:total`` suffix no longer makes sense in the absence of a corresponding
|
|
|
|
``:count`` metric.
|
|
|
|
|
|
|
|
To enable a graceful migration path, this release just adds new names for the
|
|
|
|
metrics being renamed. A future release will remove the old ones.
|
|
|
|
|
|
|
|
The following table shows the new metrics, and the old metrics which they are
|
|
|
|
replacing.
|
|
|
|
|
|
|
|
==================================================== ===================================================
|
|
|
|
New name Old name
|
|
|
|
==================================================== ===================================================
|
|
|
|
synapse_util_metrics_block_count synapse_util_metrics_block_timer:count
|
|
|
|
synapse_util_metrics_block_count synapse_util_metrics_block_ru_utime:count
|
|
|
|
synapse_util_metrics_block_count synapse_util_metrics_block_ru_stime:count
|
|
|
|
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_count:count
|
|
|
|
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_duration:count
|
|
|
|
|
|
|
|
synapse_util_metrics_block_time_seconds synapse_util_metrics_block_timer:total
|
|
|
|
synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_utime:total
|
|
|
|
synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_ru_stime:total
|
|
|
|
synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_count:total
|
|
|
|
synapse_util_metrics_block_db_txn_duration_seconds synapse_util_metrics_block_db_txn_duration:total
|
|
|
|
|
|
|
|
synapse_http_server_response_count synapse_http_server_requests
|
|
|
|
synapse_http_server_response_count synapse_http_server_response_time:count
|
|
|
|
synapse_http_server_response_count synapse_http_server_response_ru_utime:count
|
|
|
|
synapse_http_server_response_count synapse_http_server_response_ru_stime:count
|
|
|
|
synapse_http_server_response_count synapse_http_server_response_db_txn_count:count
|
|
|
|
synapse_http_server_response_count synapse_http_server_response_db_txn_duration:count
|
|
|
|
|
|
|
|
synapse_http_server_response_time_seconds synapse_http_server_response_time:total
|
|
|
|
synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_utime:total
|
|
|
|
synapse_http_server_response_ru_stime_seconds synapse_http_server_response_ru_stime:total
|
|
|
|
synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_count:total
|
|
|
|
synapse_http_server_response_db_txn_duration_seconds synapse_http_server_response_db_txn_duration:total
|
|
|
|
==================================================== ===================================================
|
|
|
|
|
|
|
|
|
2016-10-31 11:06:52 -04:00
|
|
|
Standard Metric Names
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
As of synapse version 0.18.2, the format of the process-wide metrics has been
|
|
|
|
changed to fit prometheus standard naming conventions. Additionally the units
|
|
|
|
have been changed to seconds, from miliseconds.
|
|
|
|
|
|
|
|
================================== =============================
|
|
|
|
New name Old name
|
2018-01-16 08:04:01 -05:00
|
|
|
================================== =============================
|
2016-10-31 11:06:52 -04:00
|
|
|
process_cpu_user_seconds_total process_resource_utime / 1000
|
|
|
|
process_cpu_system_seconds_total process_resource_stime / 1000
|
|
|
|
process_open_fds (no 'type' label) process_fds
|
|
|
|
================================== =============================
|
|
|
|
|
|
|
|
The python-specific counts of garbage collector performance have been renamed.
|
|
|
|
|
|
|
|
=========================== ======================
|
|
|
|
New name Old name
|
2018-01-16 08:04:01 -05:00
|
|
|
=========================== ======================
|
|
|
|
python_gc_time reactor_gc_time
|
2016-10-31 11:06:52 -04:00
|
|
|
python_gc_unreachable_total reactor_gc_unreachable
|
|
|
|
python_gc_counts reactor_gc_counts
|
|
|
|
=========================== ======================
|
|
|
|
|
|
|
|
The twisted-specific reactor metrics have been renamed.
|
|
|
|
|
2016-11-03 13:04:13 -04:00
|
|
|
==================================== =====================
|
2016-10-31 11:06:52 -04:00
|
|
|
New name Old name
|
2018-01-16 08:04:01 -05:00
|
|
|
==================================== =====================
|
2016-11-03 13:04:13 -04:00
|
|
|
python_twisted_reactor_pending_calls reactor_pending_calls
|
2016-10-31 11:06:52 -04:00
|
|
|
python_twisted_reactor_tick_time reactor_tick_time
|
2016-11-03 13:04:13 -04:00
|
|
|
==================================== =====================
|