Commit Graph

169 Commits

Author SHA1 Message Date
Erik Johnston
9ea408441f Handle exceptions thrown by background tasks
Fixes 
2018-09-20 16:15:21 +01:00
Erik Johnston
d0f6c1ce21 Remove spurious comment 2018-09-14 15:12:36 +01:00
Erik Johnston
0a81038ea0 Add in flight real time metrics for Measure blocks 2018-09-14 15:08:37 +01:00
Erik Johnston
3f6762f0bb isort 2018-08-21 09:38:38 +01:00
Erik Johnston
1058d14127 Make the in flight background process metrics thread safe 2018-08-20 17:27:24 +01:00
Richard van der Hoff
bab94da79c fix metric name 2018-08-07 22:11:45 +01:00
Richard van der Hoff
53bca4690b more metrics for the federation and appservice senders 2018-08-07 19:09:48 +01:00
Richard van der Hoff
03751a6420 Fix some looping_call calls which were broken in
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in .

Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
2018-07-26 11:48:08 +01:00
Richard van der Hoff
6e3fc657b4 Resource tracking for background processes
This introduces a mechanism for tracking resource usage by background
processes, along with an example of how it will be used.

This will help address , but more importantly will give us better insights
into things which are happening but not being shown up by the request metrics.

We *could* do this with Measure blocks, but:
 - I think having them pulled out as a completely separate metric class will
   make it easier to distinguish top-level processes from those which are
   nested.

 - I want to be able to report on in-flight background processes, and I don't
   think we want to do this for *all* Measure blocks.
2018-07-18 10:50:33 +01:00
Amber Brown
49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown
6350bf925e
Attempt to be more performant on PyPy () 2018-06-28 14:49:57 +01:00
Richard van der Hoff
cbbfaa4be8 Fix description of "python_gc_time" metric 2018-06-21 10:02:42 +01:00
Matthew Hodgson
ccfdaf68be spell gauge correctly 2018-06-16 07:10:34 +01:00
Amber Brown
f116f32ace
add a last seen metric () 2018-06-14 20:26:59 +10:00
Richard van der Hoff
694968fa81 Hopefully, fix LaterGuage error handling 2018-06-04 15:59:14 +01:00
Amber Brown
febe0ec8fd
Run Prometheus on a different port, optionally. () 2018-05-31 19:04:50 +10:00
Matthew Hodgson
ff1bc0a279 pep8 2018-05-29 02:32:15 +01:00
Matthew Hodgson
0a240ad36e disable CPUMetrics if no /proc/self/stat
fixes build on macOS again
2018-05-29 02:23:30 +01:00
Amber Brown
5c40ce3777 invalid syntax :( 2018-05-28 19:16:09 +10:00
Amber Brown
a2eb5db4a0 update metrics to be in seconds 2018-05-28 19:10:27 +10:00
Amber Brown
389dac2c15 pepeightttt 2018-05-23 13:08:59 -05:00
Amber Brown
472a5ec4e2 add back CPU metrics 2018-05-23 13:03:56 -05:00
Amber Brown
b6063631c3 more cleanup 2018-05-22 17:36:20 -05:00
Amber Brown
53cc2cde1f cleanup 2018-05-22 17:32:57 -05:00
Amber Brown
85ba83eb51 fixes 2018-05-22 16:28:23 -05:00
Amber Brown
a8990fa2ec Merge remote-tracking branch 'origin/develop' into 3218-official-prom 2018-05-22 10:50:26 -05:00
Amber Brown
df9f72d9e5 replacing portions 2018-05-21 19:47:37 -05:00
Amber Brown
c60e0d5e02 don't need the resource portion 2018-05-21 17:03:20 -05:00
Amber Brown
f258deffcb remove old metrics libs 2018-05-21 17:01:15 -05:00
Erik Johnston
6d8ec3462d Note that label values can be anything 2018-05-03 16:25:05 +01:00
Erik Johnston
95b6912045 Fix metrics that have integer value labels 2018-05-03 15:51:04 +01:00
Erik Johnston
a41117c63b Make _escape_character take MatchObject 2018-05-02 17:27:27 +01:00
Erik Johnston
32015e1109 Escape label values in prometheus metrics 2018-05-02 16:52:42 +01:00
Erik Johnston
d7bf3a68f0 s/list/tuple 2018-04-12 11:19:04 +01:00
Erik Johnston
4dae4a97ed Track last processed event received_ts 2018-04-11 14:27:09 +01:00
Erik Johnston
92e34615c5 Track where event stream processing have gotten up to 2018-04-11 12:13:40 +01:00
Erik Johnston
ab825aa328 Add GaugeMetric 2018-04-11 12:13:40 +01:00
Vincent Breitmoser
6d7f0f8dd3 Don't disable GC when running on PyPy
PyPy's incminimark GC can't be triggered manually. From what I observed
there are no obvious issues with just letting it run normally. And
unlike CPython, it actually returns unused RAM to the system.

Signed-off-by: Vincent Breitmoser <look@my.amazin.horse>
2018-04-10 11:35:34 +02:00
Richard van der Hoff
88541f9009 Add a metric which increments when a request is received
It's useful to know when there are peaks in incoming requests - which isn't
quite the same as there being peaks in outgoing responses, due to the time
taken to handle requests.
2018-03-09 16:30:26 +00:00
Richard van der Hoff
bc496df192 report metrics on number of cache evictions 2018-02-05 15:34:01 +00:00
Richard van der Hoff
87b7d72760 Add some comments about the reactor tick time metric 2018-01-19 23:51:04 +00:00
Richard van der Hoff
ce236f8ac8 better exception logging in callbackmetrics
when we fail to render a metric, give a clue as to which metric it was
2018-01-18 11:30:49 +00:00
Richard van der Hoff
992018d1c0 mechanism to render metrics with alternative names 2018-01-15 17:04:39 +00:00
Richard van der Hoff
80fa610f9c Add some comments to metrics classes 2018-01-15 16:52:52 +00:00
Richard van der Hoff
19d274085f Make Counter render floats
Prometheus handles all metrics as floats, and sometimes we store non-integer
values in them (notably, durations in seconds), so let's render them as floats
too.

(Note that the standard client libraries also treat Counters as floats.)
2018-01-12 23:49:44 +00:00
Paul "LeoNerd" Evans
2938a00825 Rename the python-specific metrics now the docs claim that we have done 2016-11-03 17:03:52 +00:00
Paul "LeoNerd" Evans
5219f7e060 Since we don't export per-filetype fd counts any more, delete all the code related to that too 2016-11-03 16:41:32 +00:00
Paul "LeoNerd" Evans
93ebeb2aa8 Remove now-unused 'resource' import 2016-11-03 16:37:09 +00:00
Paul "LeoNerd" Evans
c1b077cd19 Now we have new-style metrics don't bother exporting legacy-named process ones 2016-11-03 16:34:16 +00:00
Paul "LeoNerd" Evans
1cc22da600 Set up the process collector during metrics __init__; that way all split-process workers have it 2016-10-27 18:09:34 +01:00
Paul "LeoNerd" Evans
aac13b1f9a Pass the Metrics group into the process collector instead of having it find its own one; this avoids it needing to import from synapse.metrics 2016-10-27 18:08:15 +01:00
Paul "LeoNerd" Evans
ccc1a3d54d Allow creation of a 'subspace' within a Metrics object, returning another one 2016-10-27 18:07:34 +01:00
Paul "LeoNerd" Evans
b01aaadd48 Split callback metric lambda functions down onto their own lines to keep line lengths under 90 2016-10-19 18:26:13 +01:00
Paul "LeoNerd" Evans
1071c7d963 Adjust code for <100 char line limit 2016-10-19 18:23:25 +01:00
Paul "LeoNerd" Evans
6453d03edd Cut the raw /proc/self/stat line up into named fields at collection time 2016-10-19 18:21:40 +01:00
Paul "LeoNerd" Evans
3ae48a1f99 Move the process metrics collector code into its own file 2016-10-19 18:10:24 +01:00
Paul "LeoNerd" Evans
4cedd53224 A slightly neater way to manage metric collector functions 2016-10-19 17:54:09 +01:00
Paul "LeoNerd" Evans
5663137e03 appease pep8 2016-10-19 16:09:42 +01:00
Paul "LeoNerd" Evans
b202531be6 Also guard /proc/self/fds-related code with a suitable psuedoconstant 2016-10-19 15:37:41 +01:00
Paul "LeoNerd" Evans
1b179455fc Guard registration of process-wide metrics by existence of the requisite /proc entries 2016-10-19 15:34:38 +01:00
Paul "LeoNerd" Evans
981f852d54 Add standard process_start_time_seconds metric 2016-10-19 15:05:22 +01:00
Paul "LeoNerd" Evans
def63649df Add standard process_max_fds metric 2016-10-19 15:05:21 +01:00
Paul "LeoNerd" Evans
06f1ad1625 Add standard process_open_fds metric 2016-10-19 15:05:21 +01:00
Paul "LeoNerd" Evans
95fc70216d Add standard process_*_memory_bytes metrics 2016-10-19 15:05:21 +01:00
Paul "LeoNerd" Evans
9b0316c75a Use /proc/self/stat to generate the new process_cpu_*_seconds_total metrics 2016-10-19 15:05:21 +01:00
Paul "LeoNerd" Evans
03c2720940 Export CPU usage metrics also under prometheus-standard metric name 2016-10-19 15:05:21 +01:00
Paul "LeoNerd" Evans
b21b9dbc37 Callback metric values might not just be integers - allow floats 2016-10-19 15:05:15 +01:00
Erik Johnston
7c1a92274c Make psutil optional 2016-08-08 11:12:21 +01:00
Erik Johnston
d36b1d849d Don't explode if we have no snapshots yet 2016-07-20 16:59:52 +01:00
Erik Johnston
66868119dc Add metrics for psutil derived memory usage 2016-07-20 16:00:21 +01:00
Erik Johnston
0f2165ccf4 Don't track total objects as its too expensive to calculate 2016-06-07 17:00:45 +01:00
Erik Johnston
18f0cc7d99 Record some more GC metrics 2016-06-07 16:55:49 +01:00
Erik Johnston
48e65099b5 Also record number of unreachable objects 2016-06-07 13:40:22 +01:00
Erik Johnston
75331c5fca Change the way we do stats 2016-06-07 13:33:13 +01:00
Erik Johnston
8c966fbd51 Merge pull request from matrix-org/erikj/gc_tick
Manually run GC on reactor tick.
2016-06-07 13:18:36 +01:00
Erik Johnston
73c7112433 Change CacheMetrics to be quicker
We change it so that each cache has an individual CacheMetric, instead
of having one global CacheMetric. This means that when a cache tries to
increment a counter it does not need to go through so many indirections.
2016-06-03 11:26:52 +01:00
Erik Johnston
60d53f9e95 Count number of GC collects 2016-05-16 09:34:42 +01:00
Erik Johnston
7d6e89ed22 Add a comment 2016-05-13 16:31:08 +01:00
Erik Johnston
1f1dee94f6 Manually run GC on reactor tick.
This also adds a metric for amount of time spent in GC.
2016-05-09 10:13:25 +01:00
Matthew Hodgson
6c28ac260c copyrights 2016-01-07 04:26:29 +00:00
Mark Haines
709ba99afd Check that /proc/self/fd exists before listing it 2015-09-07 16:45:55 +01:00
Mark Haines
9e4dacd5e7 The maxrss reported by getrusage is in kilobytes, not pages 2015-09-07 16:45:48 +01:00
Erik Johnston
6e7d36a72c Also check for presence of 'threadCallQueue' in reactor 2015-08-18 11:51:08 +01:00
Erik Johnston
d3da63f766 Use more helpful variable names 2015-08-18 11:47:00 +01:00
Erik Johnston
891dfd90bd Fix pending_calls metric to not lie 2015-08-14 15:43:11 +01:00
Erik Johnston
a6c27de1aa Don't time getDelayedCalls 2015-08-13 11:41:57 +01:00
Erik Johnston
ba5d34a832 Add some metrics about the reactor 2015-08-13 11:38:59 +01:00
Paul "LeoNerd" Evans
ef1e019840 Appease pep8 2015-04-01 19:17:38 +01:00
Paul "LeoNerd" Evans
5583e29513 Report process open filehandles in metrics 2015-04-01 19:15:23 +01:00
Paul "LeoNerd" Evans
05a056a409 Appease pyflakes 2015-03-12 16:45:05 +00:00
Paul "LeoNerd" Evans
0eb7e6b9a8 Delete unused import of NOT_READY_YET 2015-03-12 16:39:52 +00:00
Paul "LeoNerd" Evans
128cf2daf7 Appease pep8 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
2e4f0b2bd7 Replace the @metrics.counted annotations in federation with specifically-written counters and distributions 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
c1cdd7954d Add an .inc_by() method to CounterMetric; implement DistributionMetric a neater way 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
493e3fa0ca Don't forbid '_' in metric basenames any more, to allow things like foo_time 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
f1fbe3e09f Rename TimerMetric to DistributionMetric; as it could count more than just time 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
cbc0406be8 Export CacheMetric as hits+total, rather than hits+misses, as it's easier to derive hit ratio from that 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
4d661ec0f3 Remember to emit final linefeed from /metrics page, or Prometheus gets upset 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
0e847540c3 Prometheus needs "escaped" label values 2015-03-12 16:24:51 +00:00
Paul "LeoNerd" Evans
22b37b75db Kill unused CounterMetric.fetch() method 2015-03-12 16:24:51 +00:00