2014-04-24 16:59:45 -04:00
---
2021-03-13 13:06:18 -05:00
lang: en
2015-04-10 16:17:45 -04:00
layout: doc
2021-06-16 22:56:25 -04:00
permalink: /doc/profiling/
2015-10-11 03:04:59 -04:00
redirect_from:
2015-10-28 18:14:40 -04:00
- /en/doc/profiling/
2015-10-11 03:04:59 -04:00
- /doc/Profiling/
- /wiki/Profiling/
2021-03-13 13:06:18 -05:00
ref: 48
2021-06-17 07:23:57 -04:00
title: Python Profiling
2014-04-24 16:59:45 -04:00
---
2020-06-02 19:32:55 -04:00
This is a python profiling primer.
2014-04-24 16:59:45 -04:00
For the purpose of this document, `qubes-dev` is name of the domain used for postprocessing profiling stats.
2021-03-13 12:03:23 -05:00
## Requirements
2014-04-24 16:59:45 -04:00
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
yum install gprof2dot graphviz
git clone http://git.woju.eu/qubes/profiling.git
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
2020-06-02 19:32:55 -04:00
If you profile something in dom0, move `Upload.sh` from the repository to dom0:
2014-04-24 16:59:45 -04:00
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
mkdir -p ~/profiling
qvm-run -p qubes-dev 'cat ~/profiling/Upload.sh' > ~/profiling/Upload.sh
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
2021-03-13 12:03:23 -05:00
- WARNING: this will obviously be running third-party code which is not signed by ITL nor Fedora. You have been warned.
2014-04-25 10:16:45 -04:00
2021-03-13 12:03:23 -05:00
## Workflow
2014-04-24 16:59:45 -04:00
### Identify function responsible for some slow action
2020-06-02 19:32:55 -04:00
You have to select the area in which you suspect less than optimal performance. If you do not narrow the area, graphs may be unreadable.
2014-04-24 16:59:45 -04:00
### Replace suspect function with probe
Replace
2021-03-13 12:03:23 -05:00
```python
2021-03-28 14:58:39 -04:00
def foo(self, bar):
# function content
2021-03-13 12:03:23 -05:00
```
2014-04-24 16:59:45 -04:00
with
2021-03-13 12:03:23 -05:00
```python
2021-03-28 14:58:39 -04:00
def foo(self, *args, * *kwargs):
profile.runctx('self.real_foo(*args, **kwargs)', globals(), locals(),
time.strftime('/home/user/profiling/foo-%Y%m%d-%H%M%S.pstats'))
2014-04-24 16:59:45 -04:00
2021-03-28 14:58:39 -04:00
def real_foo(self, bar):
# function content
2021-03-13 12:03:23 -05:00
```
2014-04-24 16:59:45 -04:00
### Run application
Beware that some functions may be called often. For example `qubesmanager/main.py:update_table` gets run once per second. This will produce one pstat file per second.
2020-06-02 19:32:55 -04:00
Remember to revert your changes to the application afterwards.
2014-04-24 16:59:45 -04:00
### Upload statistics
If you are in dom0:
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
cd ~/profiling
./Upload.sh
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
2016-01-13 15:41:39 -05:00
### Analyse
2014-04-24 16:59:45 -04:00
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
make
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
2020-06-02 19:32:55 -04:00
For every `${basename}.pstats` this will produce `${basename}.txt` and `${basename}.svg` . SVG files contain call graphs. Text files contain lists of all functions, sorted by cumulative execution time. You may also try `make all-png` .
2014-04-24 16:59:45 -04:00
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
make index.html
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
2020-06-02 19:32:55 -04:00
This creates `index.html` with all SVG graphics linked to TXT files, ready for upload.
2014-04-24 16:59:45 -04:00
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 16:59:45 -04:00
make REMOTE=example.com:public_html/qubes/profiling/ upload
2015-09-26 19:00:33 -04:00
~~~
2014-04-24 17:28:12 -04:00
2021-03-13 12:03:23 -05:00
## Example
2014-04-24 17:28:12 -04:00
This example is from `qubes-manager` (`qubesmanager/main.py`).
2015-03-02 00:09:37 -05:00
!["update\_table-20140424-170010.svg" ](//attachment/wiki/Profiling/update_table-20140424-170010.svg )
2014-04-24 17:28:12 -04:00
2020-06-02 19:32:55 -04:00
It is apparent that the problem is around `get_disk_usage` , which calls something via `subprocess.call` . It does this 15 times, probably once per VM.