qubes-doc/developer/services/qmemman.rst
Marek Marczykowski-Górecki b93b3c571e
Convert to RST
2024-05-21 20:59:46 +02:00

176 lines
6.0 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

==============================
Qubes memory manager (qmemman)
==============================
Rationale
---------
Traditionally, Xen VMs are assigned a fixed amount of memory. It is not
the optimal solution, as some VMs may require more memory than assigned
initially, while others underutilize memory. Thus, there is a need for
solution capable of shifting free memory from VM to another VM.
The `tmem <https://oss.oracle.com/projects/tmem/>`__ project provides a
“pseudo-RAM” that is assigned on per-need basis. However this solution
has some disadvantages:
- It does not provide real RAM, just an interface to copy memory
to/from fast, RAM-based storage. It is perfect for swap, good for
file cache, but not ideal for many tasks.
- It is deeply integrated with the Linux kernel. When Qubes will
support Windows guests natively, we would have to port *tmem* to
Windows, which may be challenging.
Therefore, in Qubes another solution is used. There is the *qmemman*
dom0 daemon. All VMs report their memory usage (via xenstore) to
*qmemman*, and it makes decisions on whether to balance memory across
domains. The actual mechanism to add/remove memory from a domain
(*xc.domain_set_target_mem*) is already supported by both PV Linux
guests and Windows guests (the latter via PV drivers).
Similarly, when there is need for Xen free memory (for instance, in
order to create a new VM), traditionally the memory is obtained from
dom0 only. When *qmemman* is running, it offers an interface to obtain
memory from all domains.
To sum up, *qmemman* pros and cons. Pros:
- provides automatic balancing of memory across participating PV and
HVM domains, based on their memory demand
- works well in practice, with less than 1% CPU consumption in the idle
case
- simple, concise implementation
Cons:
- the algorithm to calculate the memory requirement for a domain is
necessarily simple, and may not closely reflect reality
- *qmemman* is notified by a VM about memory usage change not more
often than 10 times per second (to limit CPU overhead in VM). Thus,
there can be up to 0.1s delay until qmemman starts to react to the
new memory requirements
- it takes more time to obtain free Xen memory, as all participating
domains need to instructed to yield memory
Interface
---------
*qmemman* listens for the following events:
- writes to ``/local/domain/domid/memory/meminfo`` xenstore keys by
*meminfo-writer* process in VM. The content of this key is taken from
the VMs ``/proc/meminfo`` pseudofile ; *meminfo-writer* just strips
some unused lines from it. Note that *meminfo-writer* writes its
xenstore key only if the VM memory usage has changed significantly
enough since the last update (by default 30MB), to prevent flooding
with almost identical data
- commands issued over Unix socket ``/var/run/qubes/qmemman.sock``.
Currently, the only command recognized is to free the specified
amount of memory. The QMemmanClient class implements the protocol.
- if the ``/var/run/qubes/do-not-membalance`` file exists, *qmemman*
suspends memory balancing. It is primarily used when allocating
memory for a to-be-created domain, to prevent using up the free Xen
memory by the balancing algorithm before the domain creation is
completed.
Algorithms basics
-----------------
The core VM property is ``prefmem``. It denotes the amount of memory
that should be enough for a domain to run efficiently in the nearest
future. All *qmemman* algorithms will never shrink domain memory below
``prefmem``. Currently, ``prefmem`` is simply 130% of current memory
usage in a domain (without buffers and cache, but including swap).
Naturally, ``prefmem`` is calculated by *qmemman* based on the
information passed by *meminfo-writer*.
Whenever *meminfo-writer* running in domain A provides new data on
memory usage to *qmemman*, the ``prefmem`` value for A is updated and
the following balance algorithm (*qmemman_algo.balance*) is triggered.
Its output is the list of (domain_id, new_memory_target_to_be_set)
pairs:
1. TOTAL_PREFMEM = sum of ``prefmem`` of all participating domains
2. TOTAL_MEMORY = sum of all memory assigned to participating domains
plus Xen free memory
3. if TOTAL_MEMORY > TOTAL_PREFMEM, then redistribute TOTAL_MEMORY
across all domains proportionally to their ``prefmem``
4. if TOTAL_MEMORY < TOTAL_PREFMEM, then
1. for all domains whose ``prefmem`` is less than actual memory,
shrink them to their ``prefmem``
2. redistribute memory reclaimed in the previous step between the
rest of domains, proportionally to their ``prefmem``
In order to avoid too frequent memory redistribution, it is actually
executed only if one of the below conditions hold:
- the sum of memory size changes for all domains is more than
MIN_TOTAL_MEMORY_TRANSFER (150MB)
- one of the domains is below its ``prefmem``, and more than
MIN_MEM_CHANGE_WHEN_UNDER_PREF (15MB) would be added to it
Additionally, the balance algorithm is tuned so that XEN_FREE_MEM_LEFT
(50MB) is always left as Xen free memory, to make coherent memory
allocations in driver domains work.
Whenever *qmemman* is asked to return X megabytes of memory to Xen free
pool, the following algorithm (*qmemman_algo.balloon*) is executed:
1. find all domains (“donors”) whose actual memory is greater than its
``prefmem``
2. calculate how much memory can be reclaimed by shrinking donors to
their ``prefmem``. If it is less than X, return error.
3. shrink donors, proportionally to their ``prefmem``, so that X MB
should become free
4. wait BALOON_DELAY (0.1s)
5. if some domain have not given back any memory, remove it from the
donors list, and go to step 2, unless we already did MAX_TRIES (20)
iterations (then return error).
Notes
-----
Conventional means of viewing the memory available to Qubes will give
incorrect values for ``dom0`` since commands such as ``free`` will only
show the memory allocated for ``dom0``. Run the ``xl info`` command in
``dom0`` and read the ``total_memory`` field to see the total memory
available to Qubes.