mirror of
https://github.com/QubesOS/qubes-doc.git
synced 2024-12-18 04:04:39 -05:00
176 lines
6.0 KiB
ReStructuredText
176 lines
6.0 KiB
ReStructuredText
|
==============================
|
|||
|
Qubes memory manager (qmemman)
|
|||
|
==============================
|
|||
|
|
|||
|
|
|||
|
Rationale
|
|||
|
---------
|
|||
|
|
|||
|
|
|||
|
Traditionally, Xen VMs are assigned a fixed amount of memory. It is not
|
|||
|
the optimal solution, as some VMs may require more memory than assigned
|
|||
|
initially, while others underutilize memory. Thus, there is a need for
|
|||
|
solution capable of shifting free memory from VM to another VM.
|
|||
|
|
|||
|
The `tmem <https://oss.oracle.com/projects/tmem/>`__ project provides a
|
|||
|
“pseudo-RAM” that is assigned on per-need basis. However this solution
|
|||
|
has some disadvantages:
|
|||
|
|
|||
|
- It does not provide real RAM, just an interface to copy memory
|
|||
|
to/from fast, RAM-based storage. It is perfect for swap, good for
|
|||
|
file cache, but not ideal for many tasks.
|
|||
|
|
|||
|
- It is deeply integrated with the Linux kernel. When Qubes will
|
|||
|
support Windows guests natively, we would have to port *tmem* to
|
|||
|
Windows, which may be challenging.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Therefore, in Qubes another solution is used. There is the *qmemman*
|
|||
|
dom0 daemon. All VMs report their memory usage (via xenstore) to
|
|||
|
*qmemman*, and it makes decisions on whether to balance memory across
|
|||
|
domains. The actual mechanism to add/remove memory from a domain
|
|||
|
(*xc.domain_set_target_mem*) is already supported by both PV Linux
|
|||
|
guests and Windows guests (the latter via PV drivers).
|
|||
|
|
|||
|
Similarly, when there is need for Xen free memory (for instance, in
|
|||
|
order to create a new VM), traditionally the memory is obtained from
|
|||
|
dom0 only. When *qmemman* is running, it offers an interface to obtain
|
|||
|
memory from all domains.
|
|||
|
|
|||
|
To sum up, *qmemman* pros and cons. Pros:
|
|||
|
|
|||
|
- provides automatic balancing of memory across participating PV and
|
|||
|
HVM domains, based on their memory demand
|
|||
|
|
|||
|
- works well in practice, with less than 1% CPU consumption in the idle
|
|||
|
case
|
|||
|
|
|||
|
- simple, concise implementation
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Cons:
|
|||
|
|
|||
|
- the algorithm to calculate the memory requirement for a domain is
|
|||
|
necessarily simple, and may not closely reflect reality
|
|||
|
|
|||
|
- *qmemman* is notified by a VM about memory usage change not more
|
|||
|
often than 10 times per second (to limit CPU overhead in VM). Thus,
|
|||
|
there can be up to 0.1s delay until qmemman starts to react to the
|
|||
|
new memory requirements
|
|||
|
|
|||
|
- it takes more time to obtain free Xen memory, as all participating
|
|||
|
domains need to instructed to yield memory
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Interface
|
|||
|
---------
|
|||
|
|
|||
|
|
|||
|
*qmemman* listens for the following events:
|
|||
|
|
|||
|
- writes to ``/local/domain/domid/memory/meminfo`` xenstore keys by
|
|||
|
*meminfo-writer* process in VM. The content of this key is taken from
|
|||
|
the VM’s ``/proc/meminfo`` pseudofile ; *meminfo-writer* just strips
|
|||
|
some unused lines from it. Note that *meminfo-writer* writes its
|
|||
|
xenstore key only if the VM memory usage has changed significantly
|
|||
|
enough since the last update (by default 30MB), to prevent flooding
|
|||
|
with almost identical data
|
|||
|
|
|||
|
- commands issued over Unix socket ``/var/run/qubes/qmemman.sock``.
|
|||
|
Currently, the only command recognized is to free the specified
|
|||
|
amount of memory. The QMemmanClient class implements the protocol.
|
|||
|
|
|||
|
- if the ``/var/run/qubes/do-not-membalance`` file exists, *qmemman*
|
|||
|
suspends memory balancing. It is primarily used when allocating
|
|||
|
memory for a to-be-created domain, to prevent using up the free Xen
|
|||
|
memory by the balancing algorithm before the domain creation is
|
|||
|
completed.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Algorithms basics
|
|||
|
-----------------
|
|||
|
|
|||
|
|
|||
|
The core VM property is ``prefmem``. It denotes the amount of memory
|
|||
|
that should be enough for a domain to run efficiently in the nearest
|
|||
|
future. All *qmemman* algorithms will never shrink domain memory below
|
|||
|
``prefmem``. Currently, ``prefmem`` is simply 130% of current memory
|
|||
|
usage in a domain (without buffers and cache, but including swap).
|
|||
|
Naturally, ``prefmem`` is calculated by *qmemman* based on the
|
|||
|
information passed by *meminfo-writer*.
|
|||
|
|
|||
|
Whenever *meminfo-writer* running in domain A provides new data on
|
|||
|
memory usage to *qmemman*, the ``prefmem`` value for A is updated and
|
|||
|
the following balance algorithm (*qmemman_algo.balance*) is triggered.
|
|||
|
Its output is the list of (domain_id, new_memory_target_to_be_set)
|
|||
|
pairs:
|
|||
|
|
|||
|
1. TOTAL_PREFMEM = sum of ``prefmem`` of all participating domains
|
|||
|
|
|||
|
2. TOTAL_MEMORY = sum of all memory assigned to participating domains
|
|||
|
plus Xen free memory
|
|||
|
|
|||
|
3. if TOTAL_MEMORY > TOTAL_PREFMEM, then redistribute TOTAL_MEMORY
|
|||
|
across all domains proportionally to their ``prefmem``
|
|||
|
|
|||
|
4. if TOTAL_MEMORY < TOTAL_PREFMEM, then
|
|||
|
|
|||
|
1. for all domains whose ``prefmem`` is less than actual memory,
|
|||
|
shrink them to their ``prefmem``
|
|||
|
|
|||
|
2. redistribute memory reclaimed in the previous step between the
|
|||
|
rest of domains, proportionally to their ``prefmem``
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
In order to avoid too frequent memory redistribution, it is actually
|
|||
|
executed only if one of the below conditions hold:
|
|||
|
|
|||
|
- the sum of memory size changes for all domains is more than
|
|||
|
MIN_TOTAL_MEMORY_TRANSFER (150MB)
|
|||
|
|
|||
|
- one of the domains is below its ``prefmem``, and more than
|
|||
|
MIN_MEM_CHANGE_WHEN_UNDER_PREF (15MB) would be added to it
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Additionally, the balance algorithm is tuned so that XEN_FREE_MEM_LEFT
|
|||
|
(50MB) is always left as Xen free memory, to make coherent memory
|
|||
|
allocations in driver domains work.
|
|||
|
|
|||
|
Whenever *qmemman* is asked to return X megabytes of memory to Xen free
|
|||
|
pool, the following algorithm (*qmemman_algo.balloon*) is executed:
|
|||
|
|
|||
|
1. find all domains (“donors”) whose actual memory is greater than its
|
|||
|
``prefmem``
|
|||
|
|
|||
|
2. calculate how much memory can be reclaimed by shrinking donors to
|
|||
|
their ``prefmem``. If it is less than X, return error.
|
|||
|
|
|||
|
3. shrink donors, proportionally to their ``prefmem``, so that X MB
|
|||
|
should become free
|
|||
|
|
|||
|
4. wait BALOON_DELAY (0.1s)
|
|||
|
|
|||
|
5. if some domain have not given back any memory, remove it from the
|
|||
|
donors list, and go to step 2, unless we already did MAX_TRIES (20)
|
|||
|
iterations (then return error).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Notes
|
|||
|
-----
|
|||
|
|
|||
|
|
|||
|
Conventional means of viewing the memory available to Qubes will give
|
|||
|
incorrect values for ``dom0`` since commands such as ``free`` will only
|
|||
|
show the memory allocated for ``dom0``. Run the ``xl info`` command in
|
|||
|
``dom0`` and read the ``total_memory`` field to see the total memory
|
|||
|
available to Qubes.
|