Move all developer documentation to main index

QubesOS/qubes-issues#2070
This commit is contained in:
Andrew David Wong 2016-06-20 13:56:20 -07:00
parent 254f5e71a0
commit aadb35d62f
No known key found for this signature in database
GPG key ID: 8CE137352A019A17
39 changed files with 46 additions and 3 deletions

36
system/architecture.md Normal file
View file

@ -0,0 +1,36 @@
---
layout: doc
title: Architecture
permalink: /doc/architecture/
redirect_from:
- /doc/qubes-architecture/
- /en/doc/qubes-architecture/
- /doc/QubesArchitecture/
- /wiki/QubesArchitecture/
---
Qubes Architecture Overview
===========================
Qubes implements a Security by Isolation approach. To do this, Qubes utilizes virtualization technology in order to isolate various programs from each other and even to sandbox many system-level components, such as networking and storage subsystems, so that the compromise of any of these programs or components does not affect the integrity of the rest of the system.
Qubes lets the user define many security domains, which are implemented as lightweight Virtual Machines (VMs), or “AppVMs.” For example, the user can have “personal,” “work,” “shopping,” “bank,” and “random” AppVMs and can use the applications within those VMs just as if they were executing on the local machine. At the same time, however, these applications are well isolated from each other. Qubes also supports secure copy-and-paste and file sharing between the AppVMs, of course.
[![qubes-arch-diagram-1.png](/attachment/wiki/QubesArchitecture/qubes-arch-diagram-1.png)](/attachment/wiki/QubesArchitecture/qubes-arch-diagram-1.png)
(Note: In the diagram above, "Storage domain" is actually a USB domain.)
Key Architecture features
-------------------------
- Based on a secure bare-metal hypervisor (Xen)
- Networking code sand-boxed in an unprivileged VM (using IOMMU/VT-d)
- USB stacks and drivers sand-boxed in an unprivileged VM (currently experimental feature)
- No networking code in the privileged domain (dom0)
- All user applications run in “AppVMs,” lightweight VMs based on Linux
- Centralized updates of all AppVMs based on the same template
- Qubes GUI virtualization presents applications as if they were running locally
- Qubes GUI provides isolation between apps sharing the same desktop
- Secure system boot based (optional)
[Architecture Spec v0.3 [PDF]](/attachment/wiki/QubesArchitecture/arch-spec-0.3.pdf) (The original 2009 document that started this all...)

400
system/gui.md Normal file
View file

@ -0,0 +1,400 @@
---
layout: doc
title: GUI
permalink: /doc/gui/
redirect_from:
- /en/doc/gui/
- /en/doc/gui-docs/
- /doc/GUIdocs/
- /wiki/GUIdocs/
---
Qubes GUI protocol
==================
qubes_gui and qubes_guid processes
------------------------------------
All AppVM X applications connect to local (running in AppVM) Xorg server, that uses the following "hardware" drivers:
- *dummy_drv* - video driver, that paints onto a framebuffer located in RAM, not connected to real hardware
- *qubes_drv* - it provides a virtual keyboard and mouse (in fact, more, see below)
For each AppVM, there is a pair of *qubes_gui* (running in AppVM) and *qubes_guid* (running in dom0) processes, connected over vchan. Main responsibilities of *qubes_gui* are:
- call XCompositeRedirectSubwindows on the root window, so that each window has its own composition buffer
- instruct the local Xorg server to notify it about window creation, configuration and damage events; pass information on these events to dom0
- receive information about keyboard and mouse events from dom0, tell *qubes_drv* to fake appropriate events
- receive information about window size/position change, apply them to the local window
Main responsibilities of *qubes_guid* are:
- create a window in dom0 whenever an information on window creation in AppVM is received from *qubes_gui*
- whenever the local window receives XEvent, pass information on it to AppVM (particularly, mouse and keyboard data)
- whenever AppVM signals damage event, tell local Xorg server to repaint a given window fragment
- receive information about window size/position change, apply them to the local window
Note that keyboard and mouse events are passed to AppVM only if a window belonging to this AppVM has focus. AppVM has no way to get information on keystrokes fed to other AppVMs (e.g. XTEST extension will report the status of local AppVM keyboard only), nor synthesize and pass events to other AppVMs.
Window content updates implementation
-------------------------------------
Typical remote desktop applications, like *vnc*, pass the information on all changed window content in-band (say, over tcp). As the channel has limited throughput, this impacts video performance. In case of Qubes, *qubes_gui* does not transfer all changed pixels via vchan. Instead, for each window, upon its creation or size change, *qubes_gui*
- asks *qubes_drv* driver for the list of physical memory frames that hold the composition buffer of a window
- pass this information via `MFNDUMP` message to *qubes_guid* in dom0
Now, *qubes_guid* has to tell dom0 Xorg server about the location of the buffer. There is no supported way (e.g. Xorg extension) to do this zero-copy style. The following method is used in Qubes:
- in dom0, the Xorg server is started with *LD_PRELOAD*-ed library named *shmoverride.so*. This library hooks all function calls related to shared memory.
- *qubes_guid* creates a shared memory segment, and then tells Xorg to attach it via *MIT-SHM* extension
- when Xorg tries to attach the segment (via glibc *shmat*) *shmoverride.so* intercepts this call and instead maps AppVM memory via *xc_map_foreign_range*
- since then, we can use MIT-SHM functions, e.g. *XShmPutImage* to draw onto a dom0 window. *XShmPutImage* will paint with DRAM speed; actually, many drivers use DMA for this.
The important detail is that *xc_map_foreign_range* verifies that a given mfn range actually belongs to a given domain id (and the latter is provided by trusted *qubes_guid*). Therefore, rogue AppVM cannot gain anything by passing crafted mnfs in the `MFNDUMP` message.
To sum up, this solution has the following benefits:
- window updates at DRAM speed
- no changes to Xorg code
- minimal size of the supporting code
![gui.png](/attachment/wiki/GUIdocs/gui.png)
Security markers on dom0 windows
--------------------------------
It is important that user knows which AppVM a given window belongs to. This prevents an attack when a rogue AppVM paints a window pretending to belong to other AppVM or dom0, and tries to steal e.g. passwords.
In Qubes, the custom window decorator is used, that paints a colourful frame (the colour is determined during AppVM creation) around decorated windows. Additionally, window title always starts with **[name of the AppVM]**. If a window has a *override_redirect* attribute, meaning that it should not be treated by a window manager (typical case is menu windows), *qubes_guid* draws a two-pixel colourful frame around it manually.
Clipboard sharing implementation
--------------------------------
Certainly, it would be insecure to allow AppVM to read/write clipboard of other AppVMs unconditionally. Therefore, the following mechanism is used:
- there is a "qubes clipboard" in dom0 - its contents is stored in a regular file in dom0.
- if user wants to copy local AppVM clipboard to qubes clipboard, she must focus on any window belonging to this AppVM, and press **Ctrl-Shift-C**. This combination is trapped by *qubes-guid*, and `CLIPBOARD_REQ` message is sent to AppVM. *qubes-gui* responds with *CLIPBOARD_DATA* message followed by clipboard contents.
- user focuses on other AppVM window, presses **Ctrl-Shift-V**. This combination is trapped by *qubes-guid*, and `CLIPBOARD_DATA` message followed by qubes clipboard contents is sent to AppVM; *qubes_gui* copies data to the local clipboard, and then user can paste its contents to local applications normally.
This way, user can quickly copy clipboards between AppVMs. This action is fully controlled by the user, it cannot be triggered/forced by any AppVM.
*qubes_gui* and *qubes_guid* code notes
-----------------------------------------
Both applications are structures similarly. They use *select* function to wait for any of the two event sources
- messages from the local X server
- messages from the vchan connecting to the remote party
The XEvents are handled by *handle_xevent_eventname* function, messages are handled by *handle_messagename* function. One should be very careful when altering the actual *select* loop - e.g. both XEvents and vchan messages are buffered, meaning that *select* will not wake for each message.
If one changes the number/order/signature of messages, one should increase the *QUBES_GUID_PROTOCOL_VERSION* constant in *messages.h* include file.
*qubes_guid* writes debugging information to */var/log/qubes/qubes.domain_id.log* file; *qubes_gui* writes debugging information to */var/log/qubes/gui_agent.log*. Include these files when reporting a bug.
AppVM -> dom0 messages
-----------------------
Proper handling of the below messages is security-critical. Observe that beside two messages (`CLIPBOARD` and `MFNDUMP`) the rest have fixed size, so the parsing code can be small.
The *override_redirect* window attribute is explained at [Override Redirect Flag](http://tronche.com/gui/x/xlib/window/attributes/override-redirect.html). The *transient_for* attribute is explained at [Transient_for attribute](http://tronche.com/gui/x/icccm/sec-4.html#WM_TRANSIENT_FOR).
Window manager hints and flags are described at [http://standards.freedesktop.org/wm-spec/latest/](http://standards.freedesktop.org/wm-spec/latest/), especially part about `_NET_WM_STATE`.
Each message starts with the following header
~~~
struct msghdr {
uint32_t type;
uint32_t window;
/* This field is intended for use by gui_agents to skip unknown
* messages from the (trusted) guid. Guid, on the other hand,
* should never rely on this field to calculate the actual len of
* message to be read, as the (untrusted) agent can put here
* whatever it wants! */
uint32_t untrusted_len;
};
~~~
The header is followed by message-specific data.
<table class="table">
<tr>
<th>Message name</th>
<th>Structure after header</th>
<th>Action</th>
</tr>
<tr>
<td>MSG_CLIPBOARD_DATA</td>
<td>amorphic blob (length determined by the "window" field)</td>
<td>Store the received clipboard content (not parsing in any way)</td>
</tr>
<tr>
<td>MSG_CREATE</td>
<td><pre>
struct msg_create {
uint32_t x;
uint32_t y;
uint32_t width;
uint32_t height;
uint32_t parent;
uint32_t override_redirect;
};
</pre>
</td>
<td>Create a window with given parameters</td>
</tr>
<tr>
<td>MSG_DESTROY</td>
<td>None</td>
<td>Destroy a window</td>
</tr>
<tr>
<td>MSG_MAP</td>
<td><pre>
struct msg_map_info {
uint32_t transient_for;
uint32_t override_redirect;
};
</pre></td>
<td>Map a window with given parameters</td>
</tr>
<tr>
<td>MSG_UNMAP</td>
<td>None</td>
<td>Unmap a window</td>
</tr>
<tr>
<td>MSG_CONFIGURE</td>
<td><pre>
struct msg_configure {
uint32_t x;
uint32_t y;
uint32_t width;
uint32_t height;
uint32_t override_redirect;
};
</pre></td>
<td>Change window position/size/type</td>
</tr>
<tr>
<td>MSG_MFNDUMP</td>
<td><pre>
struct shm_cmd {
uint32_t shmid;
uint32_t width;
uint32_t height;
uint32_t bpp;
uint32_t off;
uint32_t num_mfn;
uint32_t domid;
uint32_t mfns[0];
};
</pre></td>
<td>Retrieve the array of mfns that constitute the composition buffer of a remote window.
The "num_mfn" 32bit integers follow the shm_cmd structure; "off" is the offset of the composite buffer start in the first frame; "shmid" and "domid" parameters are just placeholders (to be filled by *qubes_guid*), so that we can use the same structure when talking to *shmoverride.so*|
</td>
</tr>
<tr>
<td>MSG_SHMIMAGE</td>
<td><pre>
struct msg_shmimage {
uint32_t x;
uint32_t y;
uint32_t width;
uint32_t height;
};
</pre> </td>
<td>Repaint the given window fragment</td>
</tr>
<tr>
<td>MSG_WMNAME</td>
<td><pre>
struct msg_wmname {
char data[128];
};
</pre></td>
<td>Set the window name; only printable characters are allowed</td>
</tr>
<tr>
<td>MSG_DOCK</td>
<td>None</td>
<td>Dock the window in the tray</td>
</tr>
<tr>
<td>MSG_WINDOW_HINTS</td>
<td><pre>
struct msg_window_hints {
uint32_t flags;
uint32_t min_width;
uint32_t min_height;
uint32_t max_width;
uint32_t max_height;
uint32_t width_inc;
uint32_t height_inc;
uint32_t base_width;
uint32_t base_height;
};
</pre> </td>
<td>Size hints for window manager</td>
</tr>
<tr>
<td>MSG_WINDOW_FLAGS</td>
<td><pre>
struct msg_window_flags {
uint32_t flags_set;
uint32_t flags_unset;
};
</pre> </td>
<td>Change window state request; fields contains bitmask which flags request to be set and which unset</td>
</tr>
</table>
Dom0 -> AppVM messages
-----------------------
Proper handling of the below messages is NOT security-critical.
Each message starts with the following header
~~~
struct msghdr {
uint32_t type;
uint32_t window;
};
~~~
The header is followed by message-specific data.
`KEYPRESS`, `BUTTON`, `MOTION`, `FOCUS` messages pass information extracted from dom0 XEvent; see appropriate event documentation.
<table class="table">
<tr>
<th>Message name</th>
<th>Structure after header</th>
<th>Action</th>
</tr>
<tr>
<td>MSG_KEYPRESS</td>
<td><pre>
struct msg_keypress {
uint32_t type;
uint32_t x;
uint32_t y;
uint32_t state;
uint32_t keycode;
};
</pre> </td>
<td>Tell *qubes_drv* driver to generate a keypress</td>
</tr>
<tr>
<td>MSG_BUTTON</td>
<td><pre>
struct msg_button {
uint32_t type;
uint32_t x;
uint32_t y;
uint32_t state;
uint32_t button;
};
</pre> </td>
<td>Tell *qubes_drv* driver to generate mouseclick</td>
</tr>
<tr>
<td>MSG_MOTION</td>
<td><pre>
struct msg_motion {
uint32_t x;
uint32_t y;
uint32_t state;
uint32_t is_hint;
};
</pre> </td>
<td>Tell *qubes_drv* driver to generate motion event</td>
</tr>
<tr>
<td>MSG_CONFIGURE</td>
<td><pre>
struct msg_configure {
uint32_t x;
uint32_t y;
uint32_t width;
uint32_t height;
uint32_t override_redirect;
};
</pre> </td>
<td>Change window position/size/type</td>
</tr>
<tr>
<td>MSG_MAP</td>
<td><pre>
struct msg_map_info {
uint32_t transient_for;
uint32_t override_redirect;
};
</pre> </td>
<td>Map a window with given parameters</td>
</tr>
<tr>
<td>MSG_CLOSE</td>
<td>None</td>
<td>send wmDeleteMessage to the window</td>
</tr>
<tr>
<td>MSG_CROSSING</td>
<td><pre>
struct msg_crossing {
uint32_t type;
uint32_t x;
uint32_t y;
uint32_t state;
uint32_t mode;
uint32_t detail;
uint32_t focus;
};
</pre> </td>
<td>Notify window about enter/leave event</td>
</tr>
<tr>
<td>MSG_FOCUS</td>
<td><pre>
struct msg_focus {
uint32_t type;
uint32_t mode;
uint32_t detail;
};
</pre> </td>
<td>Raise a window, XSetInputFocus</td>
</tr>
<tr>
<td>MSG_CLIPBOARD_REQ</td>
<td>None</td>
<td>Retrieve the local clipboard, pass contents to gui-daemon</td>
</tr>
<tr>
<td>MSG_CLIPBOARD_DATA</td>
<td>amorphic blob</td>
<td>Insert the received data into local clipboard</td>
</tr>
<tr>
<td>MSG_EXECUTE</td>
<td>Obsolete</td>
<td>Obsolete, unused</td>
</tr>
<tr>
<td>MSG_KEYMAP_NOTIFY</td>
<td> unsigned char remote_keys[32]; </td>
<td>Synchronize the keyboard state (key pressed/released) with dom0</td>
</tr>
<tr>
<td>MSG_WINDOW_FLAGS</td>
<td><pre>
struct msg_window_flags {
uint32_t flags_set;
uint32_t flags_unset;
};
</pre> </td>
<td>Window state change confirmation</td>
</tr>
</table>

59
system/networking.md Normal file
View file

@ -0,0 +1,59 @@
---
layout: doc
title: Networking
permalink: /doc/networking/
redirect_from:
- /doc/qubes-net/
- /en/doc/qubes-net/
- /doc/QubesNet/
- /wiki/QubesNet/
---
VM network in Qubes
===================
Overall description
-------------------
In Qubes, the standard Xen networking is used, based on backend driver in the driver domain and frontend drivers in VMs. In order to eliminate layer 2 attacks originating from a compromised VM, routed networking is used instead of the default bridging of `vif` devices. The default *vif-route* script had some deficiencies (requires `eth0` device to be up, and sets some redundant iptables rules), therefore the custom *vif-route-qubes* script is used.
The IP address of `eth0` interface in AppVM, as well as two IP addresses to be used as nameservers (`DNS1` and `DNS2`), are passed via xenstore to AppVM during its boot (thus, there is no need for DHCP daemon in the network driver domain). `DNS1` and `DNS2` are private addresses; whenever an interface is brought up in the network driver domain, the */usr/lib/qubes/qubes\_setup\_dnat\_to\_ns* script sets up the DNAT iptables rules translating `DNS1` and `DNS2` to the newly learned real dns servers. This way AppVM networking configuration does not need to be changed when configuration in the network driver domain changes (e.g. user switches to a different WLAN). Moreover, in the network driver domain, there is no DNS server either, and consequently there are no ports open to the VMs.
Routing tables examples
-----------------------
VM routing table is simple:
||
|Destination|Gateway|Genmask|Flags|Metric|Ref|Use|Iface|
|0.0.0.0|0.0.0.0|0.0.0.0|U|0|0|0|eth0|
Network driver domain routing table is a bit longer:
||
|Destination|Gateway|Genmask|Flags|Metric|Ref|Use|Iface|
|10.2.0.16|0.0.0.0|255.255.255.255|UH|0|0|0|vif4.0|
|10.2.0.7|0.0.0.0|255.255.255.255|UH|0|0|0|vif10.0|
|10.2.0.9|0.0.0.0|255.255.255.255|UH|0|0|0|vif9.0|
|10.2.0.8|0.0.0.0|255.255.255.255|UH|0|0|0|vif8.0|
|10.2.0.12|0.0.0.0|255.255.255.255|UH|0|0|0|vif3.0|
|192.168.0.0|0.0.0.0|255.255.255.0|U|1|0|0|eth0|
|0.0.0.0|192.168.0.1|0.0.0.0|UG|0|0|0|eth0|
Location of the network driver domain
-------------------------------------
Traditionally, the network driver domain is dom0. This design means that a lot of code (networking stack, drivers) running in the all-powerful domain is exposed to potential attack. Although it is supported (one can execute *qvm-set-default-netvm dom0*), it is strongly discouraged.
Instead, a dedicated domain called `netvm` should be used. In order to activate it, one needs to install the `qubes-servicevm-netvm` rpm package, and enable it via command *qvm-set-default-netvm netvm*. This domain will be assigned all PCI devices that are network cards. One can interact with the *Networkmanager* daemon running in `netvm` in the same way as with any other VM GUI application (with one detail that *nm-applet* requires a system tray, thus one needs to start it via "KDEMenu-\>Applications-\>Netvm-\>Show Tray").
Note that in order to isolate `netvm` properly, the platform must support VTd and it must be activated. Otherwise, compromised `netvm` can use DMA to get control over dom0 and even the hypervisor.
When using `netvm`, there is no network connectivity in dom0. This is the desired configuration - it eliminates all network-bourne attacks. Observe that dom0 is meant to be used for administrative tasks only, and (with one exception) they do not need network. Anything not related to system administration should be done in one of AppVMs.
The above-mentioned exception is the system packages upgrade. Again, one must not install random applications in dom0, but there is a need to e.g. upgrade existing packages. While one may argue that the new packages could be downloaded on a separate machine and copied to dom0 via a pendrive, this solution has its own problems. Therefore, the advised method to temporarily grant network connectivity to dom0 is to use *qvm-dom0-network-via-netvm up* command. It will pause all running VMs (so that they can do no harm to dom0) and connect dom0 to netvm network just like another AppVM. Having completed package upgrade, execute *qvm-dom0-network-via-netvm down* to revert to the normal state.
Firewall and Proxy VMs
----------------------
TODO

View file

@ -0,0 +1,58 @@
---
layout: doc
title: Security-critical Code
permalink: /doc/security-critical-code/
redirect_from:
- /en/doc/security-critical-code/
- /doc/SecurityCriticalCode/
- /wiki/SecurityCriticalCode/
- /trac/wiki/SecurityCriticalCode/
---
Security-Critical Code in Qubes OS
==================================
Below is a list of security-critical (AKA trusted) code in Qubes OS. A successful attack against any of those might allow to compromise the Qubes OS security. This code can be thought of as of a Trusted Computing Base (TCB) of Qubes OS. The goal of the project has been to minimize the amount of this trusted code to an absolute minimum. The size of the current TCB is of an order of hundreds thousands of lines of C code, which is several orders of magnitude less than in other OSes, such as Windows, Linux or Mac, where it is of orders of tens of millions of lines of C code.
For more information about the security goals of Qubes OS, see [this page](/doc/security-goals/).
Security-Critical Qubes-Specific Components
-------------------------------------------
Below is a code produced by the Qubes project that is security-critical.
- Dom0-side of the libvchan library
- Dom0-side of the GUI virtualization code (*qubes-guid*)
- Dom0-side of the sound virtualization code (*pacat-simple-vchan*)
- Dom0-side in qrexec-related code (*qrexec\_daemon*)
- VM memory manager (*qmemman*) that runs in Dom0
- select Qubes RPC servers that run in Dom0: qubes.ReceiveUpdates and qubes.SyncAppMenus
- The qubes.Filecopy RPC server that runs in a VM -- this one is critical because it might allow one VM to compromise another one if user allows file copy operation to be performed between them
Security-Critical 3rd-Party Components
--------------------------------------
These are the components that we haven't written or designed ourselves, yet we still rely on them. At the current project stage, we cannot afford to spend time to thoroughly review and audit them, so we just more or less "blindly" trust they are secure.
- Xen hypervisor
- The Xen's xenstore backend running in Dom0
- The Xen's block backend running in Dom0 kernel
- The rpm program used in Dom0 for verifying Dom0 updates' signatures
- Somehow optional: log viewing software in dom0 that parses VM-influenced logs
Additionally when we consider attacks that originate through a compromised network domain and target the VMs connected to it. Those attacks do not apply to domains connected to other network domains (Qubes allows more than one network domains), or those with networking disabled (Dom0 is not connected to any network by default).
- Xen network PV frontends
- VM's core networking stacks (core TCP/IP code)
Buggy Code vs. Back-doored Code Distinction
-------------------------------------------
There is an important distinction between the buggy code and maliciously trojaned code. We could have the most secure architecture and the most bulletproof TCB that perfectly isolates all domains from each other, but it still would be pretty useless if all the code used within domains, e.g. the actual email clients, word processors, etc, was somehow trojaned. In that case only network-isolated domains could be somehow trusted, while all others could be not.
The above means that we must trust at least some of the vendors (not all, of course, but at least those few that provide the apps that we use in the most critical domains). In practice in Qubes OS we trust the software provided by Fedora project. This software is signed by Fedora distribution keys and so it is also critical that the tools used in domains for software updates (yum and rpm) be trusted.
Cooperative Covert Channels Between Domains
-------------------------------------------
Qubes does not attempt to eliminate all possible *cooperative* covert channels between domains, i.e. such channels that could be established between two *compromised* domains. We don't believe this is possible to achieve on x86 hardware, and we also doubt it makes any sense in practice for most users -- after all if the two domains are compromised, then it's already (almost) all lost anyway.

View file

@ -0,0 +1,113 @@
---
layout: doc
title: Template Implementation
permalink: /doc/template-implementation/
redirect_from:
- /en/doc/template-implementation/
- /doc/TemplateImplementation/
- /wiki/TemplateImplementation/
---
Overview of VM block devices
============================
Every VM has 4 block devices connected:
- **xvda** base root device (/) details described below
- **xvdb** private.img place where VM always can write.
- **xvdc** volatile.img, discarded at each VM restart here is placed swap and temporal "/" modifications (see below)
- **xvdd** modules.img kernel modules and firmware
private.img (xvdb)
------------------
This is mounted as /rw and here is placed all VM private data. This includes:
- */home* which is bind mounted to /rw/home
- */usr/local* which is symlink to /rw/usrlocal
- some config files (/rw/config) called by qubes core scripts (ex /rw/config/rc.local)
**Note:** Whenever a TemplateBasedVM is created, the contents of the `/home` directory of its parent TemplateVM are copied to the child TemplateBasedVM's `/home`. From that point onward, the child TemplateBasedVM's `/home` is independent from its parent TemplateVM's `/home`, which means that any subsequent changes to the parent TemplateVM's `/home` will no longer affect the child TemplateBasedVM's `/home`. Once a TemplateBasedVM has been created, any changes in its `/home`, `/usr/local`, or `/rw/config` directories will be persistent across reboots, which means that any files stored there will still be available after restarting the TemplateBasedVM. No changes in any other directories in TemplateBasedVMs persist in this manner. If you would like to make changes in other directories which *do* persist in this manner, you must make those changes in the parent TemplateVM.
modules.img (xvdd)
------------------
As the kernel is chosen in dom0, there must be some way to provide matching kernel modules to VM. Qubes kernel directory consists of 3 files:
- *vmlinuz* actual kernel
- *initramfs* initial ramdisk containing script to setup snapshot devices (see below) and mount /lib/modules
- *modules.img* filesystem image of /lib/modules with matching kernel modules and firmware (/lib/firmware/updates is symlinked to /lib/modules/firmware)
Normally kernel "package" is common for many VMs (can be set using qvm-prefs). One of them can be set as default (qvm-set-default-kernel) to simplify kernel updates (by default all VMs use the default kernel). All installed kernels are placed in /var/lib/qubes/vm-kernels as separate subdirs. In this case, modules.img is attached to the VM as R/O device.
There is a special case when the VM can have a custom kernel when it is updateable (StandaloneVM or TemplateVM) and the kernel is set to "none" (by qvm-prefs). In this case the VM uses the kernel from the "kernels" VM subdir and modules.img is attached as R/W device. FIXME: "none" should be renamed to "custom".
Qubes TemplateVM implementation
===============================
TemplateVM has a shared root.img across all AppVMs that are based on it. This mechanism has some advantages over a simple common device connected to multiple VMs:
- root.img can be modified while there are AppVMs running without corrupting the filesystem
- multiple AppVMs that are using different versions of root.img (from various points in time) can be running concurrently
There are two layers of the device-mapper snapshot device; the first one enables modifying root.img without stopping the AppVMs and the second one, which is contained in the AppVM, enables temporal modifications to its filesystem. These modifications will be discarded after a restart of the AppVM.
![TemplateSharing2.png](/attachment/wiki/TemplateImplementation/TemplateSharing2.png)
Snapshot device in Dom0
-----------------------
This device consists of:
- root.img real template filesystem
- root-cow.img differences between the device as seen by AppVM and the current root.img
The above is achieved through creating device-mapper snapshots for each version of root.img. When an AppVM is started, a xen hotplug script (/etc/xen/scripts/block-snapshot) reads the inode numbers of root.img and root-cow.img; these numbers are used as the snapshot device's name. When a device with the same name exists the new AppVM will use it therefore, AppVMs based on the same version of root.img will use the same device. Of course, the device-mapper cannot use the files directly it must be connected through /dev/loop\*. The same mechanism detects if there is a loop device associated with a file determined by the device and inode numbers or if creating a new loop device is necessary.
When an AppVM is stopped the xen hotplug script checks whether the device is still in use if it is not, the script removes the snapshot and frees the loop device.
### Changes to template filesystem
In order for the full potential of the snapshot device to be realized, every change in root.img must save the original version of the modified block in root-cow.img. This is achieved by a snapshot-origin device.
When TemplateVM is started, it receives the snapshot-origin device connected as a root device (in read-write mode). Therefore, every change to this device is immediately saved in root.img but remains invisible to the AppVM, which uses the snapshot.
When TemplateVM is stopped, the xen script moves root-cow.img to root-cow.img.old and creates a new one (using the `qvm-template-commit` tool). The snapshot device will remain untouched due to the loop device, which uses an actual file on the disk (by inode, not by name). Linux kernel frees the old root-cow.img files as soon as they are unused by all snapshot devices (to be exact, loop devices). The new root-cow.img file will get a new inode number, and so new AppVMs will get new snapshot devices (with different names).
### Rollback template changes
There is possibility to rollback last template changes. Saved root-cow.img.old contains all changes made during last TemplateVM run. Rolling back changes is done by reverting this "binary patch".
This is done using snapshot-merge device-mapper target (available from 2.6.34 kernel). It requires that no other snapshot device uses underlying block devices (root.img, root-cow.img via loop device). Because of this all AppVMs based on this template must be halted during this operation.
Steps performed by **qvm-revert-template-changes**:
1. Ensure that no other VMs uses this template.
2. Prepare snapshot device with ***root-cow.img.old*** instead of *root-cow.img* (*/etc/xen/scripts/block-snapshot prepare*).
3. Replace *snapshot* device-mapper target with *snapshot-merge*, other parameters (chunk size etc) remains untouched. Now kernel starts merging changes stored in *root-cow.img.old* into *root.img*. d-m device can be used normally (if needed).
4. Waits for merge completed: *dmsetup status* shows used snapshot blocks it should be equal to metadata size when completed.
5. Replace *snapshot-merge* d-m target back to *snapshot*.
6. Cleanup snapshot device (if nobody uses it at the moment).
7. Move *root-cow.img.old* to *root-cow.img* (overriding existing file).
Snapshot device in AppVM
------------------------
Root device is exposed to AppVM in read-only mode. AppVM can write only in:
- private.img persistent storage (mounted in /rw) used for /home, /usr/local in future versions, its use may be extended
- volatile.img temporary storage, which is discarded after an AppVM restart
volatile.img is divided into two partitions:
1. changes to root device
2. swap partition
Inside of an AppVM, the root device is wrapped by the snapshot in the first partition of volatile.img. Therefore, the AppVM can write anything to its filesystem however, such changes will be discarded after a restart.
StandaloneVM
------------
Standalone VM enables user to modify root filesystem persistently. It can be created using *--standalone* switch to *qvm-create*.
It is implemented just like TemplateVM (has own root.img connected as R/W device), but no other VMs can be based on it.