From 27e8604a9bd0595603ad625b8f042454c2f2db8c Mon Sep 17 00:00:00 2001 From: Thomas Tendyck Date: Wed, 28 Sep 2022 16:31:47 +0200 Subject: [PATCH] docs: publish to 2.0 --- .../getting-started/first-steps.md | 2 +- .../version-2.0/workflows/create.md | 28 ++++-------- .../version-2.0/workflows/recovery.md | 43 +++++++++---------- .../version-2.0/workflows/scale.md | 2 +- .../version-2.0/workflows/ssh.md | 16 +++---- .../version-2.0/workflows/storage.md | 22 +++++----- .../version-2.0/workflows/upgrade.md | 14 +++--- .../version-2.0/workflows/verify-cli.md | 22 +++++----- .../version-2.0/workflows/verify-cluster.md | 4 +- 9 files changed, 69 insertions(+), 84 deletions(-) diff --git a/docs/versioned_docs/version-2.0/getting-started/first-steps.md b/docs/versioned_docs/version-2.0/getting-started/first-steps.md index 0c6803e4f..b3e58b074 100644 --- a/docs/versioned_docs/version-2.0/getting-started/first-steps.md +++ b/docs/versioned_docs/version-2.0/getting-started/first-steps.md @@ -180,7 +180,7 @@ The following steps guide you through the process of creating a cluster and depl :::tip - On Azure, you may need to wait 15+ min. at this point for role assignments to propagate. + On Azure, you may need to wait 15+ minutes at this point for role assignments to propagate. ::: diff --git a/docs/versioned_docs/version-2.0/workflows/create.md b/docs/versioned_docs/version-2.0/workflows/create.md index 47cbb2139..357ab6703 100644 --- a/docs/versioned_docs/version-2.0/workflows/create.md +++ b/docs/versioned_docs/version-2.0/workflows/create.md @@ -11,19 +11,9 @@ See the [architecture](../architecture/orchestration.md) section for details on This step creates the necessary resources for your cluster in your cloud environment. -### Prerequisites - -Before creating your cluster you need to decide on - -* the initial size of your cluster (the number of control-plane and worker nodes) -* the machine type of your nodes (depending on the availability in your cloud environment) -* whether to enable autoscaling for your cluster (automatically adding and removing nodes depending on resource demands) - -You can find the currently supported machine types for your cloud environment in the [installation guide](../architecture/orchestration.md). - ### Configuration -Constellation can generate a configuration file for your cloud provider: +Generate a configuration file for your cloud service provider (CSP): @@ -42,27 +32,28 @@ constellation config generate gcp -This creates the file `constellation-conf.yaml` in the current directory. You must edit it before you can execute the next steps. +This creates the file `constellation-conf.yaml` in the current directory. [Fill in your CSP-specific information](../getting-started/first-steps.md#create-a-cluster) before you continue. -Next, download the latest trusted measurements for your configured image. +Next, download the trusted measurements for your configured image. ```bash constellation config fetch-measurements ``` -For more details, see the [verification section](../workflows/verify-cluster.md). +For details, see the [verification section](../workflows/verify-cluster.md). ### Create +Choose the initial size of your cluster. The following command creates a cluster with one control-plane and two worker nodes: ```bash -constellation create --control-plane-nodes 1 --worker-nodes 2 -y +constellation create --control-plane-nodes 1 --worker-nodes 2 ``` -For details on the flags and a list of supported instance types, consult the command help via `constellation create -h`. +For details on the flags, consult the command help via `constellation create -h`. -*create* will store your cluster's configuration to a file named [`constellation-state.json`](../architecture/orchestration.md#installation-process) in your current directory. +*create* stores your cluster's configuration to a file named [`constellation-state.json`](../architecture/orchestration.md#installation-process) in your current directory. ## The *init* step @@ -78,11 +69,10 @@ To enable autoscaling in your cluster, add the `--autoscale` flag: constellation init --autoscale ``` -Next, configure `kubectl` for your Constellation cluster: +Next, configure `kubectl` for your cluster: ```bash export KUBECONFIG="$PWD/constellation-admin.conf" -kubectl get nodes -o wide ``` 🏁 That's it. You've successfully created a Constellation cluster. diff --git a/docs/versioned_docs/version-2.0/workflows/recovery.md b/docs/versioned_docs/version-2.0/workflows/recovery.md index 8698b9d87..4c6010d98 100644 --- a/docs/versioned_docs/version-2.0/workflows/recovery.md +++ b/docs/versioned_docs/version-2.0/workflows/recovery.md @@ -1,8 +1,8 @@ # Recover your cluster -Recovery of a Constellation cluster means getting a cluster back into a healthy state after too many concurrent node failures in the control plane. +Recovery of a Constellation cluster means getting it back into a healthy state after too many concurrent node failures in the control plane. Reasons for an unhealthy cluster can vary from a power outage, or planned reboot, to migration of nodes and regions. -Recovery events are rare, because Constellation is built for high availability and automatically and securely replaces failed nodes. When a node is replaced, Constellation's control plane first verifies the new node before it sends the node the cryptographic keys required to decrypt its [stateful disk](../architecture/images.md#stateful-disk). +Recovery events are rare, because Constellation is built for high availability and automatically and securely replaces failed nodes. When a node is replaced, Constellation's control plane first verifies the new node before it sends the node the cryptographic keys required to decrypt its [state disk](../architecture/images.md#state-disk). Constellation provides a recovery mechanism for cases where the control plane has failed and is unable to replace nodes. The `constellation recover` command connects to a node, establishes a secure connection using [attested TLS](../architecture/attestation.md#attested-tls-atls), and provides that node with the key to decrypt its stateful disk and continue booting. @@ -13,23 +13,22 @@ This process has to be repeated until enough nodes are back running for establis The first step to recovery is identifying when a cluster becomes unhealthy. Usually, this can be first observed when the Kubernetes API server becomes unresponsive. -The health status of the Constellation nodes can be checked and monitored via the cloud service provider (CSP). +You can check the health status of the nodes via the cloud service provider (CSP). Constellation provides logging information on the boot process and status via [cloud logging](troubleshooting.md#cloud-logging). -In the following, you'll find detailed descriptions for identifying clusters stuck in recovery for each cloud environment. +In the following, you'll find detailed descriptions for identifying clusters stuck in recovery for each CSP. -In the Azure cloud portal find the cluster's resource group `-` -Inside the resource group check that the control plane *Virtual machine scale set* `constellation-scale-set-controlplanes-` has enough members in a *Running* state. -Open the scale set details page, on the left go to `Settings -> Instances` and check the *Status* field. +In the Azure portal, find the cluster's resource group. +Inside the resource group, open the control plane *Virtual machine scale set* `constellation-scale-set-controlplanes-`. +On the left, go to **Settings** > **Instances** and check that enough members are in a *Running* state. Second, check the boot logs of these *Instances*. In the scale set's *Instances* view, open the details page of the desired instance. -Check the serial console output of that instance. -On the left open the *"Support + troubleshooting" -> "Serial console"* page: +On the left, go to **Support + troubleshooting** > **Serial console**. -In the serial console output search for `Waiting for decryption key`. +In the serial console output, search for `Waiting for decryption key`. Similar output to the following means your node was restarted and needs to decrypt the [state disk](../architecture/images.md#state-disk): ```json @@ -40,7 +39,7 @@ Similar output to the following means your node was restarted and needs to decry ``` The node will then try to connect to the [*JoinService*](../architecture/components.md#joinservice) and obtain the decryption key. -If that fails, because the control plane is unhealthy, you will see log messages similar to the following: +If this fails due to an unhealthy control plane, you will see log messages similar to the following: ```json {"level":"INFO","ts":"2022-09-08T09:56:43Z","logger":"rejoinClient","caller":"rejoinclient/client.go:77","msg":"Received list with JoinService endpoints","endpoints":["10.9.0.5:30090","10.9.0.6:30090"]} @@ -51,21 +50,21 @@ If that fails, because the control plane is unhealthy, you will see log messages {"level":"ERROR","ts":"2022-09-08T09:57:23Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"} ``` -This means that you have to recover the node manually. For this, you need its IP address, which can be obtained from the *Overview* page under *Private IP address*. +This means that you have to recover the node manually. For this, you need its IP address, which you can obtain from the *Overview* page under *Private IP address*. First, check that the control plane *Instance Group* has enough members in a *Ready* state. -Go to *Instance Groups* and check the group for the cluster's control plane `-control-plane-`. +In the GCP Console, go to **Instance Groups** and check the group for the cluster's control plane `-control-plane-`. Second, check the status of the *VM Instances*. -Go to *VM Instances* and open the details of the desired instance. -Check the serial console output of that instance by opening the *logs -> "Serial port 1 (console)"* page: +Go to **VM Instances** and open the details of the desired instance. +Check the serial console output of that instance by opening the **Logs** > **Serial port 1 (console)** page: ![GCP portal serial console link](../_media/recovery-gcp-serial-console-link.png) -In the serial console output search for `Waiting for decryption key`. +In the serial console output, search for `Waiting for decryption key`. Similar output to the following means your node was restarted and needs to decrypt the [state disk](../architecture/images.md#state-disk): ```json @@ -73,11 +72,10 @@ Similar output to the following means your node was restarted and needs to decry {"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"setupManager","caller":"setup/setup.go:72","msg":"Preparing existing state disk"} {"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:65","msg":"Starting RejoinClient"} {"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"recoveryServer","caller":"recoveryserver/server.go:59","msg":"Starting RecoveryServer"} - ``` The node will then try to connect to the [*JoinService*](../architecture/components.md#joinservice) and obtain the decryption key. -If that fails, because the control plane is unhealthy, you will see log messages similar to the following: +If this fails due to an unhealthy control plane, you will see log messages similar to the following: ```json {"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:77","msg":"Received list with JoinService endpoints","endpoints":["192.168.178.4:30090","192.168.178.2:30090"]} @@ -88,12 +86,12 @@ If that fails, because the control plane is unhealthy, you will see log messages {"level":"ERROR","ts":"2022-09-08T10:22:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"} ``` -This means that you have to recover the node manually. For this, you need its IP address, which can be obtained from the *"VM Instance" -> "network interfaces"* page under *"Primary internal IP address."* +This means that you have to recover the node manually. For this, you need its IP address, which you can obtain from the **VM Instance** > **network interfaces** table under *Primary internal IP address*. -## Recover your cluster +## Recover a cluster The following process needs to be repeated until a [member quorum for etcd](https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance) is established. For example, assume you have 5 control-plane nodes in your cluster and 4 of them have been rebooted due to a maintenance downtime in the cloud environment. @@ -102,10 +100,9 @@ From there, your cluster will auto heal the remaining 2 control-plane nodes and Recovering a node requires the following parameters: -* The node's IP address -* Access to the master secret of the cluster +* The node's IP address (see [Identify unhealthy clusters](#identify-unhealthy-clusters) on how to obtain it) +* The master secret of the cluster -See the [Identify unhealthy clusters](#identify-unhealthy-clusters) description of how to obtain the node's IP address. Note that the recovery command needs to connect to the recovering nodes. Nodes only have private IP addresses in the VPC of the cluster, hence, the command needs to be issued from within the VPC network of the cluster. The easiest approach is to set up a jump host connected to the VPC network and perform the recovery from there. diff --git a/docs/versioned_docs/version-2.0/workflows/scale.md b/docs/versioned_docs/version-2.0/workflows/scale.md index 469a157fb..3318d8aee 100644 --- a/docs/versioned_docs/version-2.0/workflows/scale.md +++ b/docs/versioned_docs/version-2.0/workflows/scale.md @@ -4,7 +4,7 @@ Constellation provides all features of a Kubernetes cluster including scaling an ## Worker node scaling -[During cluster initialization](create.md#init) you can choose to deploy the [cluster autoscaler](https://github.com/kubernetes/autoscaler). It automatically provisions additional worker nodes so that all pods have a place to run. Alternatively, you can choose to manually scale your cluster up or down: +[During cluster initialization](create.md#the-init-step) you can choose to deploy the [cluster autoscaler](https://github.com/kubernetes/autoscaler). It automatically provisions additional worker nodes so that all pods have a place to run. Alternatively, you can choose to manually scale your cluster up or down: diff --git a/docs/versioned_docs/version-2.0/workflows/ssh.md b/docs/versioned_docs/version-2.0/workflows/ssh.md index 3c437eb70..0871973f7 100644 --- a/docs/versioned_docs/version-2.0/workflows/ssh.md +++ b/docs/versioned_docs/version-2.0/workflows/ssh.md @@ -1,19 +1,19 @@ -# Manage SSH Keys +# Manage SSH keys -Constellation gives you the capability to create UNIX users which can connect to the cluster nodes over SSH, allowing you to access both control-plane and worker nodes. While the nodes' data partitions are persistent, the system partitions are read-only. Consequently, users need to be re-created upon each restart of a node. This is where the Access Manager comes into effect, ensuring the automatic (re-)creation of all users whenever a node is restarted. +Constellation allows you to create UNIX users that can connect to both control-plane and worker nodes over SSH. As the system partitions are read-only, users need to be re-created upon each restart of a node. This is automated by the *Access Manager*. -During the initial creation of the cluster, all users defined in the `ssh-users` section of the Constellation configuration file are automatically created during the initialization process. For persistence, the users are stored in a ConfigMap called `ssh-users`, residing in the `kube-system` namespace. For a running cluster, users can be added and removed by modifying the entries of the ConfigMap and performing a restart of a node. +On cluster initialization, users defined in the `ssh-users` section of the Constellation configuration file are created and stored in the `ssh-users` ConfigMap in the `kube-system` namespace. For a running cluster, you can add or remove users by modifying the ConfigMap and restarting a node. ## Access Manager -The Access Manager supports all OpenSSH key types. These are RSA, ECDSA (using the `nistp256`, `nistp384`, `nistp521` curves) and Ed25519. +The Access Manager supports all OpenSSH key types. These are RSA, ECDSA (using the `nistp256`, `nistp384`, `nistp521` curves) and Ed25519. :::note All users are automatically created with `sudo` capabilities. ::: -The Access Manager is deployed as a DaemonSet called `constellation-access-manager`, running as an `initContainer` and afterward running a `pause` container to avoid automatic restarts. While technically killing the Pod and letting it restart works for the (re-)creation of users, it doesn't automatically remove users. Thus, a complete node restart is required after making changes to the ConfigMap. +The Access Manager is deployed as a DaemonSet called `constellation-access-manager`, running as an `initContainer` and afterward running a `pause` container to avoid automatic restarts. While technically killing the Pod and letting it restart works for the (re-)creation of users, it doesn't automatically remove users. Thus, a node restart is required after making changes to the ConfigMap. -When a user is deleted from the ConfigMap, it won't be re-created after the next restart of a node. The home directories of the affected users will be moved to `/var/evicted`, with the owner of each directory and its content being modified to `root`. +When a user is deleted from the ConfigMap, it won't be re-created after the next restart of a node. The home directories of the affected users will be moved to `/var/evicted`. You can update the ConfigMap by: ```bash @@ -23,7 +23,7 @@ kubectl edit configmap -n kube-system ssh-users Or alternatively, by modifying and re-applying it with the definition listed in the examples. ## Examples -An example to create an user called `myuser` as part of the `constellation-config.yaml` looks like this: +You can add a user `myuser` in `constellation-config.yaml` like this: ```yaml # Create SSH users on Constellation nodes upon the first initialization of the cluster. @@ -43,7 +43,7 @@ data: myuser: "ssh-rsa AAAA...mgNJd9jc=" ``` -Entries can be added simply by adding `data` entries: +You can add users by adding `data` entries: ```yaml apiVersion: v1 diff --git a/docs/versioned_docs/version-2.0/workflows/storage.md b/docs/versioned_docs/version-2.0/workflows/storage.md index 29af4fffd..b961f894b 100644 --- a/docs/versioned_docs/version-2.0/workflows/storage.md +++ b/docs/versioned_docs/version-2.0/workflows/storage.md @@ -17,26 +17,24 @@ To address this, Constellation provides CSI drivers for Azure Disk and GCE PD, o For more details see [encrypted persistent storage](../architecture/encrypted-storage.md). -## CSI Drivers +## CSI drivers Constellation supports the following drivers, which offer node-level encryption and optional integrity protection. -1. [Azure Disk Storage](https://github.com/edgelesssys/constellation-azuredisk-csi-driver) - - Mount Azure [Disk Storage](https://azure.microsoft.com/en-us/services/storage/disks/#overview) into your Constellation cluster. See the example below on how to install the modified Azure Disk CSI driver or check out the [repository](https://github.com/edgelesssys/constellation-azuredisk-csi-driver) for installation and more information about the Constellation-managed version of the driver. Since Azure Disks are mounted as ReadWriteOnce, they're only available to a single pod. +**Constellation CSI driver for Azure Disk**: +Mount Azure [Disk Storage](https://azure.microsoft.com/en-us/services/storage/disks/#overview) into your Constellation cluster. See the instructions on how to [install the Constellation CSI driver](#installation) or check out the [repository](https://github.com/edgelesssys/constellation-azuredisk-csi-driver) for more information. Since Azure Disks are mounted as ReadWriteOnce, they're only available to a single pod. -1. [Persistent Disk](https://github.com/edgelesssys/constellation-gcp-compute-persistent-disk-csi-driver): - - Mount GCP [Persistent Disk](https://cloud.google.com/persistent-disk) block storage into your Constellation cluster. - This includes support for [volume snapshots](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/volume-snapshots), which let you create copies of your volume at a specific point in time. - You can use them to bring a volume back to a prior state or provision new volumes. - Follow the examples listed below to setup the modified GCP PD CSI driver, or check out the [repository](https://github.com/edgelesssys/constellation-gcp-compute-persistent-disk-csi-driver) for information about the configuration. +**Constellation CSI driver for GCP Persistent Disk**: +Mount [Persistent Disk](https://cloud.google.com/persistent-disk) block storage into your Constellation cluster. +This includes support for [volume snapshots](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/volume-snapshots), which let you create copies of your volume at a specific point in time. +You can use them to bring a volume back to a prior state or provision new volumes. +Follow the instructions on how to [install the Constellation CSI driver](#installation) or check out the [repository](https://github.com/edgelesssys/constellation-gcp-compute-persistent-disk-csi-driver) for information about the configuration. @@ -63,7 +61,7 @@ The following installation guide gives an overview of how to securely use CSI-ba A storage class configures the driver responsible for provisioning storage for persistent volume claims. A storage class only needs to be created once and can then be used by multiple volumes. - The following snippet creates a simple storage class using a [Standard SSD](https://docs.microsoft.com/en-us/azure/virtual-machines/disks-types#standard-ssds) as the backing storage device when the first Pod claiming the volume is created. + The following snippet creates a simple storage class using [Standard SSDs](https://docs.microsoft.com/en-us/azure/virtual-machines/disks-types#standard-ssds) as the backing storage device when the first Pod claiming the volume is created. ```bash cat <