Re-wording in docs/workflows (#135)

* Quick pass over create.md

* pass over verify.md

* Re-arrange workflows

* Quick polish of scale.md and upgrade.md

* Quick polish of terminate.md

* Cut recovery.md down

* Brush over ssh

* storage

* Brush over trusted launch VMs

* Update docs/docs/workflows/verify-cluster.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Update docs/docs/workflows/verify-cluster.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Update docs/docs/workflows/verify-cluster.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Add Azure back to title

* Update docs/docs/workflows/verify-cluster.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* fix lint errors

* publish to 2.0

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
Co-authored-by: Thomas Tendyck <tt@edgeless.systems>
This commit is contained in:
Felix Schuster 2022-09-13 15:12:05 +02:00 committed by GitHub
parent c7f39388e4
commit eb213878a2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
23 changed files with 183 additions and 237 deletions

View File

@ -67,7 +67,7 @@ For creating Kubernetes clusters in GCP a local copy of the service account secr
1. [Create a new service account key](https://console.cloud.google.com/iam-admin/serviceaccounts/details/112741463528383500960/keys?authuser=0&project=constellation-331613&supportedpurview=project)
2. Create a compact (one line) JSON representation of the file `jq -c`
3. Store in [GitHub Action Secret](https://github.com/edgelesssys/constellation/settings/secrets/actions) or create a local secret file for act to consume:
3. Store in GitHub Action Secret or create a local secret file for act to consume:
```bash
$ cat secrets.env

View File

@ -56,7 +56,7 @@ You can do this by utilizing our terraform setup.
Instructions on how to set it up can be found in it's [README](/terraform/libvirt/README.md).
# Verification
In order to verify your cluster we describe a [verification workflow](https://constellation-docs.edgeless.systems/constellation/workflows/verify-cluster) in our official docs.
In order to verify your cluster we describe a [verification workflow](https://docs.edgeless.systems/constellation/workflows/verify-cluster) in our official docs.
Apart from that you can also reproduce some of the measurements described in the [docs](https://docs.edgeless.systems/constellation/architecture/attestation#runtime-measurements) locally.
To do so we built a tool that creates a VM, collects the PCR values and reports them to you.
To run the tool execute the following command in `/hack/image-measurement`:

View File

@ -33,7 +33,7 @@ This checklist will prepare `v1.3.0` from `v1.2.0`. Adjust your version numbers
# Alternative from CLI
gh workflow run build-operator-manual.yml --ref release/v1.3 -F imageTag=v1.3.0
```
4. Review and update changelog with all changes since last release. [GitHub's diff view](https://github.com/edgelesssys/constellation/compare/v1.2.0...main) helps a lot!
4. Review and update changelog with all changes since last release. [GitHub's diff view](https://github.com/edgelesssys/constellation/compare/v2.0.0...main) helps a lot!
5. Update project version in [CMakeLists.txt](/CMakeLists.txt) to `1.3.0` (without v).
6. Update versions [versions.go](../../internal/versions/versions.go#L33-L39) to `v1.3.0` and **push your changes**.
7. Create a [production coreOS image](/.github/workflows/build-coreos.yml)

View File

@ -15,7 +15,7 @@ This step creates the necessary resources for your cluster in your cloud environ
Before creating your cluster you need to decide on
* the size of your cluster (the number of control-plane and worker nodes)
* the initial size of your cluster (the number of control-plane and worker nodes)
* the machine type of your nodes (depending on the availability in your cloud environment)
* whether to enable autoscaling for your cluster (automatically adding and removing nodes depending on resource demands)
@ -66,10 +66,6 @@ For details on the flags and a list of supported instance types, consult the com
## The *init* step
This step bootstraps your cluster and configures your Kubernetes client.
### Init
The following command initializes and bootstraps your cluster:
```bash
@ -82,11 +78,11 @@ To enable autoscaling in your cluster, add the `--autoscale` flag:
constellation init --autoscale
```
Next, configure kubectl for your Constellation cluster:
Next, configure `kubectl` for your Constellation cluster:
```bash
export KUBECONFIG="$PWD/constellation-admin.conf"
kubectl get nodes -o wide
```
That's it. You've successfully created a Constellation cluster.
🏁 That's it. You've successfully created a Constellation cluster.

View File

@ -1,32 +1,21 @@
# Recovery
# Recover your cluster
Recovery of a Constellation cluster means getting a cluster back into a healthy state after it became unhealthy due to the underlying infrastructure.
Recovery of a Constellation cluster means getting a cluster back into a healthy state after too many concurrent node failures in the control plane.
Reasons for an unhealthy cluster can vary from a power outage, or planned reboot, to migration of nodes and regions.
Constellation keeps all stateful data protected and encrypted in a [stateful disk](../architecture/images.md#stateful-disk) attached to each node.
The stateful disk will be persisted across reboots.
The data restored from that disk contains the entire Kubernetes state including the application deployments.
Meaning after a successful recovery procedure the applications can continue operating without redeploying everything from scratch.
Recovery events are rare, because Constellation is built for high availability and automatically and securely replaces failed nodes. When a node is replaced, Constellation's control plane first verifies the new node before it sends the node the cryptographic keys required to decrypt its [stateful disk](../architecture/images.md#stateful-disk).
Recovery events are rare because Constellation is built for high availability and contains mechanisms to automatically replace and join nodes to the cluster.
Once a node reboots, the [*Bootstrapper*](../architecture/components.md#bootstrapper) will try to authenticate to the cluster's [*JoinService*](../architecture/components.md#joinservice) using remote attestation.
If successful the *JoinService* will return the encryption key for the stateful disk as part of the initialization response.
This process ensures that Constellation nodes can securely recover and rejoin a cluster autonomously.
In case of a disaster, where the control plane itself becomes unhealthy, Constellation provides a mechanism to recover that cluster and bring it back into a healthy state.
Constellation provides a recovery mechanism for cases where the control plane has failed and is unable to replace nodes.
The `constellation recover` command connects to a node, establishes a secure connection using [attested TLS](../architecture/attestation.md#attested-tls-atls), and provides that node with the key to decrypt its stateful disk and continue booting.
This process has to be repeated until enough nodes are back running for establishing a [member quorum for etcd](https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance) and the Kubernetes state can be recovered.
## Identify unhealthy clusters
The first step to recovery is identifying when a cluster becomes unhealthy.
Usually, that's first observed when the Kubernetes API server becomes unresponsive.
The causes can vary but are often related to issues in the underlying infrastructure.
Recovery in Constellation becomes necessary if not enough control-plane nodes are in a healthy state to keep the control plane operational.
Usually, this can be first observed when the Kubernetes API server becomes unresponsive.
The health status of the Constellation nodes can be checked and monitored via the cloud service provider.
The health status of the Constellation nodes can be checked and monitored via the cloud service provider (CSP).
Constellation provides logging information on the boot process and status via [cloud logging](troubleshooting.md#cloud-logging).
In the following, you'll find detailed descriptions for identifying clusters stuck in recovery for each cloud environment.
Once you've identified that your cluster is in an unhealthy state you can use the [recovery](recovery.md#recover-your-cluster) command of the Constellation CLI to restore it.
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -62,9 +51,7 @@ If that fails, because the control plane is unhealthy, you will see log messages
{"level":"ERROR","ts":"2022-09-08T09:57:23Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"}
```
That means you have to recover that node manually.
Before you continue with the [recovery process](#recover-your-cluster) you need to know the node's IP address.
For the IP address, return to the instances *Overview* page and find the *Private IP address*.
This means that you have to recover the node manually. For this, you need its IP address, which can be obtained from the *Overview* page under *Private IP address*.
</tabItem>
<tabItem value="gcp" label="GCP" default>
@ -101,16 +88,14 @@ If that fails, because the control plane is unhealthy, you will see log messages
{"level":"ERROR","ts":"2022-09-08T10:22:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"}
```
That means you have to recover that node manually.
Before you continue with the [recovery process](#recover-your-cluster) you need to know the node's IP address.
For the IP address go to the *"VM Instance" -> "network interfaces"* page and take the address from *"Primary internal IP address."*
This means that you have to recover the node manually. For this, you need its IP address, which can be obtained from the *"VM Instance" -> "network interfaces"* page under *"Primary internal IP address."*
</tabItem>
</tabs>
## Recover your cluster
Depending on the size of your cluster and the number of unhealthy control plane nodes the following process needs to be repeated until a [member quorum for etcd](https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance) is established.
The following process needs to be repeated until a [member quorum for etcd](https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance) is established.
For example, assume you have 5 control-plane nodes in your cluster and 4 of them have been rebooted due to a maintenance downtime in the cloud environment.
You have to run through the following process for 2 of these nodes and recover them manually to recover the quorum.
From there, your cluster will auto heal the remaining 2 control-plane nodes and the rest of your cluster.
@ -140,5 +125,3 @@ In the serial console output of the node you'll see a similar output to the foll
{"level":"INFO","ts":"2022-09-08T10:26:59Z","logger":"recoveryServer.gRPC","caller":"zap/server_interceptors.go:61","msg":"finished streaming call with code OK","grpc.start_time":"2022-09-08T10:26:59Z","system":"grpc","span.kind":"server","grpc.service":"recoverproto.API","grpc.method":"Recover","peer.address":"192.0.2.3:41752","grpc.code":"OK","grpc.time_ms":15.701}
{"level":"INFO","ts":"2022-09-08T10:27:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:87","msg":"RejoinClient stopped"}
```
After enough control plane nodes have been recovered and the Kubernetes cluster becomes healthy again, the rest of the cluster will start auto healing using the mechanism described above.

View File

@ -4,9 +4,7 @@ Constellation provides all features of a Kubernetes cluster including scaling an
## Worker node scaling
[During cluster initialization](create.md#init) you can choose to deploy the [cluster autoscaler](https://github.com/kubernetes/autoscaler). It automatically provisions additional worker nodes so that all pods have a place to run.
Alternatively, you can choose to manually scale your cluster:
[During cluster initialization](create.md#init) you can choose to deploy the [cluster autoscaler](https://github.com/kubernetes/autoscaler). It automatically provisions additional worker nodes so that all pods have a place to run. Alternatively, you can choose to manually scale your cluster up or down:
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -26,8 +24,6 @@ Alternatively, you can choose to manually scale your cluster:
</tabItem>
</tabs>
This works for scaling your worker nodes up and down.
## Control-plane node scaling
Control-plane nodes can **only be scaled manually and only scaled up**!

View File

@ -1,17 +1,17 @@
# Managing SSH Keys
# Manage SSH Keys
Constellation gives you the capability to create UNIX users which can connect to the cluster nodes over SSH, allowing you to access both control-plane as well as worker nodes. While the data partition is persistent, the system partition is read-only, meaning that users need to be re-created upon each restart of a node. This is where the Access Manager comes into effect, ensuring the automatic (re-)creation of all users whenever a node is restarted.
Constellation gives you the capability to create UNIX users which can connect to the cluster nodes over SSH, allowing you to access both control-plane and worker nodes. While the nodes' data partitions are persistent, the system partitions are read-only. Consequently, users need to be re-created upon each restart of a node. This is where the Access Manager comes into effect, ensuring the automatic (re-)creation of all users whenever a node is restarted.
During the initial creation of the cluster, all users defined in the `ssh-users` section of the Constellation configuration (see the [reference section](../reference/config.md) for details) are automatically created during the initialization process.
For persistence, they're transferred into a ConfigMap called `ssh-users`, residing in the `kube-system` namespace. When no users are initially defined, the ConfigMap will still be created with no entries. After the initial definition in the Constellation configuration, users can be added and removed by modifying the entries of the ConfigMap and performing a restart of a node.
During the initial creation of the cluster, all users defined in the `ssh-users` section of the Constellation [configuration file](../reference/config.md) are automatically created during the initialization process. For persistence, the users are stored in a ConfigMap called `ssh-users`, residing in the `kube-system` namespace. For a running cluster, users can be added and removed by modifying the entries of the ConfigMap and performing a restart of a node.
## Access Manager
The Access Manager doesn't restrict users on the use of certain key formats, meaning that all underlying formats the OpenSSH server supports are accepted. These are RSA, ECDSA (using the `nistp256`, `nistp384`, `nistp521` curves) and Ed25519. No validation is performed on the side of the Access Manager too, passing them directly to the authorized key lists as defined.
The Access Manager supports all OpenSSH key types. These are RSA, ECDSA (using the `nistp256`, `nistp384`, `nistp521` curves) and Ed25519.
Note that all users are automatically created with `sudo` capabilities, so make sure no one unintended has permissions to modify the `ssh-users` ConfigMap.
:::note
All users are automatically created with `sudo` capabilities.
:::
The Access Manager is deployed as a DaemonSet called `constellation-access-manager`, running as an `initContainer` and afterward running a `pause` container to avoid automatic restarts. While technically killing the Pod and letting it restart works for the (re-)creation of users, it doesn't automatically remove users. Therefore, a complete node restart is important to ensure the correct modification of users on the system and needs to be executed manually after making changes to the ConfigMap.
The Access Manager is deployed as a DaemonSet called `constellation-access-manager`, running as an `initContainer` and afterward running a `pause` container to avoid automatic restarts. While technically killing the Pod and letting it restart works for the (re-)creation of users, it doesn't automatically remove users. Thus, a complete node restart is required after making changes to the ConfigMap.
When a user is deleted from the ConfigMap, it won't be re-created after the next restart of a node. The home directories of the affected users will be moved to `/var/evicted`, with the owner of each directory and its content being modified to `root`.

View File

@ -1,26 +1,25 @@
# Use persistent storage
Persistent storage in Kubernetes requires configuration based on your cloud provider of choice.
Persistent storage in Kubernetes requires cloud-specific configuration.
For abstraction of container storage, Kubernetes offers [volumes](https://kubernetes.io/docs/concepts/storage/volumes/),
allowing users to mount storage solutions directly into containers.
The [Container Storage Interface (CSI)](https://kubernetes-csi.github.io/docs/) is the standard interface for exposing arbitrary block and file storage systems into containers in Kubernetes.
Cloud providers offer their own CSI-based solutions for cloud storage.
Cloud service providers (CSPs) offer their own CSI-based solutions for cloud storage.
### Confidential storage
Most cloud storage solutions support encryption, such as [GCE Persistent Disks (PD)](https://cloud.google.com/kubernetes-engine/docs/how-to/using-cmek).
Constellation supports the available CSI-based storage options for Kubernetes engines in Azure and GCP.
However, their encryption takes place in the storage backend and is managed by the cloud provider.
This mode of storage encryption doesn't provide confidential storage.
Using the default CSI drivers for these storage types means trusting the CSP with your persistent data.
However, their encryption takes place in the storage backend and is managed by the CSP.
Thus, using the default CSI drivers for these storage types means trusting the CSP with your persistent data.
Constellation provides CSI drivers for Azure Disk and GCE PD, offering [encryption on the node level](../architecture/keys.md#storage-encryption). They enable transparent encryption for persistent volumes without needing to trust the cloud backend. Plaintext data never leaves the confidential VM context, offering you confidential storage.
To address this, Constellation provides CSI drivers for Azure Disk and GCE PD, offering [encryption on the node level](../architecture/keys.md#storage-encryption). They enable transparent encryption for persistent volumes without needing to trust the cloud backend. Plaintext data never leaves the confidential VM context, offering you confidential storage.
For more details see [encrypted persistent storage](../architecture/encrypted-storage.md).
## CSI Drivers
Constellation can use the following drivers which offer node level encryption and optional integrity protection.
Constellation supports the following drivers, which offer node-level encryption and optional integrity protection.
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -46,7 +45,7 @@ Note that in case the options above aren't a suitable solution for you, Constell
## Installation
The following installation guide gives a brief overview of using CSI-based confidential cloud storage for persistent volumes in Constellation.
The following installation guide gives an overview of how to securely use CSI-based cloud storage for persistent volumes in Constellation.
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -82,12 +81,6 @@ The following installation guide gives a brief overview of using CSI-based confi
EOF
```
:::info
By default, integrity protection is disabled for performance reasons. If you want to enable integrity protection, add `csi.storage.k8s.io/fstype: ext4-integrity` to `parameters`. Alternatively, you can use another filesystem by specifying another file system type with the suffix `-integrity`. Note that volume expansion isn't supported for integrity-protected disks.
:::
</tabItem>
<tabItem value="gcp" label="GCP" default>
@ -119,15 +112,15 @@ By default, integrity protection is disabled for performance reasons. If you wan
EOF
```
</tabItem>
</tabs>
:::info
By default, integrity protection is disabled for performance reasons. If you want to enable integrity protection, add `csi.storage.k8s.io/fstype: ext4-integrity` to `parameters`. Alternatively, you can use another filesystem by specifying another file system type with the suffix `-integrity`. Note that volume expansion isn't supported for integrity-protected disks.
:::
</tabItem>
</tabs>
3. Create a [persistent volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
A [persistent volume claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) is a request for storage with certain properties.

View File

@ -1,7 +1,6 @@
# Terminate your cluster
You can terminate your cluster using the CLI.
You need the state file of your running cluster named `constellation-state.json` in the current directory.
You can terminate your cluster using the CLI. For this, you need the state file of your running cluster named `constellation-state.json` in the current directory.
:::danger
@ -16,7 +15,7 @@ constellation terminate
```
This deletes all resources created by Constellation in your cloud environment.
All local files created by the `create` and `init` commands are deleted as well, except the *master secret* `constellation-mastersecret.json` and the configuration file.
All local files created by the `create` and `init` commands are deleted as well, except for `constellation-mastersecret.json` and the configuration file.
:::caution

View File

@ -1,24 +1,24 @@
# Azure trusted launch VMs
# Use Azure trusted launch VMs
Constellation supports Azure trusted launch VMs. These are VMs with instance type `Standard_D*_v4` and `Standard_E*_v4`.
Constellation also supports [trusted launch VMs](https://docs.microsoft.com/en-us/azure/virtual-machines/trusted-launch) on Microsoft Azure. Trusted launch VMs don't offer the same level of security as CVMs, but are available in more regions and in larger quantities. The main difference between trusted launch VMs and normal VMs is that the former offer vTPM-based remote attestation. When used with trusted launch VMs, Constellation relies on vTPM-based remote attestation to verify nodes.
:::caution
Trusted launch VMs don't provide [runtime encryption](../overview/confidential-kubernetes.md).
For highest security, use Confidential VMs.
Trusted launch VMs don't provide runtime encryption and don't keep the cloud service provider (CSP) out of your trusted computing base.
:::
Run `constellation config instance-types` to show all supported instance types.
Constellation supports trusted launch VMs with instance types `Standard_D*_v4` and `Standard_E*_v4`. Run `constellation config instance-types` for a list of all supported instance types.
## VM images
Azure currently doesn't support [community galleries for trusted launch VMs](https://docs.microsoft.com/en-us/azure/virtual-machines/share-gallery-community). So you need to import the VM image into your cloud subscription.
Azure currently doesn't support [community galleries for trusted launch VMs](https://docs.microsoft.com/en-us/azure/virtual-machines/share-gallery-community). Thus, you need to manually import the Constellation node image into your cloud subscription.
The latest image is available at [https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0](https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0). Simply adjust the last three numbers if you want to download an image for a different version.
The latest image is available at [https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0](https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0). Simply adjust the last three digits to download a different version.
After you've downloaded the image, create a resource group `constellation-images` in your Azure subscription and import the image.
You can use a script to do this:
```bash
wget https://github.com/edgelesssys/constellation/blob/main/hack/importAzure.sh
chmod +x importAzure.sh

View File

@ -1,4 +1,4 @@
# Upgrade you cluster
# Upgrade your cluster
Constellation provides an easy way to upgrade from one release to the next.
This involves choosing a new VM image to use for all nodes in the cluster and updating the cluster's expected measurements.

View File

@ -1,42 +1,48 @@
# Verify your cluster
# Manually verify your cluster
Constellation's [attestation feature](../architecture/attestation.md) allows you, or a third party, to verify the confidentiality and integrity of your Constellation.
Constellation's [attestation feature](../architecture/attestation.md) allows you, or a third party, to explicitly verify the integrity and confidentiality of your Constellation cluster.
:::note
The steps below are purely optional. They're automatically executed by `constellation init` when you initialize your cluster. The `constellation verify` command mostly has an illustrative purpose.
:::
## Fetch measurements
To verify the integrity of Constellation you need trusted measurements to verify against. For each of the released images there are signed measurements, which you can download using the CLI:
To verify the integrity of Constellation you need trusted measurements to verify against. For each node image released by Edgeless Systems, there are signed measurements, which you can download using the CLI:
```bash
constellation config fetch-measurements
```
This command performs the following steps:
1. Look up the signed measurements for the configured image.
2. Download the measurements.
3. Verify the signature.
4. Write measurements into configuration file.
1. Download the signed measurements for the configured image. By default, this will use Edgeless Systems' public measurement registry.
2. Verify the signed images. This will use Edgeless Systems' [public key](https://edgeless.systems/es.pub).
3. Write measurements into configuration file.
## The *verify* command
Once measurements are configured, this command verifies an attestation statement issued by a Constellation, thereby verifying the integrity and confidentiality of the whole cluster.
The following command performs attestation on the Constellation in your current workspace:
The `verify` command obtains and verifies an attestation statement from a running Constellation cluster.
```bash
constellation verify
constellation verify [--cluster-id ...]
```
The command makes sure the value passed to `-cluster-id` matches the *clusterID* presented in the attestation statement.
This allows you to verify that you are connecting to a specific Constellation instance
Additionally, the confidential computing capabilities, as well as the VM image, are verified to match the expected configurations.
From the attestation statement, the command verifies the following properties:
* The cluster is using the correct Confidential VM (CVM) type.
* Inside the CVMs, the correct node images are running. The node images are identified through the measurements obtained in the previous step.
* The unique ID of the cluster matches the one passed in via `--cluster-id`.
Once the above properties are verified, you know that you are talking to the right Constellation cluster and it's in a good and trustworthy shape.
### Custom arguments
You can provide additional arguments for `verify` to verify any Constellation you have network access to. This requires you to provide:
The `verify` command also allows you to verify any Constellation deployment that you have network access to. For this you need to following:
* The IP address of a running Constellation's [VerificationService](../architecture/components.md#verification-service). The *VerificationService* is exposed via a NodePort service using the external IP address of your cluster. Run `kubectl get nodes -o wide` and look for `EXTERNAL-IP`.
* The Constellation's *clusterID*. See [cluster identity](../architecture/keys.md#cluster-identity) for more details.
* The IP address of a running Constellation cluster's [VerificationService](../architecture/components.md#verification-service). The `VerificationService` is exposed via a `NodePort` service using the external IP address of your cluster. Run `kubectl get nodes -o wide` and look for `EXTERNAL-IP`.
* The cluster's *clusterID*. See [cluster identity](../architecture/keys.md#cluster-identity) for more details.
```bash
For example:
```shell-session
constellation verify -e 192.0.2.1 --cluster-id Q29uc3RlbGxhdGlvbkRvY3VtZW50YXRpb25TZWNyZXQ=
```

View File

@ -123,11 +123,6 @@ const sidebars = {
label: 'Create your cluster',
id: 'workflows/create',
},
{
type: 'doc',
label: 'Verify your cluster',
id: 'workflows/verify-cluster',
},
{
type: 'doc',
label: 'Scale your cluster',
@ -138,26 +133,6 @@ const sidebars = {
label: 'Upgrade your cluster',
id: 'workflows/upgrade',
},
{
type: 'doc',
label: 'Use persistent storage',
id: 'workflows/storage',
},
{
type: 'doc',
label: 'Azure trusted launch VMs',
id: 'workflows/trusted-launch',
},
{
type: 'doc',
label: 'Managing SSH keys',
id: 'workflows/ssh',
},
{
type: 'doc',
label: 'Troubleshooting',
id: 'workflows/troubleshooting',
},
{
type: 'doc',
label: 'Terminate your cluster',
@ -168,6 +143,31 @@ const sidebars = {
label: 'Recover your cluster',
id: 'workflows/recovery',
},
{
type: 'doc',
label: 'Verify your cluster',
id: 'workflows/verify-cluster',
},
{
type: 'doc',
label: 'Manage SSH keys',
id: 'workflows/ssh',
},
{
type: 'doc',
label: 'Use persistent storage',
id: 'workflows/storage',
},
{
type: 'doc',
label: 'Use Azure trusted launch VMs',
id: 'workflows/trusted-launch',
},
{
type: 'doc',
label: 'Troubleshooting',
id: 'workflows/troubleshooting',
},
],
},
{

View File

@ -15,7 +15,7 @@ This step creates the necessary resources for your cluster in your cloud environ
Before creating your cluster you need to decide on
* the size of your cluster (the number of control-plane and worker nodes)
* the initial size of your cluster (the number of control-plane and worker nodes)
* the machine type of your nodes (depending on the availability in your cloud environment)
* whether to enable autoscaling for your cluster (automatically adding and removing nodes depending on resource demands)
@ -66,10 +66,6 @@ For details on the flags and a list of supported instance types, consult the com
## The *init* step
This step bootstraps your cluster and configures your Kubernetes client.
### Init
The following command initializes and bootstraps your cluster:
```bash
@ -82,11 +78,11 @@ To enable autoscaling in your cluster, add the `--autoscale` flag:
constellation init --autoscale
```
Next, configure kubectl for your Constellation cluster:
Next, configure `kubectl` for your Constellation cluster:
```bash
export KUBECONFIG="$PWD/constellation-admin.conf"
kubectl get nodes -o wide
```
That's it. You've successfully created a Constellation cluster.
🏁 That's it. You've successfully created a Constellation cluster.

View File

@ -1,32 +1,21 @@
# Recovery
# Recover your cluster
Recovery of a Constellation cluster means getting a cluster back into a healthy state after it became unhealthy due to the underlying infrastructure.
Recovery of a Constellation cluster means getting a cluster back into a healthy state after too many concurrent node failures in the control plane.
Reasons for an unhealthy cluster can vary from a power outage, or planned reboot, to migration of nodes and regions.
Constellation keeps all stateful data protected and encrypted in a [stateful disk](../architecture/images.md#stateful-disk) attached to each node.
The stateful disk will be persisted across reboots.
The data restored from that disk contains the entire Kubernetes state including the application deployments.
Meaning after a successful recovery procedure the applications can continue operating without redeploying everything from scratch.
Recovery events are rare, because Constellation is built for high availability and automatically and securely replaces failed nodes. When a node is replaced, Constellation's control plane first verifies the new node before it sends the node the cryptographic keys required to decrypt its [stateful disk](../architecture/images.md#stateful-disk).
Recovery events are rare because Constellation is built for high availability and contains mechanisms to automatically replace and join nodes to the cluster.
Once a node reboots, the [*Bootstrapper*](../architecture/components.md#bootstrapper) will try to authenticate to the cluster's [*JoinService*](../architecture/components.md#joinservice) using remote attestation.
If successful the *JoinService* will return the encryption key for the stateful disk as part of the initialization response.
This process ensures that Constellation nodes can securely recover and rejoin a cluster autonomously.
In case of a disaster, where the control plane itself becomes unhealthy, Constellation provides a mechanism to recover that cluster and bring it back into a healthy state.
Constellation provides a recovery mechanism for cases where the control plane has failed and is unable to replace nodes.
The `constellation recover` command connects to a node, establishes a secure connection using [attested TLS](../architecture/attestation.md#attested-tls-atls), and provides that node with the key to decrypt its stateful disk and continue booting.
This process has to be repeated until enough nodes are back running for establishing a [member quorum for etcd](https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance) and the Kubernetes state can be recovered.
## Identify unhealthy clusters
The first step to recovery is identifying when a cluster becomes unhealthy.
Usually, that's first observed when the Kubernetes API server becomes unresponsive.
The causes can vary but are often related to issues in the underlying infrastructure.
Recovery in Constellation becomes necessary if not enough control-plane nodes are in a healthy state to keep the control plane operational.
Usually, this can be first observed when the Kubernetes API server becomes unresponsive.
The health status of the Constellation nodes can be checked and monitored via the cloud service provider.
The health status of the Constellation nodes can be checked and monitored via the cloud service provider (CSP).
Constellation provides logging information on the boot process and status via [cloud logging](troubleshooting.md#cloud-logging).
In the following, you'll find detailed descriptions for identifying clusters stuck in recovery for each cloud environment.
Once you've identified that your cluster is in an unhealthy state you can use the [recovery](recovery.md#recover-your-cluster) command of the Constellation CLI to restore it.
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -62,9 +51,7 @@ If that fails, because the control plane is unhealthy, you will see log messages
{"level":"ERROR","ts":"2022-09-08T09:57:23Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"}
```
That means you have to recover that node manually.
Before you continue with the [recovery process](#recover-your-cluster) you need to know the node's IP address.
For the IP address, return to the instances *Overview* page and find the *Private IP address*.
This means that you have to recover the node manually. For this, you need its IP address, which can be obtained from the *Overview* page under *Private IP address*.
</tabItem>
<tabItem value="gcp" label="GCP" default>
@ -101,16 +88,14 @@ If that fails, because the control plane is unhealthy, you will see log messages
{"level":"ERROR","ts":"2022-09-08T10:22:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"}
```
That means you have to recover that node manually.
Before you continue with the [recovery process](#recover-your-cluster) you need to know the node's IP address.
For the IP address go to the *"VM Instance" -> "network interfaces"* page and take the address from *"Primary internal IP address."*
This means that you have to recover the node manually. For this, you need its IP address, which can be obtained from the *"VM Instance" -> "network interfaces"* page under *"Primary internal IP address."*
</tabItem>
</tabs>
## Recover your cluster
Depending on the size of your cluster and the number of unhealthy control plane nodes the following process needs to be repeated until a [member quorum for etcd](https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance) is established.
The following process needs to be repeated until a [member quorum for etcd](https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance) is established.
For example, assume you have 5 control-plane nodes in your cluster and 4 of them have been rebooted due to a maintenance downtime in the cloud environment.
You have to run through the following process for 2 of these nodes and recover them manually to recover the quorum.
From there, your cluster will auto heal the remaining 2 control-plane nodes and the rest of your cluster.
@ -140,5 +125,3 @@ In the serial console output of the node you'll see a similar output to the foll
{"level":"INFO","ts":"2022-09-08T10:26:59Z","logger":"recoveryServer.gRPC","caller":"zap/server_interceptors.go:61","msg":"finished streaming call with code OK","grpc.start_time":"2022-09-08T10:26:59Z","system":"grpc","span.kind":"server","grpc.service":"recoverproto.API","grpc.method":"Recover","peer.address":"192.0.2.3:41752","grpc.code":"OK","grpc.time_ms":15.701}
{"level":"INFO","ts":"2022-09-08T10:27:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:87","msg":"RejoinClient stopped"}
```
After enough control plane nodes have been recovered and the Kubernetes cluster becomes healthy again, the rest of the cluster will start auto healing using the mechanism described above.

View File

@ -4,9 +4,7 @@ Constellation provides all features of a Kubernetes cluster including scaling an
## Worker node scaling
[During cluster initialization](create.md#init) you can choose to deploy the [cluster autoscaler](https://github.com/kubernetes/autoscaler). It automatically provisions additional worker nodes so that all pods have a place to run.
Alternatively, you can choose to manually scale your cluster:
[During cluster initialization](create.md#init) you can choose to deploy the [cluster autoscaler](https://github.com/kubernetes/autoscaler). It automatically provisions additional worker nodes so that all pods have a place to run. Alternatively, you can choose to manually scale your cluster up or down:
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -26,8 +24,6 @@ Alternatively, you can choose to manually scale your cluster:
</tabItem>
</tabs>
This works for scaling your worker nodes up and down.
## Control-plane node scaling
Control-plane nodes can **only be scaled manually and only scaled up**!

View File

@ -1,17 +1,17 @@
# Managing SSH Keys
# Manage SSH Keys
Constellation gives you the capability to create UNIX users which can connect to the cluster nodes over SSH, allowing you to access both control-plane as well as worker nodes. While the data partition is persistent, the system partition is read-only, meaning that users need to be re-created upon each restart of a node. This is where the Access Manager comes into effect, ensuring the automatic (re-)creation of all users whenever a node is restarted.
Constellation gives you the capability to create UNIX users which can connect to the cluster nodes over SSH, allowing you to access both control-plane and worker nodes. While the nodes' data partitions are persistent, the system partitions are read-only. Consequently, users need to be re-created upon each restart of a node. This is where the Access Manager comes into effect, ensuring the automatic (re-)creation of all users whenever a node is restarted.
During the initial creation of the cluster, all users defined in the `ssh-users` section of the Constellation configuration (see the [reference section](../reference/config.md) for details) are automatically created during the initialization process.
For persistence, they're transferred into a ConfigMap called `ssh-users`, residing in the `kube-system` namespace. When no users are initially defined, the ConfigMap will still be created with no entries. After the initial definition in the Constellation configuration, users can be added and removed by modifying the entries of the ConfigMap and performing a restart of a node.
During the initial creation of the cluster, all users defined in the `ssh-users` section of the Constellation [configuration file](../reference/config.md) are automatically created during the initialization process. For persistence, the users are stored in a ConfigMap called `ssh-users`, residing in the `kube-system` namespace. For a running cluster, users can be added and removed by modifying the entries of the ConfigMap and performing a restart of a node.
## Access Manager
The Access Manager doesn't restrict users on the use of certain key formats, meaning that all underlying formats the OpenSSH server supports are accepted. These are RSA, ECDSA (using the `nistp256`, `nistp384`, `nistp521` curves) and Ed25519. No validation is performed on the side of the Access Manager too, passing them directly to the authorized key lists as defined.
The Access Manager supports all OpenSSH key types. These are RSA, ECDSA (using the `nistp256`, `nistp384`, `nistp521` curves) and Ed25519.
Note that all users are automatically created with `sudo` capabilities, so make sure no one unintended has permissions to modify the `ssh-users` ConfigMap.
:::note
All users are automatically created with `sudo` capabilities.
:::
The Access Manager is deployed as a DaemonSet called `constellation-access-manager`, running as an `initContainer` and afterward running a `pause` container to avoid automatic restarts. While technically killing the Pod and letting it restart works for the (re-)creation of users, it doesn't automatically remove users. Therefore, a complete node restart is important to ensure the correct modification of users on the system and needs to be executed manually after making changes to the ConfigMap.
The Access Manager is deployed as a DaemonSet called `constellation-access-manager`, running as an `initContainer` and afterward running a `pause` container to avoid automatic restarts. While technically killing the Pod and letting it restart works for the (re-)creation of users, it doesn't automatically remove users. Thus, a complete node restart is required after making changes to the ConfigMap.
When a user is deleted from the ConfigMap, it won't be re-created after the next restart of a node. The home directories of the affected users will be moved to `/var/evicted`, with the owner of each directory and its content being modified to `root`.

View File

@ -1,26 +1,25 @@
# Use persistent storage
Persistent storage in Kubernetes requires configuration based on your cloud provider of choice.
Persistent storage in Kubernetes requires cloud-specific configuration.
For abstraction of container storage, Kubernetes offers [volumes](https://kubernetes.io/docs/concepts/storage/volumes/),
allowing users to mount storage solutions directly into containers.
The [Container Storage Interface (CSI)](https://kubernetes-csi.github.io/docs/) is the standard interface for exposing arbitrary block and file storage systems into containers in Kubernetes.
Cloud providers offer their own CSI-based solutions for cloud storage.
Cloud service providers (CSPs) offer their own CSI-based solutions for cloud storage.
### Confidential storage
Most cloud storage solutions support encryption, such as [GCE Persistent Disks (PD)](https://cloud.google.com/kubernetes-engine/docs/how-to/using-cmek).
Constellation supports the available CSI-based storage options for Kubernetes engines in Azure and GCP.
However, their encryption takes place in the storage backend and is managed by the cloud provider.
This mode of storage encryption doesn't provide confidential storage.
Using the default CSI drivers for these storage types means trusting the CSP with your persistent data.
However, their encryption takes place in the storage backend and is managed by the CSP.
Thus, using the default CSI drivers for these storage types means trusting the CSP with your persistent data.
Constellation provides CSI drivers for Azure Disk and GCE PD, offering [encryption on the node level](../architecture/keys.md#storage-encryption). They enable transparent encryption for persistent volumes without needing to trust the cloud backend. Plaintext data never leaves the confidential VM context, offering you confidential storage.
To address this, Constellation provides CSI drivers for Azure Disk and GCE PD, offering [encryption on the node level](../architecture/keys.md#storage-encryption). They enable transparent encryption for persistent volumes without needing to trust the cloud backend. Plaintext data never leaves the confidential VM context, offering you confidential storage.
For more details see [encrypted persistent storage](../architecture/encrypted-storage.md).
## CSI Drivers
Constellation can use the following drivers which offer node level encryption and optional integrity protection.
Constellation supports the following drivers, which offer node-level encryption and optional integrity protection.
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -46,7 +45,7 @@ Note that in case the options above aren't a suitable solution for you, Constell
## Installation
The following installation guide gives a brief overview of using CSI-based confidential cloud storage for persistent volumes in Constellation.
The following installation guide gives an overview of how to securely use CSI-based cloud storage for persistent volumes in Constellation.
<tabs groupId="csp">
<tabItem value="azure" label="Azure" default>
@ -82,12 +81,6 @@ The following installation guide gives a brief overview of using CSI-based confi
EOF
```
:::info
By default, integrity protection is disabled for performance reasons. If you want to enable integrity protection, add `csi.storage.k8s.io/fstype: ext4-integrity` to `parameters`. Alternatively, you can use another filesystem by specifying another file system type with the suffix `-integrity`. Note that volume expansion isn't supported for integrity-protected disks.
:::
</tabItem>
<tabItem value="gcp" label="GCP" default>
@ -119,15 +112,15 @@ By default, integrity protection is disabled for performance reasons. If you wan
EOF
```
</tabItem>
</tabs>
:::info
By default, integrity protection is disabled for performance reasons. If you want to enable integrity protection, add `csi.storage.k8s.io/fstype: ext4-integrity` to `parameters`. Alternatively, you can use another filesystem by specifying another file system type with the suffix `-integrity`. Note that volume expansion isn't supported for integrity-protected disks.
:::
</tabItem>
</tabs>
3. Create a [persistent volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
A [persistent volume claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) is a request for storage with certain properties.

View File

@ -1,7 +1,6 @@
# Terminate your cluster
You can terminate your cluster using the CLI.
You need the state file of your running cluster named `constellation-state.json` in the current directory.
You can terminate your cluster using the CLI. For this, you need the state file of your running cluster named `constellation-state.json` in the current directory.
:::danger
@ -16,7 +15,7 @@ constellation terminate
```
This deletes all resources created by Constellation in your cloud environment.
All local files created by the `create` and `init` commands are deleted as well, except the *master secret* `constellation-mastersecret.json` and the configuration file.
All local files created by the `create` and `init` commands are deleted as well, except for `constellation-mastersecret.json` and the configuration file.
:::caution

View File

@ -1,24 +1,24 @@
# Azure trusted launch VMs
# Use Azure trusted launch VMs
Constellation supports Azure trusted launch VMs. These are VMs with instance type `Standard_D*_v4` and `Standard_E*_v4`.
Constellation also supports [trusted launch VMs](https://docs.microsoft.com/en-us/azure/virtual-machines/trusted-launch) on Microsoft Azure. Trusted launch VMs don't offer the same level of security as CVMs, but are available in more regions and in larger quantities. The main difference between trusted launch VMs and normal VMs is that the former offer vTPM-based remote attestation. When used with trusted launch VMs, Constellation relies on vTPM-based remote attestation to verify nodes.
:::caution
Trusted launch VMs don't provide [runtime encryption](../overview/confidential-kubernetes.md).
For highest security, use Confidential VMs.
Trusted launch VMs don't provide runtime encryption and don't keep the cloud service provider (CSP) out of your trusted computing base.
:::
Run `constellation config instance-types` to show all supported instance types.
Constellation supports trusted launch VMs with instance types `Standard_D*_v4` and `Standard_E*_v4`. Run `constellation config instance-types` for a list of all supported instance types.
## VM images
Azure currently doesn't support [community galleries for trusted launch VMs](https://docs.microsoft.com/en-us/azure/virtual-machines/share-gallery-community). So you need to import the VM image into your cloud subscription.
Azure currently doesn't support [community galleries for trusted launch VMs](https://docs.microsoft.com/en-us/azure/virtual-machines/share-gallery-community). Thus, you need to manually import the Constellation node image into your cloud subscription.
The latest image is available at [https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0](https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0). Simply adjust the last three numbers if you want to download an image for a different version.
The latest image is available at [https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0](https://public-edgeless-constellation.s3.us-east-2.amazonaws.com/azure_image_exports/2.0.0). Simply adjust the last three digits to download a different version.
After you've downloaded the image, create a resource group `constellation-images` in your Azure subscription and import the image.
You can use a script to do this:
```bash
wget https://github.com/edgelesssys/constellation/blob/main/hack/importAzure.sh
chmod +x importAzure.sh

View File

@ -1,4 +1,4 @@
# Upgrade you cluster
# Upgrade your cluster
Constellation provides an easy way to upgrade from one release to the next.
This involves choosing a new VM image to use for all nodes in the cluster and updating the cluster's expected measurements.

View File

@ -1,42 +1,48 @@
# Verify your cluster
# Manually verify your cluster
Constellation's [attestation feature](../architecture/attestation.md) allows you, or a third party, to verify the confidentiality and integrity of your Constellation.
Constellation's [attestation feature](../architecture/attestation.md) allows you, or a third party, to explicitly verify the integrity and confidentiality of your Constellation cluster.
:::note
The steps below are purely optional. They're automatically executed by `constellation init` when you initialize your cluster. The `constellation verify` command mostly has an illustrative purpose.
:::
## Fetch measurements
To verify the integrity of Constellation you need trusted measurements to verify against. For each of the released images there are signed measurements, which you can download using the CLI:
To verify the integrity of Constellation you need trusted measurements to verify against. For each node image released by Edgeless Systems, there are signed measurements, which you can download using the CLI:
```bash
constellation config fetch-measurements
```
This command performs the following steps:
1. Look up the signed measurements for the configured image.
2. Download the measurements.
3. Verify the signature.
4. Write measurements into configuration file.
1. Download the signed measurements for the configured image. By default, this will use Edgeless Systems' public measurement registry.
2. Verify the signed images. This will use Edgeless Systems' [public key](https://edgeless.systems/es.pub).
3. Write measurements into configuration file.
## The *verify* command
Once measurements are configured, this command verifies an attestation statement issued by a Constellation, thereby verifying the integrity and confidentiality of the whole cluster.
The following command performs attestation on the Constellation in your current workspace:
The `verify` command obtains and verifies an attestation statement from a running Constellation cluster.
```bash
constellation verify
constellation verify [--cluster-id ...]
```
The command makes sure the value passed to `-cluster-id` matches the *clusterID* presented in the attestation statement.
This allows you to verify that you are connecting to a specific Constellation instance
Additionally, the confidential computing capabilities, as well as the VM image, are verified to match the expected configurations.
From the attestation statement, the command verifies the following properties:
* The cluster is using the correct Confidential VM (CVM) type.
* Inside the CVMs, the correct node images are running. The node images are identified through the measurements obtained in the previous step.
* The unique ID of the cluster matches the one passed in via `--cluster-id`.
Once the above properties are verified, you know that you are talking to the right Constellation cluster and it's in a good and trustworthy shape.
### Custom arguments
You can provide additional arguments for `verify` to verify any Constellation you have network access to. This requires you to provide:
The `verify` command also allows you to verify any Constellation deployment that you have network access to. For this you need to following:
* The IP address of a running Constellation's [VerificationService](../architecture/components.md#verification-service). The *VerificationService* is exposed via a NodePort service using the external IP address of your cluster. Run `kubectl get nodes -o wide` and look for `EXTERNAL-IP`.
* The Constellation's *clusterID*. See [cluster identity](../architecture/keys.md#cluster-identity) for more details.
* The IP address of a running Constellation cluster's [VerificationService](../architecture/components.md#verification-service). The `VerificationService` is exposed via a `NodePort` service using the external IP address of your cluster. Run `kubectl get nodes -o wide` and look for `EXTERNAL-IP`.
* The cluster's *clusterID*. See [cluster identity](../architecture/keys.md#cluster-identity) for more details.
```bash
For example:
```shell-session
constellation verify -e 192.0.2.1 --cluster-id Q29uc3RlbGxhdGlvbkRvY3VtZW50YXRpb25TZWNyZXQ=
```

View File

@ -105,11 +105,6 @@
"label": "Create your cluster",
"id": "workflows/create"
},
{
"type": "doc",
"label": "Verify your cluster",
"id": "workflows/verify-cluster"
},
{
"type": "doc",
"label": "Scale your cluster",
@ -120,26 +115,6 @@
"label": "Upgrade your cluster",
"id": "workflows/upgrade"
},
{
"type": "doc",
"label": "Use persistent storage",
"id": "workflows/storage"
},
{
"type": "doc",
"label": "Azure trusted launch VMs",
"id": "workflows/trusted-launch"
},
{
"type": "doc",
"label": "Managing SSH keys",
"id": "workflows/ssh"
},
{
"type": "doc",
"label": "Troubleshooting",
"id": "workflows/troubleshooting"
},
{
"type": "doc",
"label": "Terminate your cluster",
@ -149,6 +124,31 @@
"type": "doc",
"label": "Recover your cluster",
"id": "workflows/recovery"
},
{
"type": "doc",
"label": "Verify your cluster",
"id": "workflows/verify-cluster"
},
{
"type": "doc",
"label": "Manage SSH keys",
"id": "workflows/ssh"
},
{
"type": "doc",
"label": "Use persistent storage",
"id": "workflows/storage"
},
{
"type": "doc",
"label": "Use Azure trusted launch VMs",
"id": "workflows/trusted-launch"
},
{
"type": "doc",
"label": "Troubleshooting",
"id": "workflows/troubleshooting"
}
]
},