constellation/operators/constellation-node-operator
Moritz Sanft c15e4efef6
terraform: Azure Marketplace image support (#2651)
* terraform: add Azure marketplace variable

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* config: add Azure marketplace variable

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* cli: use Terraform variables from config

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* terraform: pass down marketplace variable

* image: pad Azure images to 1GiB

* terraform: add version attribute to marketplace image

* semver: allow versions to be exported without prefix

* cli: boolean var to use marketplace images

* config: remove dive key

* dev-docs: add instructions on how to use marketplace images

* terraform: fix unit test

* terraform: only fetch image for non-marketplace images

* mpimage: refactor image selection

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* [remove] increase minor version for image build

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* terraform: ignore changes to source_image_reference on upgrade

* operator: add support for parsing Azure marketplace images

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* upgrade: fix imagefetcher call

* docs: add info about azure marketplace

* image: ensure more than 1GiB in size

* image: test to pad to 2GiB

* version: change back to v2.14.0-pre

* image: GPT-conformant image size padding

* [remove] increase version

* mpimage: inline prefix func

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* ci: add marketplace image e2e test

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* [remove] register workflow

* ci: fix workflow name

* ci: only allow azure test

* cli: add marketplace image input to interface

* cli: fix argument passing

* version: roll back to v2.14.0

* ci: add force-flag support

* Update docs/docs/overview/license.md

* Update dev-docs/workflows/marketplace-images.md

Co-authored-by: Moritz Eckert <m1gh7ym0@gmail.com>

---------

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
Co-authored-by: Moritz Eckert <m1gh7ym0@gmail.com>
Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
2023-12-08 14:40:31 +01:00
..
api deps: update go to 1.21.1 (#2389) 2023-09-28 22:29:14 +02:00
config deps: update gcr.io/kubebuilder/kube-rbac-proxy Docker tag to v0.14.1 (#2063) 2023-08-04 13:49:38 +02:00
controllers Revert "operator: always delete terminated pending nodes (#2545)" (#2596) 2023-11-13 20:25:34 +01:00
external/github.com/medik8s/node-maintenance-operator/config/crd/bases [node operator] Add nodemaintenance CRD 2022-08-09 10:29:04 +02:00
hack [node operator] Initial commit 2022-08-09 10:29:04 +02:00
internal terraform: Azure Marketplace image support (#2651) 2023-12-08 14:40:31 +01:00
sgreconciler deps: update golangci/golangci-lint to v1.55.1 (#2517) 2023-11-02 11:16:17 +01:00
.dockerignore [node operator] Initial commit 2022-08-09 10:29:04 +02:00
.gitignore constellation-lib: add Helm wrapper (#2680) 2023-12-06 10:01:39 +01:00
BUILD.bazel operators: use bazel to run operator envtests 2023-08-17 10:46:45 +02:00
bundle.Dockerfile join: synchronize control plane joining (#776) 2022-12-09 18:30:20 +01:00
go.mod deps: update module github.com/hashicorp/* (#2626) 2023-11-22 09:35:00 +01:00
go.sum deps: update module github.com/hashicorp/* (#2626) 2023-11-22 09:35:00 +01:00
main.go operators: infrastructure autodiscovery (#1958) 2023-07-05 07:27:34 +02:00
Makefile operators: use bazel to run operator envtests 2023-08-17 10:46:45 +02:00
PROJECT upgrade: support Kubernetes components (#839) 2023-01-03 12:09:53 +01:00
README.md constellation-lib: add Helm wrapper (#2680) 2023-12-06 10:01:39 +01:00

constellation-node-operator

The constellation node operator manages the lifecycle of constellation nodes after cluster initialization. In particular, it is responsible for updating the OS images of nodes by replacing nodes running old images with new nodes.

High level goals

  • Admin or constellation apply can create custom resources for node related components
  • The operator will manage nodes in the cluster by trying to ensure every node has the specified image
  • If a node uses an outdated image, it will be replaced by a new node
  • Admin can update the specified image at any point in time which will trigger a rolling upgrade through the cluster
  • Nodes are replaced safely (cordon, drain, preservation of node labels)

Description

The operator has multiple controllers with corresponding custom resource definitions (CRDs) that are responsible for the following high level tasks:

NodeVersion

NodeVersion is the only user controlled CRD. The spec allows an administrator to update the desired image and trigger a rolling update.

Example for GCP:

apiVersion: update.edgeless.systems/v1alpha1
kind: NodeVersion
metadata:
  name: constellation-version
spec:
  image: "projects/constellation-images/global/images/<image-name>"

Example for Azure:

apiVersion: update.edgeless.systems/v1alpha1
kind: NodeVersion
metadata:
  name: constellation-version
spec:
  image: "/subscriptions/<subscription-id>/resourceGroups/CONSTELLATION-IMAGES/providers/Microsoft.Compute/galleries/Constellation/images/<image-definition-name>/versions/<image-version>"

AutoscalingStrategy

AutoscalingStrategy is used and modified by the NodeVersion controller to pause the cluster-autoscaler while an image update is in progress.

Example:

apiVersion: update.edgeless.systems/v1alpha1
kind: AutoscalingStrategy
metadata:
  name: autoscalingstrategy
spec:
  enabled: true
  deploymentName: "cluster-autoscaler"
  deploymentNamespace: "kube-system"

ScalingGroup

ScalingGroup represents one scaling group at the CSP. Constellation uses one scaling group for worker nodes and one for control-plane nodes. The scaling group controller will automatically set the image used for newly created nodes to be the image set in the NodeVersion Spec. On cluster creation, one instance of the ScalingGroup resource per scaling group at the CSP is created. It does not need to be updated manually.

Example for GCP:

apiVersion: update.edgeless.systems/v1alpha1
kind: ScalingGroup
metadata:
  name: scalinggroup-worker
spec:
  nodeImage: "constellation-version"
  groupId: "projects/<project-id>/zones/<zone>/instanceGroupManagers/<instance-group-name>"
  autoscaling: true

Example for Azure:

apiVersion: update.edgeless.systems/v1alpha1
kind: ScalingGroup
metadata:
  name: scalinggroup-worker
spec:
  nodeImage: "constellation-version"
  groupId: "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/<scale-set-name>"
  autoscaling: true

PendingNode

PendingNode represents a node that is either joining or leaving the cluster. These are nodes that are not part of the cluster (they do not have a corresponding node object). Instead, they are used to track the creation and deletion of nodes. This resource is automatically managed by the operator. For joining nodes, the deadline is used to delete the pending node if it fails to join before the deadline ends.

Example for GCP:

apiVersion: update.edgeless.systems/v1alpha1
kind: PendingNode
metadata:
  name: pendingnode-sample
spec:
  providerID: "gce://<project-id>/<zone>/<instance-name>"
  groupID: "projects/<project-id>/zones/<zone>/instanceGroupManagers/<instance-group-name>"
  nodeName: "<kubernetes-node-name>"
  goal: Join
  deadline: "2022-07-04T08:33:18+00:00"

Example for Azure:

apiVersion: update.edgeless.systems/v1alpha1
kind: PendingNode
metadata:
  name: pendingnode-sample
spec:
  providerID: "azure:///subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/<scale-set-name>/virtualMachines/<instance-id>"
  groupID: "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/<scale-set-name>"
  nodeName: "<kubernetes-node-name>"
  goal: Join
  deadline: "2022-07-04T08:33:18+00:00"

Getting Started

Youll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster. Note: Your controller will automatically use the current context in your kubeconfig file (i.e. whatever cluster kubectl cluster-info shows).

Running on the cluster

  1. Install Instances of Custom Resources:
kubectl apply -f config/samples/
  1. Build and push your image to the location specified by IMG:
make docker-build docker-push IMG=<some-registry>/constellation/node-operator:tag
  1. Deploy the controller to the cluster with the image specified by IMG:
make deploy IMG=<some-registry>/constellation/node-operator:tag

Uninstall CRDs

To delete the CRDs from the cluster:

make uninstall

Undeploy controller

UnDeploy the controller to the cluster:

make undeploy

How it works

This project aims to follow the Kubernetes Operator pattern

It uses Controllers which provides a reconcile function responsible for synchronizing resources until the desired state is reached on the cluster

Test It Out

  1. Install the CRDs into the cluster:
make install
  1. Run your controller (this will run in the foreground, so switch to a new terminal if you want to leave it running):
make run

NOTE: You can also run this in one step by running: make install run

Modifying the API definitions

If you are editing the API definitions, generate the manifests such as CRs or CRDs using:

make manifests

NOTE: Run make --help for more information on all potential make targets

More information can be found via the Kubebuilder Documentation

Production deployment

The operator is deployed automatically during constellation-init. Prerequisite for this is that cert-manager is installed. cert-manager is also installed during constellation-init. To deploy you can use the Helm chart at /internal/constellation/helm/charts/edgeless/operators/constellation-operator.