mirror of https://github.com/edgelesssys/constellation.git synced 2025-12-14 23:49:38 -05:00

History

Fabian Kammel 369480a50b Feat/revive (#212 ) * enable revive as linter * fix var-naming revive issues * fix blank-imports revive issues * fix receiver-naming revive issues * fix exported revive issues * fix indent-error-flow revive issues * fix unexported-return revive issues * fix indent-error-flow revive issues Signed-off-by: Fabian Kammel <fk@edgeless.systems>		2022-10-05 15:02:46 +02:00
..
api/v1alpha1	Let operator manage autoscaling of node groups	2022-09-20 13:41:23 +02:00
config	Update scalingroup resource	2022-09-22 09:10:19 +02:00
controllers	Fix autoscalingstrategy controller test	2022-09-22 09:10:19 +02:00
external/github.com/medik8s/node-maintenance-operator/config/crd/bases	[node operator] Add nodemaintenance CRD	2022-08-09 10:29:04 +02:00
hack	[node operator] Initial commit	2022-08-09 10:29:04 +02:00
internal	Feat/revive (#212 )	2022-10-05 15:02:46 +02:00
.dockerignore	[node operator] Initial commit	2022-08-09 10:29:04 +02:00
.gitignore	[node operator] Initial commit	2022-08-09 10:29:04 +02:00
bundle.Dockerfile	[node-operator] rename constellation-node-operator to node-operator	2022-08-09 10:29:04 +02:00
Dockerfile	[node operator] adopt go 1.18	2022-08-09 10:29:04 +02:00
go.mod	upgrade k8s 1.24.3 -> 1.24.6 (#201 )	2022-09-30 17:10:16 +02:00
go.sum	upgrade k8s 1.24.3 -> 1.24.6 (#201 )	2022-09-30 17:10:16 +02:00
main.go	Upgrade go module to v2	2022-09-22 09:10:19 +02:00
Makefile	CI: build and upload node operator	2022-08-09 10:29:04 +02:00
PROJECT	[node-operator] rename constellation-node-operator to node-operator	2022-08-09 10:29:04 +02:00
README.md	remove image pull secret	2022-08-28 15:57:08 +02:00

README.md

constellation-node-operator

The constellation node operator manages the lifecycle of constellation nodes after cluster initialization. In particular, it is responsible for updating the OS images of nodes by replacing nodes running old images with new nodes.

High level goals

Admin or constellation init can create custom resources for node related components
The operator will manage nodes in the cluster by trying to ensure every node has the specified image
If a node uses an outdated image, it will be replaced by a new node
Admin can update the specified image at any point in time which will trigger a rolling upgrade through the cluster
Nodes are replaced safely (cordon, drain, preservation of node labels)

Description

The operator has multiple controllers with corresponding custom resource definitions (CRDs) that are responsible for the following high level tasks:

NodeImage

NodeImage is the only user controlled CRD. The spec allows an administrator to update the desired image and trigger a rolling update.

Example for GCP:

apiVersion: update.edgeless.systems/v1alpha1
kind: NodeImage
metadata:
  name: constellation-coreos
spec:
  image: "projects/constellation-images/global/images/<image-name>"

Example for Azure:

apiVersion: update.edgeless.systems/v1alpha1
kind: NodeImage
metadata:
  name: constellation-coreos
spec:
  image: "/subscriptions/<subscription-id>/resourceGroups/CONSTELLATION-IMAGES/providers/Microsoft.Compute/galleries/Constellation/images/<image-definition-name>/versions/<image-version>"

AutoscalingStrategy

AutoscalingStrategy is used and modified by the NodeImage controller to pause the cluster-autoscaler while an image update is in progress.

Example:

apiVersion: update.edgeless.systems/v1alpha1
kind: AutoscalingStrategy
metadata:
  name: autoscalingstrategy
spec:
  enabled: true
  deploymentName: "cluster-autoscaler"
  deploymentNamespace: "kube-system"

ScalingGroup

ScalingGroup represents one scaling group at the CSP. Constellation uses one scaling group for worker nodes and one for control-plane nodes. The scaling group controller will automatically set the image used for newly created nodes to be the image set in the NodeImage Spec. On cluster creation, one instance of the ScalingGroup resource per scaling group at the CSP is created. It does not need to be updated manually.

Example for GCP:

apiVersion: update.edgeless.systems/v1alpha1
kind: ScalingGroup
metadata:
  name: scalinggroup-worker
spec:
  nodeImage: "constellation-coreos"
  groupId: "projects/<project-id>/zones/<zone>/instanceGroupManagers/<instance-group-name>"
  autoscaling: true

Example for Azure:

apiVersion: update.edgeless.systems/v1alpha1
kind: ScalingGroup
metadata:
  name: scalinggroup-worker
spec:
  nodeImage: "constellation-coreos"
  groupId: "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/<scale-set-name>"
  autoscaling: true

PendingNode

PendingNode represents a node that is either joining or leaving the cluster. These are nodes that are not part of the cluster (they do not have a corresponding node object). Instead, they are used to track the creation and deletion of nodes. This resource is automatically managed by the operator. For joining nodes, the deadline is used to delete the pending node if it fails to join before the deadline ends.

Example for GCP:

apiVersion: update.edgeless.systems/v1alpha1
kind: PendingNode
metadata:
  name: pendingnode-sample
spec:
  providerID: "gce://<project-id>/<zone>/<instance-name>"
  groupID: "projects/<project-id>/zones/<zone>/instanceGroupManagers/<instance-group-name>"
  nodeName: "<kubernetes-node-name>"
  goal: Join
  deadline: "2022-07-04T08:33:18+00:00"

Example for Azure:

apiVersion: update.edgeless.systems/v1alpha1
kind: PendingNode
metadata:
  name: pendingnode-sample
spec:
  providerID: "azure:///subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/<scale-set-name>/virtualMachines/<instance-id>"
  groupID: "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/<scale-set-name>"
  nodeName: "<kubernetes-node-name>"
  goal: Join
  deadline: "2022-07-04T08:33:18+00:00"

Getting Started

You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster. Note: Your controller will automatically use the current context in your kubeconfig file (i.e. whatever cluster kubectl cluster-info shows).

Running on the cluster

Install Instances of Custom Resources:

kubectl apply -f config/samples/

Build and push your image to the location specified by IMG:

make docker-build docker-push IMG=<some-registry>/constellation/node-operator:tag

Deploy the controller to the cluster with the image specified by IMG:

make deploy IMG=<some-registry>/constellation/node-operator:tag

Uninstall CRDs

To delete the CRDs from the cluster:

make uninstall

Undeploy controller

UnDeploy the controller to the cluster:

make undeploy

How it works

This project aims to follow the Kubernetes Operator pattern

It uses Controllers which provides a reconcile function responsible for synchronizing resources until the desired state is reached on the cluster

Test It Out

Install the CRDs into the cluster:

make install

Run your controller (this will run in the foreground, so switch to a new terminal if you want to leave it running):

make run

NOTE: You can also run this in one step by running: make install run

Modifying the API definitions

If you are editing the API definitions, generate the manifests such as CRs or CRDs using:

make manifests

NOTE: Run make --help for more information on all potential make targets

More information can be found via the Kubebuilder Documentation

Production deployment

In production, it is recommended to deploy the operator using the operator lifecycle manager (OLM).

Deploy OLM
```
operator-sdk olm install
```

Deploy Node Maintenance Operator

operator-sdk run bundle quay.io/medik8s/node-maintenance-operator-bundle:latest

Deploy node operator

apiVersion: operators.coreos.com/v1alpha1
 kind: CatalogSource
 metadata:
     name: constellation-node-operator-catalog
     namespace: olm
 spec:
     sourceType: grpc
     # TODO: user: set desired operator catalog version here
     image: ghcr.io/edgelesssys/constellation/node-operator-catalog:v0.0.1
     displayName: Constellation Node Operator
     publisher: Edgeless Systems
     updateStrategy:
         registryPoll:
             interval: 10m
 ---
 apiVersion: operators.coreos.com/v1
 kind: OperatorGroup
 metadata:
     name: constellation-og
     namespace: kube-system
 spec:
     upgradeStrategy: Default
 ---
 apiVersion: operators.coreos.com/v1alpha1
 kind: Subscription
 metadata:
     name: constellation-node-operator-sub
     namespace: kube-system
 spec:
     channel: alpha
     name: constellation-node-operator
     source: constellation-node-operator-catalog
     sourceNamespace: olm
     installPlanApproval: Automatic
     # TODO: user: set desired operator version here
     startingCSV: node-operator.v0.0.1
     config:
         env:
         # TODO: user: set correct CSP here ("azure" or "gcp")
         - name: CONSTEL_CSP
           value: "gcp"

README.md Unescape Escape

constellation-node-operator

High level goals

Description

NodeImage

AutoscalingStrategy

ScalingGroup

PendingNode

Getting Started

Running on the cluster

Uninstall CRDs

Undeploy controller

How it works

Test It Out

Modifying the API definitions

Production deployment

README.md