constellation/.github/actions/constellation_create/action.yml
Markus Rudy 85b44f7f57
ci: make waiting for nodes more robust (#2981)
* ci: make waiting for nodes more robust

After initializing the cluster, a lot of things happen in parallel and
are potentially getting in each others' way: nodes are joining,
daemonsets are proliferating, the network is being set up. During this
period, it's not unusual that the Kubernetes API server is unavailable
for a short time, e.g. due to etcd loosing quorum or load balancing
changes.

This period of instability has the potential to affect all kubectl
commands negatively, leading to problems especially for tests, where
command failures often lead to test failures. On the other hand, we'd
expect everything to be quite stable after the initial dust settles.

Therefore, this commit changes how we wait after initializing a cluster.
Until we have a reasonable expectation of readiness, we ignore command
failures and wait for things to stabilize. The cluster is considered
stable once all configured nodes and all API servers report ready.
2024-03-13 09:42:18 +01:00

277 lines
9.7 KiB
YAML

name: Constellation create
description: Create a new Constellation cluster using the latest OS image.
inputs:
workerNodesCount:
description: "Number of worker nodes to spawn."
required: true
controlNodesCount:
description: "Number of control-plane nodes to spawn."
required: true
cloudProvider:
description: "Either 'gcp', 'aws' or 'azure'."
required: true
attestationVariant:
description: "Attestation variant to use."
required: true
machineType:
description: "Machine type of VM to spawn."
required: false
cliVersion:
description: "Version of the CLI"
required: true
osImage:
description: "OS image to use."
required: true
isDebugImage:
description: "Is OS img a debug img?"
required: true
kubernetesVersion:
description: "Kubernetes version to create the cluster from."
required: false
artifactNameSuffix:
description: "Suffix for artifact naming."
required: true
fetchMeasurements:
default: "false"
description: "Update measurements via the 'constellation config fetch-measurements' command."
azureSNPEnforcementPolicy:
required: false
description: "Azure SNP enforcement policy."
test:
description: "The e2e test payload."
required: true
azureClusterCreateCredentials:
description: "Azure credentials authorized to create a Constellation cluster."
required: true
azureIAMCreateCredentials:
description: "Azure credentials authorized to create an IAM configuration."
required: true
refStream:
description: "Reference and stream of the image in use"
required: false
internalLoadBalancer:
description: "Whether to use an internal load balancer for the control plane"
required: false
clusterCreation:
description: "How to create infrastructure for the e2e test. One of [cli, terraform]."
default: "cli"
marketplaceImageVersion:
description: "Marketplace OS image version. Used instead of osImage."
required: false
force:
description: "Set the force-flag on apply to ignore version mismatches."
required: false
encryptionSecret:
description: "The secret to use for encrypting the artifact."
required: true
outputs:
kubeconfig:
description: "The kubeconfig for the cluster."
value: ${{ steps.get-kubeconfig.outputs.KUBECONFIG }}
osImageUsed:
description: "The OS image used in the cluster."
value: ${{ steps.setImage.outputs.image }}
runs:
using: "composite"
steps:
- name: Set constellation name
shell: bash
run: |
yq eval -i "(.name) = \"e2e-test\"" constellation-conf.yaml
- name: Set Azure SNP enforcement policy
if: inputs.azureSNPEnforcementPolicy != ''
shell: bash
run: |
if [[ ${{ inputs.attestationVariant }} != 'azure-sev-snp' ]]; then
echo "SNP enforcement policy is only supported for Azure"
exit 1
fi
yq eval -i "(.attestation.azureSEVSNP.firmwareSignerConfig.enforcementPolicy) \
= \"${{ inputs.azureSNPEnforcementPolicy }}\"" constellation-conf.yaml
- name: Set image
id: setImage
shell: bash
env:
imageInput: ${{ inputs.osImage }}
run: |
if [[ -z "${imageInput}" ]]; then
echo "No image specified. Using default image from config."
image=$(yq eval ".image" constellation-conf.yaml)
echo "image=${image}" | tee -a "$GITHUB_OUTPUT"
exit 0
fi
yq eval -i "(.image) = \"${imageInput}\"" constellation-conf.yaml
echo "image=${imageInput}" | tee -a "$GITHUB_OUTPUT"
- name: Set marketplace image flag (AWS)
if: inputs.marketplaceImageVersion != '' && inputs.cloudProvider == 'aws'
shell: bash
run: |
yq eval -i "(.provider.aws.useMarketplaceImage) = true" constellation-conf.yaml
yq eval -i "(.image) = \"${{ inputs.marketplaceImageVersion }}\"" constellation-conf.yaml
- name: Set marketplace image flag (Azure)
if: inputs.marketplaceImageVersion != '' && inputs.cloudProvider == 'azure'
shell: bash
run: |
yq eval -i "(.provider.azure.useMarketplaceImage) = true" constellation-conf.yaml
yq eval -i "(.image) = \"${{ inputs.marketplaceImageVersion }}\"" constellation-conf.yaml
- name: Set marketplace image flag (GCP)
if: inputs.marketplaceImageVersion != '' && inputs.cloudProvider == 'gcp'
shell: bash
run: |
yq eval -i "(.provider.gcp.useMarketplaceImage) = true" constellation-conf.yaml
yq eval -i "(.image) = \"${{ inputs.marketplaceImageVersion }}\"" constellation-conf.yaml
- name: Update measurements for non-stable images
if: inputs.fetchMeasurements
shell: bash
run: |
constellation config fetch-measurements --debug --insecure
- name: Set instanceType
if: inputs.machineType && inputs.machineType != 'default'
shell: bash
run: |
yq eval -i "(.nodeGroups[] | .instanceType) = \"${{ inputs.machineType }}\"" constellation-conf.yaml
- name: Set node count
shell: bash
run: |
yq eval -i "(.nodeGroups[] | select(.role == \"control-plane\") | .initialCount) = ${{ inputs.controlNodesCount }}" constellation-conf.yaml
yq eval -i "(.nodeGroups[] | select(.role == \"worker\") | .initialCount) = ${{ inputs.workerNodesCount }}" constellation-conf.yaml
- name: Enable debugCluster flag
if: inputs.isDebugImage == 'true'
shell: bash
run: |
yq eval -i '(.debugCluster) = true' constellation-conf.yaml
- name: Enable internalLoadBalancer flag
if: inputs.internalLoadBalancer == 'true'
shell: bash
run: |
yq eval -i '(.internalLoadBalancer) = true' constellation-conf.yaml
- name: Show Cluster Configuration
shell: bash
run: |
echo "Creating cluster using config:"
cat constellation-conf.yaml
sudo sh -c 'echo "127.0.0.1 license.confidential.cloud" >> /etc/hosts' || true
- name: Constellation create (CLI)
shell: bash
run: |
constellation apply --skip-phases=init,attestationconfig,certsans,helm,image,k8s -y --debug --tf-log=DEBUG
- name: Cdbg deploy
if: inputs.isDebugImage == 'true'
uses: ./.github/actions/cdbg_deploy
with:
cloudProvider: ${{ inputs.cloudProvider }}
attestationVariant: ${{ inputs.attestationVariant }}
test: ${{ inputs.test }}
azureClusterCreateCredentials: ${{ inputs.azureClusterCreateCredentials }}
azureIAMCreateCredentials: ${{ inputs.azureIAMCreateCredentials }}
refStream: ${{ inputs.refStream }}
kubernetesVersion: ${{ inputs.kubernetesVersion }}
clusterCreation: ${{ inputs.clusterCreation }}
- name: Set force flag
id: set-force-flag
if: inputs.force == 'true'
shell: bash
run: |
echo "flag=--force" | tee -a $GITHUB_OUTPUT
- name: Constellation apply (Terraform)
id: constellation-apply-terraform
if: inputs.clusterCreation == 'terraform'
uses: ./.github/actions/terraform_apply
with:
cloudProvider: ${{ inputs.cloudProvider }}
- name: Constellation apply
id: constellation-apply-cli
if: inputs.clusterCreation != 'terraform'
shell: bash
run: |
constellation apply --skip-phases=infrastructure --debug ${{ steps.set-force-flag.outputs.flag }}
- name: Get kubeconfig
id: get-kubeconfig
shell: bash
run: |
echo "KUBECONFIG=$(pwd)/constellation-admin.conf" | tee -a $GITHUB_OUTPUT
- name: Wait for nodes to join and become ready
shell: bash
env:
KUBECONFIG: "${{ steps.get-kubeconfig.outputs.KUBECONFIG }}"
JOINTIMEOUT: "1200" # 20 minutes timeout for all nodes to join
CONTROL_NODES_COUNT: "${{ inputs.controlNodesCount }}"
WORKER_NODES_COUNT: "${{ inputs.workerNodesCount }}"
run: ./.github/actions/constellation_create/wait-for-nodes.sh
- name: Download boot logs
if: always()
continue-on-error: true
shell: bash
env:
CSP: ${{ inputs.cloudProvider }}
run: |
echo "::group::Download boot logs"
CONSTELL_UID=$(yq '.infrastructure.uid' constellation-state.yaml)
case $CSP in
azure)
AZURE_RESOURCE_GROUP=$(yq eval ".provider.azure.resourceGroup" constellation-conf.yaml)
./.github/actions/constellation_create/az-logs.sh ${AZURE_RESOURCE_GROUP}
;;
gcp)
GCP_ZONE=$(yq eval ".provider.gcp.zone" constellation-conf.yaml)
./.github/actions/constellation_create/gcp-logs.sh ${GCP_ZONE} ${CONSTELL_UID}
;;
aws)
./.github/actions/constellation_create/aws-logs.sh us-east-2 ${CONSTELL_UID}
;;
esac
echo "::endgroup::"
- name: Upload boot logs
if: always() && !env.ACT
continue-on-error: true
uses: ./.github/actions/artifact_upload
with:
name: serial-logs-${{ inputs.artifactNameSuffix }}
path: >
!(terraform).log
encryptionSecret: ${{ inputs.encryptionSecret }}
- name: Prepare terraform state folders
if: always()
shell: bash
run: |
mkdir to-zip
cp -r constellation-terraform to-zip
cp -r constellation-iam-terraform to-zip
rm to-zip/constellation-terraform/plan.zip
rm -rf to-zip/constellation-terraform/.terraform to-zip/constellation-iam-terraform/.terraform
- name: Upload terraform state
if: always()
uses: ./.github/actions/artifact_upload
with:
name: terraform-state-${{ inputs.artifactNameSuffix }}
path: >
to-zip/constellation-terraform
to-zip/constellation-iam-terraform
encryptionSecret: ${{ inputs.encryptionSecret }}