* cli: add "--skip-helm-wait" flag
This flag can be used to disable the atomic and wait flags during helm install.
This is useful when debugging a failing constellation init, since the user gains access to
the cluster even when one of the deployments is never in a ready state.
* helm: configure GCP cloud controller manager to search in all zones of a region
See also: d716fdd452/providers/gce/gce.go (L376-L380)
* operators: add nodeGroupName to ScalingGroup CRD
NodeGroupName is the human friendly name of the node group that will be exposed to customers via the Constellation config in the future.
* operators: support simple executor / scheduler to reconcile on non-k8s resources
* operators: add new return type for ListScalingGroups to support arbitrary node groups
* operators: ListScalingGroups should return additionally created node groups on AWS
* operators: ListScalingGroups should return additionally created node groups on Azure
* operators: ListScalingGroups should return additionally created node groups on GCP
* operators: ListScalingGroups should return additionally created node groups on unsupported CSPs
* operators: implement external scaling group reconciler
This controller scans the cloud provider infrastructure and changes k8s resources accordingly.
It creates ScaleSet resources when new node groups are created and deletes them if the node groups are removed.
* operators: no longer create scale sets when the operator starts
In the future, scale sets are created dynamically.
* operators: watch for node join/leave events using a controller
* operators: deploy new controllers
* docs: update auto scaling documentation with support for node groups
* init
* make zone flag mandatory again
* add info about zone update + refactor
* add comment in docs about zone update
* Update cli/internal/cmd/iamcreate_test.go
Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
* thomas feedback
* add format check to config validation
* remove TODO
* Update docs/docs/workflows/config.md
Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
* thomas nit
---------
Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
* Use CLI to fetch measurements in e2e test
* Abort helm service upgrade early if user confirmation is missing
* Add container push to CLI build action
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
This change is required to ensure we have not tls handshake errors when connecting to the kubernetes api.
Currently, the certificates used by kube-apiserver pods contain a SAN field with the (single) public ip of the loadbalancer.
If we would allow multiple loadbalancer frontend ips, we could encounter cases where the certificate is only valid for one public ip,
while we try to connect to a different ip.
To prevent this, we consciously disable support for the multi-zone loadbalancer frontend on AWS for now.
This will be re-enabled in the future.
* Fix usage of errors.As in upgrade command implementation
* Use struct pointers when working with custom errors
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
* Delete helm chart on failure before retrying installation
* Add chart name to debug output
* Remove now unused wait flag from helm Release struct
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Normalize naming for the "instance_count" / "initial_count" int terraform to always use "initial_count".
This is required, since there is a naming confusion on AWS.
"initial_count" is more precise, since it reflects the fact that this value is ignored when applying the terraform template
after the scaling groups already exist.
This commit is designed to be reverted in the future (AB#3248).
Terraform does not implement moved blocks with dynamic targets: https://github.com/hashicorp/terraform/issues/31335 so we have to migrate the terraform state ourselves.
* init
add variables
add amount to instance_group again
fix tf validate
rollback old names
make fields optional
fix image ref mini
daniel comments
use latest
* Update cli/internal/terraform/terraform/qemu/main.tf
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* add uid to resource name
* make machine a global variable again
* fix tf
---------
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* fail on version mismatch
* rename to validateCLIandConstellationVersionAreEqual
* fix test
* image version must only be major,minor patch equal (ignore suffix)
* add version support doc
* fix: do not check patch version equality for image and cli
* skip validate on force
* init
* migration working
* make tf variables with default value optional in go through ptr type
* fix CI build
* pr feedback
* add azure targets tf
* skip migration for empty targets
* make instance_count optional
* change role naming to dashed + add validation
* make node_group.zones optional
* Update cli/internal/terraform/terraform/azure/main.tf
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* malte feedback
---------
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* use force to skip compatibility and upgrade in progress check
* update doc
* fix tests
* add force check for helm and k8s
* add no-op check
* fix errors as
* config: move AMD root key to global constant
* attestation: add SNP based attestation for aws
* Always enable SNP, regardless of attestation type.
* Make AWSNitroTPM default again
There exists a bug in AWS SNP implementation where sometimes
a host might not be able to produce valid SNP reports.
Since we have to wait for AWS to fix this we are merging SNP
attestation as opt-in feature.
* terraform: GCP node groups
* cli: marshal GCP node groups to terraform variables
This does not have any side effects for users.
We still strictly create one control-plane and one worker group.
This is a preparation for enabling customizable node groups in the future.
* deps: update Terraform google to v4.69.1
* deps: tidy all modules
* add delay for service account
* deps: tidy all modules
* add delay for service account
---------
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: edgelessci <edgelessci@users.noreply.github.com>
Co-authored-by: Adrian Stobbe <stobbe.adrian@gmail.com>
* invalidate app client id field for azure and provide info
* remove TestNewWithDefaultOptions case
* fix test
* remove appClientID field
* remove client secret + rename err
* remove from docs
* otto feedback
* update docs
* delete env test in cfg since no envs set anymore
* Update dev-docs/workflows/github-actions.md
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* WARNING to stderr
* fix check
---------
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* aws: stop using the imds api for tags
* aws: disable tags in imds api
* aws: only tag instances with non-lecagy tag
* bootstrapper: always let coredns run before cilium
* debugd: make debugd less noisy
* fixup fix aws imds test
* fixup unsued context
* move getting instance id to readInstanceTag
* variant: move into internal/attestation
* attesation: move aws attesation into subfolder nitrotpm
* config: add aws-sev-snp variant
* cli: add tf option to enable AWS SNP
For now the implementations in aws/nitrotpm and aws/snp
are identical. They both contain the aws/nitrotpm impl.
A separate commit will add the actual attestation logic.
* fetch latest version when older than 2 weeks
* extend hack upload tool to pass an upload date
* Revert "config: disable user-facing version Azure SEV SNP fetch for v2.8 (#1882)"
This reverts commit c7b22d314a.
* fix tests
* use NewAzureSEVSNPVersionList for type guarantees
* Revert "use NewAzureSEVSNPVersionList for type guarantees"
This reverts commit 942566453f4b4a2b6dc16f8689248abf1dc47db4.
* assure list is sorted
* improve root.go style
* daniel feedback
* rename to attestationconfigapi + put client and fetcher inside pkg
* rename api/version to versionsapi and put fetcher + client inside pkg
* rename AttestationConfigAPIFetcher to Fetcher
The implementation would recreate the gcp instance template (including all instances and state disks) whenever the image tfvar changes.
Fixed by ignoring lifecycle changes on the instance templates.
Fixes 8c3b963
* client supports delete version
* rename to new attestation / fetcher naming
* add delete command to upload tool
* test client delete
* bazel update
* use general client in attestation client
* Update hack/configapi/cmd/delete.go
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
* daniel feedback
* unit test azure sev upload
* Update hack/configapi/cmd/delete.go
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
* add client integration test
* new client cmds use apiObject
---------
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
* sign version with release key and remove version from fetcher interface
* extend azure-reporter GH action to upload updated version values to the Attestation API
* api: rename AttestationVersionRepo to Client
* api: move client into separate subpkg for
clearer import paths.
* api: rename configapi -> attestationconfig
* api: rename versionsapi -> versions
* api: rename sut to client
* api: split versionsapi client and make it public
* api: split versionapi fetcher and make it public
* config: move attestationversion type to config
* api: fix attestationconfig client test
Co-authored-by: Adrian Stobbe <stobbe.adrian@gmail.com>
* add SignContent() + integrate into configAPI
* use static client for upload versions tool; fix staticupload calleeReference bug
* use version to get proper cosign pub key.
* mock fetcher in CLI tests
* only provide config.New constructor with fetcher
Co-authored-by: Otto Bittner <cobittner@posteo.net>
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
During upgrade all custom resources are backed up to files on the
local file system. Since old versions are also backed up, we need to
reflect the version in the name.
* Add attestation options to config
* Add join-config migration path for clusters with old measurement format
* Always create MAA provider for Azure SNP clusters
* Remove confidential VM option from provider in favor of attestation options
* cli: add config migrate command to handle config migration (#1678)
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Resource names are only unique per kind+ns. Without this patch it
might happen that there are two resources with the same name
in different namespaces. Upgrade might fail in that case.
* The check would previously fail if e.g. `apply` did not upgrade the
image, but a new image was specified in the config. This could
happen if the specified image was too new, but a valid Kuberentes
upgrade was specified.
* ci: fix variable expansion in e2e-upgrade call
* e2e: do not verify measurement signature
* cli: upgrade check show cli upgrades
* only check compatibility for valid upgrades
* use semver.Sort
* extend unit tests
* add unit test for new compatible cli versions
* adapt to feedback
* fix rebase
* rework output
* minor -> major
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* minor -> major
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* dynamic major version
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* remove currentK8sVer argument
* bazel gen & tidy
* bazel update
---------
Co-authored-by: Otto Bittner <cobittner@posteo.net>
Previously the content of files status and upgrade within the
cloudcmd pkg did not fit cloudcmd's pkg description.
This patch introduces a separate pkg to fix that.
The new command allows checking the status of an upgrade
and which versions are installed.
Also remove the unused restclient.
And make GetConstellationVersion a function.
* Remove deprecated fields
* Remove warning for not setting attestationVariant
* Dont write attestationVariant to config
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
* Remove using measurements from the initial control-plane node for the cluster's initial measurements
* Add using measurements from the user's config for the cluster's initial measurements to align behavior with upgrade command
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
The test is implemented as a go test.
It can be executed as a bazel target.
The general workflow is to setup a cluster,
point the test to the workspace in which to
find the kubeconfig and the constellation config
and specify a target image, k8s and
service version. The test will succeed
if it detects all target versions in the cluster
within the configured timeout.
The CI automates the above steps.
A separate workflow is introduced as there
are multiple input fields to the test.
Adding all of these to the manual e2e test
seemed confusing.
Co-authored-by: Fabian Kammel <fk@edgeless.systems>
This fixes a bug where `upgrade apply` fails if only the image is
upgraded, due to mishandling of an empty configmap.
Making stubStableClient more complex is needed since it is called
with multiple configMaps now.
* Convert enforceIDKeyDigest setting to enum
* Use MAA fallback in Azure SNP attestation
* Only create MAA provider if MAA fallback is enabled
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: Thomas Tendyck <tt@edgeless.systems>
Temporarily add the attestationVariant key to the service
values during upgrade. Normally this should not be
modified during upgrade. However, since the field is introduced
in v2.7, we need to add the field manually.
This is the first step for deprecating app registrations on Azure.
The user-assigned managed identity (uami) should first gain all permissions that are currently held by the app registration.
* cli: give Azure uami all permissions previously given to app registratio
* docs: document required owner role for user-assigned managed identity on Azure