Commit Graph

585 Commits

Author SHA1 Message Date
Malte Poll
4283601433
operators: infrastructure autodiscovery (#1958)
* helm: configure GCP cloud controller manager to search in all zones of a region

See also: d716fdd452/providers/gce/gce.go (L376-L380)

* operators: add nodeGroupName to ScalingGroup CRD

NodeGroupName is the human friendly name of the node group that will be exposed to customers via the Constellation config in the future.

* operators: support simple executor / scheduler to reconcile on non-k8s resources

* operators: add new return type for ListScalingGroups to support arbitrary node groups

* operators: ListScalingGroups should return additionally created node groups on AWS

* operators: ListScalingGroups should return additionally created node groups on Azure

* operators: ListScalingGroups should return additionally created node groups on GCP

* operators: ListScalingGroups should return additionally created node groups on unsupported CSPs

* operators: implement external scaling group reconciler

This controller scans the cloud provider infrastructure and changes k8s resources accordingly.
It creates ScaleSet resources when new node groups are created and deletes them if the node groups are removed.

* operators: no longer create scale sets when the operator starts

In the future, scale sets are created dynamically.

* operators: watch for node join/leave events using a controller

* operators: deploy new controllers

* docs: update auto scaling documentation with support for node groups
2023-07-05 07:27:34 +02:00
Adrian Stobbe
e72ec60d13
config: iam create aws check zone contains availability zone (#1913)
* init

* make zone flag mandatory again

* add info about zone update + refactor

* add comment in docs about zone update

* Update cli/internal/cmd/iamcreate_test.go

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* thomas feedback

* add format check to config validation

* remove TODO

* Update docs/docs/workflows/config.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* thomas nit

---------

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
2023-07-04 13:55:52 +02:00
Adrian Stobbe
c39df2f7da
terraform: openstack node groups (#1966)
* openstack

* rename to base_name

* fix openstack boot vtpm

* add docs for accessing bootstrapper logs

* rename to initial count
2023-07-03 16:33:00 +02:00
Malte Poll
d43242a55f
deps: upgrade AWS CSI driver to v1.1.1 (#1998) 2023-07-03 16:26:42 +02:00
Daniel Weiße
90dbeae16b
cli: fix duplicate backup creation during upgrade apply (#1997)
* Use CLI to fetch measurements in e2e test

* Abort helm service upgrade early if user confirmation is missing

* Add container push to CLI build action

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-03 15:13:36 +02:00
Malte Poll
66f1333c31
terraform: use single zone loadbalancer frontend on AWS (#1983)
This change is required to ensure we have not tls handshake errors when connecting to the kubernetes api.
Currently, the certificates used by kube-apiserver pods contain a SAN field with the (single) public ip of the loadbalancer.
If we would allow multiple loadbalancer frontend ips, we could encounter cases where the certificate is only valid for one public ip,
while we try to connect to a different ip.
To prevent this, we consciously disable support for the multi-zone loadbalancer frontend on AWS for now.
This will be re-enabled in the future.
2023-06-30 16:56:31 +02:00
Daniel Weiße
d95ddd01d3
helm: fix upgrade command unintentionally skipping all service upgrades (#1992)
* Fix usage of errors.As in upgrade command implementation

* Use struct pointers when working with custom errors

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-06-30 16:46:05 +02:00
Daniel Weiße
5a9f9c0a52
bootstraper: delete helm chart on installation failure before retrying installation (#1977)
* Delete helm chart on failure before retrying installation

* Add chart name to debug output

* Remove now unused wait flag from helm Release struct

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-06-30 15:13:29 +02:00
Malte Poll
5f8ea1348a
terraform: instance_count => initial_count (#1989)
Normalize naming for the "instance_count" / "initial_count" int terraform to always use "initial_count".
This is required, since there is a naming confusion on AWS.
"initial_count" is more precise, since it reflects the fact that this value is ignored when applying the terraform template
after the scaling groups already exist.
2023-06-30 10:53:00 +02:00
Moritz Sanft
7ad284d672
cli: deploy aws csi driver per default (#1981)
* add aws csi driver helm chart

* update chart

* add CSI driver to Constellation default deployment

* generate config doc

* update buildfiles

* use upstream chart

* update buildfile

* set `DeployCSIDriver` in default config

* fix helm test

* whitespace
2023-06-30 08:46:32 +02:00
Malte Poll
f64e44a438 aws: support LBs in multiple zones when retrieving metadata 2023-06-28 18:13:01 +02:00
Malte Poll
3edc1c3ebb cli: manual AWS terraform state transitions
This commit is designed to be reverted in the future (AB#3248).
Terraform does not implement moved blocks with dynamic targets: https://github.com/hashicorp/terraform/issues/31335 so we have to migrate the terraform state ourselves.
2023-06-28 18:13:01 +02:00
Malte Poll
22ebdace43 terraform: aws node groups 2023-06-28 18:13:01 +02:00
miampf
77b28cb5e7
cli: change generate-config flag to update-config flag (#1897) 2023-06-28 12:47:44 +00:00
Adrian Stobbe
9bb91ca447
terraform: QEMU node groups (#1961)
* init

add variables

add amount to instance_group again

fix tf validate

rollback old names

make fields optional

fix image ref mini

daniel comments

use latest

* Update cli/internal/terraform/terraform/qemu/main.tf

Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>

* add uid to resource name

* make machine a global variable again

* fix tf

---------

Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
2023-06-28 14:42:34 +02:00
Adrian Stobbe
161bb37cba
config: improve usage and meaning of validate (#1975)
* discuss miniup config.Default() usage + discourage usage for Default() in comment

* Update internal/config/config_test.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* add enterprise version check for config.Default

* split config comment lines

* daniel feedback

* featureset.CanUseEmbeddedMeasurmentsAndImage

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
2023-06-28 10:28:48 +02:00
Adrian Stobbe
1edbe962c1
cli: fail fast when CLI and Constellation versions don't match (#1972)
* fail on version mismatch

* rename to validateCLIandConstellationVersionAreEqual

* fix test

* image version must only be major,minor patch equal (ignore suffix)

* add version support doc

* fix: do not check patch version equality for image and cli

* skip validate on force
2023-06-27 18:24:35 +02:00
Moritz Sanft
fe0b8c1e5b
remove Terraform targets (#1970) 2023-06-27 11:27:50 +02:00
Otto Bittner
0a36ce6171
config: validate instance type for aws SNP based on attestation variant (#1963)
* config: validate instance type for aws SNP

* apply suggestions
2023-06-26 17:05:12 +02:00
Thomas Tendyck
46e144d19b Use term "attestation variant" consistently 2023-06-26 08:54:11 +02:00
Daniel Weiße
e139eff552
fix: small formating/spelling issues (#1965)
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-06-26 08:34:37 +02:00
Malte Poll
92cd9c1dac
terraform: always use uniform role names (#1960) 2023-06-23 12:08:30 +02:00
Otto Bittner
7388240943
Revert "attestation: add SNP-based attestation for aws-sev-snp (#1916)" (#1957)
This reverts commit c7d12055d1.
2023-06-22 17:08:44 +02:00
Adrian Stobbe
487fa1e397
terraform: azure node groups (#1955)
* init

* migration working

* make tf variables with default value optional in go through ptr type

* fix CI build

* pr feedback

* add azure targets tf

* skip migration for empty targets

* make instance_count optional

* change role naming to dashed + add validation

* make node_group.zones optional

* Update cli/internal/terraform/terraform/azure/main.tf

Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>

* malte feedback

---------

Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
2023-06-22 16:53:40 +02:00
Moritz Sanft
224c74f883
csi: aws csi driver policies (#1945)
* add required disk permissions

* update worker node policy for ebs

* Revert "update worker node policy for ebs"

This reverts commit 9c24d374e0b30bc8970e00978462fb36ee6acd4f.

* attach aws managed role instead

* add TODO comment

* remove duplicate role attachment

* Update cli/internal/terraform/terraform/iam/aws/main.tf

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
2023-06-22 14:15:05 +02:00
Adrian Stobbe
4546912f11
cli: upgrade apply --force skips all compatibility checks (#1940)
* use force to skip compatibility and upgrade in progress check

* update doc

* fix tests

* add force check for helm and k8s

* add no-op check

* fix errors as
2023-06-21 15:49:42 +02:00
Otto Bittner
c7d12055d1
attestation: add SNP-based attestation for aws-sev-snp (#1916)
* config: move AMD root key to global constant
* attestation: add SNP based attestation for aws
* Always enable SNP, regardless of attestation type.
* Make AWSNitroTPM default again

There exists a bug in AWS SNP implementation where sometimes
a host might not be able to produce valid SNP reports.
Since we have to wait for AWS to fix this we are merging SNP
attestation as opt-in feature.
2023-06-21 14:19:55 +02:00
Moritz Sanft
b25228d175
cli: store upgrade files in versioned folders (#1929)
* upgrade versioning

* dont pass upgrade kind as boolean

* whitespace

* fix godot lint check

* clarify upgrade check directory suffix

* cli: dry-run Terraform migrations on `upgrade check` (#1942)

* dry-run Terraform migrations on upgrade check

* clean whole upgrade dir

* clean up check workspace after planning

* fix parsing

* extend upgrade check test

* rename unused parameters

* exclude false positives in test
2023-06-21 09:22:32 +02:00
Adrian Stobbe
be4a636361
cli: improve user warning / information (#1933)
* print success

* warn when debug img but !debugCluster

* malte feedback

* rename to IsNamedLikeDebugImage
2023-06-19 16:51:39 +02:00
Malte Poll
2808012c9c
terraform: gcp node groups (#1941)
* terraform: GCP node groups

* cli: marshal GCP node groups to terraform variables

This does not have any side effects for users.
We still strictly create one control-plane and one worker group.
This is a preparation for enabling customizable node groups in the future.
2023-06-19 13:02:01 +02:00
renovate[bot]
ab52e6d4c5
fix: GCP service account creation fails sometimes (#1935)
* deps: update Terraform google to v4.69.1

* deps: tidy all modules

* add delay for service account

* deps: tidy all modules

* add delay for service account

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: edgelessci <edgelessci@users.noreply.github.com>
Co-authored-by: Adrian Stobbe <stobbe.adrian@gmail.com>
2023-06-16 09:37:31 +02:00
Adrian Stobbe
07de6482b2
config: drop support for deprecated Azure's service principal authentication (#1906)
* invalidate app client id field for azure and provide info

* remove TestNewWithDefaultOptions case

* fix test

* remove appClientID field

* remove client secret + rename err

* remove from docs

* otto feedback

* update docs

* delete env test in cfg since no envs set anymore

* Update dev-docs/workflows/github-actions.md

Co-authored-by: Otto Bittner <cobittner@posteo.net>

* WARNING to stderr

* fix check

---------

Co-authored-by: Otto Bittner <cobittner@posteo.net>
2023-06-14 17:50:57 +02:00
3u13r
a2c98eb1d5
Correctly deploy the AWS CCM (#1853)
* aws: stop using the imds api for tags

* aws: disable tags in imds api

* aws: only tag instances with non-lecagy tag

* bootstrapper: always let coredns run before cilium

* debugd: make debugd less noisy

* fixup fix aws imds test

* fixup unsued context

* move getting instance id to readInstanceTag
2023-06-13 09:58:39 +02:00
Adrian Stobbe
e738f15f0f
cdbg: make endpoint deployment failure more transparent (#1883)
* add retry + timeout + intercept grpc logs

* LogStateChanges inside grplog pkg

* remove retry and tj/assert

* rename nit

* Update debugd/internal/cdbg/cmd/deploy.go

Co-authored-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>

* Update debugd/internal/cdbg/cmd/deploy.go

Co-authored-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>

* paul feedback

* return waitFn instead of WaitGroup

* Revert "return waitFn instead of WaitGroup"

This reverts commit 45700f30e341ce3af509b687febbc0125f7ddb38.

* log routine inside debugd constructor

* test doubles names

* Update debugd/internal/cdbg/cmd/deploy.go

Co-authored-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>

* fix newDebugClient closeFn

---------

Co-authored-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
2023-06-12 13:45:34 +02:00
Otto Bittner
8f21972aec
attestation: add awsSEVSNP as new variant (#1900)
* variant: move into internal/attestation
* attesation: move aws attesation into subfolder nitrotpm
* config: add aws-sev-snp variant
* cli: add tf option to enable AWS SNP

For now the implementations in aws/nitrotpm and aws/snp
are identical. They both contain the aws/nitrotpm impl.
A separate commit will add the actual attestation logic.
2023-06-09 15:41:02 +02:00
Thomas Tendyck
947d0cb20a cli: hide --insecure of config fetch-measurements 2023-06-09 15:07:31 +02:00
Adrian Stobbe
3fde118b33
config: enable azure snp version fetcher again + minimum age for latest version (#1899)
* fetch latest version when older than 2 weeks

* extend hack upload tool to pass an upload date

* Revert "config: disable user-facing version Azure SEV SNP fetch for v2.8  (#1882)"

This reverts commit c7b22d314a.

* fix tests

* use NewAzureSEVSNPVersionList for type guarantees

* Revert "use NewAzureSEVSNPVersionList for type guarantees"

This reverts commit 942566453f4b4a2b6dc16f8689248abf1dc47db4.

* assure list is sorted

* improve root.go style

* daniel feedback
2023-06-09 12:48:12 +02:00
Adrian Stobbe
d9c604ed2c
terraform: update aws to v5.1.0 (#1891) 2023-06-09 10:37:25 +02:00
Adrian Stobbe
4284f892ce
api: rename /api/versions to versionsapi and /api/attestationcfig to attestationconfigapi (#1876)
* rename to attestationconfigapi + put client and fetcher inside pkg

* rename api/version to versionsapi and put fetcher + client inside pkg

* rename AttestationConfigAPIFetcher to Fetcher
2023-06-07 16:16:32 +02:00
Malte Poll
b3c052e299
operators: cleanup placeholder nodeversion (#1881)
* operators: cleanup placeholder nodeversion
* e2e: improve upgrade test portability
2023-06-06 15:22:06 +02:00
3u13r
7c07e3be18
Add --insecure to config fetch-measurement (#1879)
* cli: add --insecure to fetch-measurements

* cli: rename fake to stub

* ci: upload measurements for debug images

* fix cli docs
2023-06-06 10:32:22 +02:00
Malte Poll
439359ffbc
cli: prevent terraform apply drift when patching and re-applying existing terraform deployment (#1873)
The implementation would recreate the gcp instance template (including all instances and state disks) whenever the image tfvar changes.
Fixed by ignoring lifecycle changes on the instance templates.
Fixes 8c3b963
2023-06-05 14:52:39 +02:00
Adrian Stobbe
c446f36b0f
config: Azure SNP tool can delete specific version from attestation API (#1863)
* client supports delete version

* rename to new attestation / fetcher naming

* add delete command to upload tool

* test client delete

* bazel update

* use general client in attestation client

* Update hack/configapi/cmd/delete.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* daniel feedback

* unit test azure sev upload

* Update hack/configapi/cmd/delete.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* add client integration test

* new client cmds use apiObject

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
2023-06-05 12:33:22 +02:00
Otto Bittner
6bda62d397
cli: skip k8s upgrade in case of outdated version (#1864)
If an unsupported, outdated k8s patch version is used,
the user should still be able to run upgrade apply.
2023-06-05 09:13:02 +02:00
Malte Poll
7c34aef263
cli: write target k8s version to config if new version is found on upgrade check (#1862) 2023-06-02 17:19:41 +02:00
Adrian Stobbe
a813760f96
config: automatically upload new Azure SNP versions to API + sign version with release key (#1854)
* sign version with release key and remove version from fetcher interface
* extend azure-reporter GH action to upload updated version values to the Attestation API
2023-06-02 12:10:22 +02:00
Moritz Sanft
8c3b963a3f
cli: Terraform upgrades maa patching (#1821)
* patch maa after upgrade

* buildfiles

* reword comment

* remove whitespace

* temp: log measurements URL

* temp: update import

* ignore changes to attestation policies

* add issue URL

* separate output in e2e upgrade test

* use enterprise CLI for e2e test

* remove measurements print

* add license headers
2023-06-02 10:47:44 +02:00
Otto Bittner
30f2b332b3
api: restructure api pkg (#1851)
* api: rename AttestationVersionRepo to Client
* api: move client into separate subpkg for
clearer import paths.
* api: rename configapi -> attestationconfig
* api: rename versionsapi -> versions
* api: rename sut to client
* api: split versionsapi client and make it public
* api: split versionapi fetcher and make it public
* config: move attestationversion type to config
* api: fix attestationconfig client test

Co-authored-by: Adrian Stobbe <stobbe.adrian@gmail.com>
2023-06-02 09:19:23 +02:00
Adrian Stobbe
b51cc52945
config: sign Azure versions on upload & verify on fetch (#1836)
* add SignContent() + integrate into configAPI

* use static client for upload versions tool; fix staticupload calleeReference bug

* use version to get proper cosign pub key.

* mock fetcher in CLI tests

* only provide config.New constructor with fetcher

Co-authored-by: Otto Bittner <cobittner@posteo.net>
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
2023-06-01 13:55:46 +02:00
3u13r
e0285c122e
todo responsibilities and cleanup (#1837)
* chore: add TODO responsibilities

* chore: remove not needed TODOs

* chore: remove outdated migrations

* chore: remove resolved goleak exception

* chore: remove not needed cosign env

* config: add link to our Azure snp docs
2023-06-01 12:33:06 +02:00