Disabling SMT dynamically inside the image creates problems on AWS.
The problem should be fixed by disabling smt through the VMM.
By recommendation from AWS: add idle=poll.
This should improve our launch success rate while they investigate some
upstream issues.
* Move IAM migration client to cloudcmd package
* Move Terraform Cluster upgrade client to cloudcmd package
* Use hcl for creating Terraform IAM variables files
* Unify terraform upgrade code
* Rename some cloudcmd files for better clarity
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
* Clean up Terraform pkg
* Add note to Terraform migration functions expecting to be run on initialized workspace
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
* deps: limit Terraform version to FOSS releases
* fix: enforce upper version constraint
---------
Co-authored-by: Leonard Cohnen <lc@edgeless.systems>
* Remove `--config` and `--master-secret` falgs
* Add `--workspace` flag
* In CLI, only work on files with paths created from `cli/internal/cmd`
* Properly print values for GCP on IAM create when not directly updating the config
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Also update Azure terraform:
ignore snp policy changes on resource
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: edgelessci <edgelessci@users.noreply.github.com>
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* add current chart
add current helm chart
* disable service controller for aws ccm
* add new iam roles
* doc AWS internet LB + add to LB test
* pass clusterName to helm for AWS LB
* fix update-aws-lb chart to also include .helmignore
* move chart outside services
* working state
* add subnet tags for AWS subnet discovery
* fix .helmignore load rule with file in subdirectory
* upgrade iam profile
* revert new loader impl since cilium is not correctly loaded
* install chart if not already present during `upgrade apply`
* cleanup PR + fix build + add todos
cleanup PR + add todos
* shared helm pkg for cli install and bootstrapper
* add link to eks docs
* refactor iamMigrationCmd
* delete unused helm.symwallk
* move iammigrate to upgrade pkg
* fixup! delete unused helm.symwallk
* add to upgradecheck
* remove nodeSelector from go code (Otto)
* update iam docs and sort permission + remove duplicate roles
* fix bug in `upgrade check`
* better upgrade check output when svc version upgrade not possible
* pr feedback
* remove force flag in upgrade_test
* use upgrader.GetUpgradeID instead of extra type
* remove todos + fix check
* update doc lb (leo)
* remove bootstrapper helm package
* Update cli/internal/cmd/upgradecheck.go
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
* final nits
* add docs for e2e upgrade test setup
* Apply suggestions from code review
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
* Update cli/internal/helm/loader.go
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
* Update cli/internal/cmd/tfmigrationclient.go
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
* fix daniel review
* link to the iam permissions instead of manually updating them (agreed with leo)
* disable iam upgrade in upgrade apply
---------
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
Co-authored-by: Malte Poll
terraform: collect apiserver cert SANs and support custom endpoint
constants: add new constants for cluster configuration and custom endpoint
cloud: support apiserver cert sans and prepare for endpoint migration on AWS
config: add customEndpoint field
bootstrapper: use per-CSP apiserver cert SANs
cli: route customEndpoint to terraform and add migration for apiserver cert SANs
bootstrapper: change interface of GetLoadBalancerEndpoint to return host and port separately
This change is required to ensure we have not tls handshake errors when connecting to the kubernetes api.
Currently, the certificates used by kube-apiserver pods contain a SAN field with the (single) public ip of the loadbalancer.
If we would allow multiple loadbalancer frontend ips, we could encounter cases where the certificate is only valid for one public ip,
while we try to connect to a different ip.
To prevent this, we consciously disable support for the multi-zone loadbalancer frontend on AWS for now.
This will be re-enabled in the future.
Normalize naming for the "instance_count" / "initial_count" int terraform to always use "initial_count".
This is required, since there is a naming confusion on AWS.
"initial_count" is more precise, since it reflects the fact that this value is ignored when applying the terraform template
after the scaling groups already exist.
This commit is designed to be reverted in the future (AB#3248).
Terraform does not implement moved blocks with dynamic targets: https://github.com/hashicorp/terraform/issues/31335 so we have to migrate the terraform state ourselves.
* init
add variables
add amount to instance_group again
fix tf validate
rollback old names
make fields optional
fix image ref mini
daniel comments
use latest
* Update cli/internal/terraform/terraform/qemu/main.tf
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* add uid to resource name
* make machine a global variable again
* fix tf
---------
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* init
* migration working
* make tf variables with default value optional in go through ptr type
* fix CI build
* pr feedback
* add azure targets tf
* skip migration for empty targets
* make instance_count optional
* change role naming to dashed + add validation
* make node_group.zones optional
* Update cli/internal/terraform/terraform/azure/main.tf
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* malte feedback
---------
Co-authored-by: Malte Poll <1780588+malt3@users.noreply.github.com>
* terraform: GCP node groups
* cli: marshal GCP node groups to terraform variables
This does not have any side effects for users.
We still strictly create one control-plane and one worker group.
This is a preparation for enabling customizable node groups in the future.
* deps: update Terraform google to v4.69.1
* deps: tidy all modules
* add delay for service account
* deps: tidy all modules
* add delay for service account
---------
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: edgelessci <edgelessci@users.noreply.github.com>
Co-authored-by: Adrian Stobbe <stobbe.adrian@gmail.com>
* invalidate app client id field for azure and provide info
* remove TestNewWithDefaultOptions case
* fix test
* remove appClientID field
* remove client secret + rename err
* remove from docs
* otto feedback
* update docs
* delete env test in cfg since no envs set anymore
* Update dev-docs/workflows/github-actions.md
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* WARNING to stderr
* fix check
---------
Co-authored-by: Otto Bittner <cobittner@posteo.net>
* aws: stop using the imds api for tags
* aws: disable tags in imds api
* aws: only tag instances with non-lecagy tag
* bootstrapper: always let coredns run before cilium
* debugd: make debugd less noisy
* fixup fix aws imds test
* fixup unsued context
* move getting instance id to readInstanceTag
* variant: move into internal/attestation
* attesation: move aws attesation into subfolder nitrotpm
* config: add aws-sev-snp variant
* cli: add tf option to enable AWS SNP
For now the implementations in aws/nitrotpm and aws/snp
are identical. They both contain the aws/nitrotpm impl.
A separate commit will add the actual attestation logic.
The implementation would recreate the gcp instance template (including all instances and state disks) whenever the image tfvar changes.
Fixed by ignoring lifecycle changes on the instance templates.
Fixes 8c3b963