Commit Graph

195 Commits

Author SHA1 Message Date
Daniel Weiße
1077b7a48e
bootstrapper: wipe disk and reboot on non-recoverable error (#2971)
* Let JoinClient return fatal errors
* Mark disk for wiping if JoinClient or InitServer return errors
* Reboot system if bootstrapper detects an error
* Refactor joinClient start/stop implementation
* Fix joining nodes retrying kubeadm 3 times in all cases
* Write non-recoverable failures to syslog before rebooting

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2024-03-12 11:43:38 +01:00
Markus Rudy
03fbcafe68
bootstrapper: bounded retry of k8s join (#2968) 2024-03-05 09:14:01 +01:00
Malte Poll
65903459a0 chore: fix unused parameter lint in new golangcilint version 2024-02-21 17:54:07 +01:00
miampf
54cce77bab
deps: convert zap to slog (#2825) 2024-02-08 14:20:01 +00:00
Malte Poll
3a5753045e goleak: ignore rules_go SIGTERM handler
rules_go added a SIGTERM handler that has a goroutine that survives the scope of the goleak check.
Currently, the best known workaround is to ignore this goroutine.

https://github.com/uber-go/goleak/issues/119
https://github.com/bazelbuild/rules_go/pull/3749
https://github.com/bazelbuild/rules_go/pull/3827#issuecomment-1894002120
2024-01-22 13:11:58 +01:00
Markus Rudy
ce9e25c150 bootstrapper: pass patches to kubeadm 2023-12-18 14:17:35 +01:00
Markus Rudy
a1dbd13f95 versions: consolidate various types of Components
There used to be three definitions of a Component type, and conversion
routines between the three. Since the use case is always the same, and
the Component semantics are defined by versions.go and the installer, it
seems appropriate to define the Component type there and import it in
the necessary places.
2023-12-11 14:26:54 +01:00
3u13r
63cdd03d09
Make Kubernetes serviceCIDR configurable in config (#2660)
* config: pass serviceCIDR to kubeadm init

* terraform: add serviceCIDR
2023-12-01 14:39:05 +01:00
Leonard Cohnen
cb88c7a5f3 kubernetes: remove unused struct 2023-11-15 19:27:33 +01:00
Leonard Cohnen
cfcc0898b2 helm: remove konnectivity from control-planes
This is the first step in our migration off of
konnectivity. Before node-to-node encryption
we used konnectivity to route some KubeAPI
to kubelet traffic over the pod network which then
would be encrypted.

Since we enabled node-to-node encryption this has no
security upsides anymore. Note that we still deploy
the konnectivity agents via helm and still have the
load balancer for konnectivity.

In the following releases we will remove both.
2023-11-15 19:27:33 +01:00
Leonard Cohnen
79f562374a bootstrapper: remove cilium restart fix
Tests concluded that restating the Cilium agent after the
first boot is not needed anymore to regain connectivity for
pods.
2023-11-15 19:27:33 +01:00
Leonard Cohnen
aae85f0c3c kubernetes: always use lb for joining
The token given out by control-planes contains the node IP
as an endpoint. Since during this stage the joining node is
not connected to the WireGuard network, we cannot
communicate node-to-node. Therefore, we need to hop over the
load balancer again to have a src IP outside of the strict
range.
2023-11-15 19:27:33 +01:00
Malte Poll
bf06a014a4 bootstrapper: ignore "journald" not in $PATH in constructor
In unit tests, NewCollector may be called on systems that do not have
"journalctl" in $PATH.
We can defer checking if the command can work by not checking cmd.Err in
the constructor.
2023-11-10 18:15:59 +01:00
3u13r
6c0a3b8efa
fix joining over lb (#2478) 2023-10-19 16:28:07 +02:00
3u13r
2776e40df7
join: join over lb if available (#2348)
* join: join over lb if available
2023-09-25 10:23:35 +02:00
Daniel Weiße
afa7fd0edb
cli: refactor kubernetes package (#2232)
* Clean up CLI kubernetes package

* Rename CLI kubernetes pkg to kubecmd

* Unify kubernetes clients

* Refactor attestation config upgrade

* Update CODEOWNERS file

* Remove outdated GetMeasurementSalt

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-08-21 16:15:32 +02:00
Malte Poll
1f12541a36 bazel: allow "bazel test" to work without cgo dependencies 2023-08-18 16:36:13 +02:00
Daniel Weiße
8dbe79500f
cli: fix incorrect usage of masterSecret salt for clusterID generation (#2169)
* Fix incorrect use of masterSecret salt for clusterID generation

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: Leonard Cohnen <lc@edgeless.systems>
2023-08-07 15:24:46 +02:00
3u13r
ee0adfe8c7
kubernetes: document total log size (#2164) 2023-08-04 18:17:36 +02:00
Adrian Stobbe
13eea1ca31
cli: install cilium in cli instead of bootstrapper (#2146)
* add wait and restartDS

* cilium working (tested on azure + gcp)

* clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* add patchnode for gcp

* fix gcp

* patch node inside bootstrapper

* apply renaming of client

* fixup! apply renaming of client

* otto feedback
2023-08-02 15:49:40 +02:00
Adrian Stobbe
26305e8f80
cli: install helm charts in cli instead of bootstrapper (#2136)
* init

* fixup! init

* gcp working?

* fixup! fixup! init

* azure cfg for microService installation

* fixup! azure cfg for microService installation

* fixup! azure cfg for microService installation

* cleanup bootstrapper code

* cleanup helminstall code

* fixup! cleanup helminstall code

* Update internal/deploy/helm/install.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* daniel feedback

* TODO add provider (also to CreateCluster) so we can ensure that provider specific output

* fixup! daniel feedback

* use debugLog in helm installer

* placeholderHelmInstaller

* rename to stub

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
2023-07-31 10:53:05 +02:00
Daniel Weiße
7152633255
bootstrapper: refactor coredns and cilium setup (#2129)
* Decouple CoreDNS installation from Cilium

* Align cilium helm installation with other charts

* Remove unused functions
---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-25 09:57:35 +02:00
Adrian Stobbe
a87b7894db
aws: use new LB controller to fix SecurityGroup cleanup on K8s service deletion (#2090)
* add current chart

add current helm chart

* disable service controller for aws ccm

* add new iam roles

* doc AWS internet LB + add to LB test

* pass clusterName to helm for AWS LB

* fix update-aws-lb chart to also include .helmignore

* move chart outside services

* working state

* add subnet tags for AWS subnet discovery

* fix .helmignore load rule with file in subdirectory

* upgrade iam profile

* revert new loader impl since cilium is not correctly loaded

* install chart if not already present during `upgrade apply`

* cleanup PR + fix build + add todos

cleanup PR + add todos

* shared helm pkg for cli install and bootstrapper

* add link to eks docs

* refactor iamMigrationCmd

* delete unused helm.symwallk

* move iammigrate to upgrade pkg

* fixup! delete unused helm.symwallk

* add to upgradecheck

* remove nodeSelector from go code (Otto)

* update iam docs and sort permission + remove duplicate roles

* fix bug in `upgrade check`

* better upgrade check output when svc version upgrade not possible

* pr feedback

* remove force flag in upgrade_test

* use upgrader.GetUpgradeID instead of extra type

* remove todos + fix check

* update doc lb (leo)

* remove bootstrapper helm package

* Update cli/internal/cmd/upgradecheck.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* final nits

* add docs for e2e upgrade test setup

* Apply suggestions from code review

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* Update cli/internal/helm/loader.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* Update cli/internal/cmd/tfmigrationclient.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* fix daniel review

* link to the iam permissions instead of manually updating them (agreed with leo)

* disable iam upgrade in upgrade apply

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
Co-authored-by: Malte Poll
2023-07-24 10:30:53 +02:00
Malte Poll
8da6a23aa5
bootstrapper: add fallback endpoint and custom endpoint to SAN field (#2108)
terraform: collect apiserver cert SANs and support custom endpoint

constants: add new constants for cluster configuration and custom endpoint

cloud: support apiserver cert sans and prepare for endpoint migration on AWS

config: add customEndpoint field

bootstrapper: use per-CSP apiserver cert SANs

cli: route customEndpoint to terraform and add migration for apiserver cert SANs

bootstrapper: change interface of GetLoadBalancerEndpoint to return host and port separately
2023-07-21 16:43:51 +02:00
Daniel Weiße
ea5c83587c Move CSI charts to separate chart and cleanup loader code
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-20 15:47:12 +02:00
Daniel Weiße
8619a90149 Deploy CSI snapshotter on init
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-20 15:47:12 +02:00
Daniel Weiße
6a40c73ff7
disk-mapper: set LUKS2 token to allow reusing unintialized state disks (#2083)
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-18 16:20:03 +02:00
Daniel Weiße
ac1128d07f
cryptsetup: unify code (#2043)
* Add common backend for interacting with cryptsetup

* Use common cryptsetup backend in bootstrapper

* Use common cryptsetup backend in disk-mapper

* Use common cryptsetup backend in csi lib

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-17 13:55:31 +02:00
Malte Poll
1ff40533f1
cli: add "--skip-helm-wait" flag (#2061)
* cli: add "--skip-helm-wait" flag

This flag can be used to disable the atomic and wait flags during helm install.
This is useful when debugging a failing constellation init, since the user gains access to
the cluster even when one of the deployments is never in a ready state.
2023-07-07 17:09:45 +02:00
Malte Poll
8ba0179137
bootstrapper: use atomics in nodelock (#2001) 2023-07-04 16:26:37 +02:00
Malte Poll
3942cf27f3
bootstrapper: install internal-config cm before constellation-services (#1995) 2023-07-03 10:19:27 +02:00
Daniel Weiße
5a9f9c0a52
bootstraper: delete helm chart on installation failure before retrying installation (#1977)
* Delete helm chart on failure before retrying installation

* Add chart name to debug output

* Remove now unused wait flag from helm Release struct

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-06-30 15:13:29 +02:00
3u13r
a2c98eb1d5
Correctly deploy the AWS CCM (#1853)
* aws: stop using the imds api for tags

* aws: disable tags in imds api

* aws: only tag instances with non-lecagy tag

* bootstrapper: always let coredns run before cilium

* debugd: make debugd less noisy

* fixup fix aws imds test

* fixup unsued context

* move getting instance id to readInstanceTag
2023-06-13 09:58:39 +02:00
Otto Bittner
8f21972aec
attestation: add awsSEVSNP as new variant (#1900)
* variant: move into internal/attestation
* attesation: move aws attesation into subfolder nitrotpm
* config: add aws-sev-snp variant
* cli: add tf option to enable AWS SNP

For now the implementations in aws/nitrotpm and aws/snp
are identical. They both contain the aws/nitrotpm impl.
A separate commit will add the actual attestation logic.
2023-06-09 15:41:02 +02:00
Adrian Stobbe
e0fe8e6ca0
local: fix mac issues in bazel (#1893) 2023-06-09 10:35:52 +02:00
3u13r
e0285c122e
todo responsibilities and cleanup (#1837)
* chore: add TODO responsibilities

* chore: remove not needed TODOs

* chore: remove outdated migrations

* chore: remove resolved goleak exception

* chore: remove not needed cosign env

* config: add link to our Azure snp docs
2023-06-01 12:33:06 +02:00
miampf
8686c5e7e2
bootstrapper: collect journald logs on failure (#1618) 2023-05-30 11:47:36 +00:00
renovate[bot]
2afddcb0f8
deps: update K8s dependencies (#1599)
* deps: update K8s dependencies

* deps: bump controller runtime

* chore: tidy

* bump helm and migrate controller runtime

* fix helm deprecation

---------

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
Co-authored-by: Leonard Cohnen <lc@edgeless.systems>
2023-05-24 18:57:45 +02:00
Malte Poll
94758bc392 bootstrapper: allow building without cgo dependencies for linting 2023-05-23 13:44:56 +02:00
3u13r
964775c4c2
Add autoscaling and cluster upgrade support for AWS (#1758)
* aws: autoscaling and upgrades

* docs: update scaling and upgrades for AWS

* deps: pin vuln check against release
2023-05-19 13:57:31 +02:00
Moritz Eckert
6252193879 cli: deploy cinder as OpenStack CSI plugin 2023-05-17 15:20:39 +02:00
Daniel Weiße
dd2da25ebe attestation: tdx issuer/validator (#1265)
* Add TDX validator

* Add TDX issuer

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-05-17 11:37:26 +02:00
Malte Poll
56635c3993 cli: deploy yawol as OpenStack loadbalancer 2023-05-03 21:45:59 +02:00
Malte Poll
d15968bed7
bootstrapper: make Azure auth method configurable on cluster init (#1346)
* bootstrapper: make Azure auth method configurable on cluster init
* azure: convert uami resource ID to clientID


Co-authored-by: 3u13r <lc@edgeless.systems>
2023-04-03 15:01:25 +02:00
miampf
8964e3e90c
bootstrapper: fix journald ram usage (#1553) 2023-04-03 13:58:34 +02:00
miampf
c1dbf0561a
feat: added journald collection package to bootstrapper internal packages 2023-03-29 14:45:13 +02:00
Daniel Weiße
b57413cfa7
cli: set cluster's initial measurements from user's config using Helm (#1540)
* Remove using measurements from the initial control-plane node for the cluster's initial measurements

* Add using measurements from the user's config for the cluster's initial measurements to align behavior with upgrade command

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-03-29 11:16:56 +02:00
Daniel Weiße
99b12e4035
internal: refactor oid package to variant package (#1538)
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-03-29 09:30:13 +02:00
Nils Hanke
8e8124345b
bootstrapper: return error on context cancel in WaitForCilium (#1534) 2023-03-28 14:41:17 +02:00
Daniel Weiße
5a0234b3f2
attestation: add option for MAA fallback to verify azure's snp-sev id key digest (#1257)
* Convert enforceIDKeyDigest setting to enum

* Use MAA fallback in Azure SNP attestation

* Only create MAA provider if MAA fallback is enabled

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: Thomas Tendyck <tt@edgeless.systems>
2023-03-21 12:46:49 +01:00