Commit Graph

236 Commits

Author SHA1 Message Date
Malte Poll
d960121cba bazel: update BUILD files for rules_go bzlmod migration 2024-05-23 09:48:04 +02:00
Moritz Sanft
9c100a542c
bootstrapper: prioritize etcd disk I/O (#3114) 2024-05-22 16:12:53 +02:00
Daniel Weiße
9def35ed06
deps: update all Go dependencies (#3071)
* Upgrade Go dependencies

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Group Go dependency upgrades

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Remove usage of deprecated docker types

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Fix usage of invalid validation tags

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Regenerate bazel files

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Keep github.com/bazelbuild/buildtools at old version to not break other dependencies

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2024-05-08 17:31:47 +02:00
Daniel Weiße
1077b7a48e
bootstrapper: wipe disk and reboot on non-recoverable error (#2971)
* Let JoinClient return fatal errors
* Mark disk for wiping if JoinClient or InitServer return errors
* Reboot system if bootstrapper detects an error
* Refactor joinClient start/stop implementation
* Fix joining nodes retrying kubeadm 3 times in all cases
* Write non-recoverable failures to syslog before rebooting

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2024-03-12 11:43:38 +01:00
Malte Poll
281c7c320c deps: update protobuf to v1.33.0 2024-03-06 14:50:01 +01:00
Markus Rudy
03fbcafe68
bootstrapper: bounded retry of k8s join (#2968) 2024-03-05 09:14:01 +01:00
Malte Poll
522f2858c6 proto: update generated protobuf sources 2024-02-21 18:40:16 +01:00
Malte Poll
65903459a0 chore: fix unused parameter lint in new golangcilint version 2024-02-21 17:54:07 +01:00
miampf
54cce77bab
deps: convert zap to slog (#2825) 2024-02-08 14:20:01 +00:00
Moritz Sanft
901edd420b
terraform: remove cloud loggers (#2892)
* terraform: remove cloud logging apps

* internal/cloud: remove loggers

* bootstrapper: remove logging

* qemu-metadata-api: remove logging endpoint

* docs: add instructions on how to get boot logs

* bazel: tidy

* docs: fix typo

* cloud: remove unused types

* Update go.mod

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* bazel: tidy

* Update docs/docs/workflows/troubleshooting.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Update docs/docs/workflows/troubleshooting.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Update docs/docs/workflows/troubleshooting.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* docs: elaborate on how to get boot logs

* bazel: tidy

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
2024-02-06 14:27:30 +01:00
Malte Poll
3a5753045e goleak: ignore rules_go SIGTERM handler
rules_go added a SIGTERM handler that has a goroutine that survives the scope of the goleak check.
Currently, the best known workaround is to ignore this goroutine.

https://github.com/uber-go/goleak/issues/119
https://github.com/bazelbuild/rules_go/pull/3749
https://github.com/bazelbuild/rules_go/pull/3827#issuecomment-1894002120
2024-01-22 13:11:58 +01:00
Markus Rudy
ef6f63dc48
Fix various small things throughout the codebase (#2800)
* bootstrapper: remove obsolete log statement

* ci: simplify variable usage

Co-authored-by: Daniel Weiße <daniel-weisse@users.noreply.github.com>

* cli: add missing formatting directive

* helm: fix rm invocation

* ci: document reproducible-builds workflow

* constants: use variables for measurement files

* constants: use variables for CDN distribution ID

* ci: make Helm version explicit

* api: prettify versionsapi-list output

* ci: remove obsolete docstring

---------

Co-authored-by: Daniel Weiße <daniel-weisse@users.noreply.github.com>
2024-01-09 19:37:56 +01:00
Markus Rudy
ce9e25c150 bootstrapper: pass patches to kubeadm 2023-12-18 14:17:35 +01:00
Markus Rudy
a1dbd13f95 versions: consolidate various types of Components
There used to be three definitions of a Component type, and conversion
routines between the three. Since the use case is always the same, and
the Component semantics are defined by versions.go and the installer, it
seems appropriate to define the Component type there and import it in
the necessary places.
2023-12-11 14:26:54 +01:00
3u13r
63cdd03d09
Make Kubernetes serviceCIDR configurable in config (#2660)
* config: pass serviceCIDR to kubeadm init

* terraform: add serviceCIDR
2023-12-01 14:39:05 +01:00
Malte Poll
ee3ff9ac01 bazel: use patched RPATH in bootstrapper and disk-mapper binaries 2023-12-01 09:35:33 +01:00
Leonard Cohnen
cb88c7a5f3 kubernetes: remove unused struct 2023-11-15 19:27:33 +01:00
Leonard Cohnen
cfcc0898b2 helm: remove konnectivity from control-planes
This is the first step in our migration off of
konnectivity. Before node-to-node encryption
we used konnectivity to route some KubeAPI
to kubelet traffic over the pod network which then
would be encrypted.

Since we enabled node-to-node encryption this has no
security upsides anymore. Note that we still deploy
the konnectivity agents via helm and still have the
load balancer for konnectivity.

In the following releases we will remove both.
2023-11-15 19:27:33 +01:00
Leonard Cohnen
79f562374a bootstrapper: remove cilium restart fix
Tests concluded that restating the Cilium agent after the
first boot is not needed anymore to regain connectivity for
pods.
2023-11-15 19:27:33 +01:00
Leonard Cohnen
aae85f0c3c kubernetes: always use lb for joining
The token given out by control-planes contains the node IP
as an endpoint. Since during this stage the joining node is
not connected to the WireGuard network, we cannot
communicate node-to-node. Therefore, we need to hop over the
load balancer again to have a src IP outside of the strict
range.
2023-11-15 19:27:33 +01:00
Malte Poll
bf06a014a4 bootstrapper: ignore "journald" not in $PATH in constructor
In unit tests, NewCollector may be called on systems that do not have
"journalctl" in $PATH.
We can defer checking if the command can work by not checking cmd.Err in
the constructor.
2023-11-10 18:15:59 +01:00
3u13r
6c0a3b8efa
fix joining over lb (#2478) 2023-10-19 16:28:07 +02:00
3u13r
0c89f57ac5
Support internal load balancers (#2388)
* arch: support internal lb on Azure

* arch: support internal lb on GCP

* helm: remove lb svc from verify deployment

* arch: support internal lb on AWS

* terraform: add jump hosts for internal lb

* cli: expose internalLoadBalancer in config

* ci: add e2e-manual-internal

* add in-cluster endpoint to terraform output
2023-10-17 15:46:15 +02:00
Malte Poll
e4ed24ee4f image: fix bootstrapper install path 2023-10-10 10:33:54 +02:00
Malte Poll
200fc79e0c bootstrapper: package as tar 2023-09-27 17:58:19 +02:00
3u13r
2776e40df7
join: join over lb if available (#2348)
* join: join over lb if available
2023-09-25 10:23:35 +02:00
Daniel Weiße
afa7fd0edb
cli: refactor kubernetes package (#2232)
* Clean up CLI kubernetes package

* Rename CLI kubernetes pkg to kubecmd

* Unify kubernetes clients

* Refactor attestation config upgrade

* Update CODEOWNERS file

* Remove outdated GetMeasurementSalt

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-08-21 16:15:32 +02:00
Malte Poll
1f12541a36 bazel: allow "bazel test" to work without cgo dependencies 2023-08-18 16:36:13 +02:00
Adrian Stobbe
656cdbb4bb
remove unused CloudServiceAccountUri from init request (#2182) 2023-08-09 14:16:45 +02:00
Daniel Weiße
8dbe79500f
cli: fix incorrect usage of masterSecret salt for clusterID generation (#2169)
* Fix incorrect use of masterSecret salt for clusterID generation

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: Leonard Cohnen <lc@edgeless.systems>
2023-08-07 15:24:46 +02:00
3u13r
ee0adfe8c7
kubernetes: document total log size (#2164) 2023-08-04 18:17:36 +02:00
Adrian Stobbe
13eea1ca31
cli: install cilium in cli instead of bootstrapper (#2146)
* add wait and restartDS

* cilium working (tested on azure + gcp)

* clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* add patchnode for gcp

* fix gcp

* patch node inside bootstrapper

* apply renaming of client

* fixup! apply renaming of client

* otto feedback
2023-08-02 15:49:40 +02:00
Adrian Stobbe
26305e8f80
cli: install helm charts in cli instead of bootstrapper (#2136)
* init

* fixup! init

* gcp working?

* fixup! fixup! init

* azure cfg for microService installation

* fixup! azure cfg for microService installation

* fixup! azure cfg for microService installation

* cleanup bootstrapper code

* cleanup helminstall code

* fixup! cleanup helminstall code

* Update internal/deploy/helm/install.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* daniel feedback

* TODO add provider (also to CreateCluster) so we can ensure that provider specific output

* fixup! daniel feedback

* use debugLog in helm installer

* placeholderHelmInstaller

* rename to stub

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
2023-07-31 10:53:05 +02:00
Otto Bittner
1d5a8283e0
cli: use Semver type to represent microservice versions (#2125)
Previously we used strings to pass microservice versions. This invited
bugs due to missing input validation.
2023-07-25 14:20:25 +02:00
Daniel Weiße
7152633255
bootstrapper: refactor coredns and cilium setup (#2129)
* Decouple CoreDNS installation from Cilium

* Align cilium helm installation with other charts

* Remove unused functions
---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-25 09:57:35 +02:00
Adrian Stobbe
a87b7894db
aws: use new LB controller to fix SecurityGroup cleanup on K8s service deletion (#2090)
* add current chart

add current helm chart

* disable service controller for aws ccm

* add new iam roles

* doc AWS internet LB + add to LB test

* pass clusterName to helm for AWS LB

* fix update-aws-lb chart to also include .helmignore

* move chart outside services

* working state

* add subnet tags for AWS subnet discovery

* fix .helmignore load rule with file in subdirectory

* upgrade iam profile

* revert new loader impl since cilium is not correctly loaded

* install chart if not already present during `upgrade apply`

* cleanup PR + fix build + add todos

cleanup PR + add todos

* shared helm pkg for cli install and bootstrapper

* add link to eks docs

* refactor iamMigrationCmd

* delete unused helm.symwallk

* move iammigrate to upgrade pkg

* fixup! delete unused helm.symwallk

* add to upgradecheck

* remove nodeSelector from go code (Otto)

* update iam docs and sort permission + remove duplicate roles

* fix bug in `upgrade check`

* better upgrade check output when svc version upgrade not possible

* pr feedback

* remove force flag in upgrade_test

* use upgrader.GetUpgradeID instead of extra type

* remove todos + fix check

* update doc lb (leo)

* remove bootstrapper helm package

* Update cli/internal/cmd/upgradecheck.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* final nits

* add docs for e2e upgrade test setup

* Apply suggestions from code review

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* Update cli/internal/helm/loader.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* Update cli/internal/cmd/tfmigrationclient.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* fix daniel review

* link to the iam permissions instead of manually updating them (agreed with leo)

* disable iam upgrade in upgrade apply

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
Co-authored-by: Malte Poll
2023-07-24 10:30:53 +02:00
Malte Poll
8da6a23aa5
bootstrapper: add fallback endpoint and custom endpoint to SAN field (#2108)
terraform: collect apiserver cert SANs and support custom endpoint

constants: add new constants for cluster configuration and custom endpoint

cloud: support apiserver cert sans and prepare for endpoint migration on AWS

config: add customEndpoint field

bootstrapper: use per-CSP apiserver cert SANs

cli: route customEndpoint to terraform and add migration for apiserver cert SANs

bootstrapper: change interface of GetLoadBalancerEndpoint to return host and port separately
2023-07-21 16:43:51 +02:00
Daniel Weiße
ea5c83587c Move CSI charts to separate chart and cleanup loader code
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-20 15:47:12 +02:00
Daniel Weiße
8619a90149 Deploy CSI snapshotter on init
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-20 15:47:12 +02:00
Daniel Weiße
6a40c73ff7
disk-mapper: set LUKS2 token to allow reusing unintialized state disks (#2083)
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-18 16:20:03 +02:00
Daniel Weiße
ac1128d07f
cryptsetup: unify code (#2043)
* Add common backend for interacting with cryptsetup

* Use common cryptsetup backend in bootstrapper

* Use common cryptsetup backend in disk-mapper

* Use common cryptsetup backend in csi lib

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-17 13:55:31 +02:00
Malte Poll
1ff40533f1
cli: add "--skip-helm-wait" flag (#2061)
* cli: add "--skip-helm-wait" flag

This flag can be used to disable the atomic and wait flags during helm install.
This is useful when debugging a failing constellation init, since the user gains access to
the cluster even when one of the deployments is never in a ready state.
2023-07-07 17:09:45 +02:00
renovate[bot]
49cff0aabb
deps: update module github.com/sigstore/rekor to v1.2.2 (#2033)
Co-authored-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
2023-07-06 15:41:14 +02:00
Malte Poll
8ba0179137
bootstrapper: use atomics in nodelock (#2001) 2023-07-04 16:26:37 +02:00
Adrian Stobbe
c39df2f7da
terraform: openstack node groups (#1966)
* openstack

* rename to base_name

* fix openstack boot vtpm

* add docs for accessing bootstrapper logs

* rename to initial count
2023-07-03 16:33:00 +02:00
Malte Poll
3942cf27f3
bootstrapper: install internal-config cm before constellation-services (#1995) 2023-07-03 10:19:27 +02:00
Daniel Weiße
5a9f9c0a52
bootstraper: delete helm chart on installation failure before retrying installation (#1977)
* Delete helm chart on failure before retrying installation

* Add chart name to debug output

* Remove now unused wait flag from helm Release struct

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-06-30 15:13:29 +02:00
Malte Poll
537cdbcfad bazel: trim path to *.pb.go files embedded in go libraries
See https://github.com/bazelbuild/rules_go/issues/3581 for context.
2023-06-16 16:30:47 +02:00
3u13r
a2c98eb1d5
Correctly deploy the AWS CCM (#1853)
* aws: stop using the imds api for tags

* aws: disable tags in imds api

* aws: only tag instances with non-lecagy tag

* bootstrapper: always let coredns run before cilium

* debugd: make debugd less noisy

* fixup fix aws imds test

* fixup unsued context

* move getting instance id to readInstanceTag
2023-06-13 09:58:39 +02:00
renovate[bot]
167052d443
deps: update dependency hermetic_cc_toolchain to v2.0.0 (#1860)
* deps: update dependency hermetic_cc_toolchain to v2.0.0
* deps: tidy all modules
* bazel: target glibc 2.23 to enable rbe

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: edgelessci <edgelessci@users.noreply.github.com>
Co-authored-by: Malte Poll <mp@edgeless.systems>
2023-06-09 17:39:30 +02:00