Commit Graph

246 Commits

Author SHA1 Message Date
Markus Rudy
850b460002
helm: revert parts of CoreDNS Helm chart packaging (#3366)
* Revert "helm: fix kubeadm bugs caused by CoreDNS installation (#3353)"

This reverts commit 8ef5ea2efe.

* Revert "helm: manage CoreDNS addon as Helm chart (#3236)"

This reverts commit 97c77e2a78.

* upgrade-agent: ignore CoreDNS preflight errors
2024-09-19 10:55:21 +02:00
Daniel Weiße
c11631ec11
logging: reduce grpc logging noise (#3329)
* Normalize gRPC logs to print at warn level only
* Fix grpcLogger level enablement

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2024-08-29 10:44:22 +02:00
renovate[bot]
fe96153507
deps: update bazel (modules) (#3304)
* deps: update bazel (modules)
* Set std=c++14
* deps: tidy all modules

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: edgelessci <edgelessci@users.noreply.github.com>
Co-authored-by: Markus Rudy <mr@edgeless.systems>
2024-08-09 11:00:22 +02:00
Markus Rudy
97c77e2a78 helm: manage CoreDNS addon as Helm chart (#3236)
* helm: generate CoreDNS Helm chart
* helm: load CoreDNS Helm chart
* bootstrapper: don't install coredns addon
2024-07-12 12:01:49 +02:00
Adrian Stobbe
f4a3ae7d27
ci: fix IDE setup on mac (#3226) 2024-07-09 09:27:32 +02:00
Moritz Sanft
50dcfd7905
bootstrapper: remove unnecessary stat (#3202) 2024-06-25 11:51:23 +02:00
Moritz Sanft
dcb8cca268
bootstrapper: remove static pod manifests before cluster init/join 2024-06-25 10:43:23 +02:00
renovate[bot]
e71819eb62
deps: update Go dependencies (#3185)
* deps: update Go dependencies
* deps: tidy all modules
* Replace deprecated `grpc.DialContext` with `grpc.NewClient`

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: edgelessci <edgelessci@users.noreply.github.com>
Co-authored-by: Daniel Weiße <dw@edgeless.systems>
2024-06-21 10:05:57 +02:00
Moritz Sanft
1989bce0a5
bootstrapper: disable gRPC logging (#3134)
* bootstrapper: disable gRPC logging

* bootstrapper: remove debug flag

* upgrade-agent: remove gRPC logging

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

---------

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2024-06-05 09:24:08 +02:00
Malte Poll
2c8a16294e bazel: migrate rules_proto to bzlmod 2024-05-23 09:48:04 +02:00
Malte Poll
d960121cba bazel: update BUILD files for rules_go bzlmod migration 2024-05-23 09:48:04 +02:00
Moritz Sanft
9c100a542c
bootstrapper: prioritize etcd disk I/O (#3114) 2024-05-22 16:12:53 +02:00
Daniel Weiße
9def35ed06
deps: update all Go dependencies (#3071)
* Upgrade Go dependencies

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Group Go dependency upgrades

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Remove usage of deprecated docker types

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Fix usage of invalid validation tags

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Regenerate bazel files

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

* Keep github.com/bazelbuild/buildtools at old version to not break other dependencies

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2024-05-08 17:31:47 +02:00
Daniel Weiße
1077b7a48e
bootstrapper: wipe disk and reboot on non-recoverable error (#2971)
* Let JoinClient return fatal errors
* Mark disk for wiping if JoinClient or InitServer return errors
* Reboot system if bootstrapper detects an error
* Refactor joinClient start/stop implementation
* Fix joining nodes retrying kubeadm 3 times in all cases
* Write non-recoverable failures to syslog before rebooting

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2024-03-12 11:43:38 +01:00
Malte Poll
281c7c320c deps: update protobuf to v1.33.0 2024-03-06 14:50:01 +01:00
Markus Rudy
03fbcafe68
bootstrapper: bounded retry of k8s join (#2968) 2024-03-05 09:14:01 +01:00
Malte Poll
522f2858c6 proto: update generated protobuf sources 2024-02-21 18:40:16 +01:00
Malte Poll
65903459a0 chore: fix unused parameter lint in new golangcilint version 2024-02-21 17:54:07 +01:00
miampf
54cce77bab
deps: convert zap to slog (#2825) 2024-02-08 14:20:01 +00:00
Moritz Sanft
901edd420b
terraform: remove cloud loggers (#2892)
* terraform: remove cloud logging apps

* internal/cloud: remove loggers

* bootstrapper: remove logging

* qemu-metadata-api: remove logging endpoint

* docs: add instructions on how to get boot logs

* bazel: tidy

* docs: fix typo

* cloud: remove unused types

* Update go.mod

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* bazel: tidy

* Update docs/docs/workflows/troubleshooting.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Update docs/docs/workflows/troubleshooting.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* Update docs/docs/workflows/troubleshooting.md

Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>

* docs: elaborate on how to get boot logs

* bazel: tidy

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
2024-02-06 14:27:30 +01:00
Malte Poll
3a5753045e goleak: ignore rules_go SIGTERM handler
rules_go added a SIGTERM handler that has a goroutine that survives the scope of the goleak check.
Currently, the best known workaround is to ignore this goroutine.

https://github.com/uber-go/goleak/issues/119
https://github.com/bazelbuild/rules_go/pull/3749
https://github.com/bazelbuild/rules_go/pull/3827#issuecomment-1894002120
2024-01-22 13:11:58 +01:00
Markus Rudy
ef6f63dc48
Fix various small things throughout the codebase (#2800)
* bootstrapper: remove obsolete log statement

* ci: simplify variable usage

Co-authored-by: Daniel Weiße <daniel-weisse@users.noreply.github.com>

* cli: add missing formatting directive

* helm: fix rm invocation

* ci: document reproducible-builds workflow

* constants: use variables for measurement files

* constants: use variables for CDN distribution ID

* ci: make Helm version explicit

* api: prettify versionsapi-list output

* ci: remove obsolete docstring

---------

Co-authored-by: Daniel Weiße <daniel-weisse@users.noreply.github.com>
2024-01-09 19:37:56 +01:00
Markus Rudy
ce9e25c150 bootstrapper: pass patches to kubeadm 2023-12-18 14:17:35 +01:00
Markus Rudy
a1dbd13f95 versions: consolidate various types of Components
There used to be three definitions of a Component type, and conversion
routines between the three. Since the use case is always the same, and
the Component semantics are defined by versions.go and the installer, it
seems appropriate to define the Component type there and import it in
the necessary places.
2023-12-11 14:26:54 +01:00
3u13r
63cdd03d09
Make Kubernetes serviceCIDR configurable in config (#2660)
* config: pass serviceCIDR to kubeadm init

* terraform: add serviceCIDR
2023-12-01 14:39:05 +01:00
Malte Poll
ee3ff9ac01 bazel: use patched RPATH in bootstrapper and disk-mapper binaries 2023-12-01 09:35:33 +01:00
Leonard Cohnen
cb88c7a5f3 kubernetes: remove unused struct 2023-11-15 19:27:33 +01:00
Leonard Cohnen
cfcc0898b2 helm: remove konnectivity from control-planes
This is the first step in our migration off of
konnectivity. Before node-to-node encryption
we used konnectivity to route some KubeAPI
to kubelet traffic over the pod network which then
would be encrypted.

Since we enabled node-to-node encryption this has no
security upsides anymore. Note that we still deploy
the konnectivity agents via helm and still have the
load balancer for konnectivity.

In the following releases we will remove both.
2023-11-15 19:27:33 +01:00
Leonard Cohnen
79f562374a bootstrapper: remove cilium restart fix
Tests concluded that restating the Cilium agent after the
first boot is not needed anymore to regain connectivity for
pods.
2023-11-15 19:27:33 +01:00
Leonard Cohnen
aae85f0c3c kubernetes: always use lb for joining
The token given out by control-planes contains the node IP
as an endpoint. Since during this stage the joining node is
not connected to the WireGuard network, we cannot
communicate node-to-node. Therefore, we need to hop over the
load balancer again to have a src IP outside of the strict
range.
2023-11-15 19:27:33 +01:00
Malte Poll
bf06a014a4 bootstrapper: ignore "journald" not in $PATH in constructor
In unit tests, NewCollector may be called on systems that do not have
"journalctl" in $PATH.
We can defer checking if the command can work by not checking cmd.Err in
the constructor.
2023-11-10 18:15:59 +01:00
3u13r
6c0a3b8efa
fix joining over lb (#2478) 2023-10-19 16:28:07 +02:00
3u13r
0c89f57ac5
Support internal load balancers (#2388)
* arch: support internal lb on Azure

* arch: support internal lb on GCP

* helm: remove lb svc from verify deployment

* arch: support internal lb on AWS

* terraform: add jump hosts for internal lb

* cli: expose internalLoadBalancer in config

* ci: add e2e-manual-internal

* add in-cluster endpoint to terraform output
2023-10-17 15:46:15 +02:00
Malte Poll
e4ed24ee4f image: fix bootstrapper install path 2023-10-10 10:33:54 +02:00
Malte Poll
200fc79e0c bootstrapper: package as tar 2023-09-27 17:58:19 +02:00
3u13r
2776e40df7
join: join over lb if available (#2348)
* join: join over lb if available
2023-09-25 10:23:35 +02:00
Daniel Weiße
afa7fd0edb
cli: refactor kubernetes package (#2232)
* Clean up CLI kubernetes package

* Rename CLI kubernetes pkg to kubecmd

* Unify kubernetes clients

* Refactor attestation config upgrade

* Update CODEOWNERS file

* Remove outdated GetMeasurementSalt

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-08-21 16:15:32 +02:00
Malte Poll
1f12541a36 bazel: allow "bazel test" to work without cgo dependencies 2023-08-18 16:36:13 +02:00
Adrian Stobbe
656cdbb4bb
remove unused CloudServiceAccountUri from init request (#2182) 2023-08-09 14:16:45 +02:00
Daniel Weiße
8dbe79500f
cli: fix incorrect usage of masterSecret salt for clusterID generation (#2169)
* Fix incorrect use of masterSecret salt for clusterID generation

Signed-off-by: Daniel Weiße <dw@edgeless.systems>

---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: Leonard Cohnen <lc@edgeless.systems>
2023-08-07 15:24:46 +02:00
3u13r
ee0adfe8c7
kubernetes: document total log size (#2164) 2023-08-04 18:17:36 +02:00
Adrian Stobbe
13eea1ca31
cli: install cilium in cli instead of bootstrapper (#2146)
* add wait and restartDS

* cilium working (tested on azure + gcp)

* clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* fixup! clean helm code from bootstrapper

* add patchnode for gcp

* fix gcp

* patch node inside bootstrapper

* apply renaming of client

* fixup! apply renaming of client

* otto feedback
2023-08-02 15:49:40 +02:00
Adrian Stobbe
26305e8f80
cli: install helm charts in cli instead of bootstrapper (#2136)
* init

* fixup! init

* gcp working?

* fixup! fixup! init

* azure cfg for microService installation

* fixup! azure cfg for microService installation

* fixup! azure cfg for microService installation

* cleanup bootstrapper code

* cleanup helminstall code

* fixup! cleanup helminstall code

* Update internal/deploy/helm/install.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* daniel feedback

* TODO add provider (also to CreateCluster) so we can ensure that provider specific output

* fixup! daniel feedback

* use debugLog in helm installer

* placeholderHelmInstaller

* rename to stub

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
2023-07-31 10:53:05 +02:00
Otto Bittner
1d5a8283e0
cli: use Semver type to represent microservice versions (#2125)
Previously we used strings to pass microservice versions. This invited
bugs due to missing input validation.
2023-07-25 14:20:25 +02:00
Daniel Weiße
7152633255
bootstrapper: refactor coredns and cilium setup (#2129)
* Decouple CoreDNS installation from Cilium

* Align cilium helm installation with other charts

* Remove unused functions
---------

Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-25 09:57:35 +02:00
Adrian Stobbe
a87b7894db
aws: use new LB controller to fix SecurityGroup cleanup on K8s service deletion (#2090)
* add current chart

add current helm chart

* disable service controller for aws ccm

* add new iam roles

* doc AWS internet LB + add to LB test

* pass clusterName to helm for AWS LB

* fix update-aws-lb chart to also include .helmignore

* move chart outside services

* working state

* add subnet tags for AWS subnet discovery

* fix .helmignore load rule with file in subdirectory

* upgrade iam profile

* revert new loader impl since cilium is not correctly loaded

* install chart if not already present during `upgrade apply`

* cleanup PR + fix build + add todos

cleanup PR + add todos

* shared helm pkg for cli install and bootstrapper

* add link to eks docs

* refactor iamMigrationCmd

* delete unused helm.symwallk

* move iammigrate to upgrade pkg

* fixup! delete unused helm.symwallk

* add to upgradecheck

* remove nodeSelector from go code (Otto)

* update iam docs and sort permission + remove duplicate roles

* fix bug in `upgrade check`

* better upgrade check output when svc version upgrade not possible

* pr feedback

* remove force flag in upgrade_test

* use upgrader.GetUpgradeID instead of extra type

* remove todos + fix check

* update doc lb (leo)

* remove bootstrapper helm package

* Update cli/internal/cmd/upgradecheck.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* final nits

* add docs for e2e upgrade test setup

* Apply suggestions from code review

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* Update cli/internal/helm/loader.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* Update cli/internal/cmd/tfmigrationclient.go

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>

* fix daniel review

* link to the iam permissions instead of manually updating them (agreed with leo)

* disable iam upgrade in upgrade apply

---------

Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
Co-authored-by: Malte Poll
2023-07-24 10:30:53 +02:00
Malte Poll
8da6a23aa5
bootstrapper: add fallback endpoint and custom endpoint to SAN field (#2108)
terraform: collect apiserver cert SANs and support custom endpoint

constants: add new constants for cluster configuration and custom endpoint

cloud: support apiserver cert sans and prepare for endpoint migration on AWS

config: add customEndpoint field

bootstrapper: use per-CSP apiserver cert SANs

cli: route customEndpoint to terraform and add migration for apiserver cert SANs

bootstrapper: change interface of GetLoadBalancerEndpoint to return host and port separately
2023-07-21 16:43:51 +02:00
Daniel Weiße
ea5c83587c Move CSI charts to separate chart and cleanup loader code
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-20 15:47:12 +02:00
Daniel Weiße
8619a90149 Deploy CSI snapshotter on init
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-20 15:47:12 +02:00
Daniel Weiße
6a40c73ff7
disk-mapper: set LUKS2 token to allow reusing unintialized state disks (#2083)
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
2023-07-18 16:20:03 +02:00