* workflows: add error handling to e2e windows liveness probe
* update retry condition in last iteration
* Update liveness probe to check for correct number of nodes
* ci: fix Windows e2e test not pushing required container images (#3021)
* More output when waiting for nodes to get ready
* Create unique resource group name for Windows e2e test
* Push container images on windows CLI build to fix e2e test
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
* Fix resource group naming
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
Co-authored-by: Daniel Weiße <66256922+daniel-weisse@users.noreply.github.com>
Co-authored-by: Daniel Weiße <dw@edgeless.systems>
* Only delete namespace if its deletable
* For "default" namespace, delete all resources in that namespace
* For "kube-system" namespace, delete all PVCs in that namespace
* Don't abort terminate action if PVC deletion fails
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
* ci: make waiting for nodes more robust
After initializing the cluster, a lot of things happen in parallel and
are potentially getting in each others' way: nodes are joining,
daemonsets are proliferating, the network is being set up. During this
period, it's not unusual that the Kubernetes API server is unavailable
for a short time, e.g. due to etcd loosing quorum or load balancing
changes.
This period of instability has the potential to affect all kubectl
commands negatively, leading to problems especially for tests, where
command failures often lead to test failures. On the other hand, we'd
expect everything to be quite stable after the initial dust settles.
Therefore, this commit changes how we wait after initializing a cluster.
Until we have a reasonable expectation of readiness, we ignore command
failures and wait for things to stabilize. The cluster is considered
stable once all configured nodes and all API servers report ready.
* Let JoinClient return fatal errors
* Mark disk for wiping if JoinClient or InitServer return errors
* Reboot system if bootstrapper detects an error
* Refactor joinClient start/stop implementation
* Fix joining nodes retrying kubeadm 3 times in all cases
* Write non-recoverable failures to syslog before rebooting
---------
Signed-off-by: Daniel Weiße <dw@edgeless.systems>
The OpenStack credentials (username and password) can now be retrieved
from the "clouds.yaml" by the Constellation CLI and terraform code.
This simplifies the configuration for end-users.