mirror of
https://github.com/edgelesssys/constellation.git
synced 2025-01-11 07:29:29 -05:00
docs: explain recovery steps on STACKIT
This commit is contained in:
parent
220f292181
commit
63b9761962
@ -118,6 +118,37 @@ If this fails due to an unhealthy control plane, you will see log messages simil
|
||||
|
||||
This means that you have to recover the node manually.
|
||||
|
||||
</tabItem>
|
||||
<tabItem value="stackit" label="STACKIT">
|
||||
|
||||
First, open the STACKIT portal to view all servers in your project. Select individual control plane nodes `<cluster-name>-<UID>-control-plane-<UID>-<index>` and check that enough members are in a *Running* state.
|
||||
|
||||
Second, check the boot logs of these servers. Click on a server name and select **Overview**. Find the **Machine Setup** section and click on **Web console** > **Open console**.
|
||||
|
||||
In the serial console output, search for `Waiting for decryption key`.
|
||||
Similar output to the following means your node was restarted and needs to decrypt the [state disk](../architecture/images.md#state-disk):
|
||||
|
||||
```json
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","caller":"cmd/main.go:55","msg":"Starting disk-mapper","version":"2.0.0","cloudProvider":"gcp"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"setupManager","caller":"setup/setup.go:72","msg":"Preparing existing state disk"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:65","msg":"Starting RejoinClient"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"recoveryServer","caller":"recoveryserver/server.go:59","msg":"Starting RecoveryServer"}
|
||||
```
|
||||
|
||||
The node will then try to connect to the [*JoinService*](../architecture/microservices.md#joinservice) and obtain the decryption key.
|
||||
If this fails due to an unhealthy control plane, you will see log messages similar to the following:
|
||||
|
||||
```json
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:77","msg":"Received list with JoinService endpoints","endpoints":["192.168.178.4:30090","192.168.178.2:30090"]}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:96","msg":"Requesting rejoin ticket","endpoint":"192.168.178.4:30090"}
|
||||
{"level":"WARN","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:101","msg":"Failed to rejoin on endpoint","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 192.168.178.4:30090: connect: connection refused\"","endpoint":"192.168.178.4:30090"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:96","msg":"Requesting rejoin ticket","endpoint":"192.168.178.2:30090"}
|
||||
{"level":"WARN","ts":"2022-09-08T10:22:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:101","msg":"Failed to rejoin on endpoint","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 192.168.178.2:30090: i/o timeout\"","endpoint":"192.168.178.2:30090"}
|
||||
{"level":"ERROR","ts":"2022-09-08T10:22:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"}
|
||||
```
|
||||
|
||||
This means that you have to recover the node manually.
|
||||
|
||||
</tabItem>
|
||||
</tabs>
|
||||
|
||||
|
@ -118,6 +118,37 @@ If this fails due to an unhealthy control plane, you will see log messages simil
|
||||
|
||||
This means that you have to recover the node manually.
|
||||
|
||||
</tabItem>
|
||||
<tabItem value="stackit" label="STACKIT">
|
||||
|
||||
First, open the STACKIT portal to view all servers in your project. Select individual control plane nodes `<cluster-name>-<UID>-control-plane-<UID>-<index>` and check that enough members are in a *Running* state.
|
||||
|
||||
Second, check the boot logs of these *Servers*. Click on a server name and select **Overview**. Find the **Machine Setup** section and click on **Web console** > **Open console**.
|
||||
|
||||
In the serial console output, search for `Waiting for decryption key`.
|
||||
Similar output to the following means your node was restarted and needs to decrypt the [state disk](../architecture/images.md#state-disk):
|
||||
|
||||
```json
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","caller":"cmd/main.go:55","msg":"Starting disk-mapper","version":"2.0.0","cloudProvider":"gcp"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"setupManager","caller":"setup/setup.go:72","msg":"Preparing existing state disk"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:65","msg":"Starting RejoinClient"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"recoveryServer","caller":"recoveryserver/server.go:59","msg":"Starting RecoveryServer"}
|
||||
```
|
||||
|
||||
The node will then try to connect to the [*JoinService*](../architecture/microservices.md#joinservice) and obtain the decryption key.
|
||||
If this fails due to an unhealthy control plane, you will see log messages similar to the following:
|
||||
|
||||
```json
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:77","msg":"Received list with JoinService endpoints","endpoints":["192.168.178.4:30090","192.168.178.2:30090"]}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:96","msg":"Requesting rejoin ticket","endpoint":"192.168.178.4:30090"}
|
||||
{"level":"WARN","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:101","msg":"Failed to rejoin on endpoint","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 192.168.178.4:30090: connect: connection refused\"","endpoint":"192.168.178.4:30090"}
|
||||
{"level":"INFO","ts":"2022-09-08T10:21:53Z","logger":"rejoinClient","caller":"rejoinclient/client.go:96","msg":"Requesting rejoin ticket","endpoint":"192.168.178.2:30090"}
|
||||
{"level":"WARN","ts":"2022-09-08T10:22:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:101","msg":"Failed to rejoin on endpoint","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 192.168.178.2:30090: i/o timeout\"","endpoint":"192.168.178.2:30090"}
|
||||
{"level":"ERROR","ts":"2022-09-08T10:22:13Z","logger":"rejoinClient","caller":"rejoinclient/client.go:110","msg":"Failed to rejoin on all endpoints"}
|
||||
```
|
||||
|
||||
This means that you have to recover the node manually.
|
||||
|
||||
</tabItem>
|
||||
</tabs>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user