constellation/docs/versioned_docs/version-2.9/workflows/troubleshooting.md
2024-08-23 22:45:37 +02:00

5.7 KiB

Troubleshooting

This section aids you in finding problems when working with Constellation.

Common issues

Issues with creating new clusters

When you create a new cluster, you should always use the latest release. If something doesn't work, check out the known issues.

Azure: Resource Providers can't be registered

On Azure, you may receive the following error when running create or terminate with limited IAM permissions:

Error: Error ensuring Resource Providers are registered.

Terraform automatically attempts to register the Resource Providers it supports to
ensure it's able to provision resources.

If you don't have permission to register Resource Providers you may wish to use the
"skip_provider_registration" flag in the Provider block to disable this functionality.

[...]

To continue, please ensure that the required resource providers have been registered in your subscription by your administrator.

Afterward, set ARM_SKIP_PROVIDER_REGISTRATION=true as an environment variable and either run create or terminate again. For example:

ARM_SKIP_PROVIDER_REGISTRATION=true constellation create --control-plane-nodes 1 --worker-nodes 2 -y

Or alternatively, for terminate:

ARM_SKIP_PROVIDER_REGISTRATION=true constellation terminate

Nodes fail to join with error untrusted PCR value

This error indicates that a node's attestation statement contains measurements that don't match the trusted values expected by the JoinService. This may for example happen if the cloud provider updates the VM's firmware such that it influences the runtime measurements in an unforeseen way. You can change the expected measurements to resolve the failure.

:::caution

Attestation and trusted measurements are crucial for the security of your cluster. Be extra careful when manually changing these settings. When in doubt, check if the encountered issue is known or contact support.

:::

You can use the upgrade apply command to change measurements of a running cluster:

  1. Modify the measurements key in your local constellation-conf.yaml to the expected values.
  2. Run constellation upgrade apply.

Keep in mind that running upgrade apply also applies any version changes from your config to the cluster.

You can run these commands to learn about the versions currently configured in the cluster:

  • Kubernetes API server version: kubectl get nodeversion constellation-version -o json -n kube-system | jq .spec.kubernetesClusterVersion
  • image version: kubectl get nodeversion constellation-version -o json -n kube-system | jq .spec.imageVersion
  • microservices versions: helm list --filter 'constellation-services' -n kube-system

Diagnosing issues

Cloud logging

To provide information during early stages of a node's boot process, Constellation logs messages to the log systems of the cloud providers. Since these offerings aren't confidential, only generic information without any sensitive values is stored. This provides administrators with a high-level understanding of the current state of a node.

You can view this information in the following places:

  1. In your Azure subscription find the Constellation resource group.
  2. Inside the resource group find the Application Insights resource called constellation-insights-*.
  3. On the left-hand side go to Logs, which is located in the section Monitoring.
    • Close the Queries page if it pops up.
  4. In the query text field type in traces, and click Run.

To find the disk UUIDs use the following query: traces | where message contains "Disk UUID"

  1. Select the project that hosts Constellation.
  2. Go to the Compute Engine service.
  3. On the right-hand side of a VM entry select More Actions (a stacked ellipsis)
    • Select View logs

To find the disk UUIDs use the following query: resource.type="gce_instance" text_payload=~"Disk UUID:.*\n" logName=~".*/constellation-boot-log"

:::info

Constellation uses the default bucket to store logs. Its default retention period is 30 days.

:::

  1. Open AWS CloudWatch
  2. Select Log Groups
  3. Select the log group that matches the name of your cluster.
  4. Select the log stream for control or worker type nodes.

Node shell access

Debugging via a shell on a node is directly supported by Kubernetes.

  1. Figure out which node to connect to:

    kubectl get nodes
    # or to see more information, such as IPs:
    kubectl get nodes -o wide
    
  2. Connect to the node:

    kubectl debug node/constell-worker-xksa0-000000 -it --image=busybox
    

    You will be presented with a prompt.

    The nodes file system is mounted at /host.

  3. Once finished, clean up the debug pod:

    kubectl delete pod node-debugger-constell-worker-xksa0-000000-bjthj