docs: add observability page (#2384)

Co-authored-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> Co-authored-by: 3u13r <lc@edgeless.systems> Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
2025-08-13 17:25:32 -04:00 · 2023-10-04 09:37:46 +02:00 · 2023-10-04 09:37:46 +02:00 · 7c76592a08
commit 7c76592a08
parent e938cc5e63
7 changed files with 187 additions and 1 deletions
--- a/docs/docs/architecture/observability.md
+++ b/docs/docs/architecture/observability.md
@ -0,0 +1,78 @@
 # Observability
 In Kubernetes, observability is the ability to gain insight into the behavior and performance of applications.
 It helps identify and resolve issues more effectively, ensuring stability and performance of Kubernetes workloads, reducing downtime and outages, and improving efficiency.
 The "three pillars of observability" are logs, metrics, and traces.
 In the context of Confidential Computing, observability is a delicate subject and needs to be applied such that it doesn't leak any sensitive information.
 The following gives an overview of where and how you can apply standard observability tools in Constellation.
 ## Cloud resource monitoring
 While inaccessible, Constellation's nodes are still visible as black box VMs to the hypervisor.
 Resource consumption, such as memory and CPU utilization, can be monitored from the outside and observed via the cloud platforms directly.
 Similarly, other resources, such as storage and network and their respective metrics, are visible via the cloud platform.
 ## Metrics
 Metrics are numeric representations of data measured over intervals of time. They're essential for understanding system health and gaining insights using telemetry signals.
 By default, Constellation exposes the [metrics for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/) inside the cluster.
 Similarly, the [etcd metrics](https://etcd.io/docs/v3.5/metrics/) endpoints are exposed inside the cluster.
 These [metrics endpoints can be disabled](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#disabling-metrics).
 You can collect these cluster-internal metrics via tools such as [Prometheus](https://prometheus.io/) or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
 Constellation's CNI Cilium also supports [metrics via Prometheus endpoints](https://docs.cilium.io/en/latest/observability/metrics/).
 However, in Constellation, they're disabled by default and must be enabled first.
 ## Logs
 Logs represent discrete events that usually describe what's happening with your service.
 The payload is an actual message emitted from your system along with a metadata section containing a timestamp, labels, and tracking identifiers.
 ### System logs
 Constellation uses cloud logging for events occurring during the early stages of a node's boot process.
 These logs include [Bootstrapper](./microservices.md#bootstrapper) events and [state disk UUIDs](../architecture/images.md#state-disk).
 You can access the cloud logging [directly via the cloud provider endpoints](../workflows/troubleshooting.md#cloud-logging).
 More detailed system-level logs are accessible via `/var/log` and [journald](https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html) on the nodes directly.
 They can be collected from there, for example, via [Filebeat and Logstash](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html), which are tools of the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
 In case of an error during the initialization, the CLI automatically collects the [Bootstrapper](./microservices.md#bootstrapper) logs and returns these as a file for [troubleshooting](../workflows/troubleshooting.md). Here is an example of such an event:
 ```shell-session
 Cluster initialization failed. This error is not recoverable.
 Terminate your cluster and try again.
 Fetched bootstrapper logs are stored in "constellation-cluster.log"
 ```
 ### Kubernetes logs
 Constellation supports the [Kubernetes logging architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
 By default, logs are written to the nodes' encrypted state disks.
 These include the Pod and container logs and the [system component logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs).
 [Constellation services](microservices.md) run as Pods inside the `kube-system` namespace and use the standard container logging mechanism.
 The same applies for the [Cilium Pods](https://docs.cilium.io/en/latest/operations/troubleshooting/#logs).
 You can collect logs from within the cluster via tools such as [Fluentd](https://github.com/fluent/fluentd), [Loki](https://github.com/grafana/loki), or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
 ## Traces
 Modern systems are implemented as interconnected complex and distributed microservices. Understanding request flows and system communications is challenging, mainly because all systems in a chain need to be modified to propagate tracing information. Distributed tracing is a new approach to increasing observability and understanding performance bottlenecks. A trace represents consecutive events that reflect an end-to-end request path in a distributed system.
 Constellation supports [traces for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-traces/).
 By default, they're disabled and need to be enabled first.
 Similarly, Cilium can be enabled to [export traces](https://cilium.io/use-cases/metrics-export/).
 You can collect these traces via tools such as [Jaeger](https://www.jaegertracing.io/) or [Zipkin](https://zipkin.io/).
 ## Integrations
 Platforms and SaaS solutions such as Datadog, logz.io, Dynatrace, or New Relic facilitate the observability challenge for Kubernetes and provide all-in-one SaaS solutions.
 They install agents into the cluster that collect metrics, logs, and tracing information and upload them into the data lake of the platform.
 Technically, the agent-based approach is compatible with Constellation, and attaching these platforms is straightforward.
 However, you need to evaluate if the exported data might violate Constellation's compliance and privacy guarantees by uploading them to a third-party platform.
--- a/docs/docs/architecture/overview.md
+++ b/docs/docs/architecture/overview.md
@ -22,3 +22,9 @@ You can learn more about [the images](images.md) and how verified boot ensures t
 ## About key management and cryptographic primitives
 Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
 ## About observability
 Observability in Kubernetes refers to the capability to swiftly troubleshoot issues using telemetry signals such as logs, metrics, and traces.
 In the realm of Confidential Computing, it's crucial that observability aligns with confidentiality, necessitating careful implementation.
 Learn more about the [observability capabilities in Constellation](./observability.md).
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@ -254,6 +254,11 @@ const sidebars = {
          label: 'Networking',
          id: 'architecture/networking',
        },
        {
          type: 'doc',
          label: 'Observability',
          id: 'architecture/observability',
        },
      ],
    },
    {
--- a/docs/styles/Vocab/constellation/accept.txt
+++ b/docs/styles/Vocab/constellation/accept.txt
@ -15,11 +15,15 @@ Bootstrapper
 config
 cyber
 datacenter
 Datadog
 deallocate
 Dockerfile
 Dynatrace
 [Ee]mojivoto
 etcd
 Filebeat
 Filestore
 Fluentd
 Fulcio
 Mbps
 Gbps
@ -30,12 +34,14 @@ iam
 IAM
 iodepth
 initramfs
 journald
 [Kk]3s
 Kata
 kubeadm
 kubectl
 kubelet
 libcryptsetup
 Logstash
 MicroK8s
 [Mm]inikube
 namespace
@ -45,11 +51,13 @@ Rekor
 resizable
 rollout
 sigstore
 [Ss]uperset
 Syft
 systemd
 [Uu]nencrypted
 unspoofable
 updatable
 UUID
 proxied
 QEMU
 virsh
@ -58,4 +66,4 @@ whitepaper
 WireGuard
 Xeon
 xsltproc
-[Ss]uperset
+Zipkin
--- a/docs/versioned_docs/version-2.11/architecture/observability.md
+++ b/docs/versioned_docs/version-2.11/architecture/observability.md
@ -0,0 +1,78 @@
 # Observability
 In Kubernetes, observability is the ability to gain insight into the behavior and performance of applications.
 It helps identify and resolve issues more effectively, ensuring stability and performance of Kubernetes workloads, reducing downtime and outages, and improving efficiency.
 The "three pillars of observability" are logs, metrics, and traces.
 In the context of Confidential Computing, observability is a delicate subject and needs to be applied such that it doesn't leak any sensitive information.
 The following gives an overview of where and how you can apply standard observability tools in Constellation.
 ## Cloud resource monitoring
 While inaccessible, Constellation's nodes are still visible as black box VMs to the hypervisor.
 Resource consumption, such as memory and CPU utilization, can be monitored from the outside and observed via the cloud platforms directly.
 Similarly, other resources, such as storage and network and their respective metrics, are visible via the cloud platform.
 ## Metrics
 Metrics are numeric representations of data measured over intervals of time. They're essential for understanding system health and gaining insights using telemetry signals.
 By default, Constellation exposes the [metrics for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/) inside the cluster.
 Similarly, the [etcd metrics](https://etcd.io/docs/v3.5/metrics/) endpoints are exposed inside the cluster.
 These [metrics endpoints can be disabled](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#disabling-metrics).
 You can collect these cluster-internal metrics via tools such as [Prometheus](https://prometheus.io/) or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
 Constellation's CNI Cilium also supports [metrics via Prometheus endpoints](https://docs.cilium.io/en/latest/observability/metrics/).
 However, in Constellation, they're disabled by default and must be enabled first.
 ## Logs
 Logs represent discrete events that usually describe what's happening with your service.
 The payload is an actual message emitted from your system along with a metadata section containing a timestamp, labels, and tracking identifiers.
 ### System logs
 Constellation uses cloud logging for events occurring during the early stages of a node's boot process.
 These logs include [Bootstrapper](./microservices.md#bootstrapper) events and [state disk UUIDs](../architecture/images.md#state-disk).
 You can access the cloud logging [directly via the cloud provider endpoints](../workflows/troubleshooting.md#cloud-logging).
 More detailed system-level logs are accessible via `/var/log` and [journald](https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html) on the nodes directly.
 They can be collected from there, for example, via [Filebeat and Logstash](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html), which are tools of the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
 In case of an error during the initialization, the CLI automatically collects the [Bootstrapper](./microservices.md#bootstrapper) logs and returns these as a file for [troubleshooting](../workflows/troubleshooting.md). Here is an example of such an event:
 ```shell-session
 Cluster initialization failed. This error is not recoverable.
 Terminate your cluster and try again.
 Fetched bootstrapper logs are stored in "constellation-cluster.log"
 ```
 ### Kubernetes logs
 Constellation supports the [Kubernetes logging architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
 By default, logs are written to the nodes' encrypted state disks.
 These include the Pod and container logs and the [system component logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs).
 [Constellation services](microservices.md) run as Pods inside the `kube-system` namespace and use the standard container logging mechanism.
 The same applies for the [Cilium Pods](https://docs.cilium.io/en/latest/operations/troubleshooting/#logs).
 You can collect logs from within the cluster via tools such as [Fluentd](https://github.com/fluent/fluentd), [Loki](https://github.com/grafana/loki), or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
 ## Traces
 Modern systems are implemented as interconnected complex and distributed microservices. Understanding request flows and system communications is challenging, mainly because all systems in a chain need to be modified to propagate tracing information. Distributed tracing is a new approach to increasing observability and understanding performance bottlenecks. A trace represents consecutive events that reflect an end-to-end request path in a distributed system.
 Constellation supports [traces for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-traces/).
 By default, they're disabled and need to be enabled first.
 Similarly, Cilium can be enabled to [export traces](https://cilium.io/use-cases/metrics-export/).
 You can collect these traces via tools such as [Jaeger](https://www.jaegertracing.io/) or [Zipkin](https://zipkin.io/).
 ## Integrations
 Platforms and SaaS solutions such as Datadog, logz.io, Dynatrace, or New Relic facilitate the observability challenge for Kubernetes and provide all-in-one SaaS solutions.
 They install agents into the cluster that collect metrics, logs, and tracing information and upload them into the data lake of the platform.
 Technically, the agent-based approach is compatible with Constellation, and attaching these platforms is straightforward.
 However, you need to evaluate if the exported data might violate Constellation's compliance and privacy guarantees by uploading them to a third-party platform.
--- a/docs/versioned_docs/version-2.11/architecture/overview.md
+++ b/docs/versioned_docs/version-2.11/architecture/overview.md
@ -22,3 +22,9 @@ You can learn more about [the images](images.md) and how verified boot ensures t
 ## About key management and cryptographic primitives
 Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
 ## About observability
 Observability in Kubernetes refers to the capability to swiftly troubleshoot issues using telemetry signals such as logs, metrics, and traces.
 In the realm of Confidential Computing, it's crucial that observability aligns with confidentiality, necessitating careful implementation.
 Learn more about the [observability capabilities in Constellation](./observability.md).
--- a/docs/versioned_sidebars/version-2.11-sidebars.json
+++ b/docs/versioned_sidebars/version-2.11-sidebars.json
@ -233,6 +233,11 @@
          "type": "doc",
          "label": "Networking",
          "id": "architecture/networking"
        },
        {
          "type": "doc",
          "label": "Observability",
          "id": "architecture/observability"
        }
      ]
    },