mirror of
https://github.com/edgelesssys/constellation.git
synced 2024-10-01 01:36:09 -04:00
docs: add observability page (#2384)
Co-authored-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> Co-authored-by: 3u13r <lc@edgeless.systems> Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
This commit is contained in:
parent
e938cc5e63
commit
7c76592a08
78
docs/docs/architecture/observability.md
Normal file
78
docs/docs/architecture/observability.md
Normal file
@ -0,0 +1,78 @@
|
||||
# Observability
|
||||
|
||||
In Kubernetes, observability is the ability to gain insight into the behavior and performance of applications.
|
||||
It helps identify and resolve issues more effectively, ensuring stability and performance of Kubernetes workloads, reducing downtime and outages, and improving efficiency.
|
||||
The "three pillars of observability" are logs, metrics, and traces.
|
||||
|
||||
In the context of Confidential Computing, observability is a delicate subject and needs to be applied such that it doesn't leak any sensitive information.
|
||||
The following gives an overview of where and how you can apply standard observability tools in Constellation.
|
||||
|
||||
## Cloud resource monitoring
|
||||
|
||||
While inaccessible, Constellation's nodes are still visible as black box VMs to the hypervisor.
|
||||
Resource consumption, such as memory and CPU utilization, can be monitored from the outside and observed via the cloud platforms directly.
|
||||
Similarly, other resources, such as storage and network and their respective metrics, are visible via the cloud platform.
|
||||
|
||||
## Metrics
|
||||
|
||||
Metrics are numeric representations of data measured over intervals of time. They're essential for understanding system health and gaining insights using telemetry signals.
|
||||
|
||||
By default, Constellation exposes the [metrics for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/) inside the cluster.
|
||||
Similarly, the [etcd metrics](https://etcd.io/docs/v3.5/metrics/) endpoints are exposed inside the cluster.
|
||||
These [metrics endpoints can be disabled](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#disabling-metrics).
|
||||
|
||||
You can collect these cluster-internal metrics via tools such as [Prometheus](https://prometheus.io/) or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||
|
||||
Constellation's CNI Cilium also supports [metrics via Prometheus endpoints](https://docs.cilium.io/en/latest/observability/metrics/).
|
||||
However, in Constellation, they're disabled by default and must be enabled first.
|
||||
|
||||
## Logs
|
||||
|
||||
Logs represent discrete events that usually describe what's happening with your service.
|
||||
The payload is an actual message emitted from your system along with a metadata section containing a timestamp, labels, and tracking identifiers.
|
||||
|
||||
### System logs
|
||||
|
||||
Constellation uses cloud logging for events occurring during the early stages of a node's boot process.
|
||||
These logs include [Bootstrapper](./microservices.md#bootstrapper) events and [state disk UUIDs](../architecture/images.md#state-disk).
|
||||
You can access the cloud logging [directly via the cloud provider endpoints](../workflows/troubleshooting.md#cloud-logging).
|
||||
|
||||
More detailed system-level logs are accessible via `/var/log` and [journald](https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html) on the nodes directly.
|
||||
They can be collected from there, for example, via [Filebeat and Logstash](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html), which are tools of the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||
|
||||
In case of an error during the initialization, the CLI automatically collects the [Bootstrapper](./microservices.md#bootstrapper) logs and returns these as a file for [troubleshooting](../workflows/troubleshooting.md). Here is an example of such an event:
|
||||
|
||||
```shell-session
|
||||
Cluster initialization failed. This error is not recoverable.
|
||||
Terminate your cluster and try again.
|
||||
Fetched bootstrapper logs are stored in "constellation-cluster.log"
|
||||
```
|
||||
|
||||
### Kubernetes logs
|
||||
|
||||
Constellation supports the [Kubernetes logging architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
|
||||
By default, logs are written to the nodes' encrypted state disks.
|
||||
These include the Pod and container logs and the [system component logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs).
|
||||
|
||||
[Constellation services](microservices.md) run as Pods inside the `kube-system` namespace and use the standard container logging mechanism.
|
||||
The same applies for the [Cilium Pods](https://docs.cilium.io/en/latest/operations/troubleshooting/#logs).
|
||||
|
||||
You can collect logs from within the cluster via tools such as [Fluentd](https://github.com/fluent/fluentd), [Loki](https://github.com/grafana/loki), or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||
|
||||
## Traces
|
||||
|
||||
Modern systems are implemented as interconnected complex and distributed microservices. Understanding request flows and system communications is challenging, mainly because all systems in a chain need to be modified to propagate tracing information. Distributed tracing is a new approach to increasing observability and understanding performance bottlenecks. A trace represents consecutive events that reflect an end-to-end request path in a distributed system.
|
||||
|
||||
Constellation supports [traces for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-traces/).
|
||||
By default, they're disabled and need to be enabled first.
|
||||
|
||||
Similarly, Cilium can be enabled to [export traces](https://cilium.io/use-cases/metrics-export/).
|
||||
|
||||
You can collect these traces via tools such as [Jaeger](https://www.jaegertracing.io/) or [Zipkin](https://zipkin.io/).
|
||||
|
||||
## Integrations
|
||||
|
||||
Platforms and SaaS solutions such as Datadog, logz.io, Dynatrace, or New Relic facilitate the observability challenge for Kubernetes and provide all-in-one SaaS solutions.
|
||||
They install agents into the cluster that collect metrics, logs, and tracing information and upload them into the data lake of the platform.
|
||||
Technically, the agent-based approach is compatible with Constellation, and attaching these platforms is straightforward.
|
||||
However, you need to evaluate if the exported data might violate Constellation's compliance and privacy guarantees by uploading them to a third-party platform.
|
@ -22,3 +22,9 @@ You can learn more about [the images](images.md) and how verified boot ensures t
|
||||
## About key management and cryptographic primitives
|
||||
|
||||
Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
|
||||
|
||||
## About observability
|
||||
|
||||
Observability in Kubernetes refers to the capability to swiftly troubleshoot issues using telemetry signals such as logs, metrics, and traces.
|
||||
In the realm of Confidential Computing, it's crucial that observability aligns with confidentiality, necessitating careful implementation.
|
||||
Learn more about the [observability capabilities in Constellation](./observability.md).
|
||||
|
@ -254,6 +254,11 @@ const sidebars = {
|
||||
label: 'Networking',
|
||||
id: 'architecture/networking',
|
||||
},
|
||||
{
|
||||
type: 'doc',
|
||||
label: 'Observability',
|
||||
id: 'architecture/observability',
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
|
@ -15,11 +15,15 @@ Bootstrapper
|
||||
config
|
||||
cyber
|
||||
datacenter
|
||||
Datadog
|
||||
deallocate
|
||||
Dockerfile
|
||||
Dynatrace
|
||||
[Ee]mojivoto
|
||||
etcd
|
||||
Filebeat
|
||||
Filestore
|
||||
Fluentd
|
||||
Fulcio
|
||||
Mbps
|
||||
Gbps
|
||||
@ -30,12 +34,14 @@ iam
|
||||
IAM
|
||||
iodepth
|
||||
initramfs
|
||||
journald
|
||||
[Kk]3s
|
||||
Kata
|
||||
kubeadm
|
||||
kubectl
|
||||
kubelet
|
||||
libcryptsetup
|
||||
Logstash
|
||||
MicroK8s
|
||||
[Mm]inikube
|
||||
namespace
|
||||
@ -45,11 +51,13 @@ Rekor
|
||||
resizable
|
||||
rollout
|
||||
sigstore
|
||||
[Ss]uperset
|
||||
Syft
|
||||
systemd
|
||||
[Uu]nencrypted
|
||||
unspoofable
|
||||
updatable
|
||||
UUID
|
||||
proxied
|
||||
QEMU
|
||||
virsh
|
||||
@ -58,4 +66,4 @@ whitepaper
|
||||
WireGuard
|
||||
Xeon
|
||||
xsltproc
|
||||
[Ss]uperset
|
||||
Zipkin
|
||||
|
@ -0,0 +1,78 @@
|
||||
# Observability
|
||||
|
||||
In Kubernetes, observability is the ability to gain insight into the behavior and performance of applications.
|
||||
It helps identify and resolve issues more effectively, ensuring stability and performance of Kubernetes workloads, reducing downtime and outages, and improving efficiency.
|
||||
The "three pillars of observability" are logs, metrics, and traces.
|
||||
|
||||
In the context of Confidential Computing, observability is a delicate subject and needs to be applied such that it doesn't leak any sensitive information.
|
||||
The following gives an overview of where and how you can apply standard observability tools in Constellation.
|
||||
|
||||
## Cloud resource monitoring
|
||||
|
||||
While inaccessible, Constellation's nodes are still visible as black box VMs to the hypervisor.
|
||||
Resource consumption, such as memory and CPU utilization, can be monitored from the outside and observed via the cloud platforms directly.
|
||||
Similarly, other resources, such as storage and network and their respective metrics, are visible via the cloud platform.
|
||||
|
||||
## Metrics
|
||||
|
||||
Metrics are numeric representations of data measured over intervals of time. They're essential for understanding system health and gaining insights using telemetry signals.
|
||||
|
||||
By default, Constellation exposes the [metrics for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/) inside the cluster.
|
||||
Similarly, the [etcd metrics](https://etcd.io/docs/v3.5/metrics/) endpoints are exposed inside the cluster.
|
||||
These [metrics endpoints can be disabled](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#disabling-metrics).
|
||||
|
||||
You can collect these cluster-internal metrics via tools such as [Prometheus](https://prometheus.io/) or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||
|
||||
Constellation's CNI Cilium also supports [metrics via Prometheus endpoints](https://docs.cilium.io/en/latest/observability/metrics/).
|
||||
However, in Constellation, they're disabled by default and must be enabled first.
|
||||
|
||||
## Logs
|
||||
|
||||
Logs represent discrete events that usually describe what's happening with your service.
|
||||
The payload is an actual message emitted from your system along with a metadata section containing a timestamp, labels, and tracking identifiers.
|
||||
|
||||
### System logs
|
||||
|
||||
Constellation uses cloud logging for events occurring during the early stages of a node's boot process.
|
||||
These logs include [Bootstrapper](./microservices.md#bootstrapper) events and [state disk UUIDs](../architecture/images.md#state-disk).
|
||||
You can access the cloud logging [directly via the cloud provider endpoints](../workflows/troubleshooting.md#cloud-logging).
|
||||
|
||||
More detailed system-level logs are accessible via `/var/log` and [journald](https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html) on the nodes directly.
|
||||
They can be collected from there, for example, via [Filebeat and Logstash](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html), which are tools of the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||
|
||||
In case of an error during the initialization, the CLI automatically collects the [Bootstrapper](./microservices.md#bootstrapper) logs and returns these as a file for [troubleshooting](../workflows/troubleshooting.md). Here is an example of such an event:
|
||||
|
||||
```shell-session
|
||||
Cluster initialization failed. This error is not recoverable.
|
||||
Terminate your cluster and try again.
|
||||
Fetched bootstrapper logs are stored in "constellation-cluster.log"
|
||||
```
|
||||
|
||||
### Kubernetes logs
|
||||
|
||||
Constellation supports the [Kubernetes logging architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
|
||||
By default, logs are written to the nodes' encrypted state disks.
|
||||
These include the Pod and container logs and the [system component logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs).
|
||||
|
||||
[Constellation services](microservices.md) run as Pods inside the `kube-system` namespace and use the standard container logging mechanism.
|
||||
The same applies for the [Cilium Pods](https://docs.cilium.io/en/latest/operations/troubleshooting/#logs).
|
||||
|
||||
You can collect logs from within the cluster via tools such as [Fluentd](https://github.com/fluent/fluentd), [Loki](https://github.com/grafana/loki), or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||
|
||||
## Traces
|
||||
|
||||
Modern systems are implemented as interconnected complex and distributed microservices. Understanding request flows and system communications is challenging, mainly because all systems in a chain need to be modified to propagate tracing information. Distributed tracing is a new approach to increasing observability and understanding performance bottlenecks. A trace represents consecutive events that reflect an end-to-end request path in a distributed system.
|
||||
|
||||
Constellation supports [traces for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-traces/).
|
||||
By default, they're disabled and need to be enabled first.
|
||||
|
||||
Similarly, Cilium can be enabled to [export traces](https://cilium.io/use-cases/metrics-export/).
|
||||
|
||||
You can collect these traces via tools such as [Jaeger](https://www.jaegertracing.io/) or [Zipkin](https://zipkin.io/).
|
||||
|
||||
## Integrations
|
||||
|
||||
Platforms and SaaS solutions such as Datadog, logz.io, Dynatrace, or New Relic facilitate the observability challenge for Kubernetes and provide all-in-one SaaS solutions.
|
||||
They install agents into the cluster that collect metrics, logs, and tracing information and upload them into the data lake of the platform.
|
||||
Technically, the agent-based approach is compatible with Constellation, and attaching these platforms is straightforward.
|
||||
However, you need to evaluate if the exported data might violate Constellation's compliance and privacy guarantees by uploading them to a third-party platform.
|
@ -22,3 +22,9 @@ You can learn more about [the images](images.md) and how verified boot ensures t
|
||||
## About key management and cryptographic primitives
|
||||
|
||||
Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
|
||||
|
||||
## About observability
|
||||
|
||||
Observability in Kubernetes refers to the capability to swiftly troubleshoot issues using telemetry signals such as logs, metrics, and traces.
|
||||
In the realm of Confidential Computing, it's crucial that observability aligns with confidentiality, necessitating careful implementation.
|
||||
Learn more about the [observability capabilities in Constellation](./observability.md).
|
||||
|
@ -233,6 +233,11 @@
|
||||
"type": "doc",
|
||||
"label": "Networking",
|
||||
"id": "architecture/networking"
|
||||
},
|
||||
{
|
||||
"type": "doc",
|
||||
"label": "Observability",
|
||||
"id": "architecture/observability"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
Loading…
Reference in New Issue
Block a user