mirror of
https://github.com/edgelesssys/constellation.git
synced 2024-10-01 01:36:09 -04:00
docs: add observability page (#2384)
Co-authored-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> Co-authored-by: 3u13r <lc@edgeless.systems> Co-authored-by: Thomas Tendyck <51411342+thomasten@users.noreply.github.com>
This commit is contained in:
parent
e938cc5e63
commit
7c76592a08
78
docs/docs/architecture/observability.md
Normal file
78
docs/docs/architecture/observability.md
Normal file
@ -0,0 +1,78 @@
|
|||||||
|
# Observability
|
||||||
|
|
||||||
|
In Kubernetes, observability is the ability to gain insight into the behavior and performance of applications.
|
||||||
|
It helps identify and resolve issues more effectively, ensuring stability and performance of Kubernetes workloads, reducing downtime and outages, and improving efficiency.
|
||||||
|
The "three pillars of observability" are logs, metrics, and traces.
|
||||||
|
|
||||||
|
In the context of Confidential Computing, observability is a delicate subject and needs to be applied such that it doesn't leak any sensitive information.
|
||||||
|
The following gives an overview of where and how you can apply standard observability tools in Constellation.
|
||||||
|
|
||||||
|
## Cloud resource monitoring
|
||||||
|
|
||||||
|
While inaccessible, Constellation's nodes are still visible as black box VMs to the hypervisor.
|
||||||
|
Resource consumption, such as memory and CPU utilization, can be monitored from the outside and observed via the cloud platforms directly.
|
||||||
|
Similarly, other resources, such as storage and network and their respective metrics, are visible via the cloud platform.
|
||||||
|
|
||||||
|
## Metrics
|
||||||
|
|
||||||
|
Metrics are numeric representations of data measured over intervals of time. They're essential for understanding system health and gaining insights using telemetry signals.
|
||||||
|
|
||||||
|
By default, Constellation exposes the [metrics for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/) inside the cluster.
|
||||||
|
Similarly, the [etcd metrics](https://etcd.io/docs/v3.5/metrics/) endpoints are exposed inside the cluster.
|
||||||
|
These [metrics endpoints can be disabled](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#disabling-metrics).
|
||||||
|
|
||||||
|
You can collect these cluster-internal metrics via tools such as [Prometheus](https://prometheus.io/) or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||||
|
|
||||||
|
Constellation's CNI Cilium also supports [metrics via Prometheus endpoints](https://docs.cilium.io/en/latest/observability/metrics/).
|
||||||
|
However, in Constellation, they're disabled by default and must be enabled first.
|
||||||
|
|
||||||
|
## Logs
|
||||||
|
|
||||||
|
Logs represent discrete events that usually describe what's happening with your service.
|
||||||
|
The payload is an actual message emitted from your system along with a metadata section containing a timestamp, labels, and tracking identifiers.
|
||||||
|
|
||||||
|
### System logs
|
||||||
|
|
||||||
|
Constellation uses cloud logging for events occurring during the early stages of a node's boot process.
|
||||||
|
These logs include [Bootstrapper](./microservices.md#bootstrapper) events and [state disk UUIDs](../architecture/images.md#state-disk).
|
||||||
|
You can access the cloud logging [directly via the cloud provider endpoints](../workflows/troubleshooting.md#cloud-logging).
|
||||||
|
|
||||||
|
More detailed system-level logs are accessible via `/var/log` and [journald](https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html) on the nodes directly.
|
||||||
|
They can be collected from there, for example, via [Filebeat and Logstash](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html), which are tools of the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||||
|
|
||||||
|
In case of an error during the initialization, the CLI automatically collects the [Bootstrapper](./microservices.md#bootstrapper) logs and returns these as a file for [troubleshooting](../workflows/troubleshooting.md). Here is an example of such an event:
|
||||||
|
|
||||||
|
```shell-session
|
||||||
|
Cluster initialization failed. This error is not recoverable.
|
||||||
|
Terminate your cluster and try again.
|
||||||
|
Fetched bootstrapper logs are stored in "constellation-cluster.log"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Kubernetes logs
|
||||||
|
|
||||||
|
Constellation supports the [Kubernetes logging architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
|
||||||
|
By default, logs are written to the nodes' encrypted state disks.
|
||||||
|
These include the Pod and container logs and the [system component logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs).
|
||||||
|
|
||||||
|
[Constellation services](microservices.md) run as Pods inside the `kube-system` namespace and use the standard container logging mechanism.
|
||||||
|
The same applies for the [Cilium Pods](https://docs.cilium.io/en/latest/operations/troubleshooting/#logs).
|
||||||
|
|
||||||
|
You can collect logs from within the cluster via tools such as [Fluentd](https://github.com/fluent/fluentd), [Loki](https://github.com/grafana/loki), or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||||
|
|
||||||
|
## Traces
|
||||||
|
|
||||||
|
Modern systems are implemented as interconnected complex and distributed microservices. Understanding request flows and system communications is challenging, mainly because all systems in a chain need to be modified to propagate tracing information. Distributed tracing is a new approach to increasing observability and understanding performance bottlenecks. A trace represents consecutive events that reflect an end-to-end request path in a distributed system.
|
||||||
|
|
||||||
|
Constellation supports [traces for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-traces/).
|
||||||
|
By default, they're disabled and need to be enabled first.
|
||||||
|
|
||||||
|
Similarly, Cilium can be enabled to [export traces](https://cilium.io/use-cases/metrics-export/).
|
||||||
|
|
||||||
|
You can collect these traces via tools such as [Jaeger](https://www.jaegertracing.io/) or [Zipkin](https://zipkin.io/).
|
||||||
|
|
||||||
|
## Integrations
|
||||||
|
|
||||||
|
Platforms and SaaS solutions such as Datadog, logz.io, Dynatrace, or New Relic facilitate the observability challenge for Kubernetes and provide all-in-one SaaS solutions.
|
||||||
|
They install agents into the cluster that collect metrics, logs, and tracing information and upload them into the data lake of the platform.
|
||||||
|
Technically, the agent-based approach is compatible with Constellation, and attaching these platforms is straightforward.
|
||||||
|
However, you need to evaluate if the exported data might violate Constellation's compliance and privacy guarantees by uploading them to a third-party platform.
|
@ -22,3 +22,9 @@ You can learn more about [the images](images.md) and how verified boot ensures t
|
|||||||
## About key management and cryptographic primitives
|
## About key management and cryptographic primitives
|
||||||
|
|
||||||
Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
|
Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
|
||||||
|
|
||||||
|
## About observability
|
||||||
|
|
||||||
|
Observability in Kubernetes refers to the capability to swiftly troubleshoot issues using telemetry signals such as logs, metrics, and traces.
|
||||||
|
In the realm of Confidential Computing, it's crucial that observability aligns with confidentiality, necessitating careful implementation.
|
||||||
|
Learn more about the [observability capabilities in Constellation](./observability.md).
|
||||||
|
@ -254,6 +254,11 @@ const sidebars = {
|
|||||||
label: 'Networking',
|
label: 'Networking',
|
||||||
id: 'architecture/networking',
|
id: 'architecture/networking',
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
label: 'Observability',
|
||||||
|
id: 'architecture/observability',
|
||||||
|
},
|
||||||
],
|
],
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -15,11 +15,15 @@ Bootstrapper
|
|||||||
config
|
config
|
||||||
cyber
|
cyber
|
||||||
datacenter
|
datacenter
|
||||||
|
Datadog
|
||||||
deallocate
|
deallocate
|
||||||
Dockerfile
|
Dockerfile
|
||||||
|
Dynatrace
|
||||||
[Ee]mojivoto
|
[Ee]mojivoto
|
||||||
etcd
|
etcd
|
||||||
|
Filebeat
|
||||||
Filestore
|
Filestore
|
||||||
|
Fluentd
|
||||||
Fulcio
|
Fulcio
|
||||||
Mbps
|
Mbps
|
||||||
Gbps
|
Gbps
|
||||||
@ -30,12 +34,14 @@ iam
|
|||||||
IAM
|
IAM
|
||||||
iodepth
|
iodepth
|
||||||
initramfs
|
initramfs
|
||||||
|
journald
|
||||||
[Kk]3s
|
[Kk]3s
|
||||||
Kata
|
Kata
|
||||||
kubeadm
|
kubeadm
|
||||||
kubectl
|
kubectl
|
||||||
kubelet
|
kubelet
|
||||||
libcryptsetup
|
libcryptsetup
|
||||||
|
Logstash
|
||||||
MicroK8s
|
MicroK8s
|
||||||
[Mm]inikube
|
[Mm]inikube
|
||||||
namespace
|
namespace
|
||||||
@ -45,11 +51,13 @@ Rekor
|
|||||||
resizable
|
resizable
|
||||||
rollout
|
rollout
|
||||||
sigstore
|
sigstore
|
||||||
|
[Ss]uperset
|
||||||
Syft
|
Syft
|
||||||
systemd
|
systemd
|
||||||
[Uu]nencrypted
|
[Uu]nencrypted
|
||||||
unspoofable
|
unspoofable
|
||||||
updatable
|
updatable
|
||||||
|
UUID
|
||||||
proxied
|
proxied
|
||||||
QEMU
|
QEMU
|
||||||
virsh
|
virsh
|
||||||
@ -58,4 +66,4 @@ whitepaper
|
|||||||
WireGuard
|
WireGuard
|
||||||
Xeon
|
Xeon
|
||||||
xsltproc
|
xsltproc
|
||||||
[Ss]uperset
|
Zipkin
|
||||||
|
@ -0,0 +1,78 @@
|
|||||||
|
# Observability
|
||||||
|
|
||||||
|
In Kubernetes, observability is the ability to gain insight into the behavior and performance of applications.
|
||||||
|
It helps identify and resolve issues more effectively, ensuring stability and performance of Kubernetes workloads, reducing downtime and outages, and improving efficiency.
|
||||||
|
The "three pillars of observability" are logs, metrics, and traces.
|
||||||
|
|
||||||
|
In the context of Confidential Computing, observability is a delicate subject and needs to be applied such that it doesn't leak any sensitive information.
|
||||||
|
The following gives an overview of where and how you can apply standard observability tools in Constellation.
|
||||||
|
|
||||||
|
## Cloud resource monitoring
|
||||||
|
|
||||||
|
While inaccessible, Constellation's nodes are still visible as black box VMs to the hypervisor.
|
||||||
|
Resource consumption, such as memory and CPU utilization, can be monitored from the outside and observed via the cloud platforms directly.
|
||||||
|
Similarly, other resources, such as storage and network and their respective metrics, are visible via the cloud platform.
|
||||||
|
|
||||||
|
## Metrics
|
||||||
|
|
||||||
|
Metrics are numeric representations of data measured over intervals of time. They're essential for understanding system health and gaining insights using telemetry signals.
|
||||||
|
|
||||||
|
By default, Constellation exposes the [metrics for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/) inside the cluster.
|
||||||
|
Similarly, the [etcd metrics](https://etcd.io/docs/v3.5/metrics/) endpoints are exposed inside the cluster.
|
||||||
|
These [metrics endpoints can be disabled](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#disabling-metrics).
|
||||||
|
|
||||||
|
You can collect these cluster-internal metrics via tools such as [Prometheus](https://prometheus.io/) or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||||
|
|
||||||
|
Constellation's CNI Cilium also supports [metrics via Prometheus endpoints](https://docs.cilium.io/en/latest/observability/metrics/).
|
||||||
|
However, in Constellation, they're disabled by default and must be enabled first.
|
||||||
|
|
||||||
|
## Logs
|
||||||
|
|
||||||
|
Logs represent discrete events that usually describe what's happening with your service.
|
||||||
|
The payload is an actual message emitted from your system along with a metadata section containing a timestamp, labels, and tracking identifiers.
|
||||||
|
|
||||||
|
### System logs
|
||||||
|
|
||||||
|
Constellation uses cloud logging for events occurring during the early stages of a node's boot process.
|
||||||
|
These logs include [Bootstrapper](./microservices.md#bootstrapper) events and [state disk UUIDs](../architecture/images.md#state-disk).
|
||||||
|
You can access the cloud logging [directly via the cloud provider endpoints](../workflows/troubleshooting.md#cloud-logging).
|
||||||
|
|
||||||
|
More detailed system-level logs are accessible via `/var/log` and [journald](https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html) on the nodes directly.
|
||||||
|
They can be collected from there, for example, via [Filebeat and Logstash](https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html), which are tools of the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||||
|
|
||||||
|
In case of an error during the initialization, the CLI automatically collects the [Bootstrapper](./microservices.md#bootstrapper) logs and returns these as a file for [troubleshooting](../workflows/troubleshooting.md). Here is an example of such an event:
|
||||||
|
|
||||||
|
```shell-session
|
||||||
|
Cluster initialization failed. This error is not recoverable.
|
||||||
|
Terminate your cluster and try again.
|
||||||
|
Fetched bootstrapper logs are stored in "constellation-cluster.log"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Kubernetes logs
|
||||||
|
|
||||||
|
Constellation supports the [Kubernetes logging architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
|
||||||
|
By default, logs are written to the nodes' encrypted state disks.
|
||||||
|
These include the Pod and container logs and the [system component logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs).
|
||||||
|
|
||||||
|
[Constellation services](microservices.md) run as Pods inside the `kube-system` namespace and use the standard container logging mechanism.
|
||||||
|
The same applies for the [Cilium Pods](https://docs.cilium.io/en/latest/operations/troubleshooting/#logs).
|
||||||
|
|
||||||
|
You can collect logs from within the cluster via tools such as [Fluentd](https://github.com/fluent/fluentd), [Loki](https://github.com/grafana/loki), or the [Elastic Stack](https://www.elastic.co/de/elastic-stack/).
|
||||||
|
|
||||||
|
## Traces
|
||||||
|
|
||||||
|
Modern systems are implemented as interconnected complex and distributed microservices. Understanding request flows and system communications is challenging, mainly because all systems in a chain need to be modified to propagate tracing information. Distributed tracing is a new approach to increasing observability and understanding performance bottlenecks. A trace represents consecutive events that reflect an end-to-end request path in a distributed system.
|
||||||
|
|
||||||
|
Constellation supports [traces for Kubernetes system components](https://kubernetes.io/docs/concepts/cluster-administration/system-traces/).
|
||||||
|
By default, they're disabled and need to be enabled first.
|
||||||
|
|
||||||
|
Similarly, Cilium can be enabled to [export traces](https://cilium.io/use-cases/metrics-export/).
|
||||||
|
|
||||||
|
You can collect these traces via tools such as [Jaeger](https://www.jaegertracing.io/) or [Zipkin](https://zipkin.io/).
|
||||||
|
|
||||||
|
## Integrations
|
||||||
|
|
||||||
|
Platforms and SaaS solutions such as Datadog, logz.io, Dynatrace, or New Relic facilitate the observability challenge for Kubernetes and provide all-in-one SaaS solutions.
|
||||||
|
They install agents into the cluster that collect metrics, logs, and tracing information and upload them into the data lake of the platform.
|
||||||
|
Technically, the agent-based approach is compatible with Constellation, and attaching these platforms is straightforward.
|
||||||
|
However, you need to evaluate if the exported data might violate Constellation's compliance and privacy guarantees by uploading them to a third-party platform.
|
@ -22,3 +22,9 @@ You can learn more about [the images](images.md) and how verified boot ensures t
|
|||||||
## About key management and cryptographic primitives
|
## About key management and cryptographic primitives
|
||||||
|
|
||||||
Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
|
Encryption of data at-rest, in-transit, and in-use is the fundamental building block for confidential computing and Constellation. Learn more about the [keys and cryptographic primitives](keys.md) used in Constellation, [encrypted persistent storage](encrypted-storage.md), and [network encryption](networking.md).
|
||||||
|
|
||||||
|
## About observability
|
||||||
|
|
||||||
|
Observability in Kubernetes refers to the capability to swiftly troubleshoot issues using telemetry signals such as logs, metrics, and traces.
|
||||||
|
In the realm of Confidential Computing, it's crucial that observability aligns with confidentiality, necessitating careful implementation.
|
||||||
|
Learn more about the [observability capabilities in Constellation](./observability.md).
|
||||||
|
@ -233,6 +233,11 @@
|
|||||||
"type": "doc",
|
"type": "doc",
|
||||||
"label": "Networking",
|
"label": "Networking",
|
||||||
"id": "architecture/networking"
|
"id": "architecture/networking"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "doc",
|
||||||
|
"label": "Observability",
|
||||||
|
"id": "architecture/observability"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
Loading…
Reference in New Issue
Block a user