docs: add vault benchmark (#2271)
* Refactor benchmark structure * Add vault-benchmark section * update 2.10 docs Co-authored-by: Otto Bittner <cobittner@posteo.net> Co-authored-by: Thomas Tendyck <tt@edgeless.systems>
BIN
docs/docs/_media/benchmark_vault/5replicas/max_latency.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
docs/docs/_media/benchmark_vault/5replicas/mean_latency.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
docs/docs/_media/benchmark_vault/5replicas/min_latency.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
docs/docs/_media/benchmark_vault/5replicas/p99_latency.png
Normal file
After Width: | Height: | Size: 24 KiB |
102
docs/docs/overview/performance/application.md
Normal file
@ -0,0 +1,102 @@
|
||||
# Application benchmarks
|
||||
|
||||
## HashiCorp Vault
|
||||
|
||||
[HashiCorp Vault](https://www.vaultproject.io/) is a distributed secrets management software that can be deployed to Kubernetes.
|
||||
HashiCorp maintains a benchmarking tool for vault, [vault-benchmark](https://github.com/hashicorp/vault-benchmark/).
|
||||
Vault-benchmark generates load on a Vault deployment and measures response times.
|
||||
|
||||
This article describes the results from running vault-benchmark on Constellation, AKS, and GKE.
|
||||
You can find the setup for producing the data discussed in this article in the [vault-benchmarks](https://github.com/edgelesssys/vault-benchmarks) repository.
|
||||
|
||||
The Vault API used during benchmarking is the [transits secret engine](https://developer.hashicorp.com/vault/docs/secrets/transit).
|
||||
This allows services to send data to Vault for encryption, decryption, signing, and verification.
|
||||
|
||||
## Results
|
||||
|
||||
On each run, vault-benchmark sends requests and measures the latencies.
|
||||
The measured latencies are aggregated through various statistical features.
|
||||
After running the benchmark n times, the arithmetic mean over a subset of the reported statistics is calculated.
|
||||
The selected features are arithmetic mean, 99th percentile, minimum, and maximum.
|
||||
|
||||
Arithmetic mean gives a general sense of the latency on each target.
|
||||
The 99th percentile shows performance in (most likely) erroneous states.
|
||||
Minimum and maximum mark the range within which latency varies each run.
|
||||
|
||||
The benchmark was configured with 1300 workers and 10 seconds per run.
|
||||
Those numbers were chosen empirically.
|
||||
The latency was stabilizing at 10 seconds runtime, not changing with further increase.
|
||||
Increasing the number of workers beyond 1300 leads to request failures, marking the limit Vault was able to handle in this setup.
|
||||
All results are based on 100 runs.
|
||||
|
||||
The following data was generated while running five replicas, one primary, and four standby nodes.
|
||||
All numbers are in seconds if not indicated otherwise.
|
||||
```
|
||||
========== Results AKS ==========
|
||||
Mean: mean: 1.632200, variance: 0.002057
|
||||
P99: mean: 5.480679, variance: 2.263700
|
||||
Max: mean: 6.651001, variance: 2.808401
|
||||
Min: mean: 0.011415, variance: 0.000133
|
||||
========== Results GKE ==========
|
||||
Mean: mean: 1.656435, variance: 0.003615
|
||||
P99: mean: 6.030807, variance: 3.955051
|
||||
Max: mean: 7.164843, variance: 3.300004
|
||||
Min: mean: 0.010233, variance: 0.000111
|
||||
========== Results C11n ==========
|
||||
Mean: mean: 1.651549, variance: 0.001610
|
||||
P99: mean: 5.780422, variance: 3.016106
|
||||
Max: mean: 6.942997, variance: 3.075796
|
||||
Min: mean: 0.013774, variance: 0.000228
|
||||
========== AKS vs C11n ==========
|
||||
Mean: +1.171577 % (AKS is faster)
|
||||
P99: +5.185495 % (AKS is faster)
|
||||
Max: +4.205618 % (AKS is faster)
|
||||
Min: +17.128781 % (AKS is faster)
|
||||
========== GKE vs C11n ==========
|
||||
Mean: -0.295851 % (GKE is slower)
|
||||
P99: -4.331603 % (GKE is slower)
|
||||
Max: -3.195248 % (GKE is slower)
|
||||
Min: +25.710886 % (GKE is faster)
|
||||
```
|
||||
|
||||
**Interpretation**: Latencies are all within ~5% of each other.
|
||||
AKS performs slightly better than GKE and Constellation (C11n) in all cases except minimum latency.
|
||||
Minimum latency is the lowest for GKE.
|
||||
Compared to GKE, Constellation had slightly lower peak latencies (99th percentile and maximum), indicating that Constellation could have handled slightly more concurrent accesses than GKE.
|
||||
Overall, performance is at comparable levels across all three distributions.
|
||||
Based on these numbers, you can use a similarly sized Constellation cluster to run your existing Vault deployment.
|
||||
|
||||
### Visualization
|
||||
|
||||
The following plots visualize the data presented above as [box plots](https://en.wikipedia.org/wiki/Box_plot).
|
||||
The whiskers denote the minimum and maximum.
|
||||
The box stretches from the 25th to the 75th percentile, with the dividing bar marking the 50th percentile.
|
||||
The circles outside the whiskers denote outliers.
|
||||
|
||||
<details>
|
||||
<summary>Mean Latency</summary>
|
||||
|
||||
![Mean Latency](../../_media/benchmark_vault/5replicas/mean_latency.png)
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>99th Percentile Latency</summary>
|
||||
|
||||
![99th Percentile Latency](../../_media/benchmark_vault/5replicas/p99_latency.png)
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Maximum Latency</summary>
|
||||
|
||||
![Maximum Latency](../../_media/benchmark_vault/5replicas/max_latency.png)
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Minimum Latency</summary>
|
||||
|
||||
![Minimum Latency](../../_media/benchmark_vault/5replicas/min_latency.png)
|
||||
|
||||
</details>
|
@ -1,24 +1,12 @@
|
||||
# Performance
|
||||
|
||||
This section analyzes the performance of Constellation.
|
||||
|
||||
## Performance impact from runtime encryption
|
||||
|
||||
All nodes in a Constellation cluster run inside Confidential VMs (CVMs). Thus, Constellation's performance is directly affected by the performance of CVMs.
|
||||
|
||||
AMD and Azure jointly released a [performance benchmark](https://community.amd.com/t5/business/microsoft-azure-confidential-computing-powered-by-3rd-gen-epyc/ba-p/497796) for CVMs based on 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With a range of mostly compute-intensive benchmarks like SPEC CPU 2017 and CoreMark, they found that CVMs only have a small (2%--8%) performance degradation compared to standard VMs. You can expect to see similar performance for compute-intensive workloads running with Constellation on Azure.
|
||||
|
||||
Similarly, AMD and Google jointly released a [performance benchmark](https://www.amd.com/system/files/documents/3rd-gen-epyc-gcp-c2d-conf-compute-perf-brief.pdf) for CVMs based on 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With high-performance computing workloads like WRF, NAMD, Ansys CFS, and Ansys LS_DYNA, they found similar results with only small (2%--4%) performance degradation compared to standard VMs. You can expect to see similar performance for compute-intensive workloads running with Constellation on GCP.
|
||||
|
||||
## Performance impact from storage and network
|
||||
# I/O performance benchmarks
|
||||
|
||||
To assess the overall performance of Constellation, this benchmark evaluates Constellation v2.6.0 in terms of storage I/O using [`fio`](https://fio.readthedocs.io/en/latest/fio_doc.html) and network performance using the [Kubernetes Network Benchmark](https://github.com/InfraBuilder/k8s-bench-suite#knb--kubernetes-network-be).
|
||||
|
||||
This benchmark tested Constellation on Azure and GCP and compared the results against the managed Kubernetes offerings AKS and GKE.
|
||||
|
||||
### Configurations
|
||||
## Configurations
|
||||
|
||||
#### Constellation
|
||||
### Constellation
|
||||
|
||||
The benchmark was conducted with Constellation v2.6.0, Kubernetes v1.25.7, and Cilium v1.12.
|
||||
It ran on the following infrastructure configurations.
|
||||
@ -38,7 +26,7 @@ Constellation on GCP:
|
||||
- CVM: `true`
|
||||
- Zone: `europe-west3-b`
|
||||
|
||||
#### AKS
|
||||
### AKS
|
||||
|
||||
On AKS, the benchmark used Kubernetes `v1.24.9` and nodes with version `AKSUbuntu-1804gen2containerd-2023.02.15`.
|
||||
AKS ran with the [`kubenet`](https://learn.microsoft.com/en-us/azure/aks/concepts-network#kubenet-basic-networking) CNI and the [default CSI driver](https://learn.microsoft.com/en-us/azure/aks/azure-disk-csi) for Azure Disk.
|
||||
@ -51,7 +39,7 @@ The following infrastructure configurations was used:
|
||||
- Region: `West US`
|
||||
- Zone: `2`
|
||||
|
||||
#### GKE
|
||||
### GKE
|
||||
|
||||
On GKE, the benchmark used Kubernetes `v1.24.9` and nodes with version `1.24.9-gke.3200`.
|
||||
GKE ran with the [`kubenet`](https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview) CNI and the [default CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver) for Compute Engine persistent disk.
|
||||
@ -63,9 +51,9 @@ The following infrastructure configurations was used:
|
||||
- CVM: `false`
|
||||
- Zone: `europe-west3-b`
|
||||
|
||||
### Results
|
||||
## Results
|
||||
|
||||
#### Network
|
||||
### Network
|
||||
|
||||
This section gives a thorough analysis of the network performance of Constellation, specifically focusing on measuring TCP and UDP bandwidth.
|
||||
The benchmark measured the bandwidth of pod-to-pod and pod-to-service connections between two different nodes using [`iperf`](https://iperf.fr/).
|
||||
@ -80,10 +68,10 @@ Constellation on GCP used an MTU of 8896. GKE used an MTU of 1450.
|
||||
|
||||
The difference in network bandwidth can largely be attributed to two factors.
|
||||
|
||||
- Constellation's [network encryption](../architecture/networking.md) via Cilium and WireGuard, which protects data in-transit.
|
||||
- Constellation's [network encryption](../../architecture/networking.md) via Cilium and WireGuard, which protects data in-transit.
|
||||
- [AMD SEV using SWIOTLB bounce buffers](https://lore.kernel.org/all/20200204193500.GA15564@ashkalra_ubuntu_server/T/) for all DMA including network I/O.
|
||||
|
||||
##### Pod-to-Pod
|
||||
#### Pod-to-Pod
|
||||
|
||||
In this scenario, the client Pod connects directly to the server pod via its IP address.
|
||||
|
||||
@ -100,13 +88,13 @@ flowchart LR
|
||||
|
||||
The results for "Pod-to-Pod" on Azure are as follows:
|
||||
|
||||
![Network Pod2Pod Azure benchmark graph](../_media/benchmark_net_p2p_azure.png)
|
||||
![Network Pod2Pod Azure benchmark graph](../../_media/benchmark_net_p2p_azure.png)
|
||||
|
||||
The results for "Pod-to-Pod" on GCP are as follows:
|
||||
|
||||
![Network Pod2Pod GCP benchmark graph](../_media/benchmark_net_p2p_gcp.png)
|
||||
![Network Pod2Pod GCP benchmark graph](../../_media/benchmark_net_p2p_gcp.png)
|
||||
|
||||
##### Pod-to-Service
|
||||
#### Pod-to-Service
|
||||
|
||||
In this scenario, the client Pod connects to the server Pod via a ClusterIP service. This is more relevant to real-world use cases.
|
||||
|
||||
@ -123,21 +111,21 @@ flowchart LR
|
||||
|
||||
The results for "Pod-to-Pod" on Azure are as follows:
|
||||
|
||||
![Network Pod2SVC Azure benchmark graph](../_media/benchmark_net_p2svc_azure.png)
|
||||
![Network Pod2SVC Azure benchmark graph](../../_media/benchmark_net_p2svc_azure.png)
|
||||
|
||||
The results for "Pod-to-Pod" on GCP are as follows:
|
||||
|
||||
![Network Pod2SVC GCP benchmark graph](../_media/benchmark_net_p2svc_gcp.png)
|
||||
![Network Pod2SVC GCP benchmark graph](../../_media/benchmark_net_p2svc_gcp.png)
|
||||
|
||||
In our recent comparison of Constellation on GCP with GKE, Constellation has 58% less TCP bandwidth. However, UDP bandwidth was slightly better with Constellation, thanks to its higher MTU.
|
||||
|
||||
Similarly, when comparing Constellation on Azure with AKS using CVMs, Constellation achieved approximately 10% less TCP and 40% less UDP bandwidth.
|
||||
|
||||
#### Storage I/O
|
||||
### Storage I/O
|
||||
|
||||
Azure and GCP offer persistent storage for their Kubernetes services AKS and GKE via the Container Storage Interface (CSI). CSI storage in Kubernetes is available via `PersistentVolumes` (PV) and consumed via `PersistentVolumeClaims` (PVC).
|
||||
Upon requesting persistent storage through a PVC, GKE and AKS will provision a PV as defined by a default [storage class](https://kubernetes.io/docs/concepts/storage/storage-classes/).
|
||||
Constellation provides persistent storage on Azure and GCP [that's encrypted on the CSI layer](../architecture/encrypted-storage.md).
|
||||
Constellation provides persistent storage on Azure and GCP [that's encrypted on the CSI layer](../../architecture/encrypted-storage.md).
|
||||
Similarly, upon a PVC request, Constellation will provision a PV via a default storage class.
|
||||
|
||||
For Constellation on Azure and AKS, the benchmark ran with Azure Disk storage [Standard SSD](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types#standard-ssds) of 400 GiB size.
|
||||
@ -185,19 +173,19 @@ For more details, see the [`fio` test configuration](https://github.com/edgeless
|
||||
|
||||
The results for IOPS on Azure are as follows:
|
||||
|
||||
![I/O IOPS Azure benchmark graph](../_media/benchmark_fio_azure_iops.png)
|
||||
![I/O IOPS Azure benchmark graph](../../_media/benchmark_fio_azure_iops.png)
|
||||
|
||||
The results for IOPS on GCP are as follows:
|
||||
|
||||
![I/O IOPS GCP benchmark graph](../_media/benchmark_fio_gcp_iops.png)
|
||||
![I/O IOPS GCP benchmark graph](../../_media/benchmark_fio_gcp_iops.png)
|
||||
|
||||
The results for bandwidth on Azure are as follows:
|
||||
|
||||
![I/O bandwidth Azure benchmark graph](../_media/benchmark_fio_azure_bw.png)
|
||||
![I/O bandwidth Azure benchmark graph](../../_media/benchmark_fio_azure_bw.png)
|
||||
|
||||
The results for bandwidth on GCP are as follows:
|
||||
|
||||
![I/O bandwidth GCP benchmark graph](../_media/benchmark_fio_gcp_bw.png)
|
||||
![I/O bandwidth GCP benchmark graph](../../_media/benchmark_fio_gcp_bw.png)
|
||||
|
||||
On GCP, the results exceed the maximum performance guarantees of the chosen disk type. There are two possible explanations for this. The first is that there may be cloud caching in place that isn't configurable. Alternatively, the underlying provisioned disk size may be larger than what was requested, resulting in higher performance boundaries.
|
||||
|
||||
@ -205,8 +193,12 @@ When comparing Constellation on GCP with GKE, Constellation has similar bandwidt
|
||||
|
||||
## Conclusion
|
||||
|
||||
Despite the added [security benefits](./security-benefits.md) that Constellation provides, it only incurs a slight performance overhead when compared to managed Kubernetes offerings such as AKS and GKE. In most compute benchmarks, Constellation is on par, and while it may be slightly slower in certain I/O scenarios due to network and storage encryption, we're confident that we can reduce this overhead to single digits.
|
||||
Despite the added [security benefits](../security-benefits.md) that Constellation provides, it only incurs a slight performance overhead when compared to managed Kubernetes offerings such as AKS and GKE. In most compute benchmarks, Constellation is on par with it's alternatives.
|
||||
While it may be slightly slower in certain I/O scenarios due to network and storage encryption, there is ongoing work to reduce this overhead to single digits.
|
||||
|
||||
For instance, storage encryption only adds between 10% to 15% overhead in terms of bandwidth and IOPS. Meanwhile, the biggest performance impact that Constellation currently faces is network encryption, which can incur up to 58% overhead on a 10 Gbps network. However, the Cilium team has conducted [benchmarks with Cilium using WireGuard encryption](https://docs.cilium.io/en/latest/operations/performance/benchmark/#encryption-wireguard-ipsec) on a 100 Gbps network that yielded over 15 Gbps, and we're confident that we can provide a similar level of performance with Constellation in our upcoming releases.
|
||||
For instance, storage encryption only adds between 10% to 15% overhead in terms of bandwidth and IOPS.
|
||||
Meanwhile, the biggest performance impact that Constellation currently faces is network encryption, which can incur up to 58% overhead on a 10 Gbps network.
|
||||
However, the Cilium team has conducted [benchmarks with Cilium using WireGuard encryption](https://docs.cilium.io/en/latest/operations/performance/benchmark/#encryption-wireguard-ipsec) on a 100 Gbps network that yielded over 15 Gbps.
|
||||
We're confident that Constellation will provide a similar level of performance with an upcoming release.
|
||||
|
||||
Overall, Constellation strikes a great balance between security and performance, and we're continuously working to improve its performance capabilities while maintaining its high level of security.
|
25
docs/docs/overview/performance/performance.md
Normal file
@ -0,0 +1,25 @@
|
||||
# Performance analysis of Constellation
|
||||
|
||||
This section provides a comprehensive examination of the performance characteristics of Constellation, encompassing various aspects, including runtime encryption, I/O benchmarks, and real-world applications.
|
||||
|
||||
## Impact of runtime encryption on performance
|
||||
|
||||
All nodes in a Constellation cluster are executed inside Confidential VMs (CVMs). Consequently, the performance of Constellation is inherently linked to the performance of these CVMs.
|
||||
|
||||
### AMD and Azure benchmarking
|
||||
|
||||
AMD and Azure have collectively released a [performance benchmark](https://community.amd.com/t5/business/microsoft-azure-confidential-computing-powered-by-3rd-gen-epyc/ba-p/497796) for CVMs that utilize 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. This benchmark, which included a variety of mostly compute-intensive tests such as SPEC CPU 2017 and CoreMark, demonstrated that CVMs experience only minor performance degradation (ranging from 2% to 8%) when compared to standard VMs. Such results are indicative of the performance that can be expected from compute-intensive workloads running with Constellation on Azure.
|
||||
|
||||
### AMD and Google benchmarking
|
||||
|
||||
Similarly, AMD and Google have jointly released a [performance benchmark](https://www.amd.com/system/files/documents/3rd-gen-epyc-gcp-c2d-conf-compute-perf-brief.pdf) for CVMs employing 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With high-performance computing workloads such as WRF, NAMD, Ansys CFS, and Ansys LS_DYNA, they observed analogous findings, with only minor performance degradation (between 2% and 4%) compared to standard VMs. These outcomes are reflective of the performance that can be expected for compute-intensive workloads running with Constellation on GCP.
|
||||
|
||||
## I/O performance benchmarks
|
||||
|
||||
We evaluated the [I/O performance](io.md) of Constellation, utilizing a collection of synthetic benchmarks targeting networking and storage.
|
||||
We further compared this performance to native managed Kubernetes offerings from various cloud providers, to better understand how Constellation stands in relation to standard practices.
|
||||
|
||||
## Application benchmarking
|
||||
|
||||
To gauge Constellation's applicability to well-known applications, we performed a [benchmark of HashiCorp Vault](application.md) running on Constellation.
|
||||
The results were then compared to deployments on the managed Kubernetes offerings from different cloud providers, providing a tangible perspective on Constellation's performance in actual deployment scenarios.
|
@ -51,9 +51,21 @@ const sidebars = {
|
||||
id: 'overview/clouds',
|
||||
},
|
||||
{
|
||||
type: 'doc',
|
||||
type: 'category',
|
||||
label: 'Performance',
|
||||
id: 'overview/performance',
|
||||
link: { type: 'doc', id: 'overview/performance/performance' },
|
||||
items: [
|
||||
{
|
||||
type: 'doc',
|
||||
label: 'I/O benchmarks',
|
||||
id: 'overview/performance/io',
|
||||
},
|
||||
{
|
||||
type: 'doc',
|
||||
label: 'Application benchmarks',
|
||||
id: 'overview/performance/application',
|
||||
},
|
||||
]
|
||||
},
|
||||
{
|
||||
type: 'doc',
|
||||
|
After Width: | Height: | Size: 21 KiB |
After Width: | Height: | Size: 18 KiB |
After Width: | Height: | Size: 21 KiB |
After Width: | Height: | Size: 24 KiB |
@ -0,0 +1,102 @@
|
||||
# Application benchmarks
|
||||
|
||||
## HashiCorp Vault
|
||||
|
||||
[HashiCorp Vault](https://www.vaultproject.io/) is a distributed secrets management software that can be deployed to Kubernetes.
|
||||
HashiCorp maintains a benchmarking tool for vault, [vault-benchmark](https://github.com/hashicorp/vault-benchmark/).
|
||||
Vault-benchmark generates load on a Vault deployment and measures response times.
|
||||
|
||||
This article describes the results from running vault-benchmark on Constellation, AKS, and GKE.
|
||||
You can find the setup for producing the data discussed in this article in the [vault-benchmarks](https://github.com/edgelesssys/vault-benchmarks) repository.
|
||||
|
||||
The Vault API used during benchmarking is the [transits secret engine](https://developer.hashicorp.com/vault/docs/secrets/transit).
|
||||
This allows services to send data to Vault for encryption, decryption, signing, and verification.
|
||||
|
||||
## Results
|
||||
|
||||
On each run, vault-benchmark sends requests and measures the latencies.
|
||||
The measured latencies are aggregated through various statistical features.
|
||||
After running the benchmark n times, the arithmetic mean over a subset of the reported statistics is calculated.
|
||||
The selected features are arithmetic mean, 99th percentile, minimum, and maximum.
|
||||
|
||||
Arithmetic mean gives a general sense of the latency on each target.
|
||||
The 99th percentile shows performance in (most likely) erroneous states.
|
||||
Minimum and maximum mark the range within which latency varies each run.
|
||||
|
||||
The benchmark was configured with 1300 workers and 10 seconds per run.
|
||||
Those numbers were chosen empirically.
|
||||
The latency was stabilizing at 10 seconds runtime, not changing with further increase.
|
||||
Increasing the number of workers beyond 1300 leads to request failures marking the limit Vault was able to handle in our setup.
|
||||
All results are based on 100 runs.
|
||||
|
||||
The following data was generated while running five replicas, one primary, and four standby nodes.
|
||||
All numbers are in seconds if not indicated otherwise.
|
||||
```
|
||||
========== Results AKS ==========
|
||||
Mean: mean: 1.632200, variance: 0.002057
|
||||
P99: mean: 5.480679, variance: 2.263700
|
||||
Max: mean: 6.651001, variance: 2.808401
|
||||
Min: mean: 0.011415, variance: 0.000133
|
||||
========== Results GKE ==========
|
||||
Mean: mean: 1.656435, variance: 0.003615
|
||||
P99: mean: 6.030807, variance: 3.955051
|
||||
Max: mean: 7.164843, variance: 3.300004
|
||||
Min: mean: 0.010233, variance: 0.000111
|
||||
========== Results C11n ==========
|
||||
Mean: mean: 1.651549, variance: 0.001610
|
||||
P99: mean: 5.780422, variance: 3.016106
|
||||
Max: mean: 6.942997, variance: 3.075796
|
||||
Min: mean: 0.013774, variance: 0.000228
|
||||
========== AKS vs C11n ==========
|
||||
Mean: +1.171577 % (AKS is faster)
|
||||
P99: +5.185495 % (AKS is faster)
|
||||
Max: +4.205618 % (AKS is faster)
|
||||
Min: +17.128781 % (AKS is faster)
|
||||
========== GKE vs C11n ==========
|
||||
Mean: -0.295851 % (GKE is slower)
|
||||
P99: -4.331603 % (GKE is slower)
|
||||
Max: -3.195248 % (GKE is slower)
|
||||
Min: +25.710886 % (GKE is faster)
|
||||
```
|
||||
|
||||
**Interpretation**: Latencies are all within ~5% of each other.
|
||||
AKS performs slightly better than GKE and Constellation (C11n) in all cases except minimum latency.
|
||||
Minimum latency is the lowest for GKE.
|
||||
Compared to GKE, Constellation had slightly lower peak latencies (99th percentile and maximum), indicating that Constellation could have handled slightly more concurrent accesses than GKE.
|
||||
Overall, performance is at comparable levels across all three distributions.
|
||||
Based on these numbers, you can use a similarly sized Constellation cluster to run your existing Vault deployment.
|
||||
|
||||
### Visualization
|
||||
|
||||
The following plots visualize the data presented above as [box plots](https://en.wikipedia.org/wiki/Box_plot).
|
||||
The whiskers denote the minimum and maximum.
|
||||
The box stretches from the 25th to the 75th percentile, with the dividing bar marking the 50th percentile.
|
||||
The circles outside the whiskers denote outliers.
|
||||
|
||||
<details>
|
||||
<summary>Mean Latency</summary>
|
||||
|
||||
![Mean Latency](../../_media/benchmark_vault/5replicas/mean_latency.png)
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>99th Percentile Latency</summary>
|
||||
|
||||
![99th Percentile Latency](../../_media/benchmark_vault/5replicas/p99_latency.png)
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Maximum Latency</summary>
|
||||
|
||||
![Maximum Latency](../../_media/benchmark_vault/5replicas/max_latency.png)
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Minimum Latency</summary>
|
||||
|
||||
![Minimum Latency](../../_media/benchmark_vault/5replicas/min_latency.png)
|
||||
|
||||
</details>
|
@ -1,24 +1,12 @@
|
||||
# Performance
|
||||
|
||||
This section analyzes the performance of Constellation.
|
||||
|
||||
## Performance impact from runtime encryption
|
||||
|
||||
All nodes in a Constellation cluster run inside Confidential VMs (CVMs). Thus, Constellation's performance is directly affected by the performance of CVMs.
|
||||
|
||||
AMD and Azure jointly released a [performance benchmark](https://community.amd.com/t5/business/microsoft-azure-confidential-computing-powered-by-3rd-gen-epyc/ba-p/497796) for CVMs based on 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With a range of mostly compute-intensive benchmarks like SPEC CPU 2017 and CoreMark, they found that CVMs only have a small (2%--8%) performance degradation compared to standard VMs. You can expect to see similar performance for compute-intensive workloads running with Constellation on Azure.
|
||||
|
||||
Similarly, AMD and Google jointly released a [performance benchmark](https://www.amd.com/system/files/documents/3rd-gen-epyc-gcp-c2d-conf-compute-perf-brief.pdf) for CVMs based on 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With high-performance computing workloads like WRF, NAMD, Ansys CFS, and Ansys LS_DYNA, they found similar results with only small (2%--4%) performance degradation compared to standard VMs. You can expect to see similar performance for compute-intensive workloads running with Constellation on GCP.
|
||||
|
||||
## Performance impact from storage and network
|
||||
# I/O performance benchmarks
|
||||
|
||||
To assess the overall performance of Constellation, this benchmark evaluates Constellation v2.6.0 in terms of storage I/O using [`fio`](https://fio.readthedocs.io/en/latest/fio_doc.html) and network performance using the [Kubernetes Network Benchmark](https://github.com/InfraBuilder/k8s-bench-suite#knb--kubernetes-network-be).
|
||||
|
||||
This benchmark tested Constellation on Azure and GCP and compared the results against the managed Kubernetes offerings AKS and GKE.
|
||||
|
||||
### Configurations
|
||||
## Configurations
|
||||
|
||||
#### Constellation
|
||||
### Constellation
|
||||
|
||||
The benchmark was conducted with Constellation v2.6.0, Kubernetes v1.25.7, and Cilium v1.12.
|
||||
It ran on the following infrastructure configurations.
|
||||
@ -38,7 +26,7 @@ Constellation on GCP:
|
||||
- CVM: `true`
|
||||
- Zone: `europe-west3-b`
|
||||
|
||||
#### AKS
|
||||
### AKS
|
||||
|
||||
On AKS, the benchmark used Kubernetes `v1.24.9` and nodes with version `AKSUbuntu-1804gen2containerd-2023.02.15`.
|
||||
AKS ran with the [`kubenet`](https://learn.microsoft.com/en-us/azure/aks/concepts-network#kubenet-basic-networking) CNI and the [default CSI driver](https://learn.microsoft.com/en-us/azure/aks/azure-disk-csi) for Azure Disk.
|
||||
@ -51,7 +39,7 @@ The following infrastructure configurations was used:
|
||||
- Region: `West US`
|
||||
- Zone: `2`
|
||||
|
||||
#### GKE
|
||||
### GKE
|
||||
|
||||
On GKE, the benchmark used Kubernetes `v1.24.9` and nodes with version `1.24.9-gke.3200`.
|
||||
GKE ran with the [`kubenet`](https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview) CNI and the [default CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver) for Compute Engine persistent disk.
|
||||
@ -63,9 +51,9 @@ The following infrastructure configurations was used:
|
||||
- CVM: `false`
|
||||
- Zone: `europe-west3-b`
|
||||
|
||||
### Results
|
||||
## Results
|
||||
|
||||
#### Network
|
||||
### Network
|
||||
|
||||
This section gives a thorough analysis of the network performance of Constellation, specifically focusing on measuring TCP and UDP bandwidth.
|
||||
The benchmark measured the bandwidth of pod-to-pod and pod-to-service connections between two different nodes using [`iperf`](https://iperf.fr/).
|
||||
@ -80,10 +68,10 @@ Constellation on GCP used an MTU of 8896. GKE used an MTU of 1450.
|
||||
|
||||
The difference in network bandwidth can largely be attributed to two factors.
|
||||
|
||||
- Constellation's [network encryption](../architecture/networking.md) via Cilium and WireGuard, which protects data in-transit.
|
||||
- Constellation's [network encryption](../../architecture/networking.md) via Cilium and WireGuard, which protects data in-transit.
|
||||
- [AMD SEV using SWIOTLB bounce buffers](https://lore.kernel.org/all/20200204193500.GA15564@ashkalra_ubuntu_server/T/) for all DMA including network I/O.
|
||||
|
||||
##### Pod-to-Pod
|
||||
#### Pod-to-Pod
|
||||
|
||||
In this scenario, the client Pod connects directly to the server pod via its IP address.
|
||||
|
||||
@ -100,13 +88,13 @@ flowchart LR
|
||||
|
||||
The results for "Pod-to-Pod" on Azure are as follows:
|
||||
|
||||
![Network Pod2Pod Azure benchmark graph](../_media/benchmark_net_p2p_azure.png)
|
||||
![Network Pod2Pod Azure benchmark graph](../../_media/benchmark_net_p2p_azure.png)
|
||||
|
||||
The results for "Pod-to-Pod" on GCP are as follows:
|
||||
|
||||
![Network Pod2Pod GCP benchmark graph](../_media/benchmark_net_p2p_gcp.png)
|
||||
![Network Pod2Pod GCP benchmark graph](../../_media/benchmark_net_p2p_gcp.png)
|
||||
|
||||
##### Pod-to-Service
|
||||
#### Pod-to-Service
|
||||
|
||||
In this scenario, the client Pod connects to the server Pod via a ClusterIP service. This is more relevant to real-world use cases.
|
||||
|
||||
@ -123,21 +111,21 @@ flowchart LR
|
||||
|
||||
The results for "Pod-to-Pod" on Azure are as follows:
|
||||
|
||||
![Network Pod2SVC Azure benchmark graph](../_media/benchmark_net_p2svc_azure.png)
|
||||
![Network Pod2SVC Azure benchmark graph](../../_media/benchmark_net_p2svc_azure.png)
|
||||
|
||||
The results for "Pod-to-Pod" on GCP are as follows:
|
||||
|
||||
![Network Pod2SVC GCP benchmark graph](../_media/benchmark_net_p2svc_gcp.png)
|
||||
![Network Pod2SVC GCP benchmark graph](../../_media/benchmark_net_p2svc_gcp.png)
|
||||
|
||||
In our recent comparison of Constellation on GCP with GKE, Constellation has 58% less TCP bandwidth. However, UDP bandwidth was slightly better with Constellation, thanks to its higher MTU.
|
||||
|
||||
Similarly, when comparing Constellation on Azure with AKS using CVMs, Constellation achieved approximately 10% less TCP and 40% less UDP bandwidth.
|
||||
|
||||
#### Storage I/O
|
||||
### Storage I/O
|
||||
|
||||
Azure and GCP offer persistent storage for their Kubernetes services AKS and GKE via the Container Storage Interface (CSI). CSI storage in Kubernetes is available via `PersistentVolumes` (PV) and consumed via `PersistentVolumeClaims` (PVC).
|
||||
Upon requesting persistent storage through a PVC, GKE and AKS will provision a PV as defined by a default [storage class](https://kubernetes.io/docs/concepts/storage/storage-classes/).
|
||||
Constellation provides persistent storage on Azure and GCP [that's encrypted on the CSI layer](../architecture/encrypted-storage.md).
|
||||
Constellation provides persistent storage on Azure and GCP [that's encrypted on the CSI layer](../../architecture/encrypted-storage.md).
|
||||
Similarly, upon a PVC request, Constellation will provision a PV via a default storage class.
|
||||
|
||||
For Constellation on Azure and AKS, the benchmark ran with Azure Disk storage [Standard SSD](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types#standard-ssds) of 400 GiB size.
|
||||
@ -185,19 +173,19 @@ For more details, see the [`fio` test configuration](https://github.com/edgeless
|
||||
|
||||
The results for IOPS on Azure are as follows:
|
||||
|
||||
![I/O IOPS Azure benchmark graph](../_media/benchmark_fio_azure_iops.png)
|
||||
![I/O IOPS Azure benchmark graph](../../_media/benchmark_fio_azure_iops.png)
|
||||
|
||||
The results for IOPS on GCP are as follows:
|
||||
|
||||
![I/O IOPS GCP benchmark graph](../_media/benchmark_fio_gcp_iops.png)
|
||||
![I/O IOPS GCP benchmark graph](../../_media/benchmark_fio_gcp_iops.png)
|
||||
|
||||
The results for bandwidth on Azure are as follows:
|
||||
|
||||
![I/O bandwidth Azure benchmark graph](../_media/benchmark_fio_azure_bw.png)
|
||||
![I/O bandwidth Azure benchmark graph](../../_media/benchmark_fio_azure_bw.png)
|
||||
|
||||
The results for bandwidth on GCP are as follows:
|
||||
|
||||
![I/O bandwidth GCP benchmark graph](../_media/benchmark_fio_gcp_bw.png)
|
||||
![I/O bandwidth GCP benchmark graph](../../_media/benchmark_fio_gcp_bw.png)
|
||||
|
||||
On GCP, the results exceed the maximum performance guarantees of the chosen disk type. There are two possible explanations for this. The first is that there may be cloud caching in place that isn't configurable. Alternatively, the underlying provisioned disk size may be larger than what was requested, resulting in higher performance boundaries.
|
||||
|
||||
@ -205,7 +193,7 @@ When comparing Constellation on GCP with GKE, Constellation has similar bandwidt
|
||||
|
||||
## Conclusion
|
||||
|
||||
Despite the added [security benefits](./security-benefits.md) that Constellation provides, it only incurs a slight performance overhead when compared to managed Kubernetes offerings such as AKS and GKE. In most compute benchmarks, Constellation is on par, and while it may be slightly slower in certain I/O scenarios due to network and storage encryption, we're confident that we can reduce this overhead to single digits.
|
||||
Despite the added [security benefits](../security-benefits.md) that Constellation provides, it only incurs a slight performance overhead when compared to managed Kubernetes offerings such as AKS and GKE. In most compute benchmarks, Constellation is on par, and while it may be slightly slower in certain I/O scenarios due to network and storage encryption, we're confident that we can reduce this overhead to single digits.
|
||||
|
||||
For instance, storage encryption only adds between 10% to 15% overhead in terms of bandwidth and IOPS. Meanwhile, the biggest performance impact that Constellation currently faces is network encryption, which can incur up to 58% overhead on a 10 Gbps network. However, the Cilium team has conducted [benchmarks with Cilium using WireGuard encryption](https://docs.cilium.io/en/latest/operations/performance/benchmark/#encryption-wireguard-ipsec) on a 100 Gbps network that yielded over 15 Gbps, and we're confident that we can provide a similar level of performance with Constellation in our upcoming releases.
|
||||
|
@ -0,0 +1,23 @@
|
||||
# Performance analysis of Constellation
|
||||
|
||||
This section provides a comprehensive examination of the performance characteristics of Constellation, encompassing various aspects, including runtime encryption, I/O benchmarks, and real-world applications.
|
||||
|
||||
## Impact of runtime encryption on performance
|
||||
|
||||
All nodes in a Constellation cluster are executed inside Confidential VMs (CVMs). Consequently, the performance of Constellation is inherently linked to the performance of these CVMs.
|
||||
|
||||
### AMD and Azure benchmarking
|
||||
|
||||
AMD and Azure have collectively released a [performance benchmark](https://community.amd.com/t5/business/microsoft-azure-confidential-computing-powered-by-3rd-gen-epyc/ba-p/497796) for CVMs that utilize 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. This benchmark, which included a variety of mostly compute-intensive tests such as SPEC CPU 2017 and CoreMark, demonstrated that CVMs experience only minor performance degradation (ranging from 2% to 8%) when compared to standard VMs. Such results are indicative of the performance that can be expected from compute-intensive workloads running with Constellation on Azure.
|
||||
|
||||
### AMD and Google benchmarking
|
||||
|
||||
Similarly, AMD and Google have jointly released a [performance benchmark](https://www.amd.com/system/files/documents/3rd-gen-epyc-gcp-c2d-conf-compute-perf-brief.pdf) for CVMs employing 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With high-performance computing workloads such as WRF, NAMD, Ansys CFS, and Ansys LS_DYNA, they observed analogous findings, with only minor performance degradation (between 2% and 4%) compared to standard VMs. These outcomes are reflective of the performance that can be expected for compute-intensive workloads running with Constellation on GCP.
|
||||
|
||||
## I/O performance benchmarks
|
||||
|
||||
We evaluated the [I/O performance](io.md) of Constellation, utilizing a collection of synthetic benchmarks targeting networking and storage. We further compared this performance to native managed Kubernetes offerings from various cloud providers, to better understand how Constellation stands in relation to standard practices.
|
||||
|
||||
## Real-world application benchmarking
|
||||
|
||||
To gauge Constellation's real-world applicability, we performed a specific benchmarking of [HashiCorp Vault](application.md) running on Constellation. The results were then compared to deployments on the managed Kubernetes offerings from different cloud providers, providing a tangible perspective on Constellation's performance in actual deployment scenarios.
|
@ -33,9 +33,24 @@
|
||||
"id": "overview/clouds"
|
||||
},
|
||||
{
|
||||
"type": "doc",
|
||||
"type": "category",
|
||||
"label": "Performance",
|
||||
"id": "overview/performance"
|
||||
"link": {
|
||||
"type": "doc",
|
||||
"id": "overview/performance/performance"
|
||||
},
|
||||
"items": [
|
||||
{
|
||||
"type": "doc",
|
||||
"label": "I/O benchmarks",
|
||||
"id": "overview/performance/io"
|
||||
},
|
||||
{
|
||||
"type": "doc",
|
||||
"label": "Application benchmarks",
|
||||
"id": "overview/performance/application"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "doc",
|
||||
|