ci: Add tooling to create benchmark figures
89
.github/actions/e2e_benchmark/README.md
vendored
@ -41,18 +41,19 @@ Example table:
|
||||
|
||||
</details>
|
||||
|
||||
### Drawing Performance Charts
|
||||
The action also draws graphs as used in the [Constellation docs](https://docs.edgeless.systems/constellation/next/overview/performance). The graphs compare the performance of Constellation to the performance of managed Kubernetes clusters.
|
||||
|
||||
Graphs are created with every run of the benchmarking action. The action attaches them to the `benchmark` artifact of the workflow run.
|
||||
|
||||
## Updating Stored Records
|
||||
|
||||
### Managed Kubernetes
|
||||
One must manually update the stored benchmark records of managed Kubernetes:
|
||||
|
||||
### AKS
|
||||
Follow the [Azure documentation](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-portal?tabs=azure-cli) to create an AKS cluster of desired benchmarking settings (region, instance types). If comparing against Constellation clusters with CVM instances, make sure to select the matching CVM instance type on Azure as well.
|
||||
Follow the [Azure documentation](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-portal?tabs=azure-cli) to create an AKS cluster of desired benchmarking settings (region, instance types). If comparing against Constellation clusters with CVM instances, make sure to select the instance type on AKS as well.
|
||||
|
||||
For example:
|
||||
```bash
|
||||
az aks create -g moritz-constellation -n benchmark --node-count 2
|
||||
az aks get-credentials -g moritz-constellation -n benchmark
|
||||
```
|
||||
|
||||
Once the cluster is ready, set up managing access via `kubectl` and take the benchmark:
|
||||
```bash
|
||||
@ -63,8 +64,9 @@ install knb /usr/local/bin
|
||||
cd ..
|
||||
|
||||
# Setup kubestr
|
||||
HOSTOS="$(go env GOOS)"
|
||||
case "$(go env GOOS)" in "darwin") HOSTOS="MacOS";; *) HOSTOS="$(go env GOOS)";; esac
|
||||
HOSTARCH="$(go env GOARCH)"
|
||||
KUBESTR_VER=0.4.37
|
||||
curl -fsSLO https://github.com/kastenhq/kubestr/releases/download/v${KUBESTR_VER}/kubestr_${KUBESTR_VER}_${HOSTOS}_${HOSTARCH}.tar.gz
|
||||
tar -xzf kubestr_${KUBESTR_VER}_${HOSTOS}_${HOSTARCH}.tar.gz
|
||||
install kubestr /usr/local/bin
|
||||
@ -72,13 +74,13 @@ install kubestr /usr/local/bin
|
||||
|
||||
# Run kubestr
|
||||
mkdir -p out
|
||||
kubestr fio -e "out/fio-constellation-aks.json" -o json -s encrypted-rwo -z 400Gi
|
||||
kubestr fio -e "out/fio-AKS.json" -o json -s default -z 400Gi
|
||||
|
||||
# Run knb
|
||||
workers=$(kubectl get nodes | grep worker)
|
||||
server=$(echo $workers | head -1 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')
|
||||
client=$(echo $workers | head -2 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')
|
||||
knb -f "out/knb-constellation-aks.json" -o json --server-node $server --client-node $client
|
||||
workers="$(kubectl get nodes | grep nodepool)"
|
||||
server="$(echo $workers | head -1 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')"
|
||||
client="$(echo $workers | head -2 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')"
|
||||
knb -f "out/knb-AKS.json" -o json --server-node $server --client-node $client
|
||||
|
||||
|
||||
# Benchmarks done, do processing.
|
||||
@ -86,9 +88,10 @@ knb -f "out/knb-constellation-aks.json" -o json --server-node $server --client-n
|
||||
# Parse
|
||||
git clone https://github.com/edgelesssys/constellation.git
|
||||
mkdir -p benchmarks
|
||||
BDIR=benchmarks
|
||||
EXT_NAME=AKS
|
||||
KBENCH_RESULTS=out/
|
||||
export BDIR=benchmarks
|
||||
export CSP=azure
|
||||
export EXT_NAME=AKS
|
||||
export BENCH_RESULTS=out/
|
||||
|
||||
python constellation/.github/actions/e2e_benchmark/evaluate/parse.py
|
||||
|
||||
@ -98,7 +101,29 @@ aws s3 cp benchmarks/AKS.json ${S3_PATH}/AKS.json
|
||||
```
|
||||
|
||||
### GKE
|
||||
Create a GKE cluster of desired benchmarking settings (region, instance types). If comparing against Constellation clusters with CVM instances, make sure to select the matching CVM instance type on GCP and enable **confidential** VMs as well.
|
||||
Create a GKE cluster of desired benchmarking settings (region, instance types). If comparing against Constellation clusters with CVM instances, make sure to select the matching instance type on GKE.
|
||||
For example:
|
||||
|
||||
```bash
|
||||
gcloud container clusters create benchmark \
|
||||
--zone europe-west3-b \
|
||||
--node-locations europe-west3-b \
|
||||
--machine-type n2d-standard-4 \
|
||||
--num-nodes 2
|
||||
gcloud container clusters get-credentials benchmark --region europe-west3-b
|
||||
# create storage class for pd-standard
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: pd-standard
|
||||
provisioner: pd.csi.storage.gke.io
|
||||
volumeBindingMode: WaitForFirstConsumer
|
||||
allowVolumeExpansion: true
|
||||
parameters:
|
||||
type: pd-standard
|
||||
EOF
|
||||
```
|
||||
|
||||
Once the cluster is ready, set up managing access via `kubectl` and take the benchmark:
|
||||
```bash
|
||||
@ -109,29 +134,31 @@ install knb /usr/local/bin
|
||||
cd ..
|
||||
|
||||
# Setup kubestr
|
||||
HOSTOS="$(go env GOOS)"
|
||||
case "$(go env GOOS)" in "darwin") HOSTOS="MacOS";; *) HOSTOS="$(go env GOOS)";; esac
|
||||
HOSTARCH="$(go env GOARCH)"
|
||||
KUBESTR_VER=0.4.37
|
||||
curl -fsSLO https://github.com/kastenhq/kubestr/releases/download/v${KUBESTR_VER}/kubestr_${KUBESTR_VER}_${HOSTOS}_${HOSTARCH}.tar.gz
|
||||
tar -xzf kubestr_${KUBESTR_VER}_${HOSTOS}_${HOSTARCH}.tar.gz
|
||||
install kubestr /usr/local/bin
|
||||
|
||||
# Run kubestr
|
||||
mkdir -p out
|
||||
kubestr fio -e "out/fio-constellation-gke.json" -o json -s encrypted-rwo -z 400Gi
|
||||
kubestr fio -e "out/fio-GKE.json" -o json -s pd-standard -z 400Gi
|
||||
|
||||
# Run knb
|
||||
workers=$(kubectl get nodes | grep worker)
|
||||
server=$(echo $workers | head -1 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')
|
||||
client=$(echo $workers | head -2 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')
|
||||
knb -f "out/knb-constellation-gke.json" -o json --server-node $server --client-node $client
|
||||
workers="$(kubectl get nodes | grep default-pool)"
|
||||
server="$(echo $workers | head -1 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')"
|
||||
client="$(echo $workers | head -2 | tail -1 |cut -d ' ' -f1|tr '\n' ' ')"
|
||||
knb -f "out/knb-GKE.json" -o json --server-node "$server" --client-node "$client"
|
||||
|
||||
|
||||
# Parse
|
||||
git clone https://github.com/edgelesssys/constellation.git
|
||||
mkdir -p benchmarks
|
||||
BDIR=benchmarks
|
||||
EXT_NAME=GKE
|
||||
KBENCH_RESULTS=out/
|
||||
export BDIR=benchmarks
|
||||
export CSP=gcp
|
||||
export EXT_NAME=GKE
|
||||
export BENCH_RESULTS=out/
|
||||
|
||||
python constellation/.github/actions/e2e_benchmark/evaluate/parse.py
|
||||
|
||||
@ -142,3 +169,15 @@ aws s3 cp benchmarks/GKE.json ${S3_PATH}/GKE.json
|
||||
|
||||
### Constellation
|
||||
The action updates the stored Constellation records for the selected cloud provider when running on the main branch.
|
||||
|
||||
## Drawing Performance Charts
|
||||
The action also contains the code to draw graphs as used in the [Constellation docs](https://docs.edgeless.systems/constellation/next/overview/performance).
|
||||
The graphs compare the performance of Constellation to the performance of managed Kubernetes clusters.
|
||||
It expects the results of `[AKS.json, GKE.json, constellation-azure.json, constellation-gcp.json]` to be present in the `BDIR` folder.
|
||||
|
||||
Graphs can thne be created from using the `graphs.py` script:
|
||||
|
||||
```bash
|
||||
BDIR=benchmarks
|
||||
python ./graph.py
|
||||
```
|
||||
|
229
.github/actions/e2e_benchmark/evaluate/graph.py
vendored
Normal file
@ -0,0 +1,229 @@
|
||||
"""Generate graphs comparing K-Bench benchmarks across cloud providers and Constellation."""
|
||||
import json
|
||||
import os
|
||||
import tempfile
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from urllib import request
|
||||
|
||||
import numpy as np
|
||||
from matplotlib import pyplot as plt
|
||||
from matplotlib import font_manager as fm
|
||||
|
||||
|
||||
SUBJECTS = [
|
||||
'constellation-azure',
|
||||
'AKS',
|
||||
'constellation-gcp',
|
||||
'GKE',
|
||||
]
|
||||
|
||||
LEGEND_NAMES = [
|
||||
'Constellation on Azure',
|
||||
'AKS',
|
||||
'Constellation on GCP',
|
||||
'GKE',
|
||||
]
|
||||
|
||||
BAR_COLORS = ['#90FF99', '#929292', '#8B04DD', '#000000']
|
||||
|
||||
FONT_URL = "https://github.com/google/fonts/raw/main/apache/roboto/static/Roboto-Regular.ttf"
|
||||
FONT_NAME = "Roboto-Regular.ttf"
|
||||
|
||||
# Rotate bar labels by X degrees
|
||||
LABEL_ROTATE_BY = 30
|
||||
LABEL_FONTSIZE = 9
|
||||
|
||||
# Some lookup dictionaries for x axis
|
||||
fio_iops_unit = 'IOPS'
|
||||
fio_bw_unit = 'KiB/s'
|
||||
|
||||
net_unit = 'Mbit/s'
|
||||
|
||||
|
||||
def configure() -> str:
|
||||
"""Read the benchmark data paths.
|
||||
|
||||
Expects ENV vars (required):
|
||||
- BDIR=benchmarks
|
||||
|
||||
Raises TypeError if at least one of them is missing.
|
||||
|
||||
Returns: out_dir
|
||||
"""
|
||||
out_dir = os.environ.get('BDIR', None)
|
||||
if not out_dir:
|
||||
raise TypeError(
|
||||
'ENV variables BDIR is required.')
|
||||
return out_dir
|
||||
|
||||
|
||||
def bar_chart(data, title='', unit='', x_label= ''):
|
||||
# """Draws a bar chart with multiple bars per data point.
|
||||
|
||||
# Args:
|
||||
# data (dict[str, list]): Benchmark data dictionary: subject -> lists of value points
|
||||
# title (str, optional): The title for the chart. Defaults to "".
|
||||
# suffix (str, optional): The suffix for values e.g. "MiB/s". Defaults to "".
|
||||
# x_label (str, optional): The label for the x-axis. Defaults to "".
|
||||
# Returns:
|
||||
# fig (matplotlib.pyplot.figure): The pyplot figure
|
||||
# """
|
||||
|
||||
# Create plot and set configs
|
||||
plt.rcdefaults()
|
||||
plt.rc('font', family=FONT_NAME)
|
||||
fig, ax = plt.subplots(figsize=(10, 5))
|
||||
|
||||
# Calculate y positions
|
||||
y_pos = np.arange(len(data))
|
||||
|
||||
bars = ax.barh(y_pos, data.values(), align='center', color=BAR_COLORS)
|
||||
|
||||
# Axis formatting
|
||||
ax.spines['top'].set_visible(False)
|
||||
ax.spines['right'].set_visible(False)
|
||||
ax.spines['left'].set_visible(False)
|
||||
ax.spines['bottom'].set_color('#DDDDDD')
|
||||
ax.tick_params(bottom=False, left=False)
|
||||
ax.set_axisbelow(True)
|
||||
ax.xaxis.grid(True, color='#EEEEEE')
|
||||
ax.yaxis.grid(False)
|
||||
|
||||
# Bar annotations
|
||||
for bar in bars:
|
||||
ax.text(
|
||||
1.03*bar.get_width(),
|
||||
bar.get_y() + bar.get_height() / 2,
|
||||
f'{bar.get_width():.0f}',
|
||||
verticalalignment='center',
|
||||
)
|
||||
|
||||
# Set labels and titles
|
||||
ax.set_yticks(y_pos, labels=data.keys())
|
||||
ax.invert_yaxis() # labels read top-to-bottom
|
||||
ax.set_xlabel(x_label, fontdict={"fontsize": 12})
|
||||
if unit != '':
|
||||
unit = f"({unit})"
|
||||
ax.set_title(f'{title} {unit}', fontdict={"fontsize": 20, 'weight': 'bold'})
|
||||
|
||||
plt.tight_layout()
|
||||
#plt.show()
|
||||
return fig
|
||||
|
||||
def main():
|
||||
""" Download and setup fonts"""
|
||||
path = Path(tempfile.mkdtemp())
|
||||
font_path = path / FONT_NAME
|
||||
request.urlretrieve(FONT_URL, font_path)
|
||||
|
||||
font = fm.FontEntry(fname=str(font_path), name=FONT_NAME)
|
||||
fm.fontManager.ttflist.append(font)
|
||||
|
||||
"""Read the files and create diagrams."""
|
||||
out_dir = configure()
|
||||
combined_results = defaultdict(dict)
|
||||
|
||||
for test in SUBJECTS:
|
||||
# Read the previous results
|
||||
read_path = os.path.join(
|
||||
out_dir, '{subject}.json'.format(subject=test))
|
||||
try:
|
||||
with open(read_path, 'r') as res_file:
|
||||
combined_results[test].update(json.load(res_file))
|
||||
except OSError as e:
|
||||
raise ValueError(
|
||||
'Failed reading {subject} benchmark records: {e}'.format(subject=test, e=e))
|
||||
|
||||
# Network charts
|
||||
## P2P TCP
|
||||
net_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
net_data[l] = int(combined_results[s]['knb']['pod2pod']['tcp_bw_mbit'])
|
||||
bar_chart(data=net_data,
|
||||
title='K8S CNI Benchmark - Pod to Pod - TCP - Bandwidth',
|
||||
unit=net_unit,
|
||||
x_label = f" TCP Bandwidth in {net_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_net_p2p_tcp.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
## P2P TCP
|
||||
net_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
net_data[l] = int(combined_results[s]['knb']['pod2pod']['udp_bw_mbit'])
|
||||
bar_chart(data=net_data,
|
||||
title='K8S CNI Benchmark - Pod to Pod - UDP - Bandwidth',
|
||||
unit=net_unit,
|
||||
x_label = f" UDP Bandwidth in {net_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_net_p2p_udp.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
|
||||
## P2SVC TCP
|
||||
net_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
net_data[l] = int(combined_results[s]['knb']['pod2svc']['tcp_bw_mbit'])
|
||||
bar_chart(data=net_data,
|
||||
title='K8S CNI Benchmark - Pod to Service - TCP - Bandwidth',
|
||||
unit=net_unit,
|
||||
x_label = f" TCP Bandwidth in {net_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_net_p2svc_tcp.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
## P2SVC UDP
|
||||
net_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
net_data[l] = int(combined_results[s]['knb']['pod2svc']['udp_bw_mbit'])
|
||||
bar_chart(data=net_data,
|
||||
title='K8S CNI Benchmark - Pod to Service - UDP - Bandwidth',
|
||||
unit=net_unit,
|
||||
x_label = f" UDP Bandwidth in {net_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_net_p2svc_udp.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
# FIO chart
|
||||
## Read IOPS
|
||||
fio_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
fio_data[l] = int(combined_results[s]['fio']['read_iops']['iops'])
|
||||
bar_chart(data=fio_data,
|
||||
title='FIO Benchmark - Read - IOPS',
|
||||
x_label = f" Read {fio_iops_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_fio_read_iops.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
## Write IOPS
|
||||
fio_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
fio_data[l] = int(combined_results[s]['fio']['write_iops']['iops'])
|
||||
bar_chart(data=fio_data,
|
||||
title='FIO Benchmark - Write - IOPS',
|
||||
x_label = f" Write {fio_iops_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_fio_write_iops.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
## Read Bandwidth
|
||||
fio_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
fio_data[l] = int(combined_results[s]['fio']['read_bw']['bw_kbytes'])
|
||||
bar_chart(data=fio_data,
|
||||
title='FIO Benchmark - Read - Bandwidth',
|
||||
unit=fio_bw_unit,
|
||||
x_label = f" Read Bandwidth in {fio_bw_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_fio_read_bw.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
## Write Bandwidth
|
||||
fio_data = {}
|
||||
for s, l in zip(SUBJECTS, LEGEND_NAMES):
|
||||
fio_data[l] = int(combined_results[s]['fio']['write_bw']['bw_kbytes'])
|
||||
bar_chart(data=fio_data,
|
||||
title='FIO Benchmark - Write - Bandwidth',
|
||||
unit=fio_bw_unit,
|
||||
x_label = f" Write Bandwidth in {fio_bw_unit} - Higher is better")
|
||||
save_name = os.path.join(out_dir, 'benchmark_fio_write_bw.png')
|
||||
plt.savefig(save_name)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
@ -37,6 +37,7 @@ def configure() -> Tuple[str, str, str, str | None, str, str, str, str]:
|
||||
workflow = os.environ.get('GITHUB_WORKFLOW', 'N/A')
|
||||
return base_path, csp, out_dir, ext_provider_name, commit_hash, commit_ref, actor, workflow
|
||||
|
||||
|
||||
class BenchmarkParser:
|
||||
def __init__(self, base_path, csp, out_dir, ext_provider_name=None, commit_hash="N/A", commit_ref="N/A", actor="N/A", workflow="N/A"):
|
||||
self.base_path = base_path
|
||||
@ -50,7 +51,6 @@ class BenchmarkParser:
|
||||
self.actor = actor
|
||||
self.workflow = workflow
|
||||
|
||||
|
||||
def parse(self) -> None:
|
||||
"""Read and parse the K-Bench tests.
|
||||
|
||||
@ -102,8 +102,10 @@ class BenchmarkParser:
|
||||
|
||||
def main():
|
||||
base_path, csp, out_dir, ext_provider_name, commit_hash, commit_ref, actor, workflow = configure()
|
||||
p = BenchmarkParser(base_path, csp, out_dir, ext_provider_name, commit_hash, commit_ref, actor, workflow)
|
||||
p = BenchmarkParser(base_path, csp, out_dir, ext_provider_name,
|
||||
commit_hash, commit_ref, actor, workflow)
|
||||
p.parse()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
3
.github/actions/e2e_benchmark/evaluate/requirements.txt
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
numpy == 1.24.2
|
||||
matplotlib == 3.7.0
|
||||
Pillow == 9.4.0
|
Before Width: | Height: | Size: 21 KiB |
Before Width: | Height: | Size: 18 KiB |
Before Width: | Height: | Size: 19 KiB |
BIN
docs/docs/_media/benchmark_fio_read_bw.png
Normal file
After Width: | Height: | Size: 30 KiB |
BIN
docs/docs/_media/benchmark_fio_read_iops.png
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
docs/docs/_media/benchmark_fio_write_bw.png
Normal file
After Width: | Height: | Size: 31 KiB |
BIN
docs/docs/_media/benchmark_fio_write_iops.png
Normal file
After Width: | Height: | Size: 24 KiB |
Before Width: | Height: | Size: 22 KiB |
Before Width: | Height: | Size: 14 KiB |
BIN
docs/docs/_media/benchmark_net_p2p_tcp.png
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
docs/docs/_media/benchmark_net_p2p_udp.png
Normal file
After Width: | Height: | Size: 32 KiB |
BIN
docs/docs/_media/benchmark_net_p2svc_tcp.png
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
docs/docs/_media/benchmark_net_p2svc_udp.png
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
docs/docs/_media/benchmark_p2p_concept.webp
Normal file
After Width: | Height: | Size: 7.6 KiB |
BIN
docs/docs/_media/benchmark_p2svc_concept.webp
Normal file
After Width: | Height: | Size: 11 KiB |
@ -6,84 +6,98 @@ This section analyzes the performance of Constellation.
|
||||
|
||||
All nodes in a Constellation cluster run inside Confidential VMs (CVMs). Thus, Constellation's performance is directly affected by the performance of CVMs.
|
||||
|
||||
AMD and Azure jointly released a [performance benchmark](https://community.amd.com/t5/business/microsoft-azure-confidential-computing-powered-by-3rd-gen-epyc/ba-p/497796) for CVMs based on 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With a range of mostly compute-intensive benchmarks like SPEC CPU 2017 and CoreMark, they found that CVMs only have a small (2%--8%) performance degradation compared to standard VMs. You can expect to see similar performance for compute-intensive workloads running on Constellation.
|
||||
AMD and Azure jointly released a [performance benchmark](https://community.amd.com/t5/business/microsoft-azure-confidential-computing-powered-by-3rd-gen-epyc/ba-p/497796) for CVMs based on 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With a range of mostly compute-intensive benchmarks like SPEC CPU 2017 and CoreMark, they found that CVMs only have a small (2%--8%) performance degradation compared to standard VMs. You can expect to see similar performance for compute-intensive workloads running with Constellation on Azure.
|
||||
|
||||
## Performance analysis with K-Bench
|
||||
Similary, AMD and Google jointly released a [performance benchmark](https://www.amd.com/system/files/documents/3rd-gen-epyc-gcp-c2d-conf-compute-perf-brief.pdf) for CVMs based on 3rd Gen AMD EPYC processors (Milan) with SEV-SNP. With high performance computing workloads like WRF, NAMD, Ansys CFS, and Ansys LS_DYNA, they found similar results with only small (2%--4%) performance degradation compared to standard VMs. You can expect to see similar performance for compute-intensive workloads running with Constellation GCP.
|
||||
|
||||
To assess the overall performance of Constellation, we benchmarked Constellation v2.0.0 using [K-Bench](https://github.com/vmware-tanzu/k-bench). K-Bench is a configurable framework to benchmark Kubernetes clusters in terms of storage I/O, network performance, and creating/scaling resources.
|
||||
|
||||
As a baseline, we compare Constellation with the non-confidential managed Kubernetes offerings on Microsoft Azure and Google Cloud Platform (GCP). These are AKS on Azure and GKE on GCP.
|
||||
## Performance analysis of I/O and network
|
||||
|
||||
To assess the overall performance of Constellation, we benchmarked Constellation v2.6.0 in terms of storage I/O using [FIO via Kubestr](https://github.com/kastenhq/kubestr), and network performance using the [Kubernetes Network Benchmark](https://github.com/InfraBuilder/k8s-bench-suite#knb--kubernetes-network-be)
|
||||
### Configurations
|
||||
|
||||
We used the following configurations for the benchmarks.
|
||||
We ran the benchmark with Constellation v2.6.0 and Kubernetes v1.25.7.
|
||||
Cilium v1.12 was used for encrypted networking via eBPF and WireGuard.
|
||||
For storage we utilized Constellation's [Azure Disk CSI driver with encryption](https://github.com/edgelesssys/constellation-azuredisk-csi-driver) v1.1.2 on Azure and Constellation's [GCP Persistent Disk CSI Driver with encryption](https://github.com/edgelesssys/constellation-gcp-compute-persistent-disk-csi-driver) v1.1.2 on GCP.
|
||||
|
||||
We ran the benchmark on AKS with Kubernetes `v1.24.9` and nodes with version `AKSUbuntu-1804gen2containerd-2023.02.15`.
|
||||
On GKE we used Kubernetes `v1.24.9` and nodes with version `1.24.9-gke.3200`.
|
||||
|
||||
We used the following infrastructure configurations for the benchmarks.
|
||||
|
||||
#### Constellation Azure
|
||||
|
||||
- Nodes: 3 (1 Control-plane, 2 Worker)
|
||||
- Machines: `DC4as_v5`: 3rd Generation AMD EPYC 7763v (Milan) processor with 4 Cores, 16 GiB memory
|
||||
- CVM: `true`
|
||||
- Region: `West US`
|
||||
- Zone: `2`
|
||||
|
||||
#### Constellation and GKE on GCP
|
||||
|
||||
- Nodes: 3
|
||||
- Machines: `n2d-standard-4`
|
||||
- Nodes: 3 (1 Control-plane, 2 Worker)
|
||||
- Machines: `n2d-standard-4` 2nd Generation AMD EPYC (Rome) processor with 4 Cores, 16 GiB of memory
|
||||
- CVM: `true`
|
||||
- Zone: `europe-west3-b`
|
||||
|
||||
#### Constellation and AKS on Azure
|
||||
#### AKS
|
||||
|
||||
- Nodes: 3
|
||||
- Machines: `DC4as_v5`
|
||||
- CVM: `true`
|
||||
- Region: `West Europe`
|
||||
- Nodes: 2 (2 Worker)
|
||||
- Machines: `D4as_v5`: 3rd Generation AMD EPYC 7763v (Milan) processor with 4 Cores, 16 GiB memory
|
||||
- CVM: `false`
|
||||
- Region: `West US`
|
||||
- Zone: `2`
|
||||
|
||||
#### K-Bench
|
||||
#### GKE
|
||||
|
||||
Using the default [K-Bench test configurations](https://github.com/vmware-tanzu/k-bench/tree/master/config), we ran the following tests on the clusters:
|
||||
- Nodes: 2 (2 Worker)
|
||||
- Machines: `n2d-standard-4` 2nd Generation AMD EPYC (Rome) processor with 4 Cores, 16 GiB of memory
|
||||
- CVM: `false`
|
||||
- Zone: `europe-west3-b`
|
||||
|
||||
- `default`
|
||||
- `dp_network_internode`
|
||||
- `dp_network_intranode`
|
||||
- `dp_fio`
|
||||
|
||||
### Results
|
||||
|
||||
#### Kubernetes API Latency
|
||||
|
||||
At its core, the Kubernetes API is the way to query and modify a cluster's state. Latency matters here. Hence, it's vital that even with the additional level of security from Constellation's network the API latency doesn't spike.
|
||||
K-Bench's `default` test performs calls to the API to create, update, and delete cluster resources.
|
||||
|
||||
The three graphs below compare the API latencies (lower is better) in milliseconds for pods, services, and deployments.
|
||||
|
||||
![API Latency - Pods](../_media/benchmark_api_pods.png)
|
||||
|
||||
Pods: Except for the `Pod Update` call, Constellation is faster than AKS and GKE in terms of API calls.
|
||||
|
||||
![API Latency - Services](../_media/benchmark_api_svc.png)
|
||||
|
||||
Services: Constellation has lower latencies than AKS and GKE except for service creation on AKS.
|
||||
|
||||
![API Latency - Deployments](../_media/benchmark_api_dpl.png)
|
||||
|
||||
Deployments: Constellation has the lowest latency for all cases except for scaling deployments on GKE and creating deployments on AKS.
|
||||
|
||||
#### Network
|
||||
|
||||
There are two main indicators for network performance: intra-node and inter-node transmission speed.
|
||||
K-Bench provides benchmark tests for both, configured as `dp_network_internode` and `dp_network_intranode`. The tests use [`iperf`](https://iperf.fr/) to measure the bandwidth available.
|
||||
We conducted a thorough analysis of the network performance of Constellation, specifically focusing on measuring the bandwidth of TCP and UDP over a 10Gbit/s network.
|
||||
The benchmark measured the bandwidth of pod to pod as well as pod to service connections between two different nodes.
|
||||
The tests use [`iperf`](https://iperf.fr/) to measure the bandwidth.
|
||||
|
||||
##### Inter-node
|
||||
Constellation on Azure and AKS used an MTU of 1500.
|
||||
Constellation GCP and GKE used an MTU of 8896.
|
||||
|
||||
Inter-node communication is the network transmission between different Kubernetes nodes.
|
||||
|
||||
The first test (`dp_network_internode`) measures the throughput between nodes. Constellation has an inter-node throughput of around 816 Mbps on Azure to 872 Mbps on GCP. While that's faster than the average throughput of AKS at 577 Mbps, GKE provides faster networking at 9.55 Gbps.
|
||||
The difference can largely be attributed to Constellation's [network encryption](../architecture/networking.md) that protects data in-transit.
|
||||
The difference can largely be attributed to two factos.
|
||||
|
||||
##### Intra-node
|
||||
1. Constellation's [network encryption](../architecture/networking.md) via Cilium and WireGuard that protects data in-transit.
|
||||
2. [AMD SEV using SWIOTLB bounce buffers](https://lore.kernel.org/all/20200204193500.GA15564@ashkalra_ubuntu_server/T/) for all DMA including network I/O.
|
||||
|
||||
Intra-node communication happens between pods running on the same node.
|
||||
The connections directly pass through the node's OS layer and never hit the network.
|
||||
The benchmark evaluates how the [Constellation's node OS image](../architecture/images.md) and runtime encryption influence the throughput.
|
||||
##### Pod-to-Pod
|
||||
|
||||
Constellation's bandwidth for both sending and receiving is at 31 Gbps on Azure and 22 Gbps on GCP. AKS achieves 26 Gbps and GKE achieves about 27 Gbps in the tests.
|
||||
In this scenario, the client Pod connects directly to the server pod in its IP address.
|
||||
|
||||
![](../_media/benchmark_net.png)
|
||||
![Pod2Pod concept](../_media/benchmark_p2p_concept.webp)
|
||||
|
||||
|
||||
The results for "Pod-to-Pod" TCP are as follows:
|
||||
|
||||
![Network Pod2Pod TCP benchmark graph](../_media/benchmark_net_p2p_tcp.png)
|
||||
|
||||
The results for "Pod-to-Pod" UDP are as follows:
|
||||
|
||||
![Network Pod2Pod UDP benchmark graph](../_media/benchmark_net_p2p_udp.png)
|
||||
|
||||
##### Pod-to-Service
|
||||
|
||||
In this section, the client Pod connects to the server Pod via a ClusterIP service. This is more relevant to real-world use cases.
|
||||
|
||||
The results for “Pod-to-Service” TCP are as follows:
|
||||
|
||||
![Network Pod2SVC TCP benchmark graph](../_media/benchmark_net_p2svc_tcp.png)
|
||||
|
||||
The results for “Pod-to-Service” UDP are as follows:
|
||||
|
||||
![Network Pod2SVC TCP benchmark graph](../_media/benchmark_net_p2svc_udp.png)
|
||||
|
||||
#### Storage I/O
|
||||
|
||||
@ -92,12 +106,37 @@ Upon requesting persistent storage through a PVC, GKE and AKS will provision a P
|
||||
Constellation provides persistent storage on Azure and GCP [that's encrypted on the CSI layer](../architecture/encrypted-storage.md).
|
||||
Similarly, Constellation will provision a PV via a default storage class upon a PVC request.
|
||||
|
||||
The K-Bench [`fio`](https://fio.readthedocs.io/en/latest/fio_doc.html) benchmark consists of several tests.
|
||||
We selected four different tests that perform asynchronous access patterns because we believe they most accurately depict real-world I/O access for most applications.
|
||||
For Constellation on Azure and AKS we ran the benchmark with Azure Disk Storage [Standard SSD](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types#standard-ssds) of 400GB size.
|
||||
With our DC4as machine type with 4 cores standard-ssd provides the following maximum performance:
|
||||
- 500 (600 burst) IOPS
|
||||
- 60 MB/s (150 MB/s burst) throughput
|
||||
|
||||
The following graph shows I/O throughput in MiB/s (higher is better).
|
||||
For Constellation on GCP and GKE we ran the benchmark with Google Persistent Disk Storage [pd-standard](https://cloud.google.com/compute/docs/disks) of 400GB size.
|
||||
With our N2D machine type with 4 cores pd-standard provides the following [maximum performance](https://cloud.google.com/compute/docs/disks/performance#n2d_vms):
|
||||
- 15,000 write IOPS
|
||||
- 3,000 read IOPS
|
||||
- 240 MB/s write throughput
|
||||
- 240 MB/s read throughput
|
||||
|
||||
![I/O benchmark graph](../_media/benchmark_io.png)
|
||||
The [`fio`](https://fio.readthedocs.io/en/latest/fio_doc.html) benchmark consists of several tests.
|
||||
We selected a tests that performs asynchronous access patterns because we believe they most accurately depict real-world I/O access for most applications.
|
||||
We measured IOPS, read, and write bandwidth.
|
||||
|
||||
The results for "Async Read" IOPS are as follows:
|
||||
|
||||
![I/O read IOPS benchmark graph](../_media/benchmark_fio_read_iops.png)
|
||||
|
||||
The results for "Async Write" IOPS are as follows:
|
||||
|
||||
![I/O write IOPS benchmark graph](../_media/benchmark_fio_write_iops.png)
|
||||
|
||||
The results for "Async Read" bandwidth are as follows:
|
||||
|
||||
![I/O read bandwidth benchmark graph](../_media/benchmark_fio_read_bw.png)
|
||||
|
||||
The results for "Async Write" bandwidth are as follows:
|
||||
|
||||
![I/O write bandwidth benchmark graph](../_media/benchmark_fio_write_bw.png)
|
||||
|
||||
Comparing Constellation on GCP with GKE, you see that Constellation offers similar read/write speeds in all scenarios.
|
||||
|
||||
|