mirror of
https://github.com/edgelesssys/constellation.git
synced 2025-08-04 21:14:14 -04:00
3.2 KiB
3.2 KiB
Constellation with GPUs on Azure
This document describes the general setup flow to run Constellation on Azure's GPU CVM preview.
Prerequisites
- A Constellation CLI built on a branch with GPU support.
- A Constellation OS Image built on a branch with GPU support.
- Access to the Azure H100 preview, and the capability to create the resources for Constellation in the
eastus2
region.
Step-by-Step Guide
- Using the CLI, create a configuration file:
constellation config generate azure
- Create IAM configuration as per your requirements. The region needs to be set to
eastus2
.constellation iam create azure --update-config --region eastus2 --resourceGroup msanft-gpu-rg --servicePrincipal msanft-gpu-sp
- Then, a few patches to the configuration are necessary.
- Replace the
.image
field's contents with the canonical URL for the node image (e.g.ref/feat-gpu-h100-support/stream/debug/v2.17.0-pre.0.20240405174758-ec33feb41903
) - Replace the
.microserviceVersion
field's contents with the CLI version used (i.e. the output ofconstellation version
) - Set the
nodeGroups.*.instanceTypes
fields' contents toStandard_NCC40ads_H100_v5
, which is the GPU-enabled CVM type. - Set the
attestation.azureSEVSNP.bootloaderVersion
to7
and theattestation.azureSEVSNP.microcodeVersion
to 62. This is due to the GPU-enabled CVMs using the "Genoa" CPU generation, which is not yet used in any other VMs on Azure and thus not reflected by our API resolving thelatest
versions. - Fetch the measurements for your image:
Theconstellation config fetch-measurements --insecure
--insecure
flag is necessary, as the image (most likely) won't be a stable image, and is thus not signed by us.
- Replace the
- Then, continue creating a cluster as per usual:
constellation apply
Deploying a sample workload
To deploy an examplary AI workload, we can use this Llama-2-7B deployment:
helm repo add ialacol https://chenhunghan.github.io/ialacol
helm repo update
helm install llama-2-7b-chat ialacol/ialacol
kubectl port-forward svc/llama-2-7b-chat 8000:8000
curl -X POST \
-H 'Content-Type: application/json' \
-d '{ "messages": [{"role": "user", "content": "How are you?"}], "model": "llama-2-7b-chat.ggmlv3.q4_0.bin", "stream": false}' \
http://localhost:8000/v1/chat/completions
But of course, also any other AI workload can be deployed.
Known Limitations
- Control Plane nodes currently also need to run in GPU-enabled CVMs, as
eastus2
doesn't offer any other SEV-SNP VM type, and mixing regions or attestation types is currently not supported in Constellation. If you want to use the GPU power of the control plane nodes, you should enable workloads to schedule on them. - We currently use the ARK returned by Azure's compatibility layer (THIM), which would not be the case in production. We would normally use the ARK given by AMD and hardcode it in
constellation-conf.yaml
, but due to the Genoa CPUs not being present anywhere else, this has not yet been done.