dev-docs: describe GPU support

2025-06-27 07:30:45 -04:00 · 2024-06-18 09:48:39 +02:00 · 2024-06-18 09:48:39 +02:00 · 3462989117
commit 3462989117
parent 1e7d21e071
1 changed files with 57 additions and 0 deletions
--- a/dev-docs/howto/azure-gpu-preview.md
+++ b/dev-docs/howto/azure-gpu-preview.md
@ -0,0 +1,57 @@
 # Constellation with GPUs on Azure
 This document describes the general setup flow to run Constellation on Azure's GPU CVM preview.
 ## Prerequisites
 - A Constellation CLI built on a branch with GPU support.
 - A Constellation OS Image built on a branch with GPU support.
 - Access to the Azure H100 preview, and the capability to create the resources for Constellation in the `eastus2` region.
 ## Step-by-Step Guide
 1. Using the CLI, create a configuration file:
    ```shell-session
    constellation config generate azure
    ```
 2. Create IAM configuration as per your requirements. The region needs to be set to `eastus2`.
    ```shell-session
    constellation iam create azure --update-config --region eastus2 --resourceGroup msanft-gpu-rg --servicePrincipal msanft-gpu-sp
    ```
 3. Then, a few patches to the configuration are necessary.
    1. Replace the `.image` field's contents with the canonical URL for the node image (e.g. `ref/feat-gpu-h100-support/stream/debug/v2.17.0-pre.0.20240405174758-ec33feb41903`)
    2. Replace the `.microserviceVersion` field's contents with the CLI version used (i.e. the output of `constellation version`)
    3. Set the `nodeGroups.*.instanceTypes` fields' contents to `Standard_NCC40ads_H100_v5`, which is the GPU-enabled CVM type.
    4. Set the `attestation.azureSEVSNP.bootloaderVersion` to `7` and the `attestation.azureSEVSNP.microcodeVersion` to 62. This is due to the GPU-enabled CVMs using the "Genoa" CPU generation, which is not yet used in any other VMs on Azure and thus not reflected by our API resolving the `latest` versions.
    5. Fetch the measurements for your image:
        ```shell-session
        constellation config fetch-measurements --insecure
        ```
        The  `--insecure` flag is necessary, as the image (most likely) won't be a stable image, and is thus not signed by us.
 4. Then, continue creating a cluster as per usual:
    ```shell-session
    constellation apply
    ```
 ## Deploying a sample workload
 To deploy an examplary AI workload, we can use [this Llama-2-7B deployment](https://github.com/chenhunghan/ialacol):
 ```shell-session
 helm repo add ialacol https://chenhunghan.github.io/ialacol
 helm repo update
 helm install llama-2-7b-chat ialacol/ialacol
 kubectl port-forward svc/llama-2-7b-chat 8000:8000
 curl -X POST \
     -H 'Content-Type: application/json' \
     -d '{ "messages": [{"role": "user", "content": "How are you?"}], "model": "llama-2-7b-chat.ggmlv3.q4_0.bin", "stream": false}' \
     http://localhost:8000/v1/chat/completions
 ```
 But of course, also any other AI workload can be deployed.
 ## Known Limitations
 - Control Plane nodes currently also need to run in GPU-enabled CVMs, as `eastus2` doesn't offer any other SEV-SNP VM type, and mixing regions or attestation types is currently not supported in Constellation. If you want to use the GPU power of the control plane nodes, you should enable workloads to schedule on them.
 - We currently use the ARK returned by Azure's compatibility layer (THIM), which would not be the case in production. We would normally use the ARK given by AMD and hardcode it in `constellation-conf.yaml`, but due to the Genoa CPUs not being present anywhere else, this has not yet been done.