Update README.md

This commit is contained in:
James Ravenscroft 2023-08-26 17:11:12 +01:00 committed by GitHub
parent 86f07745bb
commit 30c437700a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -94,11 +94,14 @@ docker run --gpus=all --rm -it \
-e THREADS=6 \
-e MODEL_TYPE=starcoder \
-e MODEL="/models/santacoder-q4_0.bin" \
-e GPU_LAYERS=32 \
-p 18080:18080 \
ghcr.io/ravenscroftj/turbopilot:v0.1.0-cuda11
ghcr.io/ravenscroftj/turbopilot:v0.2.0-cuda11-7
```
Swap `ghcr.io/ravenscroftj/turbopilot:v0.1.0-cuda11` for `ghcr.io/ravenscroftj/turbopilot:v0.1.0-cuda12` if you are using CUDA 12 or later.
If you have a big enough GPU then setting `GPU_LAYERS` will allow turbopilot to fully offload computation onto your GPU rather than copying data backwards and forwards, dramatically speeding up inference.
Swap `ghcr.io/ravenscroftj/turbopilot:v0.1.0-cuda11` for `ghcr.io/ravenscroftj/turbopilot:v0.2.0-cuda12-0` or `ghcr.io/ravenscroftj/turbopilot:v0.2.0-cuda12-2` if you are using CUDA 12.0 or 12.2 respectively.
You will need CUDA 11 or CUDA 12 later to run this container. You should be able to see `/app/turbopilot` listed when you run `nvidia-smi`.
@ -107,6 +110,8 @@ You will need CUDA 11 or CUDA 12 later to run this container. You should be able
As of v0.0.5 a CUDA version of the linux executable is available - it requires that libcublas 11 be installed on the machine - I might build ubuntu debs at some point but for now running in docker may be more convenient if you want to use a CUDA GPU.
You can use GPU offloading via the `--ngl` option.
### 🌐 Using the API
#### Support for the official Copilot Plugin
@ -177,12 +182,7 @@ Should get you something like this:
## 👉 Known Limitations
Again I want to set expectations around this being a proof-of-concept project. With that in mind. Here are some current known limitations.
As of **v0.0.2**:
- The models can be quite slow - especially the 6B ones. It can take ~30-40s to make suggestions across 4 CPU cores.
- I've only tested the system on Ubuntu 22.04 but I am now supplying ARM docker images and soon I'll be providing ARM binary releases.
- Sometimes suggestions get truncated in nonsensical places - e.g. part way through a variable name or string name. This is due to a hard limit of 2048 on the context length (prompt + suggestion).
- Currently Turbopilot only supports one GPU device at a time (it will not try to make use of multiple devices).
## 👏 Acknowledgements