From 6ee2d3dc6688a8eef1f4d91e60d43e5abcf4c504 Mon Sep 17 00:00:00 2001 From: James Ravenscroft Date: Thu, 10 Aug 2023 08:57:19 +0100 Subject: [PATCH] update readme --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6ca590a..572224e 100644 --- a/README.md +++ b/README.md @@ -66,7 +66,7 @@ If you have a multi-core system you can control how many CPUs are used with the To run the legacy codegen models. Just change the model type flag `-m` to `codegen` instead. -**NOTE: the latest version of GGML requires that you re-quantize your codegen models. Old models downloaded from here will no longer work. I am working on providing updated quantized codegen models** +**NOTE: Turbopilot 0.1.0 and newer re-quantize your codegen models old models from v0.0.5 and older. I am working on providing updated quantized codegen models** ### 📦 Running From Docker @@ -92,7 +92,8 @@ As of release v0.0.5 turbocode now supports CUDA inference. In order to run the docker run --gpus=all --rm -it \ -v ./models:/models \ -e THREADS=6 \ - -e MODEL="/models/codegen-2B-multi-ggml-4bit-quant.bin" \ + -e MODEL_TYPE=starcoder \ + -e MODEL="/models/santacoder-q4_0.bin" \ -p 18080:18080 \ ghcr.io/ravenscroftj/turbopilot:v0.1.0-cuda11 ```