update docs

2024-06-28 23:32:20 +00:00 · 2023-08-05 09:21:06 +01:00 · 2023-08-05 09:21:06 +01:00 · 57ab06d457
commit 57ab06d457
parent 430733c7b8
2 changed files with 81 additions and 3 deletions
--- a/MODELS.md
+++ b/MODELS.md
@ -0,0 +1,70 @@
+# Models Directory
+
+## "Coder" family models
+
+WizardCoder, StarCoder and SantaCoder are current "state-of-the-art" autocomplete models 
+
+### SantaCoder (Best Small model)
+
+[SantaCoder](https://huggingface.co/bigcode/santacoder) is a smaller version of the StarCoder and WizardCoder family with only 1.1 Billion parameters. The model is trained with fill-in-the-middle objective allowing it to be used to auto-complete function parameters.
+
+This model is primarily trained on Python, Java and Javscript.
+
+
+| Model Name          | RAM Requirement | Direct Download  | HF Project Link |
+|---------------------|-----------------|-----------------|-----------------|
+| StarCoder   | ~2GiB        |   [:arrow_down:](https://huggingface.co/mike-ravkine/gpt_bigcode-santacoder-GGML/resolve/main/santacoder-q4_0.bin)           |   [:hugs:](https://huggingface.co/mike-ravkine/gpt_bigcode-santacoder-GGML/)           |
+
+To run in Turbopilot set model type `-m starcoder`
+
+
+### WizardCoder (Best Autocomplete Performance, Compute-Hungry)
+
+[WizardCoder](https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder) is the current SOTA auto complete model, it is an updated version of StarCoder that achieves 57.1 pass@1 on HumanEval benchmarks (essentially in 57% of cases it correctly solves a given challenge. Read more about how this metric works in the scientific paper [here](https://arxiv.org/pdf/2107.03374.pdf) ).
+
+Even when quantized, WizardCoder is a large model that takes up a significant amount of RAM.
+
+
+| Model Name          | RAM Requirement | Direct Download  | HF Project Link |
+|---------------------|-----------------|-----------------|-----------------|
+| WizardCoder   | ~12GiB        |   [:arrow_down:](https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GGML/resolve/main/WizardCoder-15B-1.0.ggmlv3.q4_0.bin)           |   [:hugs:](https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GGML/)           |
+
+To run in Turbopilot set model type `-m starcoder`
+
+
+### StarCoder
+
+[StarCoder](https://huggingface.co/blog/starcoder) held the previous title of state-of-the-art coding model back in May 2023. It is still a reasonably good model by comparison but it is a similar size and has similar RAM and compute requirements to WizardCoder so you may be better off just running that. Links below provided for posterity.
+
+
+| Model Name          | RAM Requirement | Direct Download  | HF Project Link |
+|---------------------|-----------------|------------------|-----------------|
+| StarCoder   | ~12GiB        | [:arrow_down:](https://huggingface.co/NeoDim/starcoder-GGML/resolve/main/starcoder-ggml-q4_0.bin)           |   [:hugs:](https://huggingface.co/NeoDim/starcoder-GGML/)           |
+| StarCoder Plus   | ~12GiB        | [:arrow_down:](https://huggingface.co/TheBloke/starcoderplus-GGML/resolve/main/starcoderplus.ggmlv3.q4_0.bin)           |   [:hugs:](https://huggingface.co/TheBloke/starcoderplus-GGML/)           |
+
+To run in Turbopilot set model type `-m starcoder`
+
+
+
+
+
+
+## CodeGen 1.0
+
+The CodeGen models were the first models supported by Turbopilot. They perform less well than the newer Wizardcoder/Starcoder/Santacoder variant models.
+
+
+The `multi` flavour models can provide auto-complete suggestions for `C`, `C++`, `Go`, `Java`, `JavaScript`, and `Python`.
+
+The `mono` flavour models can provide auto-complete suggestions for `Python` only (but the quality of Python-specific suggestions may be higher).
+
+Pre-converted and pre-quantized models are available for download from here:
+
+| Model Name          | RAM Requirement | Supported Languages       | Direct Download  | HF Project Link |
+|---------------------|-----------------|---------------------------|-----------------|-----------------|
+| CodeGen 350M multi   | ~800MiB        | `C`, `C++`, `Go`, `Java`, `JavaScript`, `Python`  |   [:arrow_down:](https://huggingface.co/ravenscroftj/CodeGen-350M-multi-ggml-quant/resolve/main/codegen-350M-multi-ggml-4bit-quant.bin)           |   [:hugs:](https://huggingface.co/ravenscroftj/CodeGen-350M-multi-ggml-quant)           |
+| CodeGen 350M mono   | ~800MiB   | `Python`          |   [:arrow_down:](https://huggingface.co/Guglielmo/CodeGen-350M-mono-ggml-quant/resolve/main/ggml-model-quant.bin)           |   [:hugs:](https://huggingface.co/Guglielmo/CodeGen-350M-mono-ggml-quant)           |
+| CodeGen 2B multi   | ~4GiB  | `C`, `C++`, `Go`, `Java`, `JavaScript`, `Python`          |   [:arrow_down:](https://huggingface.co/ravenscroftj/CodeGen-2B-multi-ggml-quant/resolve/main/codegen-2B-multi-ggml-4bit-quant.bin)           |   [:hugs:](https://huggingface.co/ravenscroftj/CodeGen-2B-multi-ggml-quant)          |
+| CodeGen 2B mono   | ~4GiB  | `Python`          |   [:arrow_down:](https://huggingface.co/Guglielmo/CodeGen-2B-mono-ggml-quant/resolve/main/ggml-model-quant.bin)           |   [:hugs:](https://huggingface.co/Guglielmo/CodeGen-2B-mono-ggml-quant/)          |
+| CodeGen 6B multi   | ~8GiB  | `C`, `C++`, `Go`, `Java`, `JavaScript`, `Python`          |   [:arrow_down:](https://huggingface.co/ravenscroftj/CodeGen-6B-multi-ggml-quant/resolve/main/codegen-6B-multi-ggml-4bit-quant.bin)           |   [:hugs:](https://huggingface.co/ravenscroftj/CodeGen-6B-multi-ggml-quant)          |
+| CodeGen 6B mono   | ~8GiB  | `Python`          |   [:arrow_down:](https://huggingface.co/Guglielmo/CodeGen-6B-mono-ggml-quant/resolve/main/ggml-model-quant.bin)           |   [:hugs:](https://huggingface.co/Guglielmo/CodeGen-6B-mono-ggml-quant/)          |
--- a/README.md
+++ b/README.md
@ -34,6 +34,12 @@ You have 2 options for getting the model

 You can download the pre-converted, pre-quantized models from Huggingface.

+For low RAM users (4-8 GiB), I recommend [SantaCoder](https://huggingface.co/mike-ravkine/gpt_bigcode-santacoder-GGML/resolve/main/santacoder-q4_0.bin) and for high power users (16+ GiB RAM, discrete GPU or apple silicon) I recomnmend [WizardCoder](https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GGML/resolve/main/WizardCoder-15B-1.0.ggmlv3.q4_0.bin).
+
+Turbopilot still supports the first generation codegen models from `v0.0.5` and earlier builds. Although old models do need to be requantized.
+
+You can find a full catalogue of models in [MODELS.md](MODELS.md).
+

 #### Option B: Convert The Models Yourself - Hard, More Flexible

@ -57,7 +63,7 @@ If you have a multi-core system you can control how many CPUs are used with the
 ./codegen-serve -t 6 -m starcoder -f ./models/santacoder-q4_0.bin
 ```

-Turbopilot also supports the legacy codegen models. Just change the model type flag `-m` to `codegen` instead.
+To run the legacy codegen models. Just change the model type flag `-m` to `codegen` instead.

 **NOTE: the latest version of GGML requires that you re-quantize your codegen models. Old models downloaded from here will no longer work. I am working on providing updated quantized codegen models**

@ -87,10 +93,12 @@ docker run --gpus=all --rm -it \
  -e THREADS=6 \
  -e MODEL="/models/codegen-2B-multi-ggml-4bit-quant.bin" \
  -p 18080:18080 \
-  ghcr.io/ravenscroftj/turbopilot:v0.0.5-cuda
+  ghcr.io/ravenscroftj/turbopilot:v1.0.0-cuda11
 ```

-You will need CUDA 11 or later to run this container. You should be able to see `/app/codegen-serve` listed when you run `nvidia-smi`.
+Swap `ghcr.io/ravenscroftj/turbopilot:v1.0.0-cuda11` for `ghcr.io/ravenscroftj/turbopilot:v1.0.0-cuda12` if you are using CUDA 12 or later.
+
+You will need CUDA 11 or CUDA 12 later to run this container. You should be able to see `/app/turbopilot` listed when you run `nvidia-smi`.


 #### Executable and CUDA