.github/workflows | ||
assets | ||
ggml@560ee1aaa0 | ||
models | ||
.dockerignore | ||
.gitmodules | ||
convert-codegen-to-ggml.py | ||
Dockerfile | ||
LICENSE.md | ||
README.md | ||
requirements.txt | ||
run.sh |
TurboPilot
TurboPilot is a self-hosted copilot clone which uses the library behind llama.cpp to run the 6 Billion Parameter Salesforce Codegen model in 4GiB of RAM. It is heavily based and inspired by on the fauxpilot project.
NB: This is a proof of concept right now rather than a stable tool. Autocompletion is quite slow in this version of the project. Feel free to play with it, but your mileage may vary.
Getting Started
git clone https://github.com/ravenscroftj/turbopilot
git submodule init
cd ggml
mkdir build
cd build
cmake ..
make codegen codegen-quantize
Getting The Models
Direct Download
You can download the pre-converted, pre-quantized models from Google Drive. I've made the multi
flavour models with 2B and 6B parameters available - these models are pre-trained on C
, C++
, Go
, Java
, JavaScript
, and Python
Convert The Models Yourself
Start by downloading either the 2B or 6B GPT-J versions of CodeGen.
You could also experiment with the other sizes of model such as 16B if you want or try the mono models (2B, 6B, 16B) which are fine-tuned on python only but which outperform the multi
models in some cases (see the original paper for details).
You will also need to place vocab.json and added_tokens.json in the directory along with the model to make the conversion script work. This is a temporary limitation that I'll remove at some point.
You can directly git clone
from huggingface URLS above. To save time you can disable LFS on first checkout and selectively pull the files you need (you only need the .bin
files for conversion. The large .zst
files are not needed). Here is an example:
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/moyix/codegen-16B-multi-gptj
git config lfs.fetchexclude "*.zst"
git lfs fetch
git lfs checkout *.bin
Install Python Dependencies
The convert-codegen-to-ggml.py
requires Python 3 - I used 3.10
. Install the dependencies with pip install -r requirements.txt
.
Convert The Model
python convert-codegen-to-ggml.py ./codegen-6B-multi-gptj 0
Quantize the Model
./bin/codegen-quantize ./codegen-6B-multi-gptj/ggml-model-f32.bin ./codegen-6B-multi-gptj/ggml-model-quant.bin 2
Acknowledgements
- This project would not have been possible without Georgi Gerganov's work on GGML and llama.cpp
- It was completely inspired by fauxpilot which I did experiment with for a little while but wanted to try to make the models work without a GPU
- The frontend of the project is powered by Venthe's vscode-fauxpilot plugin
- The project uses the Salesforce Codegen models.
- Thanks to Moyix for his work on converting the Salesforce models to run in a GPT-J architecture. Not only does this confer some speed benefits but it also made it much easier for me to port the models to GGML using the existing gpt-j example code
- The model server uses CrowCPP to serve suggestions.
- Check out the original scientific paper for CodeGen for more info.