TurboPilot

TurboPilot is a self-hosted copilot clone which uses the library behind llama.cpp to run the 6 Billion Parameter Salesforce Codegen model in 4GiB of RAM. It is heavily based and inspired by on the fauxpilot project.

NB: This is a proof of concept right now rather than a stable tool. Autocompletion is quite slow in this version of the project. Feel free to play with it, but your mileage may vary.

Getting Started

git clone https://github.com/ravenscroftj/turbopilot
git submodule init
cd ggml
mkdir build
cd build
cmake ..
make codegen codegen-quantize

Getting The Models

Direct Download

You can download the pre-converted, pre-quantized models from Google Drive. I've made the multi flavour models with 2B and 6B parameters available - these models are pre-trained on C, C++, Go, Java, JavaScript, and Python

Convert The Models Yourself

Start by downloading either the 2B or 6B GPT-J versions of CodeGen.

You could also experiment with the other sizes of model such as 16B if you want or try the mono models (2B, 6B, 16B) which are fine-tuned on python only but which outperform the multi models in some cases (see the original paper for details).

You will also need to place vocab.json and added_tokens.json in the directory along with the model to make the conversion script work. This is a temporary limitation that I'll remove at some point.

You can directly git clone from huggingface URLS above. To save time you can disable LFS on first checkout and selectively pull the files you need (you only need the .bin files for conversion. The large .zst files are not needed). Here is an example:

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/moyix/codegen-16B-multi-gptj
git config lfs.fetchexclude "*.zst"
git lfs fetch
git lfs checkout *.bin

Install Python Dependencies

The convert-codegen-to-ggml.py requires Python 3 - I used 3.10. Install the dependencies with pip install -r requirements.txt.

Convert The Model

python convert-codegen-to-ggml.py ./codegen-6B-multi-gptj 0

Quantize the Model

./bin/codegen-quantize ./codegen-6B-multi-gptj/ggml-model-f32.bin ./codegen-6B-multi-gptj/ggml-model-quant.bin 2

Acknowledgements

This project would not have been possible without Georgi Gerganov's work on GGML and llama.cpp
It was completely inspired by fauxpilot which I did experiment with for a little while but wanted to try to make the models work without a GPU
The frontend of the project is powered by Venthe's vscode-fauxpilot plugin
The project uses the Salesforce Codegen models.
Thanks to Moyix for his work on converting the Salesforce models to run in a GPT-J architecture. Not only does this confer some speed benefits but it also made it much easier for me to port the models to GGML using the existing gpt-j example code
The model server uses CrowCPP to serve suggestions.
Check out the original scientific paper for CodeGen for more info.

4.2 KiB Raw Blame History