mirror of
https://github.com/ravenscroftj/turbopilot.git
synced 2024-10-01 01:06:01 -04:00
2.6 KiB
2.6 KiB
TurboPilot
TurboPilot is a self-hosted copilot clone which uses the library behind llama.cpp to run the 6 Billion Parameter Salesforce Codegen model in 4GiB of RAM. It is heavily based and inspired by on the fauxpilot project.
NB: This is a proof of concept right now rather than a stable tool. Autocompletion is quite slow in this version of the project. Feel free to play with it, but your mileage may vary.
Getting Started
git clone https://github.com/ravenscroftj/turbopilot
git submodule init
cd ggml
mkdir build
cd build
cmake ..
make codegen codegen-quantize
Getting The Models
Direct Access
You can download the pre-converted, pre-quantized models from Google Drive
Convert The Models Yourself
Start by downloading either the 2B or 6B GPT-J versions of CodeGen.
Convert The Model
python convert-codegen-to-ggml.py ./codegen-6B-multi-gptj 0
Quantize the model
./bin/codegen-quantize ../../codegen-6B-multi-gptj/ggml-model-f32.bin ../../codegen-6B-multi-gptj/ggml-model-quant.bin 2
Run the model
./bin/codegen -t 6 -m ../../codegen-6B-multi-gptj/ggml-model-quant.bin -p "def main("
Acknowledgements
- This project would not have been possible without Georgi Gerganov's work on GGML and llama.cpp
- It was completely inspired by fauxpilot which I did experiment with for a little while but wanted to try to make the models work without a GPU
- The frontend of the project is powered by Venthe's vscode-fauxpilot plugin
- The project uses the Salesforce Codegen models.
- Thanks to Moyix for his work on converting the Salesforce models to run in a GPT-J architecture. Not only does this confer some speed benefits but it also made it much easier for me to port the models to GGML using the existing gpt-j example code
- The model server uses CrowCPP to serve suggestions.