2.9 KiB
TurboPilot
TurboPilot is a self-hosted copilot clone which uses the library behind llama.cpp to run the 6 Billion Parameter Salesforce Codegen model in 4GiB of RAM. It is heavily based and inspired by on the fauxpilot project.
NB: This is a proof of concept right now rather than a stable tool. Autocompletion is quite slow in this version of the project. Feel free to play with it, but your mileage may vary.
Getting Started
The easiest way to try the project out is to grab the pre-processed models and then run the server in docker.
Getting The Models
Direct Download
You can download the pre-converted, pre-quantized models from Google Drive. I've made the multi
flavour models with 2B and 6B parameters available - these models are pre-trained on C
, C++
, Go
, Java
, JavaScript
, and Python
Convert The Models Yourself
Follow this guide if you want to experiment with quantizing the models yourself.
Running TurboPilot Server
Download the latest binary and extract it to the root project folder. If a binary is not provided for your OS or you'd prefer to build it yourself follow the build instructions
Run:
./codegen-serve -m ./models/codegen-6B-multi-ggml-4bit-quant.bin
The application should start a server on port 18080
Acknowledgements
- This project would not have been possible without Georgi Gerganov's work on GGML and llama.cpp
- It was completely inspired by fauxpilot which I did experiment with for a little while but wanted to try to make the models work without a GPU
- The frontend of the project is powered by Venthe's vscode-fauxpilot plugin
- The project uses the Salesforce Codegen models.
- Thanks to Moyix for his work on converting the Salesforce models to run in a GPT-J architecture. Not only does this confer some speed benefits but it also made it much easier for me to port the models to GGML using the existing gpt-j example code
- The model server uses CrowCPP to serve suggestions.
- Check out the original scientific paper for CodeGen for more info.