TurboPilot

TurboPilot is a self-hosted copilot clone which uses the library behind llama.cpp to run the 6 Billion Parameter Salesforce Codegen model in 4GiB of RAM. It is heavily based and inspired by on the fauxpilot project.

NB: This is a proof of concept right now rather than a stable tool. Autocompletion is quite slow in this version of the project. Feel free to play with it, but your mileage may vary.

Getting Started

The easiest way to try the project out is to grab the pre-processed models and then run the server in docker.

Getting The Models

Direct Download

You can download the pre-converted, pre-quantized models from Google Drive. I've made the multi flavour models with 2B and 6B parameters available - these models are pre-trained on C, C++, Go, Java, JavaScript, and Python

Convert The Models Yourself

Follow this guide if you want to experiment with quantizing the models yourself.

Running TurboPilot Server

Download the latest binary and extract it to the root project folder. If a binary is not provided for your OS or you'd prefer to build it yourself follow the build instructions

Run:

./codegen-serve -m ./models/codegen-6B-multi-ggml-4bit-quant.bin

The application should start a server on port 18080

Acknowledgements

This project would not have been possible without Georgi Gerganov's work on GGML and llama.cpp
It was completely inspired by fauxpilot which I did experiment with for a little while but wanted to try to make the models work without a GPU
The frontend of the project is powered by Venthe's vscode-fauxpilot plugin
The project uses the Salesforce Codegen models.
Thanks to Moyix for his work on converting the Salesforce models to run in a GPT-J architecture. Not only does this confer some speed benefits but it also made it much easier for me to port the models to GGML using the existing gpt-j example code
The model server uses CrowCPP to serve suggestions.
Check out the original scientific paper for CodeGen for more info.

2.9 KiB Raw Blame History