mirror of https://github.com/ravenscroftj/turbopilot.git synced 2024-06-28 23:32:20 +00:00

Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU

code-completion cpp language-model machine-learning

Go to file

James Ravenscroft 6459a89c6e Merge branch 'main' of github.com:ravenscroftj/turbopilot		2023-04-10 10:20:08 +01:00
.github/workflows	Update cmake.yml	2023-04-10 10:05:02 +01:00
assets	update screen recording	2023-04-10 08:32:00 +01:00
ggml@560ee1aaa0	use latest ggml submodule	2023-04-10 09:13:21 +01:00
models	add models readme	2023-04-10 09:41:09 +01:00
.dockerignore	add docker build stuff	2023-04-10 08:51:48 +01:00
.gitmodules	add ggml	2023-04-09 17:49:03 +01:00
BUILD.md	add build markdown	2023-04-10 10:19:46 +01:00
convert-codegen-to-ggml.py	add conversion script	2023-04-09 17:49:42 +01:00
Dockerfile	update model name in docker	2023-04-10 09:18:04 +01:00
LICENSE.md	add readme and license	2023-04-10 08:16:12 +01:00
README.md	add link to build from readme	2023-04-10 10:20:04 +01:00
requirements.txt	add requirements file for python	2023-04-10 09:31:54 +01:00
run.sh	add docker build stuff	2023-04-10 08:51:48 +01:00

README.md

TurboPilot

TurboPilot is a self-hosted copilot clone which uses the library behind llama.cpp to run the 6 Billion Parameter Salesforce Codegen model in 4GiB of RAM. It is heavily based and inspired by on the fauxpilot project.

NB: This is a proof of concept right now rather than a stable tool. Autocompletion is quite slow in this version of the project. Feel free to play with it, but your mileage may vary.

Getting Started

The easiest way to try the project out is to grab the pre-processed models and then run the server in docker.

Getting The Models

Direct Download

You can download the pre-converted, pre-quantized models from Google Drive. I've made the multi flavour models with 2B and 6B parameters available - these models are pre-trained on C, C++, Go, Java, JavaScript, and Python

Convert The Models Yourself

Follow this guide if you want to experiment with quantizing the models yourself.

Running TurboPilot Server

Download the latest binary and extract it to the root project folder. If a binary is not provided for your OS or you'd prefer to build it yourself follow the build instructions

Run:

./codegen-serve -m ./models/codegen-6B-multi-ggml-4bit-quant.bin

The application should start a server on port 18080

Acknowledgements

This project would not have been possible without Georgi Gerganov's work on GGML and llama.cpp
It was completely inspired by fauxpilot which I did experiment with for a little while but wanted to try to make the models work without a GPU
The frontend of the project is powered by Venthe's vscode-fauxpilot plugin
The project uses the Salesforce Codegen models.
Thanks to Moyix for his work on converting the Salesforce models to run in a GPT-J architecture. Not only does this confer some speed benefits but it also made it much easier for me to port the models to GGML using the existing gpt-j example code
The model server uses CrowCPP to serve suggestions.
Check out the original scientific paper for CodeGen for more info.