Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Fixed issue with static build in docker not working.
Integrated CUDA functonality from llama.cpp upstream which accelerates inference for long prompts

Added multi-threaded server support which should prevent health checks aimed at GET / from failing during prediction.
Separated autocomplete lambda into a separate C++ function so that it can be bound to /v1/completions, /v1/engines/copilot-codex/completions and /v1/engines/codegen/completions
Removed model from completion input as required param which stops the official copilot plugin from freaking out
Integrate latest changes from upstream ggml including some fixes for ARM NEON processor
Added Mac builds as part of CI
Support for fork of vscode-fauxpilot with a progress indicator is now available (PR is open upstream, please react/vote for it).

Added 350M parameter codegen model to Google Drive folder
Added multi-arch docker images so that users can now directly run on Apple silicon and even raspberry pi
Now support pre-tokenized inputs passed into the API from a Python tokenizer (thanks to @thakkarparth007 for their PR - https://github.com/ravenscroftj/ggml/pull/2)

Project now builds on Mac OS (Thanks to @Dimitrije-V for their PR https://github.com/ravenscroftj/ggml/pull/1 and @dabdine for contributing some clearer Mac build instructions)
Fix inability to load vocab.json on converting the 16B model due to encoding of the file not being set by @swanserquack in #5
Improve performance of model by incorporating changes to GGML library from @ggerganov