text-generation-webui/docs/ExLlama.md

22 lines
700 B
Markdown
Raw Normal View History

2023-06-16 19:35:38 -04:00
# ExLlama
## About
ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.
2023-06-16 19:35:38 -04:00
2023-06-16 19:40:12 -04:00
## Installation:
2023-06-16 19:35:38 -04:00
2023-06-16 19:40:12 -04:00
1) Clone the ExLlama repository into your `text-generation-webui/repositories` folder:
2023-06-16 19:35:38 -04:00
```
2023-06-16 19:40:12 -04:00
mkdir repositories
2023-06-16 19:35:38 -04:00
cd repositories
git clone https://github.com/turboderp/exllama
```
2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama
3) Configure text-generation-webui to use exllama via the UI or command line:
- In the "Model" tab, set "Loader" to "exllama"
- Specify `--loader exllama` on the command line