mirror of
https://github.com/oobabooga/text-generation-webui.git
synced 2024-10-01 01:26:03 -04:00
1.4 KiB
1.4 KiB
Performance optimizations
In order to get the highest possible performance for your hardware, you can try compiling the following 3 backends manually instead of relying on the pre-compiled binaries that are part of requirements.txt
:
- AutoGPTQ (the default GPTQ loader)
- GPTQ-for-LLaMa (secondary GPTQ loader)
- llama-cpp-python
If you go this route, you should update the Python requirements for the webui in the future with
pip install -r requirements-minimal.txt --upgrade
and then install the up-to-date backends using the commands below. The file requirements-minimal.txt
contains the all requirements except for the pre-compiled wheels for GPTQ and llama-cpp-python.
AutoGPTQ
conda activate textgen
pip uninstall auto-gptq -i
git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
pip install .
GPTQ-for-LLaMa
conda activate textgen
pip uninstall quant-cuda -y
cd text-generation-webui/repositories
rm -r GPTQ-for-LLaMa
git clone https://github.com/oobabooga/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install
llama-cpp-python
If you do not have a GPU:
conda activate textgen
pip uninstall -y llama-cpp-python
pip install llama-cpp-python
If you have a GPU, use the commands here instead: llama.cpp-models.md#gpu-acceleration