LoRA (Low-Rank Adaptation) is an extremely powerful method for customizing a base model by training only a small number of parameters. They can be attached to models at runtime.
For instance, a 50mb LoRA can teach LLaMA an entire new language, a given writing style, or give it instruction-following or chat abilities.
This is the current state of LoRA integration in the web UI:
|Loader | Status |
|--------|------|
| Transformers | Full support in 16-bit, `--load-in-8bit`, `--load-in-4bit`, and CPU modes. |
| ExLlama | Single LoRA support. Fast to remove the LoRA afterwards. |
| AutoGPTQ | Single LoRA support. Removing the LoRA requires reloading the entire model.|
| GPTQ-for-LLaMa | Full support with the [monkey patch](https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md#using-loras-with-gptq-for-llama). |