alpaca-lora/README.md

49 lines
2.5 KiB
Markdown
Raw Normal View History

2023-03-14 02:13:05 -04:00
## 🦙🌲🤏 Alpaca-LoRA: Low-Rank LLaMA Instruct-Tuning
2023-03-13 18:00:05 -04:00
2023-03-14 18:10:33 -04:00
This repository contains code for reproducing the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) results using [low-rank adaptation (LoRA)](https://arxiv.org/pdf/2106.09685.pdf).
The fine-tuning runs within five hours on a consumer GPU,
and the LoRA weights are made available on the Huggingface model hub.
With Huggingface's out-of-the-box 8-bit quantization,
we aim to provide an Instruct model of similar quality to `text-davinci-003` that can run [on a Raspberry Pi](https://twitter.com/miolini/status/1634982361757790209). (For research.)
Until Jason Phang's [LLaMA implementation](https://github.com/huggingface/transformers/pull/21955)
is merged, users will need to replace their local Huggingface `transformers` as described below.
2023-03-14 02:06:36 -04:00
For fine-tuning LoRAs we use Huggingface's [PEFT](https://github.com/huggingface/peft).
2023-03-14 18:10:33 -04:00
Included also is code to download the LLaMA foundation model from the Huggingface model hub. (For research.)
2023-03-14 01:45:34 -04:00
Once I've finished running the finetuning code myself, I'll put the LoRA on the Hub as well, and the code in `generate.py` should work as expected.
2023-03-13 18:00:05 -04:00
2023-03-13 20:44:21 -04:00
### Setup
2023-03-13 18:00:05 -04:00
2023-03-13 20:23:29 -04:00
1. Install dependencies (**install zphang's transformers fork**)
2023-03-13 18:00:05 -04:00
```
2023-03-14 01:47:26 -04:00
pip install -q datasets loralib sentencepiece
2023-03-14 02:10:41 -04:00
pip uninstall transformers
pip install -q git+https://github.com/zphang/transformers@c3dc391
2023-03-13 20:23:29 -04:00
pip install -q git+https://github.com/huggingface/peft.git
2023-03-13 18:00:05 -04:00
```
2023-03-14 02:07:24 -04:00
2. [Install bitsandbytes from source.](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md)
2023-03-13 18:00:05 -04:00
2023-03-14 01:49:33 -04:00
### Inference (`generate.py`)
2023-03-13 18:00:05 -04:00
2023-03-13 20:42:46 -04:00
See `generate.py`. This file reads the `decapoda-research/llama-7b-hf` model from the Huggingface model hub and the LoRA weights from `tloen/alpaca-lora-7b`, and runs inference on a specified input. Users should treat this as example code for the use of the model, and modify it as needed.
2023-03-14 01:49:33 -04:00
### Training (`finetune.py`)
2023-03-13 18:00:05 -04:00
2023-03-14 01:49:33 -04:00
Under construction. If you're impatient, note that this file contains a set of hardcoded hyperparameters you should feel free to modify.
PRs adapting this code to multi-GPU setups and larger models are always welcome.
2023-03-13 20:50:38 -04:00
### To do
2023-03-14 18:10:33 -04:00
- [ ] Merge LoRA weights into LLaMA weights to remove inference dependency on PEFT
- [ ] Train/val/test split
- [ ] Hyperparameter tuning code
2023-03-13 20:50:38 -04:00
- [ ] Documentation for notebook
- [ ] Support for `13b`, `30b`, `65b`
2023-03-14 01:25:39 -04:00
- [ ] Train a version that doesn't waste tokens on the prompt header
2023-03-13 20:50:38 -04:00
- [ ] Inference CLI and evaluation
2023-03-14 00:03:36 -04:00
- [ ] Better disclaimers about why using LLaMA without permission is very bad!