stanford_alpaca/README.md


<p align="center" width="100%">
<a href="https://crfm.stanford.edu/alpaca/" target="_blank"><img src="assets/logo.png" alt="Stanford-Alpaca" style="width: 50%; min-width: 300px; display: block; margin: auto;"></a>
</p>

# Stanford Alpaca: An Instruction-following LLaMA model 
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE) 
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/) 
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) 

This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:
- A [**web demo**](https://crfm.stanford.edu/alpaca/) to interact with our Alpaca model
- The [52K data](#data-release) used for fine-tuning the model
- The code for [generating the data](#data-generation-process)

## Overview

The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section.
In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the `text-davinci-003` model on the Self-Instruct instruction-following evaluation suite [2].

Alpaca is still under development, and there are many limitations that have to be addressed.
Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless.
We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.

Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.

**Please read our release [blog post](https://crfm.stanford.edu/2023/03/13/alpaca.html) for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process for releasing a reproducible model.**


[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1

[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560


## Data Release
[`alpaca_data.json`](./alpaca_data.json) contains 52K instruction-following data we used for fine-tuning the Alpaca model.
This JSON file is a list of dictionaries, each dictionary contains the following fields:
- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique.
- `input`: `str`, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.
- `output`: `str`, the answer to the instruction as generated by `text-davinci-003`.

We used the following prompts for fine-tuning the Alpaca model:
- for examples with a non-empty input field:
 ```
 Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
 
 ### Instruction:
 {instruction}
 
 ### Input:
 {input}
 
 ### Response:
 ```
- for examples with an empty input field:
 ```
 Below is an instruction that describes a task. Write a response that appropriately completes the request.
 
 ### Instruction:
 {instruction}
 
 ### Response:
 ```
 
 During inference (eg for the web demo), we use the user instruction with an empty input field (second option).

## Data Generation Process

<details>
<summary> <strong> Running the code </strong> </summary>

1. Set environment variables `OPENAI_API_KEY` to your OpenAI API key.
2. Install the dependencies with `pip install -r requirements.txt`.
3. Run `python -m generate_instruction generate_instruction_following_data` to generate the data.

</details>

We built on the data generation pipeline from [self-instruct](https://github.com/yizhongw/self-instruct) and made the following modifications:
- We used `text-davinci-003` to generate the instruction data instead of `davinci`.
- We wrote a new prompt (`prompt.txt`) that explicitly gave the requirement of instruction generation to `text-davinci-003`. Note: there is a slight error in the prompt we used, and future users should incorporate the edit in https://github.com/tatsu-lab/stanford_alpaca/pull/24
- We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
- We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
- We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].

This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500). 
In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by [self-instruct](https://github.com/yizhongw/self-instruct/blob/main/data/seed_tasks.jsonl).
We plot the below figure (in the style of Figure 2 in the [self-instruct paper](https://arxiv.org/abs/2212.10560) to demonstrate the diversity of our data.
The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.

[//]: # (![parse_analysis]&#40;assert/parse_analysis.png | width=100&#41;)
[<img src="assets/parse_analysis.png" width="750" />](./assets/parse_analysis.png)

## Fine-tuning
We fine-tune our models using standard Hugging Face training code with the following hyperparameters:

| Hyperparameter | Value |
|----------------|-------|
| Batch size     | 128   |
| Learning rate  | 2e-5  |
| Epochs         | 3     |
| Max length     | 512   |
 | Weight decay   | 1     |

Given Hugging Face hasn't officially supported the LLaMA models, we fine-tuned LLaMA with Hugging Face's transformers library by installing it from a particular fork (i.e. this [PR](https://github.com/huggingface/transformers/pull/21955) to be merged).
The hash of the specific commit we installed was `68d640f7c368bcaaaecfc678f11908ebbd3d6176`.

To reproduce our fine-tuning runs for LLaMA, first install the requirements 
```bash
pip install -r requirements.txt
```
Then, install the particular fork of Hugging Face's transformers library.

Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP `full_shard` mode. 
Replace `<your_random_port>` with a port of your own, `<your_path_to_hf_converted_llama_ckpt_and_tokenizer>` with the 
path to your converted checkpoint and tokenizer (following instructions in the PR), and `<your_output_dir>` with where you want to store your outputs.

```
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
    --tf32 True
```

The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B

```bash
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path "facebook/opt-6.7b" \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \
    --tf32 True
```

### Authors
All grad students below contributed equally and the order is determined by random draw.

- [Rohan Taori](https://www.rohantaori.com/)
- [Ishaan Gulrajani](https://ishaan.io/)
- [Tianyi Zhang](https://tiiiger.github.io/)
- [Yann Dubois](https://yanndubs.github.io/)
- [Xuechen Li](https://www.lxuechen.com/)

All advised by [Tatsunori B. Hashimoto](https://thashim.github.io/). Yann is also advised by [Percy Liang](https://cs.stanford.edu/~pliang/) and Xuechen is also advised by [Carlos Guestrin](https://guestrin.su.domains/).

### Citation

Please cite the repo if you use the data or code in this repo.
```
@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}
```

Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].

### Acknowledgements

We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot.
release 2023-03-13 11:15:01 -04:00
			`<p align="center" width="100%">`
			`<a href="https://crfm.stanford.edu/alpaca/" target="_blank"><img src="assets/logo.png" alt="Stanford-Alpaca" style="width: 50%; min-width: 300px; display: block; margin: auto;"></a>`
			`</p>`

			`# Stanford Alpaca: An Instruction-following LLaMA model`
			`[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)`
Update README.md 2023-03-14 21:13:44 -04:00			`[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)`
release 2023-03-13 11:15:01 -04:00			`[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)`

			`This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:`
			`- A [web demo](https://crfm.stanford.edu/alpaca/) to interact with our Alpaca model`
			`- The [52K data](#data-release) used for fine-tuning the model`
			`- The code for [generating the data](#data-generation-process)`

			`## Overview`

			`The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section.`
			In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the `text-davinci-003` model on the Self-Instruct instruction-following evaluation suite [2].

			`Alpaca is still under development, and there are many limitations that have to be addressed.`
			`Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless.`
			`We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.`

			`Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.`

Update README.md 2023-03-14 17:41:15 -04:00			`Please read our release [blog post](https://crfm.stanford.edu/2023/03/13/alpaca.html) for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process for releasing a reproducible model.`
release 2023-03-13 11:15:01 -04:00

			`[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1`

			`[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560`


			`## Data Release`
			[`alpaca_data.json`](./alpaca_data.json) contains 52K instruction-following data we used for fine-tuning the Alpaca model.
			`This JSON file is a list of dictionaries, each dictionary contains the following fields:`
			- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique.
			- `input`: `str`, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.
			- `output`: `str`, the answer to the instruction as generated by `text-davinci-003`.

			`We used the following prompts for fine-tuning the Alpaca model:`
			`- for examples with a non-empty input field:`
			```
			`Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.`

			`### Instruction:`
			`{instruction}`

			`### Input:`
			`{input}`

			`### Response:`
			```
			`- for examples with an empty input field:`
			```
			`Below is an instruction that describes a task. Write a response that appropriately completes the request.`

			`### Instruction:`
			`{instruction}`

			`### Response:`
			```
add inference prompt to readme 2023-03-14 19:31:02 -04:00
readme typo 2023-03-14 19:31:38 -04:00			`During inference (eg for the web demo), we use the user instruction with an empty input field (second option).`
release 2023-03-13 11:15:01 -04:00
			`## Data Generation Process`

			`<details>`
			`<summary> <strong> Running the code </strong> </summary>`

			1. Set environment variables `OPENAI_API_KEY` to your OpenAI API key.
			2. Install the dependencies with `pip install -r requirements.txt`.
			3. Run `python -m generate_instruction generate_instruction_following_data` to generate the data.

			`</details>`

			`We built on the data generation pipeline from [self-instruct](https://github.com/yizhongw/self-instruct) and made the following modifications:`
			- We used `text-davinci-003` to generate the instruction data instead of `davinci`.
Update README.md 2023-03-14 20:43:03 -04:00			- We wrote a new prompt (`prompt.txt`) that explicitly gave the requirement of instruction generation to `text-davinci-003`. Note: there is a slight error in the prompt we used, and future users should incorporate the edit in https://github.com/tatsu-lab/stanford_alpaca/pull/24
release 2023-03-13 11:15:01 -04:00			`- We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.`
			`- We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.`
			`- We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].`

			`This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500).`
			`In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by [self-instruct](https://github.com/yizhongw/self-instruct/blob/main/data/seed_tasks.jsonl).`
			`We plot the below figure (in the style of Figure 2 in the [self-instruct paper](https://arxiv.org/abs/2212.10560) to demonstrate the diversity of our data.`
			`The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.`

			`[//]: # (![parse_analysis](assert/parse_analysis.png \| width=100))`
			`[<img src="assets/parse_analysis.png" width="750" />](./assets/parse_analysis.png)`

			`## Fine-tuning`
training code. 2023-03-15 05:36:01 -04:00			`We fine-tune our models using standard Hugging Face training code with the following hyperparameters:`
release 2023-03-13 11:15:01 -04:00
			`\| Hyperparameter \| Value \|`
			`\|----------------\|-------\|`
			`\| Batch size \| 128 \|`
			`\| Learning rate \| 2e-5 \|`
			`\| Epochs \| 3 \|`
			`\| Max length \| 512 \|`
			`\| Weight decay \| 1 \|`

training code. 2023-03-15 05:36:01 -04:00			`Given Hugging Face hasn't officially supported the LLaMA models, we fine-tuned LLaMA with Hugging Face's transformers library by installing it from a particular fork (i.e. this [PR](https://github.com/huggingface/transformers/pull/21955) to be merged).`
			The hash of the specific commit we installed was `68d640f7c368bcaaaecfc678f11908ebbd3d6176`.

			`To reproduce our fine-tuning runs for LLaMA, first install the requirements`
			```bash
			`pip install -r requirements.txt`
			```
			`Then, install the particular fork of Hugging Face's transformers library.`

			Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP `full_shard` mode.
			Replace `<your_random_port>` with a port of your own, `<your_path_to_hf_converted_llama_ckpt_and_tokenizer>` with the
			path to your converted checkpoint and tokenizer (following instructions in the PR), and `<your_output_dir>` with where you want to store your outputs.

			```
			`torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \`
			`--model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \`
			`--data_path ./alpaca_data.json \`
			`--bf16 True \`
			`--output_dir <your_output_dir> \`
			`--num_train_epochs 3 \`
			`--per_device_train_batch_size 4 \`
			`--per_device_eval_batch_size 4 \`
			`--gradient_accumulation_steps 8 \`
			`--evaluation_strategy "no" \`
			`--save_strategy "steps" \`
			`--save_steps 2000 \`
			`--save_total_limit 1 \`
			`--learning_rate 2e-5 \`
			`--weight_decay 0. \`
			`--warmup_ratio 0.03 \`
			`--lr_scheduler_type "cosine" \`
			`--logging_steps 1 \`
			`--fsdp "full_shard auto_wrap" \`
			`--fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \`
			`--tf32 True`
			```

			`The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B`

			```bash
			`torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \`
			`--model_name_or_path "facebook/opt-6.7b" \`
			`--data_path ./alpaca_data.json \`
			`--bf16 True \`
			`--output_dir <your_output_dir> \`
			`--num_train_epochs 3 \`
			`--per_device_train_batch_size 4 \`
			`--per_device_eval_batch_size 4 \`
			`--gradient_accumulation_steps 8 \`
			`--evaluation_strategy "no" \`
			`--save_strategy "steps" \`
			`--save_steps 2000 \`
			`--save_total_limit 1 \`
			`--learning_rate 2e-5 \`
			`--weight_decay 0. \`
			`--warmup_ratio 0.03 \`
			`--lr_scheduler_type "cosine" \`
			`--logging_steps 1 \`
			`--fsdp "full_shard auto_wrap" \`
			`--fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \`
			`--tf32 True`
			```
release 2023-03-13 11:15:01 -04:00
			`### Authors`
			`All grad students below contributed equally and the order is determined by random draw.`

			`- [Rohan Taori](https://www.rohantaori.com/)`
			`- [Ishaan Gulrajani](https://ishaan.io/)`
			`- [Tianyi Zhang](https://tiiiger.github.io/)`
			`- [Yann Dubois](https://yanndubs.github.io/)`
			`- [Xuechen Li](https://www.lxuechen.com/)`

			`All advised by [Tatsunori B. Hashimoto](https://thashim.github.io/). Yann is also advised by [Percy Liang](https://cs.stanford.edu/~pliang/) and Xuechen is also advised by [Carlos Guestrin](https://guestrin.su.domains/).`

			`### Citation`

			`Please cite the repo if you use the data or code in this repo.`
			```
			`@misc{alpaca,`
			`author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },`
			`title = {Stanford Alpaca: An Instruction-following LLaMA model},`
			`year = {2023},`
			`publisher = {GitHub},`
			`journal = {GitHub repository},`
			`howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},`
			`}`
			```

			`Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].`

			`### Acknowledgements`

			`We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot.`