release
218
2023-03-13-alpaca.md
Normal file
@ -0,0 +1,218 @@
|
||||
---
|
||||
layout: blog
|
||||
title: "Alpaca: A Strong Open-Source Instruction-Following Model"
|
||||
authors:
|
||||
- name: Rohan Taori*
|
||||
url: https://www.rohantaori.com/
|
||||
- name: Ishaan Gulrajani*
|
||||
url: https://ishaan.io/
|
||||
- name: Tianyi Zhang*
|
||||
url: https://tiiiger.github.io/
|
||||
- name: Yann Dubois*
|
||||
url: https://yanndubs.github.io/
|
||||
- name: Xuechen Li*
|
||||
url: https://www.lxuechen.com/
|
||||
- name: Carlos Guestrin
|
||||
url: https://guestrin.su.domains/
|
||||
- name: Percy Liang
|
||||
url: https://cs.stanford.edu/~pliang/
|
||||
- name: Tatsunori B. Hashimoto
|
||||
url: https://thashim.github.io/
|
||||
display: True
|
||||
---
|
||||
<style>
|
||||
img.block-img {
|
||||
width: 60%;
|
||||
display: block;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
max-width: 100%;
|
||||
}
|
||||
img.block-half-img {
|
||||
width: 30%;
|
||||
display: block;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
max-width: 100%;
|
||||
}
|
||||
</style>
|
||||
|
||||
<div class="blog-tagline">
|
||||
<em>
|
||||
We introduce <b><a href="https://crfm.stanford.edu/alpaca/">Alpaca 7B</a></b>, a model fine-tuned from the LLaMA 7B model on 52K
|
||||
instruction-following demonstrations. Alpaca behaves similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (<600$).
|
||||
</em> <br>
|
||||
<a href="https://crfm.stanford.edu/alpaca/" style="text-decoration: underline">Web Demo</a>  
|
||||
<a href="https://github.com/tatsu-lab/stanford_alpaca" style="text-decoration: underline">GitHub</a>
|
||||
</div>
|
||||
|
||||
# Overview
|
||||
|
||||
Instruction-following models such as GPT-3.5 (text-davinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful.
|
||||
Many users now interact with these models regularly and even use them for work.
|
||||
However, despite their widespread deployment, instruction-following models still have many deficiencies:
|
||||
they can generate false information, propagate social stereotypes, and produce toxic language.
|
||||
|
||||
To make maximum progress on addressing these pressing problems,
|
||||
it is important for the academic community to engage.
|
||||
Unfortunately, doing research on instruction-following models in academia has been difficult,
|
||||
as there is no open-source model that comes close in capabilities to closed-source models such as OpenAI’s text-davinci-003.
|
||||
|
||||
We are releasing our findings about an instruction-following language model, dubbed **Alpaca**,
|
||||
which is fine-tuned from Meta’s [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) 7B model.
|
||||
We train the Alpaca model on 52K instruction-following demonstrations generated in the style of [self-instruct](https://arxiv.org/abs/2212.10560) using text-davinci-003.
|
||||
Alpaca shows many behaviors similar to OpenAI’s text-davinci-003, but is also surprisingly small and easy/cheap to reproduce.
|
||||
|
||||
We are releasing our training recipe and data, and intend to release the model weights in the future.
|
||||
We are also hosting an [interactive demo](https://crfm.stanford.edu/alpaca/) to enable the research community to better understand the behavior of Alpaca.
|
||||
Interaction can expose unexpected capabilities and failures, which will guide us for the future evaluation of these models.
|
||||
We also encourage users to report any concerning behaviors in our web demo so that we can better understand and mitigate these behaviors.
|
||||
As any release carries risks, we discuss our thought process for this open release later in this blog post.
|
||||
|
||||
We emphasize that Alpaca is intended **only for academic research** and any **commercial use is prohibited**.
|
||||
There are three factors in this decision:
|
||||
First, Alpaca is based on LLaMA, which has a non-commercial [license](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform), so we necessarily inherit this decision.
|
||||
Second, the instruction data is based OpenAI's text-davinci-003,
|
||||
whose [terms of use](https://openai.com/policies/terms-of-use) prohibit developing models that compete with OpenAI.
|
||||
Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.
|
||||
|
||||
## Training recipe
|
||||
|
||||
There are two important challenges to training a high-quality instruction-following model under an academic budget:
|
||||
a strong pretrained language model and high-quality instruction-following data.
|
||||
The first challenge is addressed with the recent release of Meta's new LLaMA models.
|
||||
For the second challenge, the [self-instruct](https://arxiv.org/abs/2212.10560) paper suggests using an existing strong language model to automatically generate instruction data.
|
||||
In particular, Alpaca is a language model fine-tuned using supervised learning from a LLaMA 7B model on 52K instruction-following demonstrations generated from OpenAI's text-davinci-003.
|
||||
|
||||
The figure below illustrates how we obtained the Alpaca model.
|
||||
For the data, we generated instruction-following demonstrations by building upon the self-instruct method.
|
||||
We started with the 175 human-written instruction-output pairs from the [self-instruct seed set](https://github.com/yizhongw/self-instruct).
|
||||
We then prompted text-davinci-003 to generate more instructions using the seed set as in-context examples.
|
||||
We improved over the self-instruct method by simplifying the generation pipeline (see details in [GitHub](https://github.com/tatsu-lab/stanford_alpaca#data-generation-process)) and significantly reduced the cost.
|
||||
Our data generation process results in 52K unique instructions and the corresponding outputs, which costed less than $500 using the OpenAI API.
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="/static/img/posts/2023-03-13-alpaca/alpaca_main.jpg"
|
||||
alt="Alpaca pipeline" style="width: 55em; display: block; margin: auto;">
|
||||
</p>
|
||||
|
||||
Equipped with this instruction-following dataset, we then fine-tuned the LLaMA models using Hugging Face’s training framework, taking advantage of techniques like Fully Sharded Data Parallel and mixed precision training. Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.
|
||||
|
||||
## Preliminary evaluation
|
||||
|
||||
To evaluate Alpaca, we conduct human evaluation (by the 5 student authors) on the inputs from the [self-instruct evaluation set](https://github.com/yizhongw/self-instruct/blob/main/human_eval/user_oriented_instructions.jsonl).
|
||||
This evaluation set was collected by the self-instruct authors and covers a diverse list of user-oriented instructions including email writing, social media, and productivity tools.
|
||||
We performed a blind pairwise comparison between text-davinci-003 and Alpaca 7B, and we found that these two models have very similar performance:
|
||||
Alpaca wins 90 versus 89 comparisons against text-davinci-003.
|
||||
|
||||
We were quite surprised by this result given the small model size and the modest amount of instruction following data.
|
||||
Besides leveraging this static evaluation set, we have also been testing the Alpaca model interactively and found that Alpaca often behaves similarly to text-davinci-003 on a diverse set of inputs.
|
||||
We are releasing an [interactive demo](https://crfm.stanford.edu/alpaca/) of Alpaca, and encourage readers to evaluate Alpaca themselves and give us feedback.
|
||||
|
||||
In the rest of this section, we include several interaction examples to showcase the capabilities and limitations of Alpaca.
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="/static/img/posts/2023-03-13-alpaca/alpaca_right_llama.png"
|
||||
alt="Alpaca about llamas" style="width: 55em; display: block; margin: auto;">
|
||||
</p>
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="/static/img/posts/2023-03-13-alpaca/alpaca_right_email.png"
|
||||
alt="Alpaca about Stanford admits" style="width: 55em; display: block; margin: auto;">
|
||||
</p>
|
||||
|
||||
The above examples show that the outputs of Alpaca are generally well-written. We note that Alpaca reflects the general style of the instruction-following dataset. As a result, Alpaca’s answers are typically shorter than ChatGPT, reflecting text-davinci-003's shorter outputs.
|
||||
|
||||
### Known limitations
|
||||
|
||||
Alpaca also exhibits several common deficiencies of language models, including hallucination, toxicity, and stereotypes.
|
||||
Hallucination in particular seems to be a common failure mode for Alpaca, even compared to text-davinci-003.
|
||||
|
||||
For example, in the following figure, Alpaca wrongly says that the Capital of Tanzania is Dar es Salaam, which is the largest city in Tanzania.
|
||||
(It was the capital until 1974, when it was replaced by Dodoma.)
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="/static/img/posts/2023-03-13-alpaca/alpaca_wrong_capital.png"
|
||||
alt="Alpaca about Tanzania's capital" style="width: 55em; display: block; margin: auto;">
|
||||
</p>
|
||||
|
||||
Furthermore, Alpaca can be used to generate well-written outputs that spread misinformation, as seen in the following example.
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="/static/img/posts/2023-03-13-alpaca/alpaca_wrong_42.png"
|
||||
alt="Alpaca about random seeds" style="width: 55em; display: block; margin: auto;">
|
||||
</p>
|
||||
|
||||
Alpaca likely contains many other limitations associated with both the underlying language model and the instruction tuning data. However, we believe that the artifact will still be useful to the community, as it provides a relatively lightweight model that serves as a basis to study important deficiencies. We encourage users to help us identify new kinds of failures by flagging them in the web demo.
|
||||
Overall, we hope that the release of Alpaca can facilitate further research into instruction-following models and their alignment with human values.
|
||||
|
||||
## Assets released
|
||||
|
||||
We are releasing the following assets today:
|
||||
- **Demo**: An [interactive demo](https://crfm.stanford.edu/alpaca/) for everyone to try out Alpaca.
|
||||
- **Data**: [52K demonstrations](https://github.com/tatsu-lab/stanford_alpaca#data-release) used to fine-tune Alpaca.
|
||||
- **Data generation process**: the code for [generating the data](https://github.com/tatsu-lab/stanford_alpaca#data-generation-process).
|
||||
- **Hyperparameters**: for [fine-tuning](https://github.com/tatsu-lab/stanford_alpaca#fine-tuning)
|
||||
the model using the Hugging Face API.
|
||||
|
||||
We intend to release the following assets in the near future:
|
||||
- **Model weights**: We have reached out to Meta to obtain guidance on releasing the Alpaca model weights, both for the 7B Alpaca and for fine-tuned versions of the larger LLaMA models.
|
||||
- **Training code**: our code uses the [Hugging Face interface to LLaMA](https://github.com/huggingface/transformers/pull/21955).
|
||||
As of now, the effort to support LLaMA is still ongoing and not stable.
|
||||
We will give the exact training commands once Hugging Face supports LLaMA officially.
|
||||
|
||||
## Release decision
|
||||
|
||||
We believe that releasing the above assets will enable the academic community to
|
||||
perform controlled scientific studies on instruction-following language models,
|
||||
resulting in better science and ultimately new techniques to address the existing deficiencies with these models.
|
||||
|
||||
At the same time, any release carries some risk.
|
||||
First, we recognize that releasing our training recipe reveals the feasibility of certain capabilities.
|
||||
On one hand, this enables more people (including bad actors)
|
||||
to create models that could cause harm (either intentionally or not).
|
||||
On the other hand, this awareness might incentivize swift defensive action,
|
||||
especially from the academic community, now empowered by the means to perform deeper safety research on such models.
|
||||
Overall, we believe that the benefits for the research community outweigh the risks of this particular release.
|
||||
|
||||
Given that we are releasing the training recipe,
|
||||
we believe that releasing the data, model weights, and training code
|
||||
incur minimal further risk, given the simplicity of the recipe.
|
||||
At the same time, releasing these assets has enormous benefits for reproducible science,
|
||||
so that the academic community can use standard datasets, models, and code
|
||||
to perform controlled comparisons and to explore extensions.
|
||||
|
||||
Deploying an interactive demo for Alpaca also poses potential risks, such as more widely
|
||||
disseminating harmful content and lowering the barrier for spam, fraud, or disinformation.
|
||||
We have put into place two risk mitigation strategies. First, we have implemented a content filter
|
||||
using [OpenAI's content moderation API](https://platform.openai.com/docs/api-reference/moderations),
|
||||
which filters out harmful content as defined by OpenAI's
|
||||
usage policies. Second, we watermark all the model outputs using the method described in
|
||||
[Kirchenbauer et al. 2023](https://arxiv.org/abs/2301.10226),
|
||||
so that others can detect (with some probability) whether an output comes from Alpaca 7B.
|
||||
Finally, we have strict terms and conditions for using the demo;
|
||||
it is restricted to non-commercial uses and to uses that follow [LLaMA’s license agreement](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform).
|
||||
|
||||
We understand that these mitigation measures can be circumvented once we release the model weights or if users train their own instruction-following models.
|
||||
However, by installing these mitigations, we hope to advance the best practices and ultimately develop [community norms](https://crfm.stanford.edu/2022/05/17/community-norms.html) for the responsible deployment of foundation models.
|
||||
|
||||
## Future directions
|
||||
|
||||
We are excited by the research opportunities that Alpaca unlocks. There are many exciting future directions:
|
||||
- Evaluation: We need to evaluate Alpaca more more rigorously.
|
||||
We will start with [HELM](https://crfm.stanford.edu/helm/latest/) (Holistic Evaluation of Language Models),
|
||||
which hopefully will evolve to capture more generative, instruction-following scenarios.
|
||||
- Safety: We would like to further study the risks of Alpaca and improve its safety using methods such as automatic red teaming, auditing, and adaptive testing.
|
||||
- Understanding: We hope to better understand how capabilities arise from the training recipe.
|
||||
What properties of a base model do you need? What happens when you scale up?
|
||||
What properties of instruction data is needed? What are alternatives to using self-instruct on text-davinci-003?
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
Alpaca depends directly and critically on existing works.
|
||||
We would like to thank Meta AI Research for training and releasing the LLaMA models,
|
||||
the self-instruct team for giving us a basis for the data generation pipeline,
|
||||
Hugging Face for the training code,
|
||||
and OpenAI for paving the path and showing what can be achieved.
|
||||
|
||||
We would also like to highlight that there are many other open-source efforts for instruction-following LLMs and chat models, including [OpenChatKit](https://www.together.xyz/blog/openchatkit), [Open Assistant](https://open-assistant.io/), and [Carper AI](https://carper.ai/instruct-gpt-announcement/).
|
220
BLOG.md
Normal file
@ -0,0 +1,220 @@
|
||||
---
|
||||
layout: blog
|
||||
title: "Alpaca: A Strong Open-Source Instruction-Following Model"
|
||||
authors:
|
||||
- name: Rohan Taori*
|
||||
url: https://www.rohantaori.com/
|
||||
- name: Ishaan Gulrajani*
|
||||
url: https://ishaan.io/
|
||||
- name: Tianyi Zhang*
|
||||
url: https://tiiiger.github.io/
|
||||
- name: Yann Dubois*
|
||||
url: https://yanndubs.github.io/
|
||||
- name: Xuechen Li*
|
||||
url: https://www.lxuechen.com/
|
||||
- name: Carlos Guestrin
|
||||
url: https://guestrin.su.domains/
|
||||
- name: Percy Liang
|
||||
url: https://cs.stanford.edu/~pliang/
|
||||
- name: Tatsunori B. Hashimoto
|
||||
url: https://thashim.github.io/
|
||||
display: True
|
||||
---
|
||||
<style>
|
||||
img.block-img {
|
||||
width: 60%;
|
||||
display: block;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
max-width: 100%;
|
||||
}
|
||||
img.block-half-img {
|
||||
width: 30%;
|
||||
display: block;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
max-width: 100%;
|
||||
}
|
||||
</style>
|
||||
|
||||
<div class="blog-tagline">
|
||||
<em>
|
||||
We introduce <b><a href="https://crfm.stanford.edu/alpaca/">Alpaca 7B</a></b>, a model fine-tuned from the LLaMA 7B model on 52K
|
||||
instruction-following demonstrations.
|
||||
Alpaca behaves similarly to OpenAI’s text-davinci-003,
|
||||
Alpaca exhibits many behaviors similar to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (<600$).
|
||||
</em> <br>
|
||||
<a href="https://crfm.stanford.edu/alpaca/" style="text-decoration: underline">Web Demo</a>  
|
||||
<a href="https://github.com/tatsu-lab/stanford_alpaca" style="text-decoration: underline">GitHub</a>
|
||||
</div>
|
||||
|
||||
# Overview
|
||||
|
||||
Instruction-following models such as GPT-3.5 (text-davinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful.
|
||||
Many users now interact with these models regularly and even use them for work.
|
||||
However, despite their widespread deployment, instruction-following models still have many deficiencies:
|
||||
they can generate false information, propagate social stereotypes, and produce toxic language.
|
||||
|
||||
To make maximum progress on addressing these pressing problems,
|
||||
it is important for the academic community to engage.
|
||||
Unfortunately, doing research on instruction-following models in academia has been difficult,
|
||||
as there is no open-source model that comes close in capabilities to closed-source models such as OpenAI’s text-davinci-003.
|
||||
|
||||
We are releasing our findings about an instruction-following language model, dubbed **Alpaca**,
|
||||
which is fine-tuned from Meta’s [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) 7B model.
|
||||
We train the Alpaca model on 52K instruction-following demonstrations generated in the style of [self-instruct](https://arxiv.org/abs/2212.10560) using text-davinci-003.
|
||||
Alpaca shows many behaviors similar to OpenAI’s text-davinci-003, but is also surprisingly small and easy/cheap to reproduce.
|
||||
|
||||
We are releasing our training recipe and data, and intend to release the model weights in the future.
|
||||
We are also hosting an [interactive demo](https://crfm.stanford.edu/alpaca/) to enable the research community to better understand the behavior of Alpaca. Interaction can expose unexpected capabilities and failures, which will guide us for the future evaluation of these models.
|
||||
We also encourage users to report any concerning behaviors in our web demo so that we can better understand and mitigate these behaviors.
|
||||
As any release carries risks, we discuss our thought process for this open release later in this blog post.
|
||||
|
||||
We emphasize that Alpaca is intended **only for academic research** and any **commercial use is prohibited**.
|
||||
There are three factors in this decision:
|
||||
First, Alpaca is based on LLaMA, which has a non-commercial [license](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform), so we necessarily inherit this decision.
|
||||
Second, the instruction data is based OpenAI’s text-davinci-003,
|
||||
whose [terms of use](https://openai.com/policies/terms-of-use) prohibit developing models that compete with OpenAI.
|
||||
Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.
|
||||
|
||||
## Training recipe
|
||||
|
||||
There are two important challenges to training a high-quality instruction-following model under an academic budget:
|
||||
a strong pretrained language model and high-quality instruction-following data.
|
||||
The first challenge is addressed with the recent release of Meta’s new LLaMA models.
|
||||
For the second challenge, the [self-instruct](https://arxiv.org/abs/2212.10560) paper suggests using an existing strong language model to automatically generate instruction data.
|
||||
In particular, Alpaca is a language model fine-tuned using supervised learning from a LLaMA 7B model on 52K instruction-following demonstrations generated from OpenAI’s text-davinci-003.
|
||||
|
||||
The figure below illustrates how we obtained the Alpaca model.
|
||||
For the data, we generated instruction-following demonstrations by building upon the self-instruct method.
|
||||
We started with the 175 human-written instruction-output pairs from the [self-instruct seed set](https://github.com/yizhongw/self-instruct).
|
||||
We then prompted text-davinci-003 to generate more instructions using the seed set as in-context examples.
|
||||
We improved over the self-instruct method by simplifying the generation pipeline (see details in [GitHub](https://github.com/tatsu-lab/stanford_alpaca#data-generation-process) and significantly reduced the cost.
|
||||
Our data generation process results in 52K unique instructions and the corresponding outputs, which costed less than $500 using the OpenAI API.
|
||||
|
||||
![Alpaca pipeline](assets/alpaca_main.jpg)
|
||||
|
||||
Equipped with this instruction-following dataset, we then fine-tuned the LLaMA models using Hugging Face’s training framework, taking advantage of techniques like Fully Sharded Data Parallel and mixed precision training. Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.
|
||||
|
||||
## Preliminary evaluation
|
||||
|
||||
To evaluate Alpaca, we conduct human evaluation (by the 5 student authors) on the inputs from the [self-instruct evaluation set](https://github.com/yizhongw/self-instruct/blob/main/human_eval/user_oriented_instructions.jsonl).
|
||||
This evaluation set was collected by the self-instruct authors and covers a diverse list of user-oriented instructions including email writing, social media, and productivity tools.
|
||||
We performed a blind pairwise comparison between text-davinci-003 and Alpaca 7B, and we found that these two models have very similar performance:
|
||||
Alpaca wins 90 versus 89 comparisons against text-davinci-003.
|
||||
|
||||
We were quite surprised by this result given the small model size and the modest amount of instruction following data.
|
||||
Besides leveraging this static evaluation set, we have also been testing the Alpaca model interactively and found that Alpaca often behaves similarly to text-davinci-003 on a diverse set of inputs.
|
||||
We are releasing an [interactive demo](https://crfm.stanford.edu/alpaca/) of Alpaca, and encourage readers to evaluate Alpaca themselves and give us feedback.
|
||||
|
||||
In the rest of this section, we include several interaction examples to showcase the capabilities and limitations of Alpaca.
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="assets/alpaca_right_llama.png"
|
||||
alt="Alpaca about llamas" style="width: 55em; display: block; margin:
|
||||
auto;">
|
||||
</p>
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="assets/alpaca_right_email.png"
|
||||
alt="Alpaca about Stanford admits" style="width: 55em; display: block; margin:
|
||||
auto;">
|
||||
</p>
|
||||
|
||||
The above examples show that the outputs of Alpaca are generally well-written. We note that Alpaca reflects the general style of the instruction-following dataset. As a result, Alpaca’s answers are typically shorter than ChatGPT, reflecting text-davinci-003’s shorter outputs.
|
||||
|
||||
### Known limitations
|
||||
|
||||
Alpaca also exhibits several common deficiencies of language models, including hallucination, toxicity, and stereotypes.
|
||||
Hallucination in particular seems to be a common failure mode for Alpaca, even compared to text-davinci-003.
|
||||
|
||||
For example, in the following figure, Alpaca wrongly says that the Capital of Tanzania is Dar es Salaam, which is the largest city in Tanzania.
|
||||
(It was the capital until 1974, when it was replaced by Dodoma.)
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="assets/alpaca_wrong_capital.png"
|
||||
alt="Alpaca about Tanzania's capital" style="width: 55em; display: block; margin:
|
||||
auto;">
|
||||
</p>
|
||||
|
||||
Furthermore, Alpaca can be used to generate well-written outputs that spread misinformation, as seen in the following example.
|
||||
|
||||
<p align="center" width="100%">
|
||||
<img src="assets/alpaca_wrong_42.png"
|
||||
alt="Alpaca about random seeds" style="width: 55em; display: block; margin:
|
||||
auto;">
|
||||
</p>
|
||||
|
||||
Alpaca likely contains many other limitations associated with both the underlying language model and the instruction tuning data. However, we believe that the artifact will still be useful to the community, as it provides a relatively lightweight model that serves as a basis to study important deficiencies. We encourage users to help us identify new kinds of failures by flagging them in the web demo.
|
||||
Overall, we hope that the release of Alpaca can facilitate further research into instruction-following models and their alignment with human values.
|
||||
|
||||
## Assets released
|
||||
|
||||
We are releasing the following assets today:
|
||||
- **Demo**: An [interactive demo](https://crfm.stanford.edu/alpaca/) for everyone to try out Alpaca.
|
||||
- **Data**: [52K demonstrations](https://github.com/tatsu-lab/stanford_alpaca#data-release) used to fine-tune Alpaca.
|
||||
- **Data generation process**: the code for [generating the data](https://github.com/tatsu-lab/stanford_alpaca#data-generation-process).
|
||||
- **Hyperparameters**: for [fine-tuning](https://github.com/tatsu-lab/stanford_alpaca#fine-tuning)
|
||||
the model using the Hugging Face API.
|
||||
|
||||
We intend to release the following assets in the near future:
|
||||
- **Model weights**: We have reached out to Meta to obtain guidance on releasing the Alpaca model weights, both for the 7B Alpaca and for fine-tuned versions of the larger LLaMA models.
|
||||
- **Training code**: our code uses the [Hugging Face interface to LLaMA](https://github.com/huggingface/transformers/pull/21955).
|
||||
As of now, the effort to support LLaMA is still ongoing and not stable.
|
||||
We will give the exact training commands once Hugging Face supports LLaMA officially.
|
||||
|
||||
## Release decision
|
||||
|
||||
We believe that releasing the above assets will enable the academic community to
|
||||
perform controlled scientific studies on instruction-following language models,
|
||||
resulting in better science and ultimately new techniques to address the existing deficiencies with these models.
|
||||
|
||||
At the same time, any release carries some risk.
|
||||
First, we recognize that releasing our training recipe reveals the feasibility of certain capabilities.
|
||||
On one hand, this enables more people (including bad actors)
|
||||
to create models that could cause harm (either intentionally or not).
|
||||
On the other hand, this awareness might incentivize swift defensive action,
|
||||
especially from the academic community, now empowered by the means to perform deeper safety research on such models.
|
||||
Overall, we believe that the benefits for the research community outweigh the risks of this particular release.
|
||||
|
||||
Given that we are releasing the training recipe,
|
||||
we believe that releasing the data, model weights, and training code
|
||||
incur minimal further risk, given the simplicity of the recipe.
|
||||
At the same time, releasing these assets has enormous benefits for reproducible science,
|
||||
so that the academic community can use standard datasets, models, and code
|
||||
to perform controlled comparisons and to explore extensions.
|
||||
|
||||
Deploying an interactive demo for Alpaca also poses potential risks, such as more widely
|
||||
disseminating harmful content and lowering the barrier for spam, fraud, or disinformation.
|
||||
We have put into place two risk mitigation strategies. First, we have implemented a content filter
|
||||
using [OpenAI's content moderation API](https://platform.openai.com/docs/api-reference/moderations),
|
||||
which filters out harmful content as defined by OpenAI's
|
||||
usage policies. Second, we watermark all the model outputs using the method described in
|
||||
[Kirchenbauer et al. 2023](https://arxiv.org/abs/2301.10226),
|
||||
so that others can detect (with some probability) whether an output comes from Alpaca 7B.
|
||||
Finally, we have strict terms and conditions for using the demo;
|
||||
it is restricted to non-commercial uses and to uses that follow [LLaMA’s license agreement](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform).
|
||||
|
||||
We understand that these mitigation measures can be circumvented once we release the model weights or if users train their own instruction-following models.
|
||||
However, by installing these mitigations, we hope to advance the best practices and ultimately develop [community norms](https://crfm.stanford.edu/2022/05/17/community-norms.html) for the responsible deployment of foundation models.
|
||||
|
||||
## Future directions
|
||||
|
||||
We are excited by the research opportunities that Alpaca unlocks. There are many exciting future directions:
|
||||
- Evaluation: We need to evaluate Alpaca more more rigorously.
|
||||
We will start with [HELM](https://crfm.stanford.edu/helm/latest/) (Holistic Evaluation of Language Models),
|
||||
which hopefully will evolve to capture more generative, instruction-following scenarios.
|
||||
- Safety: We would like to further study the risks of Alpaca and improve its safety using methods such as automatic red teaming, auditing, and adaptive testing.
|
||||
- Understanding: We hope to better understand how capabilities arise from the training recipe.
|
||||
What properties of a base model do you need? What happens when you scale up?
|
||||
What properties of instruction data is needed? What are alternatives to using self-instruct on text-davinci-003?
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
Alpaca depends directly and critically on existing works.
|
||||
We would like to thank Meta AI Research for training and releasing the LLaMA models,
|
||||
the self-instruct team for giving us a basis for the data generation pipeline,
|
||||
Hugging Face for the training code,
|
||||
and OpenAI for paving the path and showing what can be achieved.
|
||||
|
||||
We would also like to highlight that there are many other open-source efforts for instruction-following LLMs and chat models, including [OpenChatKit](https://www.together.xyz/blog/openchatkit), [Open Assistant](https://open-assistant.io/), and [Carper AI](https://carper.ai/instruct-gpt-announcement/).
|
183
DATA_LICENSE
Normal file
@ -0,0 +1,183 @@
|
||||
Attribution License (ODC-By)
|
||||
PREAMBLE
|
||||
The Open Data Commons Attribution License is a license agreement intended to allow users to freely share, modify, and use this Database subject only to the attribution requirements set out in Section 4.
|
||||
|
||||
Databases can contain a wide variety of types of content (images, audiovisual material, and sounds all in the same database, for example), and so this license only governs the rights over the Database, and not the contents of the Database individually. Licensors may therefore wish to use this license together with another license for the contents.
|
||||
|
||||
Sometimes the contents of a database, or the database itself, can be covered by other rights not addressed here (such as private contracts, trademark over the name, or privacy rights / data protection rights over information in the contents), and so you are advised that you may have to consult other documents or clear other rights before doing activities not covered by this License.
|
||||
|
||||
The Licensor (as defined below)
|
||||
|
||||
and
|
||||
|
||||
You (as defined below)
|
||||
|
||||
agree as follows:
|
||||
|
||||
1.0 DEFINITIONS OF CAPITALISED WORDS
|
||||
“Collective Database” – Means this Database in unmodified form as part of a collection of independent databases in themselves that together are assembled into a collective whole. A work that constitutes a Collective Database will not be considered a Derivative Database.
|
||||
|
||||
“Convey” – As a verb, means Using the Database, a Derivative Database, or the Database as part of a Collective Database in any way that enables a Person to make or receive copies of the Database or a Derivative Database. Conveying does not include interaction with a user through a computer network, or creating and Using a Produced Work, where no transfer of a copy of the Database or a Derivative Database occurs.
|
||||
|
||||
“Contents” – The contents of this Database, which includes the information, independent works, or other material collected into the Database. For example, the contents of the Database could be factual data or works such as images, audiovisual material, text, or sounds.
|
||||
|
||||
“Database” – A collection of material (the Contents) arranged in a systematic or methodical way and individually accessible by electronic or other means offered under the terms of this License.
|
||||
|
||||
“Database Directive” – Means Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended or succeeded.
|
||||
|
||||
“Database Right” – Means rights resulting from the Chapter III (“sui generis”) rights in the Database Directive (as amended and as transposed by member states), which includes the Extraction and Re-utilisation of the whole or a Substantial part of the Contents, as well as any similar rights available in the relevant jurisdiction under Section 10.4.
|
||||
|
||||
“Derivative Database” – Means a database based upon the Database, and includes any translation, adaptation, arrangement, modification, or any other alteration of the Database or of a Substantial part of the Contents. This includes, but is not limited to, Extracting or Re-utilising the whole or a Substantial part of the Contents in a new Database.
|
||||
|
||||
“Extraction” – Means the permanent or temporary transfer of all or a Substantial part of the Contents to another medium by any means or in any form.
|
||||
|
||||
“License” – Means this license agreement and is both a license of rights such as copyright and Database Rights and an agreement in contract.
|
||||
|
||||
“Licensor” – Means the Person that offers the Database under the terms of this License.
|
||||
|
||||
“Person” – Means a natural or legal person or a body of persons corporate or incorporate.
|
||||
|
||||
“Produced Work” – a work (such as an image, audiovisual material, text, or sounds) resulting from using the whole or a Substantial part of the Contents (via a search or other query) from this Database, a Derivative Database, or this Database as part of a Collective Database.
|
||||
|
||||
“Publicly” – means to Persons other than You or under Your control by either more than 50% ownership or by the power to direct their activities (such as contracting with an independent consultant).
|
||||
|
||||
“Re-utilisation” – means any form of making available to the public all or a Substantial part of the Contents by the distribution of copies, by renting, by online or other forms of transmission.
|
||||
|
||||
“Substantial” – Means substantial in terms of quantity or quality or a combination of both. The repeated and systematic Extraction or Re-utilisation of insubstantial parts of the Contents may amount to the Extraction or Re-utilisation of a Substantial part of the Contents.
|
||||
|
||||
“Use” – As a verb, means doing any act that is restricted by copyright or Database Rights whether in the original medium or any other; and includes without limitation distributing, copying, publicly performing, publicly displaying, and preparing derivative works of the Database, as well as modifying the Database as may be technically necessary to use it in a different mode or format.
|
||||
|
||||
“You” – Means a Person exercising rights under this License who has not previously violated the terms of this License with respect to the Database, or who has received express permission from the Licensor to exercise rights under this License despite a previous violation.
|
||||
|
||||
Words in the singular include the plural and vice versa.
|
||||
|
||||
2.0 WHAT THIS LICENSE COVERS
|
||||
2.1. Legal effect of this document. This License is:
|
||||
|
||||
a. A license of applicable copyright and neighbouring rights;
|
||||
|
||||
b. A license of the Database Right; and
|
||||
|
||||
c. An agreement in contract between You and the Licensor.
|
||||
|
||||
2.2 Legal rights covered. This License covers the legal rights in the Database, including:
|
||||
|
||||
a. Copyright. Any copyright or neighbouring rights in the Database. The copyright licensed includes any individual elements of the Database, but does not cover the copyright over the Contents independent of this Database. See Section 2.4 for details. Copyright law varies between jurisdictions, but is likely to cover: the Database model or schema, which is the structure, arrangement, and organisation of the Database, and can also include the Database tables and table indexes; the data entry and output sheets; and the Field names of Contents stored in the Database;
|
||||
|
||||
b. Database Rights. Database Rights only extend to the Extraction and Re-utilisation of the whole or a Substantial part of the Contents. Database Rights can apply even when there is no copyright over the Database. Database Rights can also apply when the Contents are removed from the Database and are selected and arranged in a way that would not infringe any applicable copyright; and
|
||||
|
||||
c. Contract. This is an agreement between You and the Licensor for access to the Database. In return you agree to certain conditions of use on this access as outlined in this License.
|
||||
|
||||
2.3 Rights not covered.
|
||||
|
||||
a. This License does not apply to computer programs used in the making or operation of the Database;
|
||||
|
||||
b. This License does not cover any patents over the Contents or the Database; and
|
||||
|
||||
c. This License does not cover any trademarks associated with the Database.
|
||||
|
||||
2.4 Relationship to Contents in the Database. The individual items of the Contents contained in this Database may be covered by other rights, including copyright, patent, data protection, privacy, or personality rights, and this License does not cover any rights (other than Database Rights or in contract) in individual Contents contained in the Database.
|
||||
|
||||
For example, if used on a Database of images (the Contents), this License would not apply to copyright over individual images, which could have their own separate licenses, or one single license covering all of the rights over the images.
|
||||
|
||||
3.0 RIGHTS GRANTED
|
||||
3.1 Subject to the terms and conditions of this License, the Licensor grants to You a worldwide, royalty-free, non-exclusive, terminable (but only under Section 9) license to Use the Database for the duration of any applicable copyright and Database Rights. These rights explicitly include commercial use, and do not exclude any field of endeavour. To the extent possible in the relevant jurisdiction, these rights may be exercised in all media and formats whether now known or created in the future.
|
||||
|
||||
The rights granted cover, for example:
|
||||
|
||||
a. Extraction and Re-utilisation of the whole or a Substantial part of the Contents;
|
||||
|
||||
b. Creation of Derivative Databases;
|
||||
|
||||
c. Creation of Collective Databases;
|
||||
|
||||
d. Creation of temporary or permanent reproductions by any means and in any form, in whole or in part, including of any Derivative Databases or as a part of Collective Databases; and
|
||||
|
||||
e. Distribution, communication, display, lending, making available, or performance to the public by any means and in any form, in whole or in part, including of any Derivative Database or as a part of Collective Databases.
|
||||
|
||||
3.2 Compulsory license schemes. For the avoidance of doubt:
|
||||
|
||||
a. Non-waivable compulsory license schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme cannot be waived, the Licensor reserves the exclusive right to collect such royalties for any exercise by You of the rights granted under this License;
|
||||
|
||||
b. Waivable compulsory license schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme can be waived, the Licensor waives the exclusive right to collect such royalties for any exercise by You of the rights granted under this License; and,
|
||||
|
||||
c. Voluntary license schemes. The Licensor waives the right to collect royalties, whether individually or, in the event that the Licensor is a member of a collecting society that administers voluntary licensing schemes, via that society, from any exercise by You of the rights granted under this License.
|
||||
|
||||
3.3 The right to release the Database under different terms, or to stop distributing or making available the Database, is reserved. Note that this Database may be multiple-licensed, and so You may have the choice of using alternative licenses for this Database. Subject to Section 10.4, all other rights not expressly granted by Licensor are reserved.
|
||||
|
||||
4.0 CONDITIONS OF USE
|
||||
4.1 The rights granted in Section 3 above are expressly made subject to Your complying with the following conditions of use. These are important conditions of this License, and if You fail to follow them, You will be in material breach of its terms.
|
||||
|
||||
4.2 Notices. If You Publicly Convey this Database, any Derivative Database, or the Database as part of a Collective Database, then You must:
|
||||
|
||||
a. Do so only under the terms of this License;
|
||||
|
||||
b. Include a copy of this License or its Uniform Resource Identifier (URI) with the Database or Derivative Database, including both in the Database or Derivative Database and in any relevant documentation;
|
||||
|
||||
c. Keep intact any copyright or Database Right notices and notices that refer to this License; and
|
||||
|
||||
d. If it is not possible to put the required notices in a particular file due to its structure, then You must include the notices in a location (such as a relevant directory) where users would be likely to look for it.
|
||||
|
||||
4.3 Notice for using output (Contents). Creating and Using a Produced Work does not require the notice in Section 4.2. However, if you Publicly Use a Produced Work, You must include a notice associated with the Produced Work reasonably calculated to make any Person that uses, views, accesses, interacts with, or is otherwise exposed to the Produced Work aware that Content was obtained from the Database, Derivative Database, or the Database as part of a Collective Database, and that it is available under this License.
|
||||
|
||||
a. Example notice. The following text will satisfy notice under Section 4.3:
|
||||
|
||||
Contains information from DATABASE NAME which is made available
|
||||
under the ODC Attribution License.
|
||||
DATABASE NAME should be replaced with the name of the Database and a hyperlink to the location of the Database. “ODC Attribution License” should contain a hyperlink to the URI of the text of this License. If hyperlinks are not possible, You should include the plain text of the required URI’s with the above notice.
|
||||
|
||||
4.4 Licensing of others. You may not sublicense the Database. Each time You communicate the Database, the whole or Substantial part of the Contents, or any Derivative Database to anyone else in any way, the Licensor offers to the recipient a license to the Database on the same terms and conditions as this License. You are not responsible for enforcing compliance by third parties with this License, but You may enforce any rights that You have over a Derivative Database. You are solely responsible for any modifications of a Derivative Database made by You or another Person at Your direction. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License.
|
||||
|
||||
5.0 MORAL RIGHTS
|
||||
5.1 Moral rights. This section covers moral rights, including any rights to be identified as the author of the Database or to object to treatment that would otherwise prejudice the author’s honour and reputation, or any other derogatory treatment:
|
||||
|
||||
a. For jurisdictions allowing waiver of moral rights, Licensor waives all moral rights that Licensor may have in the Database to the fullest extent possible by the law of the relevant jurisdiction under Section 10.4;
|
||||
|
||||
b. If waiver of moral rights under Section 5.1 a in the relevant jurisdiction is not possible, Licensor agrees not to assert any moral rights over the Database and waives all claims in moral rights to the fullest extent possible by the law of the relevant jurisdiction under Section 10.4; and
|
||||
|
||||
c. For jurisdictions not allowing waiver or an agreement not to assert moral rights under Section 5.1 a and b, the author may retain their moral rights over certain aspects of the Database.
|
||||
|
||||
Please note that some jurisdictions do not allow for the waiver of moral rights, and so moral rights may still subsist over the Database in some jurisdictions.
|
||||
|
||||
6.0 FAIR DEALING, DATABASE EXCEPTIONS, AND OTHER RIGHTS NOT AFFECTED
|
||||
6.1 This License does not affect any rights that You or anyone else may independently have under any applicable law to make any use of this Database, including without limitation:
|
||||
|
||||
a. Exceptions to the Database Right including: Extraction of Contents from non-electronic Databases for private purposes, Extraction for purposes of illustration for teaching or scientific research, and Extraction or Re-utilisation for public security or an administrative or judicial procedure.
|
||||
|
||||
b. Fair dealing, fair use, or any other legally recognised limitation or exception to infringement of copyright or other applicable laws.
|
||||
|
||||
6.2 This License does not affect any rights of lawful users to Extract and Re-utilise insubstantial parts of the Contents, evaluated quantitatively or qualitatively, for any purposes whatsoever, including creating a Derivative Database (subject to other rights over the Contents, see Section 2.4). The repeated and systematic Extraction or Re-utilisation of insubstantial parts of the Contents may however amount to the Extraction or Re-utilisation of a Substantial part of the Contents.
|
||||
|
||||
7.0 WARRANTIES AND DISCLAIMER
|
||||
7.1 The Database is licensed by the Licensor “as is” and without any warranty of any kind, either express, implied, or arising by statute, custom, course of dealing, or trade usage. Licensor specifically disclaims any and all implied warranties or conditions of title, non-infringement, accuracy or completeness, the presence or absence of errors, fitness for a particular purpose, merchantability, or otherwise. Some jurisdictions do not allow the exclusion of implied warranties, so this exclusion may not apply to You.
|
||||
|
||||
8.0 LIMITATION OF LIABILITY
|
||||
8.1 Subject to any liability that may not be excluded or limited by law, the Licensor is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use under this License, whether by You or by anyone else, and whether caused by any fault on the part of the Licensor or not. This exclusion of liability includes, but is not limited to, any special, incidental, consequential, punitive, or exemplary damages such as loss of revenue, data, anticipated profits, and lost business. This exclusion applies even if the Licensor has been advised of the possibility of such damages.
|
||||
|
||||
8.2 If liability may not be excluded by law, it is limited to actual and direct financial loss to the extent it is caused by proved negligence on the part of the Licensor.
|
||||
|
||||
9.0 TERMINATION OF YOUR RIGHTS UNDER THIS LICENSE
|
||||
9.1 Any breach by You of the terms and conditions of this License automatically terminates this License with immediate effect and without notice to You. For the avoidance of doubt, Persons who have received the Database, the whole or a Substantial part of the Contents, Derivative Databases, or the Database as part of a Collective Database from You under this License will not have their licenses terminated provided their use is in full compliance with this License or a license granted under Section 4.8 of this License. Sections 1, 2, 7, 8, 9 and 10 will survive any termination of this License.
|
||||
|
||||
9.2 If You are not in breach of the terms of this License, the Licensor will not terminate Your rights under it.
|
||||
|
||||
9.3 Unless terminated under Section 9.1, this License is granted to You for the duration of applicable rights in the Database.
|
||||
|
||||
9.4 Reinstatement of rights. If you cease any breach of the terms and conditions of this License, then your full rights under this License will be reinstated:
|
||||
|
||||
a. Provisionally and subject to permanent termination until the 60th day after cessation of breach;
|
||||
|
||||
b. Permanently on the 60th day after cessation of breach unless otherwise reasonably notified by the Licensor; or
|
||||
|
||||
c. Permanently if reasonably notified by the Licensor of the violation, this is the first time You have received notice of violation of this License from the Licensor, and You cure the violation prior to 30 days after your receipt of the notice.
|
||||
|
||||
9.5 Notwithstanding the above, Licensor reserves the right to release the Database under different license terms or to stop distributing or making available the Database. Releasing the Database under different license terms or stopping the distribution of the Database will not withdraw this License (or any other license that has been, or is required to be, granted under the terms of this License), and this License will continue in full force and effect unless terminated as stated above.
|
||||
|
||||
10.0 GENERAL
|
||||
10.1 If any provision of this License is held to be invalid or unenforceable, that must not affect the validity or enforceability of the remainder of the terms and conditions of this License and each remaining provision of this License shall be valid and enforced to the fullest extent permitted by law.
|
||||
|
||||
10.2 This License is the entire agreement between the parties with respect to the rights granted here over the Database. It replaces any earlier understandings, agreements or representations with respect to the Database.
|
||||
|
||||
10.3 If You are in breach of the terms of this License, You will not be entitled to rely on the terms of this License or to complain of any breach by the Licensor.
|
||||
|
||||
10.4 Choice of law. This License takes effect in and will be governed by the laws of the relevant jurisdiction in which the License terms are sought to be enforced. If the standard suite of rights granted under applicable copyright law and Database Rights in the relevant jurisdiction includes additional rights not granted under this License, these additional rights are granted in this License in order to meet the terms of this License.
|
201
LICENSE
Normal file
@ -0,0 +1,201 @@
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
133
README.md
Normal file
@ -0,0 +1,133 @@
|
||||
|
||||
<p align="center" width="100%">
|
||||
<a href="https://crfm.stanford.edu/alpaca/" target="_blank"><img src="assets/logo.png" alt="Stanford-Alpaca" style="width: 50%; min-width: 300px; display: block; margin: auto;"></a>
|
||||
</p>
|
||||
|
||||
# Stanford Alpaca: An Instruction-following LLaMA model
|
||||
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
|
||||
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)
|
||||
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
|
||||
|
||||
This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:
|
||||
- A [**web demo**](https://crfm.stanford.edu/alpaca/) to interact with our Alpaca model
|
||||
- The [52K data](#data-release) used for fine-tuning the model
|
||||
- The code for [generating the data](#data-generation-process)
|
||||
|
||||
## Overview
|
||||
|
||||
The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section.
|
||||
In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the `text-davinci-003` model on the Self-Instruct instruction-following evaluation suite [2].
|
||||
|
||||
Alpaca is still under development, and there are many limitations that have to be addressed.
|
||||
Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless.
|
||||
We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.
|
||||
|
||||
Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.
|
||||
|
||||
**Please read our release [blog post](https://crfm.stanford.edu/2023/03/13/alpaca.html) for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process of an open-source release.**
|
||||
|
||||
|
||||
[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1
|
||||
|
||||
[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560
|
||||
|
||||
|
||||
## Data Release
|
||||
[`alpaca_data.json`](./alpaca_data.json) contains 52K instruction-following data we used for fine-tuning the Alpaca model.
|
||||
This JSON file is a list of dictionaries, each dictionary contains the following fields:
|
||||
- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique.
|
||||
- `input`: `str`, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.
|
||||
- `output`: `str`, the answer to the instruction as generated by `text-davinci-003`.
|
||||
|
||||
We used the following prompts for fine-tuning the Alpaca model:
|
||||
- for examples with a non-empty input field:
|
||||
```
|
||||
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
|
||||
|
||||
### Instruction:
|
||||
{instruction}
|
||||
|
||||
### Input:
|
||||
{input}
|
||||
|
||||
### Response:
|
||||
```
|
||||
- for examples with an empty input field:
|
||||
```
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
||||
|
||||
### Instruction:
|
||||
{instruction}
|
||||
|
||||
### Response:
|
||||
```
|
||||
|
||||
## Data Generation Process
|
||||
|
||||
<details>
|
||||
<summary> <strong> Running the code </strong> </summary>
|
||||
|
||||
1. Set environment variables `OPENAI_API_KEY` to your OpenAI API key.
|
||||
2. Install the dependencies with `pip install -r requirements.txt`.
|
||||
3. Run `python -m generate_instruction generate_instruction_following_data` to generate the data.
|
||||
|
||||
</details>
|
||||
|
||||
We built on the data generation pipeline from [self-instruct](https://github.com/yizhongw/self-instruct) and made the following modifications:
|
||||
- We used `text-davinci-003` to generate the instruction data instead of `davinci`.
|
||||
- We wrote a new prompt (`prompt.txt`) that explicitly gave the requirement of instruction generation to `text-davinci-003`.
|
||||
- We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
|
||||
- We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
|
||||
- We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].
|
||||
|
||||
This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500).
|
||||
In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by [self-instruct](https://github.com/yizhongw/self-instruct/blob/main/data/seed_tasks.jsonl).
|
||||
We plot the below figure (in the style of Figure 2 in the [self-instruct paper](https://arxiv.org/abs/2212.10560) to demonstrate the diversity of our data.
|
||||
The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.
|
||||
|
||||
[//]: # (![parse_analysis](assert/parse_analysis.png | width=100))
|
||||
[<img src="assets/parse_analysis.png" width="750" />](./assets/parse_analysis.png)
|
||||
|
||||
## Fine-tuning
|
||||
We fine-tune our model using standard huggingface training code with the following hyperparameters:
|
||||
|
||||
| Hyperparameter | Value |
|
||||
|----------------|-------|
|
||||
| Batch size | 128 |
|
||||
| Learning rate | 2e-5 |
|
||||
| Epochs | 3 |
|
||||
| Max length | 512 |
|
||||
| Weight decay | 1 |
|
||||
|
||||
We are waiting for huggingface to officially support the llama models (i.e. this [PR](https://github.com/huggingface/transformers/pull/21955) to be merged) before we release a stable version of the finetuning code.
|
||||
|
||||
### Authors
|
||||
All grad students below contributed equally and the order is determined by random draw.
|
||||
|
||||
- [Rohan Taori](https://www.rohantaori.com/)
|
||||
- [Ishaan Gulrajani](https://ishaan.io/)
|
||||
- [Tianyi Zhang](https://tiiiger.github.io/)
|
||||
- [Yann Dubois](https://yanndubs.github.io/)
|
||||
- [Xuechen Li](https://www.lxuechen.com/)
|
||||
|
||||
All advised by [Tatsunori B. Hashimoto](https://thashim.github.io/). Yann is also advised by [Percy Liang](https://cs.stanford.edu/~pliang/) and Xuechen is also advised by [Carlos Guestrin](https://guestrin.su.domains/).
|
||||
|
||||
### Citation
|
||||
|
||||
Please cite the repo if you use the data or code in this repo.
|
||||
```
|
||||
@misc{alpaca,
|
||||
author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
|
||||
title = {Stanford Alpaca: An Instruction-following LLaMA model},
|
||||
year = {2023},
|
||||
publisher = {GitHub},
|
||||
journal = {GitHub repository},
|
||||
howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
|
||||
}
|
||||
```
|
||||
|
||||
Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].
|
||||
|
||||
### Acknowledgements
|
||||
|
||||
We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot.
|
260012
alpaca_data.json
Normal file
BIN
assets/alpaca_main.jpg
Normal file
After Width: | Height: | Size: 345 KiB |
BIN
assets/alpaca_right_email.png
Normal file
After Width: | Height: | Size: 352 KiB |
BIN
assets/alpaca_right_llama.png
Normal file
After Width: | Height: | Size: 217 KiB |
BIN
assets/alpaca_wrong_42.png
Normal file
After Width: | Height: | Size: 339 KiB |
BIN
assets/alpaca_wrong_capital.png
Normal file
After Width: | Height: | Size: 135 KiB |
BIN
assets/logo.png
Normal file
After Width: | Height: | Size: 373 KiB |
BIN
assets/parse_analysis.png
Normal file
After Width: | Height: | Size: 822 KiB |
102
datasheet.md
Normal file
@ -0,0 +1,102 @@
|
||||
# Alpaca Instruction Following Dataset
|
||||
|
||||
## Motivation
|
||||
### For what purpose was the dataset created?
|
||||
To enable more open-source research on instruction following large language models, we use generate 52K instruction-followng demonstrations using OpenAI's text-davinci-003 model.
|
||||
|
||||
### Who created the dataset
|
||||
- [Rohan Taori](https://www.rohantaori.com/)
|
||||
- [Ishaan Gulrajani](https://ishaan.io/)
|
||||
- [Tianyi Zhang](https://tiiiger.github.io/)
|
||||
- [Yann Dubois](https://yanndubs.github.io/)
|
||||
- [Xuechen Li](https://www.lxuechen.com/)
|
||||
- [Carlos Guestrin](https://guestrin.su.domains/)
|
||||
- [Percy Liang](https://cs.stanford.edu/~pliang/)
|
||||
- [Tatsunori B. Hashimoto](https://thashim.github.io/)
|
||||
|
||||
## Composition
|
||||
|
||||
### What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)?
|
||||
The instruction following demonstrations are bootstrapped by following the [seed set](https://github.com/yizhongw/self-instruct/blob/main/data/seed_tasks.jsonl) released from the self-instruct project.
|
||||
Given that the dataset is generated, it is difficult to pinpoint who/what the instances represent.
|
||||
|
||||
### How many instances are there in total
|
||||
In total, there are 52,002 instances in the dataset.
|
||||
|
||||
### Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?
|
||||
not applicable.
|
||||
|
||||
### What data does each instance consist of?
|
||||
|
||||
- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique.
|
||||
- `input`: `str`, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.
|
||||
- `output`: `str`, the answer to the instruction as generated by `text-davinci-003`.
|
||||
|
||||
### Is any information missing from individual instances?
|
||||
no.
|
||||
|
||||
### Are relationships between individual instances made explicit (e.g., users’ movie ratings, social network links)?
|
||||
not applicable.
|
||||
|
||||
### Is there a label or target associated with each instance?
|
||||
the finetuning target is the response generated by `text-davinci-003`.
|
||||
|
||||
### Are there recommended data splits (e.g., training, development/validation, testing)?
|
||||
The Alpaca models (both demo and the ones that will be released) are trained on all 52K data.
|
||||
There is no recommended data split for the dataset.
|
||||
|
||||
### Are there any errors, sources of noise, or redundancies in the dataset?
|
||||
All 52k instructions are unique. However, some generated instructions may not be sensible, i.e., there may not exist any good response to the instruction.
|
||||
|
||||
### Is the dataset self-contained, or does it link to or otherwise rely on external resources (e.g., websites, tweets, other datasets)?
|
||||
the dataset is self-contained.
|
||||
|
||||
### Does the dataset contain data that might be considered confidential (e.g., data that is protected by legal privilege or by doctor-patient confidentiality, data that includes the content of individuals’ non-public communications)?
|
||||
no.
|
||||
|
||||
### Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety?
|
||||
The generated may contain a few inappropriate responses. In our preliminary testing, we have not encountered any offensive responses.
|
||||
|
||||
## Collection process
|
||||
The [Github repository](https://github.com/tatsu-lab/stanford_alpaca) contains the code to generate the dataset.
|
||||
|
||||
## Uses
|
||||
|
||||
### Has the dataset been used for any tasks already?
|
||||
The dataset is used to train the Alpaca models that are both used for the demo and released.
|
||||
|
||||
### Is there a repository that links to any or all papers or systems that use the dataset?
|
||||
Please see https://github.com/tatsu-lab/stanford_alpaca
|
||||
|
||||
### Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses?
|
||||
This dataset is generated by using the OpenAI's API. Therefore, this dataset cannot be used for commerical usage that compete with OpenAI.
|
||||
|
||||
### Are there tasks for which the dataset should not be used?
|
||||
The dataset should not be used for commerical usage that compete with OpenAI.
|
||||
|
||||
## Distribution
|
||||
### Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created?
|
||||
The dataset can be freely downloaded.
|
||||
|
||||
### How will the dataset will be distributed (e.g., tarball on website, API, GitHub)?
|
||||
The dataset can be downloaded from the [Github repository](https://github.com/tatsu-lab/stanford_alpaca) as a json file.
|
||||
|
||||
### Will the dataset be distributed under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)?
|
||||
This dataset is distributed under [the ODC-By license](https://opendatacommons.org/licenses/by/1-0/).
|
||||
|
||||
### Have any third parties imposed IP-based or other restrictions on the data associated with the instances?
|
||||
no
|
||||
|
||||
### Do any export controls or other regulatory restrictions apply to the dataset or to individual instances?
|
||||
no
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Who is supporting/hosting/maintaining the dataset?
|
||||
The dataset is hosted on github and the Github repository is maintained by Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li.
|
||||
|
||||
### How can the owner/curator/manager of the dataset be contacted (e.g., email address)?
|
||||
Please open an issue in the [Github repository](https://github.com/tatsu-lab/stanford_alpaca)
|
||||
|
||||
### Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)?
|
||||
We do not have plan to update the dataset.
|
217
generate_instruction.py
Normal file
@ -0,0 +1,217 @@
|
||||
"""
|
||||
batch_selfinstruct_generate.py
|
||||
|
||||
run:
|
||||
python -m generate_instruction generate_instruction_following_data \
|
||||
--output_dir ./ \
|
||||
--num_instructions_to_generate 10 \
|
||||
--model_name="text-davinci-003" \
|
||||
"""
|
||||
import time
|
||||
import json
|
||||
import os
|
||||
import random
|
||||
import re
|
||||
import string
|
||||
from functools import partial
|
||||
from multiprocessing import Pool
|
||||
|
||||
import numpy as np
|
||||
import tqdm
|
||||
from rouge_score import rouge_scorer
|
||||
import utils
|
||||
|
||||
import fire
|
||||
|
||||
|
||||
def encode_prompt(prompt_instructions):
|
||||
"""Encode multiple prompt instructions into a single string."""
|
||||
prompt = open("./prompt.txt").read() + "\n"
|
||||
|
||||
for idx, task_dict in enumerate(prompt_instructions):
|
||||
(instruction, input, output) = task_dict["instruction"], task_dict["input"], task_dict["output"]
|
||||
instruction = re.sub(r"\s+", " ", instruction).strip().rstrip(":")
|
||||
input = "<noinput>" if input.lower() == "" else input
|
||||
prompt += f"###\n"
|
||||
prompt += f"{idx + 1}. Instruction: {instruction}\n"
|
||||
prompt += f"{idx + 1}. Input:\n{input}\n"
|
||||
prompt += f"{idx + 1}. Output:\n{output}\n"
|
||||
prompt += f"###\n"
|
||||
prompt += f"{idx + 2}. Instruction:"
|
||||
return prompt
|
||||
|
||||
|
||||
def post_process_gpt3_response(num_prompt_instructions, response):
|
||||
if response is None:
|
||||
return []
|
||||
raw_instructions = f"{num_prompt_instructions+1}. Instruction:" + response["text"]
|
||||
raw_instructions = re.split("###", raw_instructions)
|
||||
instructions = []
|
||||
for idx, inst in enumerate(raw_instructions):
|
||||
# if the decoding stops due to length, the last example is likely truncated so we discard it
|
||||
if idx == len(raw_instructions) - 1 and response["finish_reason"] == "length":
|
||||
continue
|
||||
idx += num_prompt_instructions + 1
|
||||
splitted_data = re.split(f"{idx}\.\s+(Instruction|Input|Output):", inst)
|
||||
if len(splitted_data) != 7:
|
||||
continue
|
||||
else:
|
||||
inst = splitted_data[2].strip()
|
||||
input = splitted_data[4].strip()
|
||||
input = "" if input.lower() == "<noinput>" else input
|
||||
output = splitted_data[6].strip()
|
||||
# filter out too short or too long instructions
|
||||
if len(inst.split()) <= 3 or len(inst.split()) > 150:
|
||||
continue
|
||||
# filter based on keywords that are not suitable for language models.
|
||||
blacklist = [
|
||||
"image",
|
||||
"images",
|
||||
"graph",
|
||||
"graphs",
|
||||
"picture",
|
||||
"pictures",
|
||||
"file",
|
||||
"files",
|
||||
"map",
|
||||
"maps",
|
||||
"draw",
|
||||
"plot",
|
||||
"go to",
|
||||
"video",
|
||||
"audio",
|
||||
"music",
|
||||
"flowchart",
|
||||
"diagram",
|
||||
]
|
||||
blacklist += []
|
||||
if any(find_word_in_string(word, inst) for word in blacklist):
|
||||
continue
|
||||
# We found that the model tends to add "write a program" to some existing instructions, which lead to a lot of such instructions.
|
||||
# And it's a bit comfusing whether the model need to write a program or directly output the result.
|
||||
# Here we filter them out.
|
||||
# Note this is not a comprehensive filtering for all programming instructions.
|
||||
if inst.startswith("Write a program"):
|
||||
continue
|
||||
# filter those starting with punctuation
|
||||
if inst[0] in string.punctuation:
|
||||
continue
|
||||
# filter those starting with non-english character
|
||||
if not inst[0].isascii():
|
||||
continue
|
||||
instructions.append({"instruction": inst, "input": input, "output": output})
|
||||
return instructions
|
||||
|
||||
|
||||
def find_word_in_string(w, s):
|
||||
return re.compile(r"\b({0})\b".format(w), flags=re.IGNORECASE).search(s)
|
||||
|
||||
|
||||
def generate_instruction_following_data(
|
||||
output_dir="./",
|
||||
seed_tasks_path="./seed_tasks.jsonl",
|
||||
num_instructions_to_generate=100,
|
||||
model_name="text-davinci-003",
|
||||
num_prompt_instructions=3,
|
||||
request_batch_size=5,
|
||||
temperature=1.0,
|
||||
top_p=1.0,
|
||||
num_cpus=16,
|
||||
):
|
||||
seed_tasks = [json.loads(l) for l in open(seed_tasks_path, "r")]
|
||||
seed_instruction_data = [
|
||||
{"instruction": t["instruction"], "input": t["instances"][0]["input"], "output": t["instances"][0]["output"]}
|
||||
for t in seed_tasks
|
||||
]
|
||||
print(f"Loaded {len(seed_instruction_data)} human-written seed instructions")
|
||||
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
request_idx = 0
|
||||
# load the LM-generated instructions
|
||||
machine_instruction_data = []
|
||||
if os.path.exists(os.path.join(output_dir, "regen.json")):
|
||||
machine_instruction_data = utils.jload(os.path.join(output_dir, "regen.json"))
|
||||
print(f"Loaded {len(machine_instruction_data)} machine-generated instructions")
|
||||
|
||||
# similarities = {}
|
||||
scorer = rouge_scorer.RougeScorer(["rougeL"], use_stemmer=False)
|
||||
|
||||
# now let's generate new instructions!
|
||||
progress_bar = tqdm.tqdm(total=num_instructions_to_generate)
|
||||
if machine_instruction_data:
|
||||
progress_bar.update(len(machine_instruction_data))
|
||||
|
||||
# first we tokenize all the seed instructions and generated machine instructions
|
||||
all_instructions = [d["instruction"] for d in seed_instruction_data] + [
|
||||
d["instruction"] for d in machine_instruction_data
|
||||
]
|
||||
all_instruction_tokens = [scorer._tokenizer.tokenize(inst) for inst in all_instructions]
|
||||
|
||||
while len(machine_instruction_data) < num_instructions_to_generate:
|
||||
request_idx += 1
|
||||
|
||||
batch_inputs = []
|
||||
for _ in range(request_batch_size):
|
||||
# only sampling from the seed tasks
|
||||
prompt_instructions = random.sample(seed_instruction_data, num_prompt_instructions)
|
||||
prompt = encode_prompt(prompt_instructions)
|
||||
batch_inputs.append(prompt)
|
||||
decoding_args = utils.OpenAIDecodingArguments(
|
||||
temperature=temperature,
|
||||
n=1,
|
||||
max_tokens=3072, # hard-code to maximize the length. the requests will be automatically adjusted
|
||||
top_p=top_p,
|
||||
stop=["\n20", "20.", "20."],
|
||||
)
|
||||
request_start = time.time()
|
||||
results = utils.openai_completion(
|
||||
prompts=batch_inputs,
|
||||
model_name=model_name,
|
||||
batch_size=request_batch_size,
|
||||
decoding_args=decoding_args,
|
||||
logit_bias={"50256": -100}, # prevent the <|endoftext|> token from being generated
|
||||
)
|
||||
request_duration = time.time() - request_start
|
||||
|
||||
process_start = time.time()
|
||||
instruction_data = []
|
||||
for result in results:
|
||||
new_instructions = post_process_gpt3_response(num_prompt_instructions, result)
|
||||
instruction_data += new_instructions
|
||||
|
||||
total = len(instruction_data)
|
||||
keep = 0
|
||||
for instruction_data_entry in instruction_data:
|
||||
# computing similarity with the pre-tokenzied instructions
|
||||
new_instruction_tokens = scorer._tokenizer.tokenize(instruction_data_entry["instruction"])
|
||||
with Pool(num_cpus) as p:
|
||||
rouge_scores = p.map(
|
||||
partial(rouge_scorer._score_lcs, new_instruction_tokens),
|
||||
all_instruction_tokens,
|
||||
)
|
||||
rouge_scores = [score.fmeasure for score in rouge_scores]
|
||||
most_similar_instructions = {
|
||||
all_instructions[i]: rouge_scores[i] for i in np.argsort(rouge_scores)[-10:][::-1]
|
||||
}
|
||||
if max(rouge_scores) > 0.7:
|
||||
continue
|
||||
else:
|
||||
keep += 1
|
||||
instruction_data_entry["most_similar_instructions"] = most_similar_instructions
|
||||
instruction_data_entry["avg_similarity_score"] = float(np.mean(rouge_scores))
|
||||
machine_instruction_data.append(instruction_data_entry)
|
||||
all_instructions.append(instruction_data_entry["instruction"])
|
||||
all_instruction_tokens.append(new_instruction_tokens)
|
||||
progress_bar.update(1)
|
||||
process_duration = time.time() - process_start
|
||||
print(f"Request {request_idx} took {request_duration:.2f}s, processing took {process_duration:.2f}s")
|
||||
print(f"Generated {total} instructions, kept {keep} instructions")
|
||||
utils.jdump(machine_instruction_data, os.path.join(output_dir, "regen.json"))
|
||||
|
||||
|
||||
def main(task, **kwargs):
|
||||
globals()[task](**kwargs)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
fire.Fire(main)
|
52
model_card.md
Normal file
@ -0,0 +1,52 @@
|
||||
---
|
||||
# Alpaca Model Card
|
||||
|
||||
## Model details
|
||||
**Organization developing the model**
|
||||
Stanford Hashimoto Group
|
||||
|
||||
**Model date**
|
||||
Alpaca was trained in March 2023
|
||||
|
||||
**Model version**
|
||||
This is version 1 of the model.
|
||||
|
||||
**Model type**
|
||||
Alpaca models are instruction-following models finetuned from LLaMA models.
|
||||
|
||||
**More information**
|
||||
Please see our blog post at `link` for more information.
|
||||
|
||||
**Citations details**
|
||||
Please cite the [github repo](https://github.com/tatsu-lab/stanford_alpaca) if you use the data or code in this repo.
|
||||
|
||||
**License**
|
||||
Code and data are licensed under the Apache 2.0 license.
|
||||
|
||||
**Where to send questions or comments about the model**
|
||||
Questions and comments about LLaMA can be sent via the [GitHub repository](https://github.com/tatsu-lab/stanford_alpaca) of the project, by opening an issue.
|
||||
|
||||
## Intended use
|
||||
**Primary intended uses**
|
||||
The primary use of Alpaca is research on instruction following large language models.
|
||||
|
||||
**Primary intended users**
|
||||
The primary intended users of the model are researchers in natural language processing, machine learning and artificial intelligence.
|
||||
|
||||
**Out-of-scope use cases**
|
||||
Alpaca models are not finetuned with human feedback and are not intended for use in production systems.
|
||||
Alpaca models are trained from data generated using the OpenAI API and thus any usage must not be competing with the OpenAI API.
|
||||
|
||||
## Metrics
|
||||
**Model performance measures**
|
||||
the Alpaca 7B model has been evaluated using blinded pairwise comparison with OpenAI's text-davinci-003 on the self-instruct evaluation set.
|
||||
Our student authors have judged the Alpaca 7B model to be on par with text-davinci-003, with a win rate around 50%.
|
||||
|
||||
**Approaches to uncertainty and variability**
|
||||
We have only finetuned a single Alpaca model at each model size, and thus we do not have a good sense of the variability of the model.
|
||||
|
||||
## Evaluation datasets
|
||||
The model was evaluated on the self-instruct evaluation set.
|
||||
|
||||
## Training dataset
|
||||
The model was trained on 52K instruction following data, which is release in the [Github repository](https://github.com/tatsu-lab/stanford_alpaca).
|
14
prompt.txt
Normal file
@ -0,0 +1,14 @@
|
||||
You are asked to come up with a set of 20 diverse task instructions. These task instructions will be given to a GPT model and we will evaluate the GPT model for completing the instructions.
|
||||
|
||||
Here are the requirements:
|
||||
1. Try not to repeat the verb for each instruction to maximize diversity.
|
||||
2. The language used for the instruction also should be diverse. For example, you should combine questions with imperative instrucitons.
|
||||
3. The type of instructions should be diverse. The list should include diverse types of tasks like open-ended generation, classification, editing, etc.
|
||||
2. A GPT language model should be able to complete the instruction. For example, do not ask the assistant to create any visual or audio output. For another example, do not ask the assistant to wake you up at 5pm or set a reminder because it cannot perform any action.
|
||||
3. The instructions should be in English.
|
||||
4. The instructions should be 1 to 2 sentences long. Either an imperative sentence or a question is permitted.
|
||||
5. You should generate an appropriate input to the instruction. The input field should contain a specific example provided for the instruction. It should involve realistic data and should not contain simple placeholders. The input should provide substantial content to make the instruction challenging but should ideally not exceed 100 words.
|
||||
6. Not all instructions require input. For example, when a instruction asks about some general information, "what is the highest peak in the world", it is not necssary to provide a specific context. In this case, we simply put "<noinput>" in the input field.
|
||||
7. The output should be an appropriate response to the instruction and the input. Make sure the output is less than 100 words.
|
||||
|
||||
List of 20 tasks:
|
4
requirements.txt
Normal file
@ -0,0 +1,4 @@
|
||||
numpy
|
||||
rouge_score
|
||||
fire
|
||||
openai
|
175
seed_tasks.jsonl
Normal file
173
utils.py
Normal file
@ -0,0 +1,173 @@
|
||||
import dataclasses
|
||||
import logging
|
||||
import math
|
||||
import os
|
||||
import io
|
||||
import sys
|
||||
import time
|
||||
import json
|
||||
from typing import Optional, Sequence, Union
|
||||
|
||||
import openai
|
||||
import tqdm
|
||||
from openai import openai_object
|
||||
import copy
|
||||
|
||||
StrOrOpenAIObject = Union[str, openai_object.OpenAIObject]
|
||||
|
||||
openai_org = os.getenv("OPENAI_ORG")
|
||||
if openai_org is not None:
|
||||
openai.organization = openai_org
|
||||
logging.warning(f"Switching to organization: {openai_org} for OAI API key.")
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class OpenAIDecodingArguments(object):
|
||||
max_tokens: int = 1800
|
||||
temperature: float = 0.2
|
||||
top_p: float = 1.0
|
||||
n: int = 1
|
||||
stream: bool = False
|
||||
stop: Optional[Sequence[str]] = None
|
||||
presence_penalty: float = 0.0
|
||||
frequency_penalty: float = 0.0
|
||||
suffix: Optional[str] = None
|
||||
logprobs: Optional[int] = None
|
||||
echo: bool = False
|
||||
|
||||
|
||||
def openai_completion(
|
||||
prompts: Union[str, Sequence[str], Sequence[dict[str, str]], dict[str, str]],
|
||||
decoding_args: OpenAIDecodingArguments,
|
||||
model_name="text-davinci-003",
|
||||
sleep_time=2,
|
||||
batch_size=1,
|
||||
max_instances=sys.maxsize,
|
||||
max_batches=sys.maxsize,
|
||||
return_text=False,
|
||||
**decoding_kwargs,
|
||||
) -> Union[Union[StrOrOpenAIObject], Sequence[StrOrOpenAIObject], Sequence[Sequence[StrOrOpenAIObject]],]:
|
||||
"""Decode with OpenAI API.
|
||||
|
||||
Args:
|
||||
prompts: A string or a list of strings to complete. If it is a chat model the strings should be formatted
|
||||
as explained here: https://github.com/openai/openai-python/blob/main/chatml.md. If it is a chat model
|
||||
it can also be a dictionary (or list thereof) as explained here:
|
||||
https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb
|
||||
decoding_args: Decoding arguments.
|
||||
model_name: Model name. Can be either in the format of "org/model" or just "model".
|
||||
sleep_time: Time to sleep once the rate-limit is hit.
|
||||
batch_size: Number of prompts to send in a single request. Only for non chat model.
|
||||
max_instances: Maximum number of prompts to decode.
|
||||
max_batches: Maximum number of batches to decode. This argument will be deprecated in the future.
|
||||
return_text: If True, return text instead of full completion object (which contains things like logprob).
|
||||
decoding_kwargs: Additional decoding arguments. Pass in `best_of` and `logit_bias` if you need them.
|
||||
|
||||
Returns:
|
||||
A completion or a list of completions.
|
||||
Depending on return_text, return_openai_object, and decoding_args.n, the completion type can be one of
|
||||
- a string (if return_text is True)
|
||||
- an openai_object.OpenAIObject object (if return_text is False)
|
||||
- a list of objects of the above types (if decoding_args.n > 1)
|
||||
"""
|
||||
is_single_prompt = isinstance(prompts, (str, dict))
|
||||
if is_single_prompt:
|
||||
prompts = [prompts]
|
||||
|
||||
if max_batches < sys.maxsize:
|
||||
logging.warning(
|
||||
"`max_batches` will be deprecated in the future, please use `max_instances` instead."
|
||||
"Setting `max_instances` to `max_batches * batch_size` for now."
|
||||
)
|
||||
max_instances = max_batches * batch_size
|
||||
|
||||
prompts = prompts[:max_instances]
|
||||
num_prompts = len(prompts)
|
||||
prompt_batches = [
|
||||
prompts[batch_id * batch_size : (batch_id + 1) * batch_size]
|
||||
for batch_id in range(int(math.ceil(num_prompts / batch_size)))
|
||||
]
|
||||
|
||||
completions = []
|
||||
for batch_id, prompt_batch in tqdm.tqdm(
|
||||
enumerate(prompt_batches),
|
||||
desc="prompt_batches",
|
||||
total=len(prompt_batches),
|
||||
):
|
||||
batch_decoding_args = copy.deepcopy(decoding_args) # cloning the decoding_args
|
||||
|
||||
while True:
|
||||
try:
|
||||
shared_kwargs = dict(
|
||||
model=model_name,
|
||||
**batch_decoding_args.__dict__,
|
||||
**decoding_kwargs,
|
||||
)
|
||||
completion_batch = openai.Completion.create(prompt=prompt_batch, **shared_kwargs)
|
||||
choices = completion_batch.choices
|
||||
|
||||
for choice in choices:
|
||||
choice["total_tokens"] = completion_batch.usage.total_tokens
|
||||
completions.extend(choices)
|
||||
break
|
||||
except openai.error.OpenAIError as e:
|
||||
logging.warning(f"OpenAIError: {e}.")
|
||||
if "Please reduce your prompt" in str(e):
|
||||
batch_decoding_args.max_tokens = int(batch_decoding_args.max_tokens * 0.8)
|
||||
logging.warning(f"Reducing target length to {batch_decoding_args.max_tokens}, Retrying...")
|
||||
else:
|
||||
logging.warning("Hit request rate limit; retrying...")
|
||||
time.sleep(sleep_time) # Annoying rate limit on requests.
|
||||
|
||||
if return_text:
|
||||
completions = [completion.text for completion in completions]
|
||||
if decoding_args.n > 1:
|
||||
# make completions a nested list, where each entry is a consecutive decoding_args.n of original entries.
|
||||
completions = [completions[i : i + decoding_args.n] for i in range(0, len(completions), decoding_args.n)]
|
||||
if is_single_prompt:
|
||||
# Return non-tuple if only 1 input and 1 generation.
|
||||
(completions,) = completions
|
||||
return completions
|
||||
|
||||
|
||||
def _make_w_io_base(f, mode: str):
|
||||
if not isinstance(f, io.IOBase):
|
||||
f_dirname = os.path.dirname(f)
|
||||
if f_dirname != "":
|
||||
os.makedirs(f_dirname, exist_ok=True)
|
||||
f = open(f, mode=mode)
|
||||
return f
|
||||
|
||||
|
||||
def _make_r_io_base(f, mode: str):
|
||||
if not isinstance(f, io.IOBase):
|
||||
f = open(f, mode=mode)
|
||||
return f
|
||||
|
||||
|
||||
def jdump(obj, f, mode="w", indent=4, default=str):
|
||||
"""Dump a str or dictionary to a file in json format.
|
||||
|
||||
Args:
|
||||
obj: An object to be written.
|
||||
f: A string path to the location on disk.
|
||||
mode: Mode for opening the file.
|
||||
indent: Indent for storing json dictionaries.
|
||||
default: A function to handle non-serializable entries; defaults to `str`.
|
||||
"""
|
||||
f = _make_w_io_base(f, mode)
|
||||
if isinstance(obj, (dict, list)):
|
||||
json.dump(obj, f, indent=indent, default=default)
|
||||
elif isinstance(obj, str):
|
||||
f.write(obj)
|
||||
else:
|
||||
raise ValueError(f"Unexpected type: {type(obj)}")
|
||||
f.close()
|
||||
|
||||
|
||||
def jload(f, mode="r"):
|
||||
"""Load a .json file into a dictionary."""
|
||||
f = _make_r_io_base(f, mode)
|
||||
jdict = json.load(f)
|
||||
f.close()
|
||||
return jdict
|