**The code in this repo is not yet fully tested. I'm still in the process of retraining the model with the outputs included, and I make no guarantees about the results of running `generate.py`.**
This repository contains code for reproducing the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) results using [low-rank adaptations (LoRAs)](https://arxiv.org/pdf/2106.09685.pdf).
The goal is to provide an open Instruct model of similar quality to `text-davinci-003` that can run on most consumer GPUs with 8-bit quantization.
Users will need to be ready to fork Huggingface `transformers` to access Jason Phang's [LLaMA implementation](https://github.com/huggingface/transformers/pull/21955).
For fine-tuning LoRAs we use Huggingface's [PEFT](https://github.com/huggingface/peft).
See `generate.py`. This file reads the `decapoda-research/llama-7b-hf` model from the Huggingface model hub and the LoRA weights from `tloen/alpaca-lora-7b`, and runs inference on a specified input. Users should treat this as example code for the use of the model, and modify it as needed.