mirror of
https://github.com/tloen/alpaca-lora.git
synced 2024-10-01 01:05:56 -04:00
Added Dockerfile and docker-compose.yml (#207)
* Added Dockerfile for inference * Added instructions for Dockerfile * Update README.md * Update README.md * Update README.md * Pass env through Dockerfile * Added docker compose setup and instructions * Added more environment options * Set a safer default mount point * add docker-compose changes * Added Dockerfile for inference * Added instructions for Dockerfile * Update README.md * Update README.md * Update README.md * Pass env through Dockerfile * Added docker compose setup and instructions * Added more environment options * Set a safer default mount point * add to gitignore, update to new generate.py * add docker ignore, simplify docker compose file * add back missing requirements * Adjustments to compose and generate.py, added Docker to README.md * Linting adjust to Black * Adjusting import linting * Update README.md * Update README.md * Removed comment by original Dockerfile creator. Comment not necessary. * cleanup README Co-authored-by: Francesco Saverio Zuppichini <zuppif@usi.ch> --------- Co-authored-by: Francesco Saverio Zuppichini <zuppif@usi.ch> Co-authored-by: Chris Alexiuk <c.s.alexiuk@gmail.com> Co-authored-by: ElRoberto538 <> Co-authored-by: Sam Sipe <samsipe@gmail.com> Co-authored-by: Eric J. Wang <eric.james.wang@gmail.com>
This commit is contained in:
parent
216e785d9c
commit
4367a43fcb
4
.dockerignore
Normal file
4
.dockerignore
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
.venv
|
||||||
|
.github
|
||||||
|
.vscode
|
||||||
|
.docker-compose.yml
|
3
.gitignore
vendored
3
.gitignore
vendored
@ -11,4 +11,5 @@ wandb
|
|||||||
evaluate.py
|
evaluate.py
|
||||||
test_data.json
|
test_data.json
|
||||||
todo.txt
|
todo.txt
|
||||||
.vscode/
|
.venv
|
||||||
|
.vscode
|
||||||
|
18
Dockerfile
Normal file
18
Dockerfile
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
|
||||||
|
|
||||||
|
ARG DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
|
RUN apt-get update && apt-get install -y \
|
||||||
|
git \
|
||||||
|
curl \
|
||||||
|
software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa \
|
||||||
|
&& apt install -y python3.10 \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
WORKDIR /workspace
|
||||||
|
COPY requirements.txt requirements.txt
|
||||||
|
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 \
|
||||||
|
&& python3.10 -m pip install -r requirements.txt \
|
||||||
|
&& python3.10 -m pip install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu118
|
||||||
|
COPY . .
|
||||||
|
ENTRYPOINT [ "python3.10"]
|
68
README.md
68
README.md
@ -15,25 +15,13 @@ as well as Tim Dettmers' [bitsandbytes](https://github.com/TimDettmers/bitsandby
|
|||||||
|
|
||||||
Without hyperparameter tuning, the LoRA model produces outputs comparable to the Stanford Alpaca model. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
|
Without hyperparameter tuning, the LoRA model produces outputs comparable to the Stanford Alpaca model. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
|
||||||
|
|
||||||
## Setup
|
### Local Setup
|
||||||
|
|
||||||
1. Install dependencies
|
1. Install dependencies
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
1. Set environment variables, or modify the files referencing `BASE_MODEL`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Files referencing `BASE_MODEL`
|
|
||||||
# export_hf_checkpoint.py
|
|
||||||
# export_state_dict_checkpoint.py
|
|
||||||
|
|
||||||
export BASE_MODEL=decapoda-research/llama-7b-hf
|
|
||||||
```
|
|
||||||
|
|
||||||
Both `finetune.py` and `generate.py` use `--base_model` flag as shown further below.
|
|
||||||
|
|
||||||
1. If bitsandbytes doesn't work, [install it from source.](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) Windows users can follow [these instructions](https://github.com/tloen/alpaca-lora/issues/17).
|
1. If bitsandbytes doesn't work, [install it from source.](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) Windows users can follow [these instructions](https://github.com/tloen/alpaca-lora/issues/17).
|
||||||
|
|
||||||
@ -94,6 +82,49 @@ They should help users
|
|||||||
who want to run inference in projects like [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
who want to run inference in projects like [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
||||||
or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
|
or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
|
||||||
|
|
||||||
|
### Docker Setup & Inference
|
||||||
|
|
||||||
|
1. Build the container image:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker build -t alpaca-lora .
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Run the container (you can also use `finetune.py` and all of its parameters as shown above for training):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --gpus=all --shm-size 64g -p 7860:7860 -v ${HOME}/.cache:/root/.cache --rm alpaca-lora generate.py \
|
||||||
|
--load_8bit \
|
||||||
|
--base_model 'decapoda-research/llama-7b-hf' \
|
||||||
|
--lora_weights 'tloen/alpaca-lora-7b'
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open `https://localhost:7860` in the browser
|
||||||
|
|
||||||
|
### Docker Compose Setup & Inference
|
||||||
|
|
||||||
|
1. (optional) Change desired model and weights under `environment` in the `docker-compose.yml`
|
||||||
|
|
||||||
|
2. Build and run the container
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose up -d --build
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open `https://localhost:7860` in the browser
|
||||||
|
|
||||||
|
4. See logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose logs -f
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Clean everything up:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose down --volumes --rmi all
|
||||||
|
```
|
||||||
|
|
||||||
### Notes
|
### Notes
|
||||||
|
|
||||||
- We can likely improve our model performance significantly if we had a better dataset. Consider supporting the [LAION Open Assistant](https://open-assistant.io/) effort to produce a high-quality dataset for supervised fine-tuning (or bugging them to release their data).
|
- We can likely improve our model performance significantly if we had a better dataset. Consider supporting the [LAION Open Assistant](https://open-assistant.io/) effort to produce a high-quality dataset for supervised fine-tuning (or bugging them to release their data).
|
||||||
@ -110,9 +141,7 @@ or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
|
|||||||
- 7B:
|
- 7B:
|
||||||
- <https://huggingface.co/tloen/alpaca-lora-7b>
|
- <https://huggingface.co/tloen/alpaca-lora-7b>
|
||||||
- <https://huggingface.co/samwit/alpaca7B-lora>
|
- <https://huggingface.co/samwit/alpaca7B-lora>
|
||||||
- 🤖 <https://huggingface.co/nomic-ai/gpt4all-lora>
|
|
||||||
- 🇧🇷 <https://huggingface.co/22h/cabrita-lora-v0-1>
|
- 🇧🇷 <https://huggingface.co/22h/cabrita-lora-v0-1>
|
||||||
- 🇨🇳 <https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b>
|
|
||||||
- 🇨🇳 <https://huggingface.co/qychen/luotuo-lora-7b-0.1>
|
- 🇨🇳 <https://huggingface.co/qychen/luotuo-lora-7b-0.1>
|
||||||
- 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-7b-v0>
|
- 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-7b-v0>
|
||||||
- 🇫🇷 <https://huggingface.co/bofenghuang/vigogne-lora-7b>
|
- 🇫🇷 <https://huggingface.co/bofenghuang/vigogne-lora-7b>
|
||||||
@ -131,9 +160,6 @@ or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
|
|||||||
- <https://huggingface.co/baseten/alpaca-30b>
|
- <https://huggingface.co/baseten/alpaca-30b>
|
||||||
- <https://huggingface.co/chansung/alpaca-lora-30b>
|
- <https://huggingface.co/chansung/alpaca-lora-30b>
|
||||||
- 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-30b-v0>
|
- 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-30b-v0>
|
||||||
- 🇰🇷 <https://huggingface.co/beomi/KoAlpaca-30B-LoRA>
|
|
||||||
- 65B:
|
|
||||||
- 🇰🇷 <https://huggingface.co/beomi/KoAlpaca-65B-LoRA>
|
|
||||||
- [alpaca-native](https://huggingface.co/chavinlo/alpaca-native), a replication using the original Alpaca code
|
- [alpaca-native](https://huggingface.co/chavinlo/alpaca-native), a replication using the original Alpaca code
|
||||||
|
|
||||||
### Example outputs
|
### Example outputs
|
||||||
|
28
docker-compose.yml
Normal file
28
docker-compose.yml
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
version: '3'
|
||||||
|
|
||||||
|
services:
|
||||||
|
alpaca-lora:
|
||||||
|
build:
|
||||||
|
context: ./
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
args:
|
||||||
|
BUILDKIT_INLINE_CACHE: "0"
|
||||||
|
image: alpaca-lora
|
||||||
|
shm_size: '64gb'
|
||||||
|
command: generate.py --load_8bit --base_model $BASE_MODEL --lora_weights 'tloen/alpaca-lora-7b'
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- alpaca-lora:/root/.cache # Location downloaded weights will be stored
|
||||||
|
ports:
|
||||||
|
- 7860:7860
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
count: all
|
||||||
|
capabilities: [ gpu ]
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
alpaca-lora:
|
||||||
|
name: alpaca-lora
|
@ -1,3 +1,4 @@
|
|||||||
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
import fire
|
import fire
|
||||||
@ -29,6 +30,7 @@ def main(
|
|||||||
server_name: str = "127.0.0.1", # Allows to listen on all interfaces by providing '0.0.0.0'
|
server_name: str = "127.0.0.1", # Allows to listen on all interfaces by providing '0.0.0.0'
|
||||||
share_gradio: bool = False,
|
share_gradio: bool = False,
|
||||||
):
|
):
|
||||||
|
base_model = base_model or os.environ.get("BASE_MODEL", "")
|
||||||
assert (
|
assert (
|
||||||
base_model
|
base_model
|
||||||
), "Please specify a --base_model, e.g. --base_model='decapoda-research/llama-7b-hf'"
|
), "Please specify a --base_model, e.g. --base_model='decapoda-research/llama-7b-hf'"
|
||||||
@ -146,7 +148,7 @@ def main(
|
|||||||
],
|
],
|
||||||
title="🦙🌲 Alpaca-LoRA",
|
title="🦙🌲 Alpaca-LoRA",
|
||||||
description="Alpaca-LoRA is a 7B-parameter LLaMA model finetuned to follow instructions. It is trained on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset and makes use of the Huggingface LLaMA implementation. For more information, please visit [the project's website](https://github.com/tloen/alpaca-lora).", # noqa: E501
|
description="Alpaca-LoRA is a 7B-parameter LLaMA model finetuned to follow instructions. It is trained on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset and makes use of the Huggingface LLaMA implementation. For more information, please visit [the project's website](https://github.com/tloen/alpaca-lora).", # noqa: E501
|
||||||
).launch(server_name=server_name, share=share_gradio)
|
).launch(server_name="0.0.0.0", share=share_gradio)
|
||||||
# Old testing code follows.
|
# Old testing code follows.
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
@ -1,5 +1,6 @@
|
|||||||
accelerate
|
accelerate
|
||||||
appdirs
|
appdirs
|
||||||
|
loralib
|
||||||
bitsandbytes
|
bitsandbytes
|
||||||
black
|
black
|
||||||
black[jupyter]
|
black[jupyter]
|
||||||
@ -7,5 +8,6 @@ datasets
|
|||||||
fire
|
fire
|
||||||
git+https://github.com/huggingface/peft.git
|
git+https://github.com/huggingface/peft.git
|
||||||
git+https://github.com/huggingface/transformers.git
|
git+https://github.com/huggingface/transformers.git
|
||||||
|
sentencepiece
|
||||||
gradio
|
gradio
|
||||||
sentencepiece
|
sentencepiece
|
||||||
|
Loading…
Reference in New Issue
Block a user