From 7f0853214d2d4cd732d6f33622566a10e317a903 Mon Sep 17 00:00:00 2001 From: Xuechen Li Date: Thu, 16 Mar 2023 00:43:24 -0700 Subject: [PATCH] document how training may slow down. --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.md b/README.md index 627481e..550ce51 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,7 @@ This is the repo for the Stanford Alpaca project, which aims to build and share - A [**web demo**](https://crfm.stanford.edu/alpaca/) to interact with our Alpaca model - The [52K data](#data-release) used for fine-tuning the model - The code for [generating the data](#data-generation-process) +- The code for [fine-tuning the model](#fine-tuning) ## Overview @@ -139,6 +140,15 @@ torchrun --nproc_per_node=4 --master_port= train.py \ --tf32 True ``` +### Warning +`fsdp_transformer_layer_cls_to_wrap` must be set to the name of the specific decoder layer. +The LLaMA Hugging Face PR is not stable. +Earlier commits used the name `LLaMADecoderLayer` for their decoder layer (the commit hash our code is based on this). +More recent commits use `LlamaDecoderLayer` (notice the small case difference). +Not setting `fsdp_transformer_layer_cls_to_wrap` to the correct name will lead to drastic slowdowns in training. + +### Side notes + The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B ```bash