From 9a14edbc84db956b094cffde453caf696a58bf69 Mon Sep 17 00:00:00 2001 From: Rohan Taori Date: Wed, 15 Mar 2023 09:43:23 -0700 Subject: [PATCH 1/2] Update gpu scaling batch size instructions --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index f5b8c52..c582830 100644 --- a/README.md +++ b/README.md @@ -166,6 +166,7 @@ torchrun --nproc_per_node=4 --master_port= train.py \ ``` Note the given training script is meant to be simple and easy to use, and is not particularly optimized. +To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Batch size has not been tested for optimality. ### Authors All grad students below contributed equally and the order is determined by random draw. From 3a50f614fcd03710ea709d7422f08591157d9ff2 Mon Sep 17 00:00:00 2001 From: Rohan Taori Date: Wed, 15 Mar 2023 11:03:26 -0700 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c582830..9b5da0e 100644 --- a/README.md +++ b/README.md @@ -166,7 +166,7 @@ torchrun --nproc_per_node=4 --master_port= train.py \ ``` Note the given training script is meant to be simple and easy to use, and is not particularly optimized. -To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Batch size has not been tested for optimality. +To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Global batch size has not been tested for optimality. ### Authors All grad students below contributed equally and the order is determined by random draw.