From 9a14edbc84db956b094cffde453caf696a58bf69 Mon Sep 17 00:00:00 2001
From: Rohan Taori <rtaori13@gmail.com>
Date: Wed, 15 Mar 2023 09:43:23 -0700
Subject: [PATCH 1/2] Update gpu scaling batch size instructions

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index f5b8c52..c582830 100644
--- a/README.md
+++ b/README.md
@@ -166,6 +166,7 @@ torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
 ```
 
 Note the given training script is meant to be simple and easy to use, and is not particularly optimized.
+To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Batch size has not been tested for optimality.
 
 ### Authors
 All grad students below contributed equally and the order is determined by random draw.

From 3a50f614fcd03710ea709d7422f08591157d9ff2 Mon Sep 17 00:00:00 2001
From: Rohan Taori <rtaori13@gmail.com>
Date: Wed, 15 Mar 2023 11:03:26 -0700
Subject: [PATCH 2/2] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index c582830..9b5da0e 100644
--- a/README.md
+++ b/README.md
@@ -166,7 +166,7 @@ torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
 ```
 
 Note the given training script is meant to be simple and easy to use, and is not particularly optimized.
-To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Batch size has not been tested for optimality.
+To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Global batch size has not been tested for optimality.
 
 ### Authors
 All grad students below contributed equally and the order is determined by random draw.