Update TRAINING_LOG.md

2024-10-01 01:06:10 -04:00 · 2023-03-28 13:24:05 -07:00 · 2023-03-28 13:24:05 -07:00 · f7b6263749
commit f7b6263749
parent d55df6b254
1 changed files with 5 additions and 1 deletions
--- a/TRAINING_LOG.md
+++ b/TRAINING_LOG.md
@ -230,4 +230,8 @@ We additionally train a full model
 | Weight decay   | 0     |
 | Warmup Steps   | 100   |

-Taking inspiration from [the Alpaca Repo](https://github.com/tatsu-lab/stanford_alpaca), we roughly scale the learning rate by `sqrt(k)`, where `k` is the increase in batch size, where Alpaca used a batch size of 128 and learning rate of 2e-5.
+Taking inspiration from [the Alpaca Repo](https://github.com/tatsu-lab/stanford_alpaca), we roughly scale the learning rate by `sqrt(k)`, where `k` is the increase in batch size, where Alpaca used a batch size of 128 and learning rate of 2e-5.
+
+Comparing our model LoRa to the [Alpaca LoRa](https://huggingface.co/tloen/alpaca-lora-7b), our model has lower perplexity. Qualitatively, training on 3 epochs performed the best on perplexity as well as qualitative examples. 
+
+We tried training a full model using the parameters above, but found that during the second epoch the model overfit.