mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2024-10-01 01:06:10 -04:00
Update TRAINING_LOG.md
This commit is contained in:
parent
a44a98f445
commit
97c389231b
@ -230,4 +230,8 @@ We additionally train a full model
|
||||
| Weight decay | 0 |
|
||||
| Warmup Steps | 100 |
|
||||
|
||||
Taking inspiration from [the Alpaca Repo](https://github.com/tatsu-lab/stanford_alpaca), we roughly scale the learning rate by `sqrt(k)`, where `k` is the increase in batch size, where Alpaca used a batch size of 128 and learning rate of 2e-5.
|
||||
Taking inspiration from [the Alpaca Repo](https://github.com/tatsu-lab/stanford_alpaca), we roughly scale the learning rate by `sqrt(k)`, where `k` is the increase in batch size, where Alpaca used a batch size of 128 and learning rate of 2e-5.
|
||||
|
||||
Comparing our model LoRa to the [Alpaca LoRa](https://huggingface.co/tloen/alpaca-lora-7b), our model has lower perplexity. Qualitatively, training on 3 epochs performed the best on perplexity as well as qualitative examples.
|
||||
|
||||
We tried training a full model using the parameters above, but found that during the second epoch the model overfit.
|
||||
|
Loading…
Reference in New Issue
Block a user