Commit Graph

17 Commits

Author SHA1 Message Date
oobabooga
3d6cb5ed63 Minor rewrite 2023-04-05 01:21:40 -03:00
oobabooga
f3a2e0b8a9 Disable pre_layer when the model type is not llama 2023-04-05 01:19:26 -03:00
catalpaaa
4ab679480e
allow quantized model to be loaded from model dir (#760) 2023-04-04 23:19:38 -03:00
OWKenobi
ee4547cd34
Detect "vicuna" as llama model type (#772) 2023-04-04 13:23:27 -03:00
oobabooga
1cb9246160 Adapt to the new model names 2023-03-29 21:47:36 -03:00
oobabooga
010b259dde Update documentation 2023-03-28 17:46:00 -03:00
oobabooga
0bec15ebcd Reorder imports 2023-03-28 17:34:15 -03:00
Maya Eary
41ec682834 Disable kernel threshold for gpt-j 2023-03-28 22:45:38 +03:00
Maya Eary
1c075d8d21 Fix typo 2023-03-28 20:43:50 +03:00
Maya Eary
c8207d474f Generalized load_quantized 2023-03-28 20:38:55 +03:00
oobabooga
49c10c5570
Add support for the latest GPTQ models with group-size (#530)
**Warning: old 4-bit weights will not work anymore!**

See here how to get up to date weights: https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#step-2-get-the-pre-converted-weights
2023-03-26 00:11:33 -03:00
EyeDeck
dcfd866402 Allow loading of .safetensors through GPTQ-for-LLaMa 2023-03-23 21:31:34 -04:00
oobabooga
db4219a340
Update comments 2023-03-20 16:40:08 -03:00
oobabooga
7618f3fe8c
Add -gptq-preload for 4-bit offloading (#460)
This works in a 4GB card now:

```
python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20
```
2023-03-20 16:30:56 -03:00
oobabooga
9a3bed50c3
Attempt at fixing 4-bit with CPU offload 2023-03-20 15:11:56 -03:00
askmyteapot
53b6a66beb
Update GPTQ_Loader.py
Correcting decoder layer for renamed class.
2023-03-17 18:34:13 +10:00
oobabooga
265ba384b7 Rename a file, add deprecation warning for --load-in-4bit 2023-03-14 07:56:31 -03:00