oobabooga
|
9d5025f531
|
Improve error handling while loading GPTQ models
|
2023-05-19 11:20:08 -03:00 |
|
oobabooga
|
b667ffa51d
|
Simplify GPTQ_loader.py
|
2023-05-17 16:22:56 -03:00 |
|
oobabooga
|
fb91c07191
|
Minor bug fix
|
2023-05-17 11:16:37 -03:00 |
|
Alex "mcmonkey" Goodwin
|
1f50dbe352
|
Experimental jank multiGPU inference that's 2x faster than native somehow (#2100)
|
2023-05-17 10:41:09 -03:00 |
|
oobabooga
|
2eeb27659d
|
Fix bug in --cpu-memory
|
2023-05-12 06:17:07 -03:00 |
|
oobabooga
|
3316e33d14
|
Remove unused code
|
2023-05-10 11:59:59 -03:00 |
|
oobabooga
|
dfd9ba3e90
|
Remove duplicate code
|
2023-05-10 02:07:22 -03:00 |
|
minipasila
|
334486f527
|
Added instruct-following template for Metharme (#1679)
|
2023-05-09 22:29:22 -03:00 |
|
Carl Kenner
|
814f754451
|
Support for MPT, INCITE, WizardLM, StableLM, Galactica, Vicuna, Guanaco, and Baize instruction following (#1596)
|
2023-05-09 20:37:31 -03:00 |
|
IJumpAround
|
020fe7b50b
|
Remove mutable defaults from function signature. (#1663)
|
2023-05-08 22:55:41 -03:00 |
|
Matthew McAllister
|
d78b04f0b4
|
Add error message when GPTQ-for-LLaMa import fails (#1871)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2023-05-08 22:29:09 -03:00 |
|
camenduru
|
ba65a48ec8
|
trust_remote_code=shared.args.trust_remote_code (#1891)
|
2023-05-07 17:42:44 -03:00 |
|
oobabooga
|
b6ff138084
|
Add --checkpoint argument for GPTQ
|
2023-05-04 15:17:20 -03:00 |
|
oobabooga
|
95d04d6a8d
|
Better warning messages
|
2023-05-03 21:43:17 -03:00 |
|
Wojtab
|
12212cf6be
|
LLaVA support (#1487)
|
2023-04-23 20:32:22 -03:00 |
|
oobabooga
|
7438f4f6ba
|
Change GPTQ triton default settings
|
2023-04-22 12:27:30 -03:00 |
|
USBhost
|
e1aa9d5173
|
Support upstream GPTQ once again. (#1451)
|
2023-04-21 12:43:56 -03:00 |
|
sgsdxzy
|
b57ffc2ec9
|
Update to support GPTQ triton commit c90adef (#1229)
|
2023-04-17 01:11:18 -03:00 |
|
oobabooga
|
39099663a0
|
Add 4-bit LoRA support (#1200)
|
2023-04-16 23:26:52 -03:00 |
|
oobabooga
|
a75e02de4d
|
Simplify GPTQ_loader.py
|
2023-04-13 12:13:07 -03:00 |
|
oobabooga
|
ca293bb713
|
Show a warning if two quantized models are found
|
2023-04-13 12:04:27 -03:00 |
|
oobabooga
|
fde6d06167
|
Prioritize names with the groupsize in them
|
2023-04-13 11:27:03 -03:00 |
|
oobabooga
|
f2bf1a2c9e
|
Add some comments, remove obsolete code
|
2023-04-13 11:17:32 -03:00 |
|
Light
|
da74cd7c44
|
Generalized weight search path.
|
2023-04-13 21:43:32 +08:00 |
|
Light
|
cf58058c33
|
Change warmup_autotune to a negative switch.
|
2023-04-13 20:59:49 +08:00 |
|
Light
|
a405064ceb
|
Better dispatch.
|
2023-04-13 01:48:17 +08:00 |
|
Light
|
f3591ccfa1
|
Keep minimal change.
|
2023-04-12 23:26:06 +08:00 |
|
oobabooga
|
8c6155251a
|
More robust 4-bit model loading
|
2023-04-09 23:19:28 -03:00 |
|
oobabooga
|
ea6e77df72
|
Make the code more like PEP8 for readability (#862)
|
2023-04-07 00:15:45 -03:00 |
|
EyeDeck
|
39f3fec913
|
Broaden GPTQ-for-LLaMA branch support (#820)
|
2023-04-06 12:16:48 -03:00 |
|
oobabooga
|
3d6cb5ed63
|
Minor rewrite
|
2023-04-05 01:21:40 -03:00 |
|
oobabooga
|
f3a2e0b8a9
|
Disable pre_layer when the model type is not llama
|
2023-04-05 01:19:26 -03:00 |
|
catalpaaa
|
4ab679480e
|
allow quantized model to be loaded from model dir (#760)
|
2023-04-04 23:19:38 -03:00 |
|
OWKenobi
|
ee4547cd34
|
Detect "vicuna" as llama model type (#772)
|
2023-04-04 13:23:27 -03:00 |
|
oobabooga
|
1cb9246160
|
Adapt to the new model names
|
2023-03-29 21:47:36 -03:00 |
|
oobabooga
|
010b259dde
|
Update documentation
|
2023-03-28 17:46:00 -03:00 |
|
oobabooga
|
0bec15ebcd
|
Reorder imports
|
2023-03-28 17:34:15 -03:00 |
|
Maya Eary
|
41ec682834
|
Disable kernel threshold for gpt-j
|
2023-03-28 22:45:38 +03:00 |
|
Maya Eary
|
1c075d8d21
|
Fix typo
|
2023-03-28 20:43:50 +03:00 |
|
Maya Eary
|
c8207d474f
|
Generalized load_quantized
|
2023-03-28 20:38:55 +03:00 |
|
oobabooga
|
49c10c5570
|
Add support for the latest GPTQ models with group-size (#530)
**Warning: old 4-bit weights will not work anymore!**
See here how to get up to date weights: https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#step-2-get-the-pre-converted-weights
|
2023-03-26 00:11:33 -03:00 |
|
EyeDeck
|
dcfd866402
|
Allow loading of .safetensors through GPTQ-for-LLaMa
|
2023-03-23 21:31:34 -04:00 |
|
oobabooga
|
db4219a340
|
Update comments
|
2023-03-20 16:40:08 -03:00 |
|
oobabooga
|
7618f3fe8c
|
Add -gptq-preload for 4-bit offloading (#460)
This works in a 4GB card now:
```
python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20
```
|
2023-03-20 16:30:56 -03:00 |
|
oobabooga
|
9a3bed50c3
|
Attempt at fixing 4-bit with CPU offload
|
2023-03-20 15:11:56 -03:00 |
|
askmyteapot
|
53b6a66beb
|
Update GPTQ_Loader.py
Correcting decoder layer for renamed class.
|
2023-03-17 18:34:13 +10:00 |
|
oobabooga
|
265ba384b7
|
Rename a file, add deprecation warning for --load-in-4bit
|
2023-03-14 07:56:31 -03:00 |
|