Bartowski
|
104573f7d4
|
Update cache_4bit documentation (#5649)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2024-03-07 13:08:21 -03:00 |
|
oobabooga
|
2ec1d96c91
|
Add cache_4bit option for ExLlamaV2 (#5645)
|
2024-03-06 23:02:25 -03:00 |
|
oobabooga
|
d6bd71db7f
|
ExLlamaV2: fix loading when autosplit is not set
|
2024-02-17 12:54:37 -08:00 |
|
oobabooga
|
a6730f88f7
|
Add --autosplit flag for ExLlamaV2 (#5524)
|
2024-02-16 15:26:10 -03:00 |
|
oobabooga
|
f1f2c4c3f4
|
Add --num_experts_per_token parameter (ExLlamav2) (#4955)
|
2023-12-17 12:08:33 -03:00 |
|
oobabooga
|
c0655475ae
|
Add cache_8bit option
|
2023-11-02 11:23:04 -07:00 |
|
oobabooga
|
77abd9b69b
|
Add no_flash_attn option
|
2023-11-02 11:08:53 -07:00 |
|
oobabooga
|
fbac6d21ca
|
Add missing exception
|
2023-10-20 23:53:24 -07:00 |
|
turboderp
|
ae8cd449ae
|
ExLlamav2_HF: Convert logits to FP32 (#4310)
|
2023-10-18 23:16:05 -03:00 |
|
Forkoz
|
8cce1f1126
|
Exllamav2 lora support (#4229)
---------
Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
|
2023-10-14 16:12:41 -03:00 |
|
tdrussell
|
cb26163a20
|
Fix off-by-one error in exllama_hf caching logic (#4145)
|
2023-10-05 12:20:56 -03:00 |
|
oobabooga
|
03dc69edc5
|
ExLlama_HF (v1 and v2) prefix matching
|
2023-09-19 13:12:19 -07:00 |
|
oobabooga
|
605ec3c9f2
|
Add a warning about ExLlamaV2 without flash-attn
|
2023-09-18 12:26:35 -07:00 |
|
Panchovix
|
34dc7306b8
|
Fix NTK (alpha) and RoPE scaling for exllamav2 and exllamav2_HF (#3897)
|
2023-09-13 02:35:09 -03:00 |
|
oobabooga
|
b7adf290fc
|
Fix ExLlama-v2 path issue
|
2023-09-12 17:42:22 -07:00 |
|
oobabooga
|
18e6b275f3
|
Add alpha_value/compress_pos_emb to ExLlama-v2
|
2023-09-12 15:02:47 -07:00 |
|
oobabooga
|
c2a309f56e
|
Add ExLlamaV2 and ExLlamav2_HF loaders (#3881)
|
2023-09-12 14:33:07 -03:00 |
|