Commit Graph

285 Commits

Author SHA1 Message Date
GralchemOz
8a39f579d8
transformers: Add eager attention option to make Gemma-2 work properly (#6188) 2024-07-01 12:08:08 -03:00
oobabooga
577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
oobabooga
bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
oobabooga
9f77ed1b98
--idle-timeout flag to unload the model if unused for N minutes (#6026) 2024-05-19 23:29:39 -03:00
oobabooga
e61055253c Bump llama-cpp-python to 0.2.69, add --flash-attn option 2024-05-03 04:31:22 -07:00
oobabooga
51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964) 2024-04-30 09:11:31 -03:00
oobabooga
70845c76fb
Add back the max_updates_second parameter (#5937) 2024-04-26 10:14:51 -03:00
oobabooga
9b623b8a78
Bump llama-cpp-python to 0.2.64, use official wheels (#5921) 2024-04-23 23:17:05 -03:00
oobabooga
cbd65ba767
Add a simple min_p preset, make it the default (#5836) 2024-04-09 12:50:16 -03:00
oobabooga
168a0f4f67 UI: do not load the "gallery" extension by default 2024-04-06 12:43:21 -07:00
oobabooga
d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga
2a92a842ce
Bump gradio to 4.23 (#5758) 2024-03-26 16:32:20 -03:00
oobabooga
28076928ac
UI: Add a new "User description" field for user personality/biography (#5691) 2024-03-11 23:41:57 -03:00
oobabooga
056717923f Document StreamingLLM 2024-03-10 19:15:23 -07:00
oobabooga
afb51bd5d6
Add StreamingLLM for llamacpp & llamacpp_HF (2nd attempt) (#5669) 2024-03-09 00:25:33 -03:00
Bartowski
104573f7d4
Update cache_4bit documentation (#5649)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-03-07 13:08:21 -03:00
oobabooga
2ec1d96c91
Add cache_4bit option for ExLlamaV2 (#5645) 2024-03-06 23:02:25 -03:00
oobabooga
2174958362
Revert gradio to 3.50.2 (#5640) 2024-03-06 11:52:46 -03:00
oobabooga
63a1d4afc8
Bump gradio to 4.19 (#5522) 2024-03-05 07:32:28 -03:00
oobabooga
a6730f88f7
Add --autosplit flag for ExLlamaV2 (#5524) 2024-02-16 15:26:10 -03:00
oobabooga
76d28eaa9e
Add a menu for customizing the instruction template for the model (#5521) 2024-02-16 14:21:17 -03:00
oobabooga
080f7132c0
Revert gradio to 3.50.2 (#5513) 2024-02-15 20:40:23 -03:00
oobabooga
7123ac3f77
Remove "Maximum UI updates/second" parameter (#5507) 2024-02-14 23:34:30 -03:00
oobabooga
acfbe6b3b3 Minor doc changes 2024-02-06 06:35:01 -08:00
oobabooga
8a6d9abb41 Small fixes 2024-02-06 06:26:27 -08:00
oobabooga
2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga
8c35fefb3b
Add custom sampler order support (#5443) 2024-02-06 11:20:10 -03:00
Forkoz
2a45620c85
Split by rows instead of layers for llama.cpp multi-gpu (#5435) 2024-02-04 23:36:40 -03:00
oobabooga
cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
oobabooga
e055967974
Add prompt_lookup_num_tokens parameter (#5296) 2024-01-17 17:09:36 -03:00
oobabooga
53dc1d8197 UI: Do not save unchanged settings to settings.yaml 2024-01-09 18:59:04 -08:00
oobabooga
2aad91f3c9
Remove deprecated command-line flags (#5131) 2023-12-31 02:07:48 -03:00
oobabooga
2734ce3e4c
Remove RWKV loader (#5130) 2023-12-31 02:01:40 -03:00
oobabooga
0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
oobabooga
8e397915c9
Remove --sdp-attention, --xformers flags (#5126) 2023-12-31 01:36:51 -03:00
oobabooga
8c60495878 UI: add "Maximum UI updates/second" parameter 2023-12-24 09:17:40 -08:00
oobabooga
2706149c65
Organize the CMD arguments by group (#5027) 2023-12-21 00:33:55 -03:00
oobabooga
9992f7d8c0 Improve several log messages 2023-12-19 20:54:32 -08:00
oobabooga
de138b8ba6
Add llama-cpp-python wheels with tensor cores support (#5003) 2023-12-19 17:30:53 -03:00
oobabooga
0a299d5959
Bump llama-cpp-python to 0.2.24 (#5001) 2023-12-19 15:22:21 -03:00
oobabooga
a23a004434 Update the example template 2023-12-18 17:47:35 -08:00
Water
674be9a09a
Add HQQ quant loader (#4888)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:23:16 -03:00
oobabooga
f1f2c4c3f4
Add --num_experts_per_token parameter (ExLlamav2) (#4955) 2023-12-17 12:08:33 -03:00
oobabooga
3bbf6c601d AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this) 2023-12-15 06:46:13 -08:00
oobabooga
1c531a3713 Minor cleanup 2023-12-12 13:25:21 -08:00
oobabooga
39d2fe1ed9
Jinja templates for Instruct and Chat (#4874) 2023-12-12 17:23:14 -03:00
oobabooga
8c8825b777 Add QuIP# to README 2023-12-08 08:40:42 -08:00
oobabooga
2c5a1e67f9
Parameters: change max_new_tokens & repetition_penalty_range defaults (#4842) 2023-12-07 20:04:52 -03:00
oobabooga
98361af4d5
Add QuIP# support (#4803)
It has to be installed manually for now.
2023-12-06 00:01:01 -03:00
oobabooga
131a5212ce UI: update context upper limit to 200000 2023-12-04 15:48:34 -08:00