oobabooga
|
0b193b8553
|
Downloader: handle one more retry case after 5770e06c48
|
2024-05-04 19:25:22 -07:00 |
|
oobabooga
|
cb31998605
|
Add a template for NVIDIA ChatQA models
|
2024-05-03 08:19:04 -07:00 |
|
oobabooga
|
e9c9483171
|
Improve the logging messages while loading models
|
2024-05-03 08:10:44 -07:00 |
|
oobabooga
|
e61055253c
|
Bump llama-cpp-python to 0.2.69, add --flash-attn option
|
2024-05-03 04:31:22 -07:00 |
|
oobabooga
|
0476f9fe70
|
Bump ExLlamaV2 to 0.0.20
|
2024-05-01 16:20:50 -07:00 |
|
oobabooga
|
ae0f28530c
|
Bump llama-cpp-python to 0.2.68
|
2024-05-01 08:40:50 -07:00 |
|
oobabooga
|
1eba888af6
|
Update FUNDING.yml
|
2024-05-01 05:54:21 -07:00 |
|
oobabooga
|
51fb766bea
|
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964)
|
2024-04-30 09:11:31 -03:00 |
|
oobabooga
|
5770e06c48
|
Add a retry mechanism to the model downloader (#5943)
|
2024-04-27 12:25:28 -03:00 |
|
oobabooga
|
dfdb6fee22
|
Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit
To allow for 32-bit CPU offloading (it's very slow).
|
2024-04-26 09:39:27 -07:00 |
|
oobabooga
|
70845c76fb
|
Add back the max_updates_second parameter (#5937)
|
2024-04-26 10:14:51 -03:00 |
|
oobabooga
|
6761b5e7c6
|
Improved instruct style (with syntax highlighting & LaTeX rendering) (#5936)
|
2024-04-26 10:13:11 -03:00 |
|
oobabooga
|
9c04365f54
|
Detect the airoboros-3_1-yi-34b-200k template
|
2024-04-25 16:50:54 -07:00 |
|
oobabooga
|
8b1dee3ec8
|
Detect platypus-yi-34b, CausalLM-RP-34B, 34b-beta instruction templates
|
2024-04-24 21:47:43 -07:00 |
|
oobabooga
|
4aa481282b
|
Detect the xwin-lm-70b-v0.1 instruction template
|
2024-04-24 17:02:20 -07:00 |
|
oobabooga
|
c9b0df16ee
|
Lint
|
2024-04-24 09:55:00 -07:00 |
|
oobabooga
|
4094813f8d
|
Lint
|
2024-04-24 09:53:41 -07:00 |
|
oobabooga
|
64e2a9a0a7
|
Fix the Phi-3 template when used in the UI
|
2024-04-24 01:34:11 -07:00 |
|
oobabooga
|
f0538efb99
|
Remove obsolete --tensorcores references
|
2024-04-24 00:31:28 -07:00 |
|
Colin
|
f3c9103e04
|
Revert walrus operator for params['max_memory'] (#5878)
|
2024-04-24 01:09:14 -03:00 |
|
Jari Van Melckebeke
|
c725d97368
|
nvidia docker: make sure gradio listens on 0.0.0.0 (#5918)
|
2024-04-23 23:17:55 -03:00 |
|
oobabooga
|
9b623b8a78
|
Bump llama-cpp-python to 0.2.64, use official wheels (#5921)
|
2024-04-23 23:17:05 -03:00 |
|
Ashley Kleynhans
|
0877741b03
|
Bumped ExLlamaV2 to version 0.0.19 to resolve #5851 (#5880)
|
2024-04-19 19:04:40 -03:00 |
|
oobabooga
|
f27e1ba302
|
Add a /v1/internal/chat-prompt endpoint (#5879)
|
2024-04-19 00:24:46 -03:00 |
|
oobabooga
|
b30bce3b2f
|
Bump transformers to 4.40
|
2024-04-18 16:19:31 -07:00 |
|
Philipp Emanuel Weidmann
|
a0c69749e6
|
Revert sse-starlette version bump because it breaks API request cancellation (#5873)
|
2024-04-18 15:05:00 -03:00 |
|
mamei16
|
8985a8538b
|
Fix whisper STT (#5856)
|
2024-04-14 10:55:58 -03:00 |
|
dependabot[bot]
|
597556cb77
|
Bump sse-starlette from 1.6.5 to 2.1.0 (#5831)
|
2024-04-11 18:54:05 -03:00 |
|
oobabooga
|
e158299fb4
|
Fix loading sharted GGUF models through llamacpp_HF
|
2024-04-11 14:50:05 -07:00 |
|
wangshuai09
|
fd4e46bce2
|
Add Ascend NPU support (basic) (#5541)
|
2024-04-11 18:42:20 -03:00 |
|
zaypen
|
a90509d82e
|
Model downloader: Take HF_ENDPOINT in consideration (#5571)
|
2024-04-11 18:28:10 -03:00 |
|
Ashley Kleynhans
|
70c637bf90
|
Fix saving of UI defaults to settings.yaml - Fixes #5592 (#5794)
|
2024-04-11 18:19:16 -03:00 |
|
oobabooga
|
3e3a7c4250
|
Bump llama-cpp-python to 0.2.61 & fix the crash
|
2024-04-11 14:15:34 -07:00 |
|
oobabooga
|
5f5ceaf025
|
Revert "Bump llama-cpp-python to 0.2.61"
This reverts commit 3ae61c0338 .
|
2024-04-11 13:24:57 -07:00 |
|
dependabot[bot]
|
bd71a504b8
|
Update gradio requirement from ==4.25.* to ==4.26.* (#5832)
|
2024-04-11 02:24:53 -03:00 |
|
Victorivus
|
c423d51a83
|
Fix issue #5783 for character images with transparency (#5827)
|
2024-04-11 02:23:43 -03:00 |
|
Alex O'Connell
|
b94cd6754e
|
UI: Respect model and lora directory settings when downloading files (#5842)
|
2024-04-11 01:55:02 -03:00 |
|
oobabooga
|
17c4319e2d
|
Fix loading command-r context length metadata
|
2024-04-10 21:39:59 -07:00 |
|
oobabooga
|
3ae61c0338
|
Bump llama-cpp-python to 0.2.61
|
2024-04-10 21:39:46 -07:00 |
|
oobabooga
|
cbd65ba767
|
Add a simple min_p preset, make it the default (#5836)
|
2024-04-09 12:50:16 -03:00 |
|
oobabooga
|
ed4001e324
|
Bump ExLlamaV2 to 0.0.18
|
2024-04-08 18:05:16 -07:00 |
|
oobabooga
|
f6828de3f2
|
Downgrade llama-cpp-python to 0.2.56
|
2024-04-07 07:00:12 -07:00 |
|
Jared Van Bortel
|
39ff9c9dcf
|
requirements: add psutil (#5819)
|
2024-04-06 23:02:20 -03:00 |
|
oobabooga
|
d02744282b
|
Minor logging change
|
2024-04-06 18:56:58 -07:00 |
|
oobabooga
|
dfb01f9a63
|
Bump llama-cpp-python to 0.2.60
|
2024-04-06 18:32:36 -07:00 |
|
oobabooga
|
096f75a432
|
Documentation: remove obsolete RWKV docs
|
2024-04-06 14:06:39 -07:00 |
|
oobabooga
|
dd6e4ac55f
|
Prevent double <BOS_TOKEN> with Command R+
|
2024-04-06 13:14:32 -07:00 |
|
oobabooga
|
1bdceea2d4
|
UI: Focus on the chat input after starting a new chat
|
2024-04-06 12:57:57 -07:00 |
|
oobabooga
|
168a0f4f67
|
UI: do not load the "gallery" extension by default
|
2024-04-06 12:43:21 -07:00 |
|
oobabooga
|
64a76856bd
|
Metadata: Fix loading Command R+ template with multiple options
|
2024-04-06 07:32:17 -07:00 |
|