Commit Graph

3763 Commits

Author SHA1 Message Date
oobabooga
564a3e1553 Remove the awkward "Tab" keyboard shortcut 2024-06-23 22:31:07 -07:00
oobabooga
577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
oobabooga
536f8d58d4 Do not expose alpha_value to llama.cpp & rope_freq_base to transformers
To avoid confusion
2024-06-23 22:09:24 -07:00
oobabooga
b48ab482f8 Remove obsolete "gptq_for_llama_info" message 2024-06-23 22:05:19 -07:00
oobabooga
5e8dc56f8a Fix after previous commit 2024-06-23 21:58:28 -07:00
Louis Del Valle
57119c1b30
Update block_requests.py to resolve unexpected type error (500 error) (#5976) 2024-06-24 01:56:51 -03:00
oobabooga
125bb7b03b Revert "Bump llama-cpp-python to 0.2.78"
This reverts commit b6eaf7923e.
2024-06-23 19:54:28 -07:00
CharlesCNorton
5993904acf
Fix several typos in the codebase (#6151) 2024-06-22 21:40:25 -03:00
GodEmperor785
2c5a9eb597
Change limits of RoPE scaling sliders in UI (#6142) 2024-06-19 21:42:17 -03:00
oobabooga
5904142777 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev 2024-06-19 17:41:09 -07:00
oobabooga
b10d735176 Minor CSS linting 2024-06-19 17:40:33 -07:00
Guanghua Lu
229d89ccfb
Make logs more readable, no more \u7f16\u7801 (#6127) 2024-06-15 23:00:13 -03:00
oobabooga
fd7c3c5bb0 Don't git pull on installation (to make past releases installable) 2024-06-15 06:38:05 -07:00
oobabooga
b6eaf7923e Bump llama-cpp-python to 0.2.78 2024-06-14 21:22:09 -07:00
oobabooga
9420973b62
Downgrade PyTorch to 2.2.2 (#6124) 2024-06-14 16:42:03 -03:00
Forkoz
1576227f16
Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-06-14 13:51:01 -03:00
dependabot[bot]
fdd8fab9cf
Bump hqq from 0.1.7.post2 to 0.1.7.post3 (#6090) 2024-06-14 13:46:35 -03:00
oobabooga
10601850d9 Fix after previous commit 2024-06-13 19:54:12 -07:00
oobabooga
0f3a423de1 Alternative solution to "get next logits" deadlock (#6106) 2024-06-13 19:34:16 -07:00
oobabooga
9aef01551d Revert "Use reentrant generation lock (#6107)"
This reverts commit b675151f25.
2024-06-13 17:53:07 -07:00
oobabooga
8930bfc5f4
Bump PyTorch, ExLlamaV2, flash-attention (#6122) 2024-06-13 20:38:31 -03:00
oobabooga
386500aa37 Avoid unnecessary calls UI -> backend, to make it faster 2024-06-12 20:52:42 -07:00
Forkoz
1d79aa67cf
Fix flash-attn UI parameter to actually store true. (#6076) 2024-06-13 00:34:54 -03:00
Belladore
3abafee696
DRY sampler improvements (#6053) 2024-06-12 23:39:11 -03:00
theo77186
b675151f25
Use reentrant generation lock (#6107) 2024-06-12 23:25:05 -03:00
oobabooga
a36fa73071 Lint 2024-06-12 19:00:21 -07:00
oobabooga
2d196ed2fe Remove obsolete pre_layer parameter 2024-06-12 18:56:44 -07:00
Belladore
46174a2d33
Fix error when bos_token_id is None. (#6061) 2024-06-12 22:52:27 -03:00
Belladore
a363cdfca1
Fix missing bos token for some models (including Llama-3) (#6050) 2024-05-27 09:21:30 -03:00
oobabooga
8df68b05e9 Remove MinPLogitsWarper (it's now a transformers built-in) 2024-05-27 05:03:30 -07:00
oobabooga
4f1e96b9e3 Downloader: Add --model-dir argument, respect --model-dir in the UI 2024-05-23 20:42:46 -07:00
oobabooga
ad54d524f7 Revert "Fix stopping strings for llama-3 and phi (#6043)"
This reverts commit 5499bc9bc8.
2024-05-22 17:18:08 -07:00
oobabooga
5499bc9bc8
Fix stopping strings for llama-3 and phi (#6043) 2024-05-22 13:53:59 -03:00
rohitanshu
8aaa0a6f4e
Fixed minor typo in docs - Training Tab.md (#6038) 2024-05-21 14:52:22 -03:00
oobabooga
9e189947d1 Minor fix after bd7cc4234d (thanks @belladoreai) 2024-05-21 10:37:30 -07:00
oobabooga
ae86292159 Fix getting Phi-3-small-128k-instruct logits 2024-05-21 10:35:00 -07:00
oobabooga
bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
oobabooga
6a1682aa95 README: update command-line flags with raw --help output
This helps me keep this up-to-date more easily.
2024-05-19 20:28:46 -07:00
Philipp Emanuel Weidmann
852c943769
DRY: A modern repetition penalty that reliably prevents looping (#5677) 2024-05-19 23:53:47 -03:00
oobabooga
9f77ed1b98
--idle-timeout flag to unload the model if unused for N minutes (#6026) 2024-05-19 23:29:39 -03:00
altoiddealer
818b4e0354
Let grammar escape backslashes (#5865) 2024-05-19 20:26:09 -03:00
Tisjwlf
907702c204
Fix gguf multipart file loading (#5857) 2024-05-19 20:22:09 -03:00
Guanghua Lu
d7bd3da35e
Add Llama 3 instruction template (#5891) 2024-05-19 20:17:26 -03:00
A0nameless0man
5cb59707f3
fix: grammar not support utf-8 (#5900) 2024-05-19 20:10:39 -03:00
Jari Van Melckebeke
8456d13349
[docs] small docker changes (#5917) 2024-05-19 20:09:37 -03:00
Samuel Wein
b63dc4e325
UI: Warn user if they are trying to load a model from no path (#6006) 2024-05-19 20:05:17 -03:00
dependabot[bot]
2de586f586
Update accelerate requirement from ==0.27.* to ==0.30.* (#5989) 2024-05-19 20:03:18 -03:00
chr
6b546a2c8b
llama.cpp: increase the max threads from 32 to 256 (#5889) 2024-05-19 20:02:19 -03:00
oobabooga
a38a37b3b3 llama.cpp: default n_gpu_layers to the maximum value for the model automatically 2024-05-19 10:57:42 -07:00
oobabooga
a4611232b7 Make --verbose output less spammy 2024-05-18 09:57:00 -07:00