Commit Graph

166 Commits

Author SHA1 Message Date
oobabooga
bcf0075278
Merge pull request #235 from xanthousm/Quality_of_life-main
--auto-launch and "Is typing..."
2023-03-12 03:12:56 -03:00
oobabooga
341e135036 Various fixes in chat mode 2023-03-12 02:53:08 -03:00
oobabooga
b0e8cb8c88 Various fixes in chat mode 2023-03-12 02:31:45 -03:00
oobabooga
0bd5430988 Use 'with' statement to better handle streaming memory 2023-03-12 02:04:28 -03:00
oobabooga
37f0166b2d Fix memory leak in new streaming (second attempt) 2023-03-11 23:14:49 -03:00
oobabooga
92fe947721 Merge branch 'main' into new-streaming 2023-03-11 19:59:45 -03:00
oobabooga
2743dd736a Add *Is typing...* to impersonate as well 2023-03-11 10:50:18 -03:00
Xan
96c51973f9 --auto-launch and "Is typing..."
- Added `--auto-launch` arg to open web UI in the default browser when ready.
- Changed chat.py to display user input immediately and "*Is typing...*" as a temporary reply while generating text. Most noticeable when using `--no-stream`.
2023-03-11 22:50:59 +11:00
oobabooga
026d60bd34 Remove default preset that didn't do anything 2023-03-10 14:01:02 -03:00
oobabooga
e9dbdafb14
Merge branch 'main' into pt-path-changes 2023-03-10 11:03:42 -03:00
oobabooga
706a03b2cb Minor changes 2023-03-10 11:02:25 -03:00
oobabooga
de7dd8b6aa Add comments 2023-03-10 10:54:08 -03:00
oobabooga
e461c0b7a0 Move the import to the top 2023-03-10 10:51:12 -03:00
deepdiffuser
9fbd60bf22 add no_split_module_classes to prevent tensor split error 2023-03-10 05:30:47 -08:00
deepdiffuser
ab47044459 add multi-gpu support for 4bit gptq LLaMA 2023-03-10 04:52:45 -08:00
rohvani
2ac2913747 fix reference issue 2023-03-09 20:13:23 -08:00
rohvani
826e297b0e add llama-65b-4bit support & multiple pt paths 2023-03-09 18:31:32 -08:00
oobabooga
9849aac0f1 Don't show .pt models in the list 2023-03-09 21:54:50 -03:00
oobabooga
74102d5ee4 Insert to the path instead of appending 2023-03-09 20:51:22 -03:00
oobabooga
2965aa1625 Check if the .pt file exists 2023-03-09 20:48:51 -03:00
oobabooga
828a524f9a Add LLaMA 4-bit support 2023-03-09 15:50:26 -03:00
oobabooga
59b5f7a4b7 Improve usage of stopping_criteria 2023-03-08 12:13:40 -03:00
oobabooga
add9330e5e Bug fixes 2023-03-08 11:26:29 -03:00
oobabooga
33fb6aed74 Minor bug fix 2023-03-08 03:08:16 -03:00
oobabooga
ad2970374a Readability improvements 2023-03-08 03:00:06 -03:00
oobabooga
72d539dbff Better separate the FlexGen case 2023-03-08 02:54:47 -03:00
oobabooga
0e16c0bacb Remove redeclaration of a function 2023-03-08 02:50:49 -03:00
oobabooga
ab50f80542 New text streaming method (much faster) 2023-03-08 02:46:35 -03:00
oobabooga
8e89bc596b Fix encode() for RWKV 2023-03-07 23:15:46 -03:00
oobabooga
19a34941ed Add proper streaming to RWKV 2023-03-07 18:17:56 -03:00
oobabooga
8660227e1b Add top_k to RWKV 2023-03-07 17:24:28 -03:00
oobabooga
153dfeb4dd Add --rwkv-cuda-on parameter, bump rwkv version 2023-03-06 20:12:54 -03:00
oobabooga
6904a507c6 Change some parameters 2023-03-06 16:29:43 -03:00
oobabooga
20bd645f6a Fix bug in multigpu setups (attempt 3) 2023-03-06 15:58:18 -03:00
oobabooga
09a7c36e1b Minor improvement while running custom models 2023-03-06 15:36:35 -03:00
oobabooga
24c4c20391 Fix bug in multigpu setups (attempt #2) 2023-03-06 15:23:29 -03:00
oobabooga
d88b7836c6 Fix bug in multigpu setups 2023-03-06 14:58:30 -03:00
oobabooga
5bed607b77 Increase repetition frequency/penalty for RWKV 2023-03-06 14:25:48 -03:00
oobabooga
bf56b6c1fb Load settings.json without the need for --settings settings.json
This is for setting UI defaults
2023-03-06 10:57:45 -03:00
oobabooga
e91f4bc25a Add RWKV tokenizer 2023-03-06 08:45:49 -03:00
oobabooga
c855b828fe Better handle <USER> 2023-03-05 17:01:47 -03:00
oobabooga
2af66a4d4c Fix <USER> in pygmalion replies 2023-03-05 16:08:50 -03:00
oobabooga
a54b91af77 Improve readability 2023-03-05 10:21:15 -03:00
oobabooga
8e706df20e Fix a memory leak when text streaming is on 2023-03-05 10:12:43 -03:00
oobabooga
c33715ad5b Move towards HF LLaMA implementation 2023-03-05 01:20:31 -03:00
oobabooga
bd8aac8fa4 Add LLaMA 8-bit support 2023-03-04 13:28:42 -03:00
oobabooga
c93f1fa99b Count the tokens more conservatively 2023-03-04 03:10:21 -03:00
oobabooga
ed8b35efd2 Add --pin-weight parameter for FlexGen 2023-03-04 01:04:02 -03:00
oobabooga
05e703b4a4 Print the performance information more reliably 2023-03-03 21:24:32 -03:00
oobabooga
5a79863df3 Increase the sequence length, decrease batch size
I have no idea what I am doing
2023-03-03 15:54:13 -03:00