Commit Graph

1503 Commits

Author SHA1 Message Date
Cebtenzzre
5fe685427a chat: clearer CPU fallback messages 2023-10-06 11:35:14 -04:00
Adam Treat
eec906aa05 Speculative fix for build on mac. 2023-10-05 18:37:33 -04:00
Aaron Miller
9325075f80 fix stray comma in models2.json
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
2023-10-05 18:32:23 -04:00
Adam Treat
a9acdd25de Push a new version number for llmodel backend now that it is based on gguf. 2023-10-05 18:18:07 -04:00
Adam Treat
f028f67c68 Add starcoder, rift and sbert to our models2.json. 2023-10-05 18:16:19 -04:00
Aaron Miller
a10f3aea5e python/embed4all: use gguf model, allow passing kwargs/overriding model 2023-10-05 18:16:19 -04:00
Cebtenzzre
8bb6a6c201 rebase on newer llama.cpp 2023-10-05 18:16:19 -04:00
Adam Treat
4528f73479 Reorder and refresh our models2.json. 2023-10-05 18:16:19 -04:00
Cebtenzzre
d87573ea75 remove old llama.cpp submodules 2023-10-05 18:16:19 -04:00
Cebtenzzre
cc6db61c93 backend: fix build with Visual Studio generator
Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This
is needed because Visual Studio is a multi-configuration generator, so
we do not know what the build type will be until `cmake --build` is
called.

Fixes #1470
2023-10-05 18:16:19 -04:00
Adam Treat
f605a5b686 Add q8_0 kernels to kompute shaders and bump to latest llama/gguf. 2023-10-05 18:16:19 -04:00
Cebtenzzre
1534df3e9f backend: do not use Vulkan with non-LLaMA models 2023-10-05 18:16:19 -04:00
Cebtenzzre
672cb850f9 differentiate between init failure and unsupported models 2023-10-05 18:16:19 -04:00
Cebtenzzre
a5b93cf095 more accurate fallback descriptions 2023-10-05 18:16:19 -04:00
Cebtenzzre
75deee9adb chat: make sure to clear fallback reason on success 2023-10-05 18:16:19 -04:00
Cebtenzzre
2eb83b9f2a chat: report reason for fallback to CPU 2023-10-05 18:16:19 -04:00
Adam Treat
906699e8e9 Bump to latest llama/gguf branch. 2023-10-05 18:16:19 -04:00
Adam Treat
ea66669cef Switch to new models2.json for new gguf release and bump our version to
2.5.0.
2023-10-05 18:16:19 -04:00
Cebtenzzre
088afada49 llamamodel: fix static vector in LLamaModel::endTokens 2023-10-05 18:16:19 -04:00
Adam Treat
b4d82ea289 Bump to the latest fixes for vulkan in llama. 2023-10-05 18:16:19 -04:00
Adam Treat
12f943e966 Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. 2023-10-05 18:16:19 -04:00
Cebtenzzre
40c78d2f78 python binding: print debug message to stderr 2023-10-05 18:16:19 -04:00
Adam Treat
5d346e13d7 Add q6_k kernels for vulkan. 2023-10-05 18:16:19 -04:00
Adam Treat
4eefd386d0 Refactor for subgroups on mat * vec kernel. 2023-10-05 18:16:19 -04:00
Cebtenzzre
3c2aa299d8 gptj: remove unused variables 2023-10-05 18:16:19 -04:00
Cebtenzzre
f9deb87d20 convert scripts: add feed-forward length for better compatiblilty
This GGUF key is used by all llama.cpp models with upstream support.
2023-10-05 18:16:19 -04:00
Cebtenzzre
cc7675d432 convert scripts: make gptj script executable 2023-10-05 18:16:19 -04:00
Cebtenzzre
0493e6eb07 convert scripts: use bytes_to_unicode from transformers 2023-10-05 18:16:19 -04:00
Cebtenzzre
a49a1dcdf4 chatllm: grammar fix 2023-10-05 18:16:19 -04:00
Cebtenzzre
d5d72f0361 gpt-j: update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
2023-10-05 18:16:19 -04:00
Cebtenzzre
050e7f076e backend: port GPT-J to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
31b20f093a modellist: fix the system prompt 2023-10-05 18:16:19 -04:00
Cebtenzzre
8f3abb37ca fix references to removed model types 2023-10-05 18:16:19 -04:00
Cebtenzzre
4219c0e2e7 convert scripts: make them directly executable 2023-10-05 18:16:19 -04:00
Cebtenzzre
ce7be1db48 backend: use llamamodel.cpp for Falcon 2023-10-05 18:16:19 -04:00
Cebtenzzre
cca9e6ce81 convert_mpt_hf_to_gguf.py: better tokenizer decoding 2023-10-05 18:16:19 -04:00
Cebtenzzre
25297786db convert scripts: load model as late as possible 2023-10-05 18:16:19 -04:00
Cebtenzzre
fd47088f2b conversion scripts: cleanup 2023-10-05 18:16:19 -04:00
Cebtenzzre
6277eac9cc backend: use llamamodel.cpp for StarCoder 2023-10-05 18:16:19 -04:00
Cebtenzzre
aa706ab1ff backend: use gguf branch of llama.cpp-mainline 2023-10-05 18:16:19 -04:00
Cebtenzzre
17fc9e3e58 backend: port Replit to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
7c67262a13 backend: port MPT to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
42bcb814b3 backend: port BERT to GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
4392bf26e0 pyllmodel: print specific error message 2023-10-05 18:16:19 -04:00
Cebtenzzre
34f2ec2b33 gpt4all.py: GGUF 2023-10-05 18:16:19 -04:00
Cebtenzzre
1d29e4696c llamamodel: metal supports all quantization types now 2023-10-05 18:16:19 -04:00
Aaron Miller
507753a37c macos build fixes 2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d Latest rebase on llama.cpp with gguf support. 2023-10-05 18:16:19 -04:00
Akarshan Biswas
5f3d739205 appdata: update software description 2023-10-05 10:12:43 -04:00
Akarshan Biswas
b4cf12e1bd Update to 2.4.19 2023-10-05 10:12:43 -04:00