Jared Van Bortel
406e88b59a
implement local Nomic Embed via llama.cpp ( #2086 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-13 18:09:24 -04:00
Jared Van Bortel
5c248dbec9
models: new MPT model file without duplicated token_embd.weight ( #2006 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-08 17:18:38 -05:00
Jared Van Bortel
c19b763e03
llmodel_c: expose fakeReply to the bindings ( #2061 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-06 13:32:24 -05:00
Jared Van Bortel
f500bcf6e5
llmodel: default to a blank line between reply and next prompt ( #1996 )
...
Also make some related adjustments to the provided Alpaca-style prompt templates
and system prompts.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-26 13:11:15 -05:00
Jared Van Bortel
007d469034
bert: fix layer norm epsilon value ( #1946 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-26 13:09:01 -05:00
Adam Treat
f720261d46
Fix another vulnerable spot for crashes.
...
Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-26 12:04:16 -06:00
chrisbarrera
f8b1069a1c
add min_p sampling parameter ( #2014 )
...
Signed-off-by: Christopher Barrera <cb@arda.tx.rr.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-24 17:51:34 -05:00
Jared Van Bortel
e7f2ff189f
fix some compilation warnings on macOS
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-22 15:09:06 -05:00
Jared Van Bortel
88e330ef0e
llama.cpp: enable Kompute support for 10 more model arches ( #2005 )
...
These are Baichuan, Bert and Nomic Bert, CodeShell, GPT-2, InternLM,
MiniCPM, Orion, Qwen, and StarCoder.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-22 14:34:42 -05:00
Jared Van Bortel
fc6c5ea0c7
llama.cpp: gemma: allow offloading the output tensor ( #1997 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-22 14:06:18 -05:00
Jared Van Bortel
4fc4d94be4
fix chat-style prompt templates ( #1970 )
...
Also use a new version of Mistral OpenOrca.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-21 15:45:32 -05:00
Jared Van Bortel
7810b757c9
llamamodel: add gemma model support
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-21 13:36:31 -06:00
Adam Treat
d948a4f2ee
Complete revamp of model loading to allow for more discreet control by
...
the user of the models loading behavior.
Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-21 10:15:20 -06:00
Jared Van Bortel
6fdec808b2
backend: update llama.cpp for faster state serialization
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-13 17:39:18 -05:00
Jared Van Bortel
a1471becf3
backend: update llama.cpp for Intel GPU blacklist
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-12 13:16:24 -05:00
Jared Van Bortel
eb1081d37e
cmake: fix LLAMA_DIR use before set
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 22:00:14 -05:00
Jared Van Bortel
e60b388a2e
cmake: fix backwards LLAMA_KOMPUTE default
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 21:53:32 -05:00
Jared Van Bortel
fc7e5f4a09
ci: fix missing Kompute support in python bindings ( #1953 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-09 21:40:32 -05:00
Jared Van Bortel
bf493bb048
Mixtral crash fix and python bindings v2.2.0 ( #1931 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-06 11:01:15 -05:00
Jared Van Bortel
92c025a7f6
llamamodel: add 12 new architectures for CPU inference ( #1914 )
...
Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2,
Plamo, Qwen, Qwen2, Refact, StableLM
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-05 16:49:31 -05:00
Jared Van Bortel
10e3f7bbf5
Fix VRAM leak when model loading fails ( #1901 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-01 15:45:45 -05:00
Jared Van Bortel
eadc3b8d80
backend: bump llama.cpp for VRAM leak fix when switching models
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 17:24:01 -05:00
Jared Van Bortel
6db5307730
update llama.cpp for unhandled Vulkan OOM exception fix
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:44:58 -05:00
Jared Van Bortel
0a40e71652
Maxwell/Pascal GPU support and crash fix ( #1895 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:32:32 -05:00
Jared Van Bortel
b11c3f679e
bump llama.cpp-mainline for C++11 compat
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 15:02:34 -05:00
Jared Van Bortel
061d1969f8
expose n_gpu_layers parameter of llama.cpp ( #1890 )
...
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 14:17:44 -05:00
Jared Van Bortel
f549d5a70a
backend : quick llama.cpp update to fix fallback to CPU
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 17:16:40 -05:00
Jared Van Bortel
38c61493d2
backend: update to latest commit of llama.cpp Vulkan PR
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 15:47:26 -06:00
Jared Van Bortel
26acdebafa
convert: replace GPTJConfig with AutoConfig ( #1866 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-22 12:14:55 -05:00
Jared Van Bortel
a9c5f53562
update llama.cpp for nomic-ai/llama.cpp#12
...
Fixes #1477
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-17 14:05:33 -05:00
Jared Van Bortel
b7c92c5afd
sync llama.cpp with latest Vulkan PR and newer upstream ( #1819 )
2024-01-16 16:36:21 -05:00
Jared Van Bortel
7e9786fccf
chat: set search path early
...
This fixes the issues with installed versions of v2.6.0.
2024-01-11 12:04:18 -05:00
AT
96cee4f9ac
Explicitly clear the kv cache each time we eval tokens to match n_past. ( #1808 )
2024-01-03 14:06:08 -05:00
ThiloteE
2d566710e5
Address review
2024-01-03 11:13:07 -06:00
ThiloteE
a0f7d7ae0e
Fix for "LLModel ERROR: Could not find CPU LLaMA implementation" v2
2024-01-03 11:13:07 -06:00
ThiloteE
38d81c14d0
Fixes https://github.com/nomic-ai/gpt4all/issues/1760 LLModel ERROR: Could not find CPU LLaMA implementation.
...
Inspired by Microsoft docs for LoadLibraryExA (https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa ).
When using LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR, the lpFileName parameter must specify a fully qualified path, also it needs to be backslashes (\), not forward slashes (/).
2024-01-03 11:13:07 -06:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length ( #1749 )
2023-12-16 17:58:15 -05:00
Jared Van Bortel
3acbef14b7
fix AVX support by removing direct linking to AVX2 libs ( #1750 )
2023-12-13 12:11:09 -05:00
Jared Van Bortel
0600f551b3
chatllm: do not attempt to serialize incompatible state ( #1742 )
2023-12-12 11:45:03 -05:00
Jared Van Bortel
1df3da0a88
update llama.cpp for clang warning fix
2023-12-11 13:07:41 -05:00
Jared Van Bortel
dfd8ef0186
backend: use ggml_new_graph for GGML backend v2 ( #1719 )
2023-12-06 14:38:53 -05:00
Jared Van Bortel
9e28dfac9c
Update to latest llama.cpp ( #1706 )
2023-12-01 16:51:15 -05:00
Adam Treat
cce5fe2045
Fix macos build.
2023-11-17 11:59:31 -05:00
Adam Treat
371e2a5cbc
LocalDocs version 2 with text embeddings.
2023-11-17 11:59:31 -05:00
Jared Van Bortel
d4ce9f4a7c
llmodel_c: improve quality of error messages ( #1625 )
2023-11-07 11:20:14 -05:00
cebtenzzre
64101d3af5
update llama.cpp-mainline
2023-11-01 09:47:39 -04:00
Adam Treat
ffef60912f
Update to llama.cpp
2023-10-30 11:40:16 -04:00
Adam Treat
f5f22fdbd0
Update llama.cpp for latest bugfixes.
2023-10-28 17:47:55 -04:00
cebtenzzre
7bcd9e8089
update llama.cpp-mainline
2023-10-27 19:29:36 -04:00
cebtenzzre
fd0c501d68
backend: support GGUFv3 ( #1582 )
2023-10-27 17:07:23 -04:00
Adam Treat
14b410a12a
Update to latest version of llama.cpp which fixes issue 1507.
2023-10-27 12:08:35 -04:00
Adam Treat
ab96035bec
Update to llama.cpp submodule for some vulkan fixes.
2023-10-26 13:46:38 -04:00
cebtenzzre
e90263c23f
make scripts executable ( #1555 )
2023-10-24 09:28:21 -04:00
Aaron Miller
f414c28589
llmodel: whitelist library name patterns
...
this fixes some issues that were being seen on installed windows builds of 2.5.0
only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
2023-10-23 21:40:14 -07:00
cebtenzzre
4338e72a51
MPT: use upstream llama.cpp implementation ( #1515 )
2023-10-19 15:25:17 -04:00
cebtenzzre
0fe2e19691
llamamodel: re-enable error messages by default ( #1537 )
2023-10-19 13:46:33 -04:00
cebtenzzre
017c3a9649
python: prepare version 2.0.0rc1 ( #1529 )
2023-10-18 20:24:54 -04:00
cebtenzzre
9a19c740ee
kompute: fix library loading issues with kp_logger ( #1517 )
2023-10-16 16:58:17 -04:00
Aaron Miller
f79557d2aa
speedup: just use mat*vec shaders for mat*mat
...
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same
2023-10-16 13:45:51 -04:00
cebtenzzre
22de3c56bd
convert scripts: fix AutoConfig typo ( #1512 )
2023-10-13 14:16:51 -04:00
Aaron Miller
2490977f89
q6k, q4_1 mat*mat
2023-10-12 14:56:54 -04:00
Aaron Miller
afaa291eab
python bindings should be quiet by default
...
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
nonempty
* make verbose flag for retrieve_model default false (but also be
overridable via gpt4all constructor)
should be able to run a basic test:
```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```
and see no non-model output when successful
2023-10-11 14:14:36 -07:00
cebtenzzre
7b611b49f2
llmodel: print an error if the CPU does not support AVX ( #1499 )
2023-10-11 15:09:40 -04:00
Aaron Miller
043617168e
do not process prompts on gpu yet
2023-10-11 13:15:50 -04:00
Aaron Miller
64001a480a
mat*mat for q4_0, q8_0
2023-10-11 13:15:50 -04:00
cebtenzzre
7a19047329
llmodel: do not call magic_match unless build variant is correct ( #1488 )
2023-10-11 11:30:48 -04:00
Cebtenzzre
5fe685427a
chat: clearer CPU fallback messages
2023-10-06 11:35:14 -04:00
Adam Treat
eec906aa05
Speculative fix for build on mac.
2023-10-05 18:37:33 -04:00
Adam Treat
a9acdd25de
Push a new version number for llmodel backend now that it is based on gguf.
2023-10-05 18:18:07 -04:00
Cebtenzzre
8bb6a6c201
rebase on newer llama.cpp
2023-10-05 18:16:19 -04:00
Cebtenzzre
d87573ea75
remove old llama.cpp submodules
2023-10-05 18:16:19 -04:00
Cebtenzzre
cc6db61c93
backend: fix build with Visual Studio generator
...
Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This
is needed because Visual Studio is a multi-configuration generator, so
we do not know what the build type will be until `cmake --build` is
called.
Fixes #1470
2023-10-05 18:16:19 -04:00
Adam Treat
f605a5b686
Add q8_0 kernels to kompute shaders and bump to latest llama/gguf.
2023-10-05 18:16:19 -04:00
Cebtenzzre
672cb850f9
differentiate between init failure and unsupported models
2023-10-05 18:16:19 -04:00
Adam Treat
906699e8e9
Bump to latest llama/gguf branch.
2023-10-05 18:16:19 -04:00
Cebtenzzre
088afada49
llamamodel: fix static vector in LLamaModel::endTokens
2023-10-05 18:16:19 -04:00
Adam Treat
b4d82ea289
Bump to the latest fixes for vulkan in llama.
2023-10-05 18:16:19 -04:00
Adam Treat
12f943e966
Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf.
2023-10-05 18:16:19 -04:00
Adam Treat
5d346e13d7
Add q6_k kernels for vulkan.
2023-10-05 18:16:19 -04:00
Adam Treat
4eefd386d0
Refactor for subgroups on mat * vec kernel.
2023-10-05 18:16:19 -04:00
Cebtenzzre
3c2aa299d8
gptj: remove unused variables
2023-10-05 18:16:19 -04:00
Cebtenzzre
f9deb87d20
convert scripts: add feed-forward length for better compatiblilty
...
This GGUF key is used by all llama.cpp models with upstream support.
2023-10-05 18:16:19 -04:00
Cebtenzzre
cc7675d432
convert scripts: make gptj script executable
2023-10-05 18:16:19 -04:00
Cebtenzzre
0493e6eb07
convert scripts: use bytes_to_unicode from transformers
2023-10-05 18:16:19 -04:00
Cebtenzzre
d5d72f0361
gpt-j: update inference to match latest llama.cpp insights
...
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
2023-10-05 18:16:19 -04:00
Cebtenzzre
050e7f076e
backend: port GPT-J to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
8f3abb37ca
fix references to removed model types
2023-10-05 18:16:19 -04:00
Cebtenzzre
4219c0e2e7
convert scripts: make them directly executable
2023-10-05 18:16:19 -04:00
Cebtenzzre
ce7be1db48
backend: use llamamodel.cpp for Falcon
2023-10-05 18:16:19 -04:00
Cebtenzzre
cca9e6ce81
convert_mpt_hf_to_gguf.py: better tokenizer decoding
2023-10-05 18:16:19 -04:00
Cebtenzzre
25297786db
convert scripts: load model as late as possible
2023-10-05 18:16:19 -04:00
Cebtenzzre
fd47088f2b
conversion scripts: cleanup
2023-10-05 18:16:19 -04:00
Cebtenzzre
6277eac9cc
backend: use llamamodel.cpp for StarCoder
2023-10-05 18:16:19 -04:00
Cebtenzzre
17fc9e3e58
backend: port Replit to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
7c67262a13
backend: port MPT to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
42bcb814b3
backend: port BERT to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
1d29e4696c
llamamodel: metal supports all quantization types now
2023-10-05 18:16:19 -04:00
Aaron Miller
507753a37c
macos build fixes
2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d
Latest rebase on llama.cpp with gguf support.
2023-10-05 18:16:19 -04:00
Adam Treat
99c106e6b5
Fix a bug seen on AMD RADEON cards with vulkan backend.
2023-09-26 11:59:47 -04:00