Commit Graph

208 Commits

Author SHA1 Message Date
Jared Van Bortel
92c025a7f6
llamamodel: add 12 new architectures for CPU inference (#1914)
Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2,
Plamo, Qwen, Qwen2, Refact, StableLM

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-05 16:49:31 -05:00
Jared Van Bortel
10e3f7bbf5
Fix VRAM leak when model loading fails (#1901)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-01 15:45:45 -05:00
Jared Van Bortel
eadc3b8d80 backend: bump llama.cpp for VRAM leak fix when switching models
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 17:24:01 -05:00
Jared Van Bortel
6db5307730 update llama.cpp for unhandled Vulkan OOM exception fix
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:44:58 -05:00
Jared Van Bortel
0a40e71652
Maxwell/Pascal GPU support and crash fix (#1895)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 16:32:32 -05:00
Jared Van Bortel
b11c3f679e bump llama.cpp-mainline for C++11 compat
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 15:02:34 -05:00
Jared Van Bortel
061d1969f8
expose n_gpu_layers parameter of llama.cpp (#1890)
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 14:17:44 -05:00
Jared Van Bortel
f549d5a70a backend : quick llama.cpp update to fix fallback to CPU
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 17:16:40 -05:00
Jared Van Bortel
38c61493d2 backend: update to latest commit of llama.cpp Vulkan PR
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 15:47:26 -06:00
Jared Van Bortel
26acdebafa
convert: replace GPTJConfig with AutoConfig (#1866)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-22 12:14:55 -05:00
Jared Van Bortel
a9c5f53562 update llama.cpp for nomic-ai/llama.cpp#12
Fixes #1477

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-17 14:05:33 -05:00
Jared Van Bortel
b7c92c5afd
sync llama.cpp with latest Vulkan PR and newer upstream (#1819) 2024-01-16 16:36:21 -05:00
Jared Van Bortel
7e9786fccf chat: set search path early
This fixes the issues with installed versions of v2.6.0.
2024-01-11 12:04:18 -05:00
AT
96cee4f9ac
Explicitly clear the kv cache each time we eval tokens to match n_past. (#1808) 2024-01-03 14:06:08 -05:00
ThiloteE
2d566710e5 Address review 2024-01-03 11:13:07 -06:00
ThiloteE
a0f7d7ae0e Fix for "LLModel ERROR: Could not find CPU LLaMA implementation" v2 2024-01-03 11:13:07 -06:00
ThiloteE
38d81c14d0 Fixes https://github.com/nomic-ai/gpt4all/issues/1760 LLModel ERROR: Could not find CPU LLaMA implementation.
Inspired by Microsoft docs for LoadLibraryExA (https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa).
When using LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR, the lpFileName parameter must specify a fully qualified path, also it needs to be backslashes (\), not forward slashes (/).
2024-01-03 11:13:07 -06:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length (#1749) 2023-12-16 17:58:15 -05:00
Jared Van Bortel
3acbef14b7
fix AVX support by removing direct linking to AVX2 libs (#1750) 2023-12-13 12:11:09 -05:00
Jared Van Bortel
0600f551b3
chatllm: do not attempt to serialize incompatible state (#1742) 2023-12-12 11:45:03 -05:00
Jared Van Bortel
1df3da0a88 update llama.cpp for clang warning fix 2023-12-11 13:07:41 -05:00
Jared Van Bortel
dfd8ef0186
backend: use ggml_new_graph for GGML backend v2 (#1719) 2023-12-06 14:38:53 -05:00
Jared Van Bortel
9e28dfac9c
Update to latest llama.cpp (#1706) 2023-12-01 16:51:15 -05:00
Adam Treat
cce5fe2045 Fix macos build. 2023-11-17 11:59:31 -05:00
Adam Treat
371e2a5cbc LocalDocs version 2 with text embeddings. 2023-11-17 11:59:31 -05:00
Jared Van Bortel
d4ce9f4a7c
llmodel_c: improve quality of error messages (#1625) 2023-11-07 11:20:14 -05:00
cebtenzzre
64101d3af5 update llama.cpp-mainline 2023-11-01 09:47:39 -04:00
Adam Treat
ffef60912f Update to llama.cpp 2023-10-30 11:40:16 -04:00
Adam Treat
f5f22fdbd0 Update llama.cpp for latest bugfixes. 2023-10-28 17:47:55 -04:00
cebtenzzre
7bcd9e8089 update llama.cpp-mainline 2023-10-27 19:29:36 -04:00
cebtenzzre
fd0c501d68
backend: support GGUFv3 (#1582) 2023-10-27 17:07:23 -04:00
Adam Treat
14b410a12a Update to latest version of llama.cpp which fixes issue 1507. 2023-10-27 12:08:35 -04:00
Adam Treat
ab96035bec Update to llama.cpp submodule for some vulkan fixes. 2023-10-26 13:46:38 -04:00
cebtenzzre
e90263c23f
make scripts executable (#1555) 2023-10-24 09:28:21 -04:00
Aaron Miller
f414c28589 llmodel: whitelist library name patterns
this fixes some issues that were being seen on installed windows builds of 2.5.0

only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be

Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
2023-10-23 21:40:14 -07:00
cebtenzzre
4338e72a51
MPT: use upstream llama.cpp implementation (#1515) 2023-10-19 15:25:17 -04:00
cebtenzzre
0fe2e19691
llamamodel: re-enable error messages by default (#1537) 2023-10-19 13:46:33 -04:00
cebtenzzre
017c3a9649
python: prepare version 2.0.0rc1 (#1529) 2023-10-18 20:24:54 -04:00
cebtenzzre
9a19c740ee
kompute: fix library loading issues with kp_logger (#1517) 2023-10-16 16:58:17 -04:00
Aaron Miller
f79557d2aa speedup: just use mat*vec shaders for mat*mat
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same
2023-10-16 13:45:51 -04:00
cebtenzzre
22de3c56bd
convert scripts: fix AutoConfig typo (#1512) 2023-10-13 14:16:51 -04:00
Aaron Miller
2490977f89 q6k, q4_1 mat*mat 2023-10-12 14:56:54 -04:00
Aaron Miller
afaa291eab python bindings should be quiet by default
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
  nonempty
* make verbose flag for retrieve_model default false (but also be
  overridable via gpt4all constructor)

should be able to run a basic test:

```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```

and see no non-model output when successful
2023-10-11 14:14:36 -07:00
cebtenzzre
7b611b49f2
llmodel: print an error if the CPU does not support AVX (#1499) 2023-10-11 15:09:40 -04:00
Aaron Miller
043617168e do not process prompts on gpu yet 2023-10-11 13:15:50 -04:00
Aaron Miller
64001a480a mat*mat for q4_0, q8_0 2023-10-11 13:15:50 -04:00
cebtenzzre
7a19047329
llmodel: do not call magic_match unless build variant is correct (#1488) 2023-10-11 11:30:48 -04:00
Cebtenzzre
5fe685427a chat: clearer CPU fallback messages 2023-10-06 11:35:14 -04:00
Adam Treat
eec906aa05 Speculative fix for build on mac. 2023-10-05 18:37:33 -04:00
Adam Treat
a9acdd25de Push a new version number for llmodel backend now that it is based on gguf. 2023-10-05 18:18:07 -04:00