cebtenzzre
e90263c23f
make scripts executable ( #1555 )
2023-10-24 09:28:21 -04:00
Aaron Miller
f414c28589
llmodel: whitelist library name patterns
...
this fixes some issues that were being seen on installed windows builds of 2.5.0
only load dlls that actually might be model impl dlls, otherwise we pull all sorts of random junk into the process before it might expect to be
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
2023-10-23 21:40:14 -07:00
cebtenzzre
4338e72a51
MPT: use upstream llama.cpp implementation ( #1515 )
2023-10-19 15:25:17 -04:00
cebtenzzre
0fe2e19691
llamamodel: re-enable error messages by default ( #1537 )
2023-10-19 13:46:33 -04:00
cebtenzzre
017c3a9649
python: prepare version 2.0.0rc1 ( #1529 )
2023-10-18 20:24:54 -04:00
cebtenzzre
9a19c740ee
kompute: fix library loading issues with kp_logger ( #1517 )
2023-10-16 16:58:17 -04:00
Aaron Miller
f79557d2aa
speedup: just use mat*vec shaders for mat*mat
...
so far my from-scratch mat*mats are still slower than just running more
invocations of the existing Metal ported mat*vec shaders - it should be
theoretically possible to make a mat*mat that's faster (for actual
mat*mat cases) than an optimal mat*vec, but it will need to be at
*least* as fast as the mat*vec op and then take special care to be
cache-friendly and save memory bandwidth, as the # of compute ops is the
same
2023-10-16 13:45:51 -04:00
cebtenzzre
22de3c56bd
convert scripts: fix AutoConfig typo ( #1512 )
2023-10-13 14:16:51 -04:00
Aaron Miller
2490977f89
q6k, q4_1 mat*mat
2023-10-12 14:56:54 -04:00
Aaron Miller
afaa291eab
python bindings should be quiet by default
...
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
nonempty
* make verbose flag for retrieve_model default false (but also be
overridable via gpt4all constructor)
should be able to run a basic test:
```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```
and see no non-model output when successful
2023-10-11 14:14:36 -07:00
cebtenzzre
7b611b49f2
llmodel: print an error if the CPU does not support AVX ( #1499 )
2023-10-11 15:09:40 -04:00
Aaron Miller
043617168e
do not process prompts on gpu yet
2023-10-11 13:15:50 -04:00
Aaron Miller
64001a480a
mat*mat for q4_0, q8_0
2023-10-11 13:15:50 -04:00
cebtenzzre
7a19047329
llmodel: do not call magic_match unless build variant is correct ( #1488 )
2023-10-11 11:30:48 -04:00
Cebtenzzre
5fe685427a
chat: clearer CPU fallback messages
2023-10-06 11:35:14 -04:00
Adam Treat
eec906aa05
Speculative fix for build on mac.
2023-10-05 18:37:33 -04:00
Adam Treat
a9acdd25de
Push a new version number for llmodel backend now that it is based on gguf.
2023-10-05 18:18:07 -04:00
Cebtenzzre
8bb6a6c201
rebase on newer llama.cpp
2023-10-05 18:16:19 -04:00
Cebtenzzre
d87573ea75
remove old llama.cpp submodules
2023-10-05 18:16:19 -04:00
Cebtenzzre
cc6db61c93
backend: fix build with Visual Studio generator
...
Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This
is needed because Visual Studio is a multi-configuration generator, so
we do not know what the build type will be until `cmake --build` is
called.
Fixes #1470
2023-10-05 18:16:19 -04:00
Adam Treat
f605a5b686
Add q8_0 kernels to kompute shaders and bump to latest llama/gguf.
2023-10-05 18:16:19 -04:00
Cebtenzzre
672cb850f9
differentiate between init failure and unsupported models
2023-10-05 18:16:19 -04:00
Adam Treat
906699e8e9
Bump to latest llama/gguf branch.
2023-10-05 18:16:19 -04:00
Cebtenzzre
088afada49
llamamodel: fix static vector in LLamaModel::endTokens
2023-10-05 18:16:19 -04:00
Adam Treat
b4d82ea289
Bump to the latest fixes for vulkan in llama.
2023-10-05 18:16:19 -04:00
Adam Treat
12f943e966
Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf.
2023-10-05 18:16:19 -04:00
Adam Treat
5d346e13d7
Add q6_k kernels for vulkan.
2023-10-05 18:16:19 -04:00
Adam Treat
4eefd386d0
Refactor for subgroups on mat * vec kernel.
2023-10-05 18:16:19 -04:00
Cebtenzzre
3c2aa299d8
gptj: remove unused variables
2023-10-05 18:16:19 -04:00
Cebtenzzre
f9deb87d20
convert scripts: add feed-forward length for better compatiblilty
...
This GGUF key is used by all llama.cpp models with upstream support.
2023-10-05 18:16:19 -04:00
Cebtenzzre
cc7675d432
convert scripts: make gptj script executable
2023-10-05 18:16:19 -04:00
Cebtenzzre
0493e6eb07
convert scripts: use bytes_to_unicode from transformers
2023-10-05 18:16:19 -04:00
Cebtenzzre
d5d72f0361
gpt-j: update inference to match latest llama.cpp insights
...
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
2023-10-05 18:16:19 -04:00
Cebtenzzre
050e7f076e
backend: port GPT-J to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
8f3abb37ca
fix references to removed model types
2023-10-05 18:16:19 -04:00
Cebtenzzre
4219c0e2e7
convert scripts: make them directly executable
2023-10-05 18:16:19 -04:00
Cebtenzzre
ce7be1db48
backend: use llamamodel.cpp for Falcon
2023-10-05 18:16:19 -04:00
Cebtenzzre
cca9e6ce81
convert_mpt_hf_to_gguf.py: better tokenizer decoding
2023-10-05 18:16:19 -04:00
Cebtenzzre
25297786db
convert scripts: load model as late as possible
2023-10-05 18:16:19 -04:00
Cebtenzzre
fd47088f2b
conversion scripts: cleanup
2023-10-05 18:16:19 -04:00
Cebtenzzre
6277eac9cc
backend: use llamamodel.cpp for StarCoder
2023-10-05 18:16:19 -04:00
Cebtenzzre
17fc9e3e58
backend: port Replit to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
7c67262a13
backend: port MPT to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
42bcb814b3
backend: port BERT to GGUF
2023-10-05 18:16:19 -04:00
Cebtenzzre
1d29e4696c
llamamodel: metal supports all quantization types now
2023-10-05 18:16:19 -04:00
Aaron Miller
507753a37c
macos build fixes
2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d
Latest rebase on llama.cpp with gguf support.
2023-10-05 18:16:19 -04:00
Adam Treat
99c106e6b5
Fix a bug seen on AMD RADEON cards with vulkan backend.
2023-09-26 11:59:47 -04:00
Jacob Nguyen
e86c63750d
Update llama.cpp.cmake
...
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
2023-09-16 11:42:56 -07:00
Adam Treat
84905aa281
Fix for crashes on systems where vulkan is not installed properly.
2023-09-16 12:19:46 -04:00
Adam Treat
045f6e6cdc
Link against ggml in bin so we can get the available devices without loading a model.
2023-09-15 14:45:25 -04:00
Adam Treat
aa33419c6e
Fallback to CPU more robustly.
2023-09-14 16:53:11 -04:00
Adam Treat
9013a089bd
Bump to new llama with new bugfix.
2023-09-14 10:02:11 -04:00
Adam Treat
3076e0bf26
Only show GPU when we're actually using it.
2023-09-14 09:59:19 -04:00
Adam Treat
cf4eb530ce
Sync to a newer version of llama.cpp with bugfix for vulkan.
2023-09-13 21:01:44 -04:00
Adam Treat
4b9a345aee
Update the submodule.
2023-09-13 17:05:46 -04:00
Aaron Miller
6f038c136b
init at most one vulkan device, submodule update
...
fixes issues w/ multiple of the same gpu
2023-09-13 12:49:53 -07:00
Adam Treat
8f99dca70f
Bring the vulkan backend to the GUI.
2023-09-13 11:26:10 -04:00
Aaron Miller
f0735efa7d
vulkan python bindings on windows fixes
2023-09-12 14:16:02 -07:00
Adam Treat
c953b321b7
Don't link against libvulkan.
2023-09-12 14:26:56 -04:00
Aaron Miller
c4d23512e4
remove extra dynamic linker deps when building with vulkan
2023-09-11 08:44:39 -07:00
Adam Treat
85e34598f9
more circleci
2023-08-31 15:29:54 -04:00
Adam Treat
f578fa6cdf
Fix for windows.
2023-08-31 15:29:54 -04:00
Adam Treat
17d3e4976c
Add a comment indicating future work.
2023-08-31 15:29:54 -04:00
Adam Treat
987546c63b
Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.
2023-08-31 15:29:54 -04:00
Adam Treat
d55cbbee32
Update to newer llama.cpp and disable older forks.
2023-08-31 15:29:54 -04:00
Aaron Miller
0bc2274869
bump llama.cpp version + needed fixes for that
2023-08-31 15:29:54 -04:00
aaron miller
33c22be2aa
starcoder: use ggml_graph_plan
2023-08-31 15:29:54 -04:00
Cosmic Snow
108d950874
Fix Windows unable to load models on older Windows builds
...
- Replace high-level IsProcessorFeaturePresent
- Reintroduce low-level compiler intrinsics implementation
2023-08-09 09:27:43 +02:00
Adam Treat
6d03b3e500
Add starcoder support.
2023-07-27 09:15:16 -04:00
cosmic-snow
2d02c65177
Handle edge cases when generating embeddings ( #1215 )
...
* Handle edge cases when generating embeddings
* Improve Python handling & add llmodel_c.h note
- In the Python bindings fail fast with a ValueError when text is empty
- Advice other bindings authors to do likewise in llmodel_c.h
2023-07-17 13:21:03 -07:00
Aaron Miller
1c4a244291
bump mem allocation a bit
2023-07-14 09:48:57 -04:00
Adam Treat
ee4186d579
Fixup bert python bindings.
2023-07-14 09:48:57 -04:00
cosmic-snow
6200900677
Fix Windows MSVC arch detection ( #1194 )
...
- in llmodel.cpp to fix AVX-only handling
Signed-off-by: cosmic-snow <134004613+cosmic-snow@users.noreply.github.com>
2023-07-13 14:44:17 -04:00
Adam Treat
4963db8f43
Bump the version numbers for both python and c backend.
2023-07-13 14:21:46 -04:00
Adam Treat
0efdbfcffe
Bert
2023-07-13 14:21:46 -04:00
Adam Treat
315a1f2aa2
Move it back as internal class.
2023-07-13 14:21:46 -04:00
Adam Treat
ae8eb297ac
Add sbert backend.
2023-07-13 14:21:46 -04:00
Adam Treat
1f749d7633
Clean up backend code a bit and hide impl. details.
2023-07-13 14:21:46 -04:00
Adam Treat
33557b1f39
Move the implementation out of llmodel class.
2023-07-13 14:21:46 -04:00
Aaron Miller
432b7ebbd7
include windows.h just to be safe
2023-07-12 12:46:46 -04:00
Aaron Miller
95b8fb312e
windows/msvc: use high level processor feature detection API
...
see https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-isprocessorfeaturepresent
2023-07-12 12:46:46 -04:00
Aaron Miller
f0faa23ad5
cmakelists: always export build commands ( #1179 )
...
friendly for using editors with clangd integration that don't also
manage the build themselves
2023-07-12 10:49:24 -04:00
Aaron Miller
4a24b586df
llama.cpp: metal buffer freeing
2023-06-30 21:07:21 -03:00
Aaron Miller
137bc2c367
replit: free metal context
2023-06-30 21:07:21 -03:00
Aaron Miller
57dc0c8953
adjust eval buf sizes to pass long input test
2023-06-30 21:07:21 -03:00
Aaron Miller
7a5f6e4726
limit prompt batch size to 128
2023-06-30 21:07:21 -03:00
Aaron Miller
883775bc5f
move 230511 submodule to nomic fork, fix alibi assert
2023-06-30 21:07:21 -03:00
Andriy Mulyar
46a0762bd5
Python Bindings: Improved unit tests, documentation and unification of API ( #1090 )
...
* Makefiles, black, isort
* Black and isort
* unit tests and generation method
* chat context provider
* context does not reset
* Current state
* Fixup
* Python bindings with unit tests
* GPT4All Python Bindings: chat contexts, tests
* New python bindings and backend fixes
* Black and Isort
* Documentation error
* preserved n_predict for backwords compat with langchain
---------
Co-authored-by: Adam Treat <treat.adam@gmail.com>
2023-06-30 16:02:02 -04:00
Aaron Miller
40a3faeb05
Use ggml scratch bufs for mpt and gptj models ( #1104 )
...
* backend/gptj: use scratch buffers
reduces total memory required and makes eval buf not grow with n_past
* backend/mpt: use scratch bufs
* fix format-related compile warnings
2023-06-30 10:53:45 -07:00
Aaron Miller
8d19ef3909
backend: factor out common elements in model code ( #1089 )
...
* backend: factor out common structs in model code
prepping to hack on these by hopefully making there be fewer places to fix the same bug
rename
* use common buffer wrapper instead of manual malloc
* fix replit compile warnings
2023-06-28 17:35:07 -07:00
Aaron Miller
28d41d4f6d
falcon: use *model-local* eval & scratch bufs ( #1079 )
...
fixes memory leaks copied from ggml/examples based implementation
2023-06-27 16:09:11 -07:00
Zach Nussbaum
2565f6a94a
feat: add conversion script
2023-06-27 14:06:39 -03:00
Aaron Miller
198b5e4832
add Falcon 7B model
...
Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin
2023-06-27 14:06:39 -03:00
Aaron Miller
db34a2f670
llmodel: skip attempting Metal if model+kvcache > 53% of system ram
2023-06-26 19:46:49 -03:00
Aaron Miller
b19a3e5b2c
add requiredMem method to llmodel impls
...
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
2023-06-26 18:27:58 -03:00
Adam Treat
a0f80453e5
Use sysinfo in backend.
2023-06-26 14:14:49 -04:00
niansa/tuxifan
47323f8591
Update replit.cpp
...
replit_tokenizer_detokenize returnins std::string now
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-26 14:49:58 -03:00
niansa
0855c0df1d
Fixed Replit implementation compile warnings
2023-06-26 14:49:58 -03:00
Aaron Miller
1290b32451
update to latest mainline llama.cpp
...
add max_size param to ggml_metal_add_buffer - introduced in https://github.com/ggerganov/llama.cpp/pull/1826
2023-06-26 14:40:52 -03:00