Jared Van Bortel
0455b80b7f
Embed4All: optionally count tokens, misc fixes ( #2145 )
...
Key changes:
* python: optionally return token count in Embed4All.embed
* python and docs: models2.json -> models3.json
* Embed4All: require explicit prefix for unknown models
* llamamodel: fix shouldAddBOS for Bert and Nomic Bert
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-20 11:24:02 -04:00
Jared Van Bortel
699410014a
fix non-AVX CPU detection ( #2141 )
...
* chat: fix non-AVX CPU detection on Windows
* bindings: throw exception instead of logging to console
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-19 10:56:14 -04:00
Jared Van Bortel
406e88b59a
implement local Nomic Embed via llama.cpp ( #2086 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-03-13 18:09:24 -04:00
chrisbarrera
f8b1069a1c
add min_p sampling parameter ( #2014 )
...
Signed-off-by: Christopher Barrera <cb@arda.tx.rr.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-24 17:51:34 -05:00
Jared Van Bortel
4fc4d94be4
fix chat-style prompt templates ( #1970 )
...
Also use a new version of Mistral OpenOrca.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-02-21 15:45:32 -05:00
Adam Treat
d948a4f2ee
Complete revamp of model loading to allow for more discreet control by
...
the user of the models loading behavior.
Signed-off-by: Adam Treat <treat.adam@gmail.com>
2024-02-21 10:15:20 -06:00
Jared Van Bortel
061d1969f8
expose n_gpu_layers parameter of llama.cpp ( #1890 )
...
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-31 14:17:44 -05:00
Jared Van Bortel
38c61493d2
backend: update to latest commit of llama.cpp Vulkan PR
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 15:47:26 -06:00
Jared Van Bortel
7e9786fccf
chat: set search path early
...
This fixes the issues with installed versions of v2.6.0.
2024-01-11 12:04:18 -05:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length ( #1749 )
2023-12-16 17:58:15 -05:00
Jared Van Bortel
3acbef14b7
fix AVX support by removing direct linking to AVX2 libs ( #1750 )
2023-12-13 12:11:09 -05:00
Jared Van Bortel
9e28dfac9c
Update to latest llama.cpp ( #1706 )
2023-12-01 16:51:15 -05:00
Cebtenzzre
5fe685427a
chat: clearer CPU fallback messages
2023-10-06 11:35:14 -04:00
Cebtenzzre
672cb850f9
differentiate between init failure and unsupported models
2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d
Latest rebase on llama.cpp with gguf support.
2023-10-05 18:16:19 -04:00
Adam Treat
045f6e6cdc
Link against ggml in bin so we can get the available devices without loading a model.
2023-09-15 14:45:25 -04:00
Adam Treat
3076e0bf26
Only show GPU when we're actually using it.
2023-09-14 09:59:19 -04:00
Adam Treat
987546c63b
Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.
2023-08-31 15:29:54 -04:00
Adam Treat
0efdbfcffe
Bert
2023-07-13 14:21:46 -04:00
Adam Treat
315a1f2aa2
Move it back as internal class.
2023-07-13 14:21:46 -04:00
Adam Treat
1f749d7633
Clean up backend code a bit and hide impl. details.
2023-07-13 14:21:46 -04:00
Adam Treat
33557b1f39
Move the implementation out of llmodel class.
2023-07-13 14:21:46 -04:00
Aaron Miller
7a5f6e4726
limit prompt batch size to 128
2023-06-30 21:07:21 -03:00
Aaron Miller
b19a3e5b2c
add requiredMem method to llmodel impls
...
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
2023-06-26 18:27:58 -03:00
Aaron Miller
88616fde7f
llmodel: change tokenToString to not use string_view ( #968 )
...
fixes a definite use-after-free and likely avoids some other
potential ones - std::string will convert to a std::string_view
automatically but as soon as the std::string in question goes out of
scope it is already freed and the string_view is pointing at freed
memory - this is *mostly* fine if its returning a reference to the
tokenizer's internal vocab table but it's, imo, too easy to return a
reference to a dynamically constructed string with this as replit is
doing (and unfortunately needs to do to convert the internal whitespace
replacement symbol back to a space)
2023-06-13 07:14:02 -04:00
niansa/tuxifan
14e9ccbc6a
Do auto detection by default in C++ API
...
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 17:01:19 +02:00
Adam Treat
8a9ad258f4
Fix symbol resolution on windows.
2023-06-05 11:19:02 -04:00
Adam Treat
301d2fdbea
Fix up for newer models on reset context. This fixes the model from totally failing after a reset context.
2023-06-04 19:31:20 -04:00
AT
bbe195ee02
Backend prompt dedup ( #822 )
...
* Deduplicated prompt() function code
2023-06-04 08:59:24 -04:00
Richard Guo
c54c42e3fb
fixed finding model libs
2023-06-02 12:32:26 -04:00
Adam Treat
a41bd6ac0a
Trying to shrink the copy+paste code and do more code sharing between backend model impl.
2023-06-02 07:20:59 -04:00
niansa
5175db2781
Fixed double-free in LLModel::Implementation destructor
2023-06-01 11:19:08 -04:00
niansa/tuxifan
fc60f0c09c
Cleaned up implementation management ( #787 )
...
* Cleaned up implementation management
* Initialize LLModel::m_implementation to nullptr
* llmodel.h: Moved dlhandle fwd declare above LLModel class
2023-06-01 16:51:46 +02:00
Adam Treat
1eca524171
Add fixme's and clean up a bit.
2023-06-01 07:57:10 -04:00
niansa
a3d08cdcd5
Dlopen better implementation management (Version 2)
2023-06-01 07:44:15 -04:00
AT
48275d0dcc
Dlopen backend 5 ( #779 )
...
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
2023-05-31 17:04:01 -04:00
Juuso Alasuutari
81fdc28e58
llmodel: constify LLModel::threadCount()
2023-05-22 08:54:46 -04:00
Adam Treat
d918b02c29
Move the llmodel C API to new top-level directory and version it.
2023-05-10 11:46:40 -04:00