Jared Van Bortel
38c61493d2
backend: update to latest commit of llama.cpp Vulkan PR
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-29 15:47:26 -06:00
Jared Van Bortel
7e9786fccf
chat: set search path early
...
This fixes the issues with installed versions of v2.6.0.
2024-01-11 12:04:18 -05:00
Jared Van Bortel
d1c56b8b28
Implement configurable context length ( #1749 )
2023-12-16 17:58:15 -05:00
Jared Van Bortel
3acbef14b7
fix AVX support by removing direct linking to AVX2 libs ( #1750 )
2023-12-13 12:11:09 -05:00
Jared Van Bortel
9e28dfac9c
Update to latest llama.cpp ( #1706 )
2023-12-01 16:51:15 -05:00
Cebtenzzre
5fe685427a
chat: clearer CPU fallback messages
2023-10-06 11:35:14 -04:00
Cebtenzzre
672cb850f9
differentiate between init failure and unsupported models
2023-10-05 18:16:19 -04:00
Adam Treat
d90d003a1d
Latest rebase on llama.cpp with gguf support.
2023-10-05 18:16:19 -04:00
Adam Treat
045f6e6cdc
Link against ggml in bin so we can get the available devices without loading a model.
2023-09-15 14:45:25 -04:00
Adam Treat
3076e0bf26
Only show GPU when we're actually using it.
2023-09-14 09:59:19 -04:00
Adam Treat
987546c63b
Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.
2023-08-31 15:29:54 -04:00
Adam Treat
0efdbfcffe
Bert
2023-07-13 14:21:46 -04:00
Adam Treat
315a1f2aa2
Move it back as internal class.
2023-07-13 14:21:46 -04:00
Adam Treat
1f749d7633
Clean up backend code a bit and hide impl. details.
2023-07-13 14:21:46 -04:00
Adam Treat
33557b1f39
Move the implementation out of llmodel class.
2023-07-13 14:21:46 -04:00
Aaron Miller
7a5f6e4726
limit prompt batch size to 128
2023-06-30 21:07:21 -03:00
Aaron Miller
b19a3e5b2c
add requiredMem method to llmodel impls
...
most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)
2023-06-26 18:27:58 -03:00
Aaron Miller
88616fde7f
llmodel: change tokenToString to not use string_view ( #968 )
...
fixes a definite use-after-free and likely avoids some other
potential ones - std::string will convert to a std::string_view
automatically but as soon as the std::string in question goes out of
scope it is already freed and the string_view is pointing at freed
memory - this is *mostly* fine if its returning a reference to the
tokenizer's internal vocab table but it's, imo, too easy to return a
reference to a dynamically constructed string with this as replit is
doing (and unfortunately needs to do to convert the internal whitespace
replacement symbol back to a space)
2023-06-13 07:14:02 -04:00
niansa/tuxifan
14e9ccbc6a
Do auto detection by default in C++ API
...
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
2023-06-09 17:01:19 +02:00
Adam Treat
8a9ad258f4
Fix symbol resolution on windows.
2023-06-05 11:19:02 -04:00
Adam Treat
301d2fdbea
Fix up for newer models on reset context. This fixes the model from totally failing after a reset context.
2023-06-04 19:31:20 -04:00
AT
bbe195ee02
Backend prompt dedup ( #822 )
...
* Deduplicated prompt() function code
2023-06-04 08:59:24 -04:00
Richard Guo
c54c42e3fb
fixed finding model libs
2023-06-02 12:32:26 -04:00
Adam Treat
a41bd6ac0a
Trying to shrink the copy+paste code and do more code sharing between backend model impl.
2023-06-02 07:20:59 -04:00
niansa
5175db2781
Fixed double-free in LLModel::Implementation destructor
2023-06-01 11:19:08 -04:00
niansa/tuxifan
fc60f0c09c
Cleaned up implementation management ( #787 )
...
* Cleaned up implementation management
* Initialize LLModel::m_implementation to nullptr
* llmodel.h: Moved dlhandle fwd declare above LLModel class
2023-06-01 16:51:46 +02:00
Adam Treat
1eca524171
Add fixme's and clean up a bit.
2023-06-01 07:57:10 -04:00
niansa
a3d08cdcd5
Dlopen better implementation management (Version 2)
2023-06-01 07:44:15 -04:00
AT
48275d0dcc
Dlopen backend 5 ( #779 )
...
Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.
2023-05-31 17:04:01 -04:00
Juuso Alasuutari
81fdc28e58
llmodel: constify LLModel::threadCount()
2023-05-22 08:54:46 -04:00
Adam Treat
d918b02c29
Move the llmodel C API to new top-level directory and version it.
2023-05-10 11:46:40 -04:00