* rebase onto llama.cpp commit ggerganov/llama.cpp@d46dbc76f
* support for CUDA backend (enabled by default)
* partial support for Occam's Vulkan backend (disabled by default)
* partial support for HIP/ROCm backend (disabled by default)
* sync llama.cpp.cmake with upstream llama.cpp CMakeLists.txt
* changes to GPT4All backend, bindings, and chat UI to handle choice of llama.cpp backend (Kompute or CUDA)
* ship CUDA runtime with installed version
* make device selection in the UI on macOS actually do something
* model whitelist: remove dbrx, mamba, persimmon, plamo; add internlm and starcoder2
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llamamodel: only print device used in verbose mode
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: expose backend and device via GPT4All properties
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* backend: const correctness fixes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: bump version
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: typing fixups
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: fix segfault with closed GPT4All
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* actually submit larger batches with increased n_ctx
* fix crash when llama_tokenize returns no tokens
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Other changes:
* fix memory leak in llmodel_available_gpu_devices
* drop model argument from llmodel_available_gpu_devices
* breaking: make GPT4All/Embed4All arguments past model_name keyword-only
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: jacob <jacoobes@sern.dev>
Signed-off-by: limez <limez@protonmail.com>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: limez <limez@protonmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
* make sure encoding is identity for Range requests
* use a .part file for partial downloads
* verify using file size and MD5 from models3.json
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* chat: fix non-AVX CPU detection on Windows
* bindings: throw exception instead of logging to console
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Key changes:
* honor empty system prompt argument
* current_chat_session is now read-only and defaults to None
* deprecate fallback prompt template for unknown models
* fix mistakes from #2086
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Tare Ebelo <75279482+TareHimself@users.noreply.github.com>
Signed-off-by: jacob <jacoobes@sern.dev>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: jacob <jacoobes@sern.dev>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
The only reason to use as_file is to support copying a file from a
frozen package. We don't currently support this anyway, and as_file
isn't supported until Python 3.9, so get rid of it.
Fixes#1605