* Don't stop generating at end of context
* Use llama_kv_cache ops to shift context
* Fix and improve reverse prompt detection
* Replace prompt recalc callback with a flag to disallow context shift
* Adds support for GPT-NeoX, Gemma 2, OpenELM, ChatGLM, and Jais architectures (all with Kompute support)
* Also enables Kompute support for StarCoder2, XVERSE, Command R, and OLMo
* Includes a number of Kompute resource management fixes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* chat: remove unused oscompat source files
These files are no longer needed now that the hnswlib index is gone.
This fixes an issue with the Windows build as there was a compilation
error in oscompat.cpp.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llm: fix pragma to be recognized by MSVC
Replaces this MSVC warning:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp(53,21): warning C4081: expected '('; found 'string'
With this:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp : warning : offline installer build will not check for updates!
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* usearch: fork usearch to fix `CreateFile` build error
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: fix incorrect assertion on Windows
SetErrorMode returns the previous value of the error mode flags, not an
indicator of success.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llamamodel: fix UB in LLamaModel::embedInternal
It is undefined behavior to increment an STL iterator past the end of
the container. Use offsets to do the math instead.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* cmake: install embedding model to bundle's Resources dir on macOS
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* ci: fix macOS build by explicitly installing Rosetta
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* rebase onto llama.cpp commit ggerganov/llama.cpp@d46dbc76f
* support for CUDA backend (enabled by default)
* partial support for Occam's Vulkan backend (disabled by default)
* partial support for HIP/ROCm backend (disabled by default)
* sync llama.cpp.cmake with upstream llama.cpp CMakeLists.txt
* changes to GPT4All backend, bindings, and chat UI to handle choice of llama.cpp backend (Kompute or CUDA)
* ship CUDA runtime with installed version
* make device selection in the UI on macOS actually do something
* model whitelist: remove dbrx, mamba, persimmon, plamo; add internlm and starcoder2
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Other changes:
- Always display first start dialog if privacy options are unset (e.g. if the user closed GPT4All without selecting them)
- LocalDocs scanQueue is now always deferred
- Fix a potential crash in magic_match
- LocalDocs indexing is now started after the first start dialog is dismissed so usage stats are included
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llamamodel: only print device used in verbose mode
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: expose backend and device via GPT4All properties
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* backend: const correctness fixes
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: bump version
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: typing fixups
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* python: fix segfault with closed GPT4All
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* actually submit larger batches with increased n_ctx
* fix crash when llama_tokenize returns no tokens
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Key changes:
* honor empty system prompt argument
* current_chat_session is now read-only and defaults to None
* deprecate fallback prompt template for unknown models
* fix mistakes from #2086
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
nonempty
* make verbose flag for retrieve_model default false (but also be
overridable via gpt4all constructor)
should be able to run a basic test:
```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```
and see no non-model output when successful