* Read CMAKE_CUDA_ARCHITECTURES directly
* Disable CUBINs for python build in CI
* Search for CUDA 11 as well as CUDA 12
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Adds workflow signing Windows installers with
EV certificate from Azure Key Vault via
AzureSignTool
Adds CMake to sign Windows binaries as they're processed
Installs dotnet 8 as required by AST
Signed-off-by: John Parent <john.parent@kitware.com>
Adds verification functionality to codesign script
Adds required context to enable XCode to perform the signing
Adds install time check + signing for all binaries
Adds instructions allowing macdeployqt to sign the finalized app bundle
Signed-off-by: John Parent <john.parent@kitware.com>
Adds basic CircleCI workflow to sign, notarize,
and staple MacOS app bundle and associated DMG,
then publishes signed binary in CircleCI artifacts
Signed-off-by: Adam Treat <treat.adam@gmail.com>
* chat: remove unused oscompat source files
These files are no longer needed now that the hnswlib index is gone.
This fixes an issue with the Windows build as there was a compilation
error in oscompat.cpp.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llm: fix pragma to be recognized by MSVC
Replaces this MSVC warning:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp(53,21): warning C4081: expected '('; found 'string'
With this:
C:\msys64\home\Jared\gpt4all\gpt4all-chat\llm.cpp : warning : offline installer build will not check for updates!
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* usearch: fork usearch to fix `CreateFile` build error
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* dlhandle: fix incorrect assertion on Windows
SetErrorMode returns the previous value of the error mode flags, not an
indicator of success.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* llamamodel: fix UB in LLamaModel::embedInternal
It is undefined behavior to increment an STL iterator past the end of
the container. Use offsets to do the math instead.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* cmake: install embedding model to bundle's Resources dir on macOS
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* ci: fix macOS build by explicitly installing Rosetta
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
---------
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
As discussed on Discord, this PR was not ready to be merged. CI fails on
it.
This reverts commit a602f7fde7.
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
* remove outdated comments
Signed-off-by: limez <limez@protonmail.com>
* simpler build from source
Signed-off-by: limez <limez@protonmail.com>
* update unix build script to create .so runtimes correctly
Signed-off-by: limez <limez@protonmail.com>
* configure ci build type, use RelWithDebInfo for dev build script
Signed-off-by: limez <limez@protonmail.com>
* add clean script
Signed-off-by: limez <limez@protonmail.com>
* fix streamed token decoding / emoji
Signed-off-by: limez <limez@protonmail.com>
* remove deprecated nCtx
Signed-off-by: limez <limez@protonmail.com>
* update typings
Signed-off-by: jacob <jacoobes@sern.dev>
update typings
Signed-off-by: jacob <jacoobes@sern.dev>
* readme,mspell
Signed-off-by: jacob <jacoobes@sern.dev>
* cuda/backend logic changes + name napi methods like their js counterparts
Signed-off-by: limez <limez@protonmail.com>
* convert llmodel example into a test, separate test suite that can run in ci
Signed-off-by: limez <limez@protonmail.com>
* update examples / naming
Signed-off-by: limez <limez@protonmail.com>
* update deps, remove the need for binding.ci.gyp, make node-gyp-build fallback easier testable
Signed-off-by: limez <limez@protonmail.com>
* make sure the assert-backend-sources.js script is published, but not the others
Signed-off-by: limez <limez@protonmail.com>
* build correctly on windows (regression on node-gyp-build)
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* codespell
Signed-off-by: limez <limez@protonmail.com>
* make sure dlhandle.cpp gets linked correctly
Signed-off-by: limez <limez@protonmail.com>
* add include for check_cxx_compiler_flag call during aarch64 builds
Signed-off-by: limez <limez@protonmail.com>
* x86 > arm64 cross compilation of runtimes and bindings
Signed-off-by: limez <limez@protonmail.com>
* default to cpu instead of kompute on arm64
Signed-off-by: limez <limez@protonmail.com>
* formatting, more minimal example
Signed-off-by: limez <limez@protonmail.com>
---------
Signed-off-by: limez <limez@protonmail.com>
Signed-off-by: jacob <jacoobes@sern.dev>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: jacob <jacoobes@sern.dev>
* rebase onto llama.cpp commit ggerganov/llama.cpp@d46dbc76f
* support for CUDA backend (enabled by default)
* partial support for Occam's Vulkan backend (disabled by default)
* partial support for HIP/ROCm backend (disabled by default)
* sync llama.cpp.cmake with upstream llama.cpp CMakeLists.txt
* changes to GPT4All backend, bindings, and chat UI to handle choice of llama.cpp backend (Kompute or CUDA)
* ship CUDA runtime with installed version
* make device selection in the UI on macOS actually do something
* model whitelist: remove dbrx, mamba, persimmon, plamo; add internlm and starcoder2
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Tare Ebelo <75279482+TareHimself@users.noreply.github.com>
Signed-off-by: jacob <jacoobes@sern.dev>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: jacob <jacoobes@sern.dev>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Konstantin Semenenko <mail@ksemenenko.com>
Co-authored-by: Konstantin Semenenko <mail@ksemenenko.com>
* adding some native methods to cpp wrapper
* gpu seems to work
* typings and add availibleGpus method
* fix spelling
* fix syntax
* more
* normalize methods to conform to py
* remove extra dynamic linker deps when building with vulkan
* bump python version (library linking fix)
* Don't link against libvulkan.
* vulkan python bindings on windows fixes
* Bring the vulkan backend to the GUI.
* When device is Auto (the default) then we will only consider discrete GPU's otherwise fallback to CPU.
* Show the device we're currently using.
* Fix up the name and formatting.
* init at most one vulkan device, submodule update
fixes issues w/ multiple of the same gpu
* Update the submodule.
* Add version 2.4.15 and bump the version number.
* Fix a bug where we're not properly falling back to CPU.
* Sync to a newer version of llama.cpp with bugfix for vulkan.
* Report the actual device we're using.
* Only show GPU when we're actually using it.
* Bump to new llama with new bugfix.
* Release notes for v2.4.16 and bump the version.
* Fallback to CPU more robustly.
* Release notes for v2.4.17 and bump the version.
* Bump the Python version to python-v1.0.12 to restrict the quants that vulkan recognizes.
* Link against ggml in bin so we can get the available devices without loading a model.
* Send actual and requested device info for those who have opt-in.
* Actually bump the version.
* Release notes for v2.4.18 and bump the version.
* Fix for crashes on systems where vulkan is not installed properly.
* Release notes for v2.4.19 and bump the version.
* fix typings and vulkan build works on win
* Add flatpak manifest
* Remove unnecessary stuffs from manifest
* Update to 2.4.19
* appdata: update software description
* Latest rebase on llama.cpp with gguf support.
* macos build fixes
* llamamodel: metal supports all quantization types now
* gpt4all.py: GGUF
* pyllmodel: print specific error message
* backend: port BERT to GGUF
* backend: port MPT to GGUF
* backend: port Replit to GGUF
* backend: use gguf branch of llama.cpp-mainline
* backend: use llamamodel.cpp for StarCoder
* conversion scripts: cleanup
* convert scripts: load model as late as possible
* convert_mpt_hf_to_gguf.py: better tokenizer decoding
* backend: use llamamodel.cpp for Falcon
* convert scripts: make them directly executable
* fix references to removed model types
* modellist: fix the system prompt
* backend: port GPT-J to GGUF
* gpt-j: update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78
* chatllm: grammar fix
* convert scripts: use bytes_to_unicode from transformers
* convert scripts: make gptj script executable
* convert scripts: add feed-forward length for better compatiblilty
This GGUF key is used by all llama.cpp models with upstream support.
* gptj: remove unused variables
* Refactor for subgroups on mat * vec kernel.
* Add q6_k kernels for vulkan.
* python binding: print debug message to stderr
* Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf.
* Bump to the latest fixes for vulkan in llama.
* llamamodel: fix static vector in LLamaModel::endTokens
* Switch to new models2.json for new gguf release and bump our version to
2.5.0.
* Bump to latest llama/gguf branch.
* chat: report reason for fallback to CPU
* chat: make sure to clear fallback reason on success
* more accurate fallback descriptions
* differentiate between init failure and unsupported models
* backend: do not use Vulkan with non-LLaMA models
* Add q8_0 kernels to kompute shaders and bump to latest llama/gguf.
* backend: fix build with Visual Studio generator
Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This
is needed because Visual Studio is a multi-configuration generator, so
we do not know what the build type will be until `cmake --build` is
called.
Fixes#1470
* remove old llama.cpp submodules
* Reorder and refresh our models2.json.
* rebase on newer llama.cpp
* python/embed4all: use gguf model, allow passing kwargs/overriding model
* Add starcoder, rift and sbert to our models2.json.
* Push a new version number for llmodel backend now that it is based on gguf.
* fix stray comma in models2.json
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
* Speculative fix for build on mac.
* chat: clearer CPU fallback messages
* Fix crasher with an empty string for prompt template.
* Update the language here to avoid misunderstanding.
* added EM German Mistral Model
* make codespell happy
* issue template: remove "Related Components" section
* cmake: install the GPT-J plugin (#1487)
* Do not delete saved chats if we fail to serialize properly.
* Restore state from text if necessary.
* Another codespell attempted fix.
* llmodel: do not call magic_match unless build variant is correct (#1488)
* chatllm: do not write uninitialized data to stream (#1486)
* mat*mat for q4_0, q8_0
* do not process prompts on gpu yet
* python: support Path in GPT4All.__init__ (#1462)
* llmodel: print an error if the CPU does not support AVX (#1499)
* python bindings should be quiet by default
* disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is
nonempty
* make verbose flag for retrieve_model default false (but also be
overridable via gpt4all constructor)
should be able to run a basic test:
```python
import gpt4all
model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf')
print(model.generate('def fib(n):'))
```
and see no non-model output when successful
* python: always check status code of HTTP responses (#1502)
* Always save chats to disk, but save them as text by default. This also changes
the UI behavior to always open a 'New Chat' and setting it as current instead
of setting a restored chat as current. This improves usability by not requiring
the user to wait if they want to immediately start chatting.
* Update README.md
Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com>
* fix embed4all filename
https://discordapp.com/channels/1076964370942267462/1093558720690143283/1161778216462192692
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
* Improves Java API signatures maintaining back compatibility
* python: replace deprecated pkg_resources with importlib (#1505)
* Updated chat wishlist (#1351)
* q6k, q4_1 mat*mat
* update mini-orca 3b to gguf2, license
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
* convert scripts: fix AutoConfig typo (#1512)
* publish config https://docs.npmjs.com/cli/v9/configuring-npm/package-json#publishconfig (#1375)
merge into my branch
* fix appendBin
* fix gpu not initializing first
* sync up
* progress, still wip on destructor
* some detection work
* untested dispose method
* add js side of dispose
* Update gpt4all-bindings/typescript/index.cc
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* Update gpt4all-bindings/typescript/index.cc
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* Update gpt4all-bindings/typescript/index.cc
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* Update gpt4all-bindings/typescript/src/gpt4all.d.ts
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* Update gpt4all-bindings/typescript/src/gpt4all.js
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* Update gpt4all-bindings/typescript/src/util.js
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
* fix tests
* fix circleci for nodejs
* bump version
---------
Signed-off-by: Aaron Miller <apage43@ninjawhale.com>
Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com>
Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>
Co-authored-by: Aaron Miller <apage43@ninjawhale.com>
Co-authored-by: Adam Treat <treat.adam@gmail.com>
Co-authored-by: Akarshan Biswas <akarshan.biswas@gmail.com>
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com>
Co-authored-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com>
Co-authored-by: Alex Soto <asotobu@gmail.com>
Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>