mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2024-10-01 01:06:10 -04:00
da95bcfb4b
* adding some native methods to cpp wrapper * gpu seems to work * typings and add availibleGpus method * fix spelling * fix syntax * more * normalize methods to conform to py * remove extra dynamic linker deps when building with vulkan * bump python version (library linking fix) * Don't link against libvulkan. * vulkan python bindings on windows fixes * Bring the vulkan backend to the GUI. * When device is Auto (the default) then we will only consider discrete GPU's otherwise fallback to CPU. * Show the device we're currently using. * Fix up the name and formatting. * init at most one vulkan device, submodule update fixes issues w/ multiple of the same gpu * Update the submodule. * Add version 2.4.15 and bump the version number. * Fix a bug where we're not properly falling back to CPU. * Sync to a newer version of llama.cpp with bugfix for vulkan. * Report the actual device we're using. * Only show GPU when we're actually using it. * Bump to new llama with new bugfix. * Release notes for v2.4.16 and bump the version. * Fallback to CPU more robustly. * Release notes for v2.4.17 and bump the version. * Bump the Python version to python-v1.0.12 to restrict the quants that vulkan recognizes. * Link against ggml in bin so we can get the available devices without loading a model. * Send actual and requested device info for those who have opt-in. * Actually bump the version. * Release notes for v2.4.18 and bump the version. * Fix for crashes on systems where vulkan is not installed properly. * Release notes for v2.4.19 and bump the version. * fix typings and vulkan build works on win * Add flatpak manifest * Remove unnecessary stuffs from manifest * Update to 2.4.19 * appdata: update software description * Latest rebase on llama.cpp with gguf support. * macos build fixes * llamamodel: metal supports all quantization types now * gpt4all.py: GGUF * pyllmodel: print specific error message * backend: port BERT to GGUF * backend: port MPT to GGUF * backend: port Replit to GGUF * backend: use gguf branch of llama.cpp-mainline * backend: use llamamodel.cpp for StarCoder * conversion scripts: cleanup * convert scripts: load model as late as possible * convert_mpt_hf_to_gguf.py: better tokenizer decoding * backend: use llamamodel.cpp for Falcon * convert scripts: make them directly executable * fix references to removed model types * modellist: fix the system prompt * backend: port GPT-J to GGUF * gpt-j: update inference to match latest llama.cpp insights - Use F16 KV cache - Store transposed V in the cache - Avoid unnecessary Q copy Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78 * chatllm: grammar fix * convert scripts: use bytes_to_unicode from transformers * convert scripts: make gptj script executable * convert scripts: add feed-forward length for better compatiblilty This GGUF key is used by all llama.cpp models with upstream support. * gptj: remove unused variables * Refactor for subgroups on mat * vec kernel. * Add q6_k kernels for vulkan. * python binding: print debug message to stderr * Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. * Bump to the latest fixes for vulkan in llama. * llamamodel: fix static vector in LLamaModel::endTokens * Switch to new models2.json for new gguf release and bump our version to 2.5.0. * Bump to latest llama/gguf branch. * chat: report reason for fallback to CPU * chat: make sure to clear fallback reason on success * more accurate fallback descriptions * differentiate between init failure and unsupported models * backend: do not use Vulkan with non-LLaMA models * Add q8_0 kernels to kompute shaders and bump to latest llama/gguf. * backend: fix build with Visual Studio generator Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This is needed because Visual Studio is a multi-configuration generator, so we do not know what the build type will be until `cmake --build` is called. Fixes #1470 * remove old llama.cpp submodules * Reorder and refresh our models2.json. * rebase on newer llama.cpp * python/embed4all: use gguf model, allow passing kwargs/overriding model * Add starcoder, rift and sbert to our models2.json. * Push a new version number for llmodel backend now that it is based on gguf. * fix stray comma in models2.json Signed-off-by: Aaron Miller <apage43@ninjawhale.com> * Speculative fix for build on mac. * chat: clearer CPU fallback messages * Fix crasher with an empty string for prompt template. * Update the language here to avoid misunderstanding. * added EM German Mistral Model * make codespell happy * issue template: remove "Related Components" section * cmake: install the GPT-J plugin (#1487) * Do not delete saved chats if we fail to serialize properly. * Restore state from text if necessary. * Another codespell attempted fix. * llmodel: do not call magic_match unless build variant is correct (#1488) * chatllm: do not write uninitialized data to stream (#1486) * mat*mat for q4_0, q8_0 * do not process prompts on gpu yet * python: support Path in GPT4All.__init__ (#1462) * llmodel: print an error if the CPU does not support AVX (#1499) * python bindings should be quiet by default * disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is nonempty * make verbose flag for retrieve_model default false (but also be overridable via gpt4all constructor) should be able to run a basic test: ```python import gpt4all model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf') print(model.generate('def fib(n):')) ``` and see no non-model output when successful * python: always check status code of HTTP responses (#1502) * Always save chats to disk, but save them as text by default. This also changes the UI behavior to always open a 'New Chat' and setting it as current instead of setting a restored chat as current. This improves usability by not requiring the user to wait if they want to immediately start chatting. * Update README.md Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com> * fix embed4all filename https://discordapp.com/channels/1076964370942267462/1093558720690143283/1161778216462192692 Signed-off-by: Aaron Miller <apage43@ninjawhale.com> * Improves Java API signatures maintaining back compatibility * python: replace deprecated pkg_resources with importlib (#1505) * Updated chat wishlist (#1351) * q6k, q4_1 mat*mat * update mini-orca 3b to gguf2, license Signed-off-by: Aaron Miller <apage43@ninjawhale.com> * convert scripts: fix AutoConfig typo (#1512) * publish config https://docs.npmjs.com/cli/v9/configuring-npm/package-json#publishconfig (#1375) merge into my branch * fix appendBin * fix gpu not initializing first * sync up * progress, still wip on destructor * some detection work * untested dispose method * add js side of dispose * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/src/gpt4all.d.ts Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/src/gpt4all.js Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * Update gpt4all-bindings/typescript/src/util.js Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * fix tests * fix circleci for nodejs * bump version --------- Signed-off-by: Aaron Miller <apage43@ninjawhale.com> Signed-off-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> Co-authored-by: Aaron Miller <apage43@ninjawhale.com> Co-authored-by: Adam Treat <treat.adam@gmail.com> Co-authored-by: Akarshan Biswas <akarshan.biswas@gmail.com> Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com> Co-authored-by: umarmnaq <102142660+umarmnaq@users.noreply.github.com> Co-authored-by: Alex Soto <asotobu@gmail.com> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>
61 lines
1.8 KiB
C++
61 lines
1.8 KiB
C++
#include "prompt.h"
|
|
|
|
|
|
TsfnContext::TsfnContext(Napi::Env env, const PromptWorkContext& pc)
|
|
: deferred_(Napi::Promise::Deferred::New(env)), pc(pc) {
|
|
}
|
|
namespace {
|
|
static std::string *res;
|
|
}
|
|
|
|
bool response_callback(int32_t token_id, const char *response) {
|
|
*res += response;
|
|
return token_id != -1;
|
|
}
|
|
bool recalculate_callback (bool isrecalculating) {
|
|
return isrecalculating;
|
|
};
|
|
bool prompt_callback (int32_t tid) {
|
|
return true;
|
|
};
|
|
|
|
// The thread entry point. This takes as its arguments the specific
|
|
// threadsafe-function context created inside the main thread.
|
|
void threadEntry(TsfnContext* context) {
|
|
static std::mutex mtx;
|
|
std::lock_guard<std::mutex> lock(mtx);
|
|
res = &context->pc.res;
|
|
// Perform a call into JavaScript.
|
|
napi_status status =
|
|
context->tsfn.BlockingCall(&context->pc,
|
|
[](Napi::Env env, Napi::Function jsCallback, PromptWorkContext* pc) {
|
|
llmodel_prompt(
|
|
pc->inference_,
|
|
pc->question.c_str(),
|
|
&prompt_callback,
|
|
&response_callback,
|
|
&recalculate_callback,
|
|
&pc->prompt_params
|
|
);
|
|
});
|
|
|
|
if (status != napi_ok) {
|
|
Napi::Error::Fatal(
|
|
"ThreadEntry",
|
|
"Napi::ThreadSafeNapi::Function.NonBlockingCall() failed");
|
|
}
|
|
// Release the thread-safe function. This decrements the internal thread
|
|
// count, and will perform finalization since the count will reach 0.
|
|
context->tsfn.Release();
|
|
}
|
|
|
|
void FinalizerCallback(Napi::Env env,
|
|
void* finalizeData,
|
|
TsfnContext* context) {
|
|
// Resolve the Promise previously returned to JS
|
|
context->deferred_.Resolve(Napi::String::New(env, context->pc.res));
|
|
// Wait for the thread to finish executing before proceeding.
|
|
context->nativeThread.join();
|
|
delete context;
|
|
}
|