AI/gpt4all

mirror of https://github.com/nomic-ai/gpt4all.git synced 2024-09-19 23:35:41 +00:00

History

Andreas Obersteiner a602f7fde7 typescript bindings maintenance (#2363 ) * remove outdated comments Signed-off-by: limez <limez@protonmail.com> * simpler build from source Signed-off-by: limez <limez@protonmail.com> * update unix build script to create .so runtimes correctly Signed-off-by: limez <limez@protonmail.com> * configure ci build type, use RelWithDebInfo for dev build script Signed-off-by: limez <limez@protonmail.com> * add clean script Signed-off-by: limez <limez@protonmail.com> * fix streamed token decoding / emoji Signed-off-by: limez <limez@protonmail.com> * remove deprecated nCtx Signed-off-by: limez <limez@protonmail.com> * update typings Signed-off-by: jacob <jacoobes@sern.dev> update typings Signed-off-by: jacob <jacoobes@sern.dev> * readme,mspell Signed-off-by: jacob <jacoobes@sern.dev> * cuda/backend logic changes + name napi methods like their js counterparts Signed-off-by: limez <limez@protonmail.com> * convert llmodel example into a test, separate test suite that can run in ci Signed-off-by: limez <limez@protonmail.com> * update examples / naming Signed-off-by: limez <limez@protonmail.com> * update deps, remove the need for binding.ci.gyp, make node-gyp-build fallback easier testable Signed-off-by: limez <limez@protonmail.com> * make sure the assert-backend-sources.js script is published, but not the others Signed-off-by: limez <limez@protonmail.com> * build correctly on windows (regression on node-gyp-build) Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> * codespell Signed-off-by: limez <limez@protonmail.com> * make sure dlhandle.cpp gets linked correctly Signed-off-by: limez <limez@protonmail.com> * add include for check_cxx_compiler_flag call during aarch64 builds Signed-off-by: limez <limez@protonmail.com> * x86 > arm64 cross compilation of runtimes and bindings Signed-off-by: limez <limez@protonmail.com> * default to cpu instead of kompute on arm64 Signed-off-by: limez <limez@protonmail.com> * formatting, more minimal example Signed-off-by: limez <limez@protonmail.com> --------- Signed-off-by: limez <limez@protonmail.com> Signed-off-by: jacob <jacoobes@sern.dev> Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> Co-authored-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com> Co-authored-by: jacob <jacoobes@sern.dev>		2024-06-03 11:12:55 -05:00
..
scripts	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
spec	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
src	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
test	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
.clang-format	typescript!: chatSessions, fixes, tokenStreams (#2045 )	2024-03-28 12:08:23 -04:00
.gitignore	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
.npmignore	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
.yarnrc.yml	vulkan support for typescript bindings, gguf support (#1390 )	2023-11-01 14:38:58 -05:00
binding.gyp	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
index.cc	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
index.h	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
package.json	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
prompt.cc	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
prompt.h	typescript!: chatSessions, fixes, tokenStreams (#2045 )	2024-03-28 12:08:23 -04:00
README.md	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00
yarn.lock	typescript bindings maintenance (#2363 )	2024-06-03 11:12:55 -05:00

README.md

GPT4All Node.js API

Native Node.js LLM bindings for all.

yarn add gpt4all@latest

npm install gpt4all@latest

pnpm install gpt4all@latest

Breaking changes in version 4!!

See Transition

See API Reference
See Examples
See Developing
GPT4ALL nodejs bindings created by jacoobes, limez and the nomic ai community, for all to use.
spare change for a college student? 🤑

Api Examples

Chat Completion

Use a chat session to keep context between completions. This is useful for efficient back and forth conversations.

import { createCompletion, loadModel } from "../src/gpt4all.js";

const model = await loadModel("orca-mini-3b-gguf2-q4_0.gguf", {
    verbose: true, // logs loaded model configuration
    device: "gpu", // defaults to 'cpu'
    nCtx: 2048, // the maximum sessions context window size.
});

// initialize a chat session on the model. a model instance can have only one chat session at a time.
const chat = await model.createChatSession({
    // any completion options set here will be used as default for all completions in this chat session
    temperature: 0.8,
    // a custom systemPrompt can be set here. note that the template depends on the model.
    // if unset, the systemPrompt that comes with the model will be used.
    systemPrompt: "### System:\nYou are an advanced mathematician.\n\n",
});

// create a completion using a string as input
const res1 = await createCompletion(chat, "What is 1 + 1?");
console.debug(res1.choices[0].message);

// multiple messages can be input to the conversation at once.
// note that if the last message is not of role 'user', an empty message will be returned.
await createCompletion(chat, [
    {
        role: "user",
        content: "What is 2 + 2?",
    },
    {
        role: "assistant",
        content: "It's 5.",
    },
]);

const res3 = await createCompletion(chat, "Could you recalculate that?");
console.debug(res3.choices[0].message);

model.dispose();

Stateless usage

You can use the model without a chat session. This is useful for one-off completions.

import { createCompletion, loadModel } from "../src/gpt4all.js";

const model = await loadModel("orca-mini-3b-gguf2-q4_0.gguf");

// createCompletion methods can also be used on the model directly.
// context is not maintained between completions.
const res1 = await createCompletion(model, "What is 1 + 1?");
console.debug(res1.choices[0].message);

// a whole conversation can be input as well.
// note that if the last message is not of role 'user', an error will be thrown.
const res2 = await createCompletion(model, [
    {
        role: "user",
        content: "What is 2 + 2?",
    },
    {
        role: "assistant",
        content: "It's 5.",
    },
    {
        role: "user",
        content: "Could you recalculate that?",
    },
]);
console.debug(res2.choices[0].message);

Embedding

import { loadModel, createEmbedding } from '../src/gpt4all.js'

const embedder = await loadModel("nomic-embed-text-v1.5.f16.gguf", { verbose: true, type: 'embedding'})

console.log(createEmbedding(embedder, "Maybe Minecraft was the friends we made along the way"));

Streaming responses

import { loadModel, createCompletionStream } from "../src/gpt4all.js";

const model = await loadModel("mistral-7b-openorca.gguf2.Q4_0.gguf", {
    device: "gpu",
});

process.stdout.write("Output: ");
const stream = createCompletionStream(model, "How are you?");
stream.tokens.on("data", (data) => {
    process.stdout.write(data);
});
//wait till stream finishes. We cannot continue until this one is done.
await stream.result;
process.stdout.write("\n");
model.dispose();

Async Generators

import { loadModel, createCompletionGenerator } from "../src/gpt4all.js";

const model = await loadModel("mistral-7b-openorca.gguf2.Q4_0.gguf");

process.stdout.write("Output: ");
const gen = createCompletionGenerator(
    model,
    "Redstone in Minecraft is Turing Complete. Let that sink in. (let it in!)"
);
for await (const chunk of gen) {
    process.stdout.write(chunk);
}

process.stdout.write("\n");
model.dispose();

Offline usage

do this b4 going offline

curl -L https://gpt4all.io/models/models3.json -o ./models3.json

import { createCompletion, loadModel } from 'gpt4all'

//make sure u downloaded the models before going offline!
const model = await loadModel('mistral-7b-openorca.gguf2.Q4_0.gguf', {
    verbose: true,
    device: 'gpu',
    modelConfigFile: "./models3.json"
});

await createCompletion(model, 'What is 1 + 1?', { verbose: true })

model.dispose();

Develop

Build Instructions

binding.gyp is compile config
Tested on Ubuntu. Everything seems to work fine
Tested on Windows. Everything works fine.
Sparse testing on mac os.
MingW script works to build the gpt4all-backend. We left it there just in case. HOWEVER, this package works only with MSVC built dlls.

Requirements

git
node.js >= 18.0.0
yarn
node-gyp
- all of its requirements.
(unix) gcc version 12
(win) msvc version 143
- Can be obtained with visual studio 2022 build tools
python 3
On Windows and Linux, building GPT4All requires the complete Vulkan SDK. You may download it from here: https://vulkan.lunarg.com/sdk/home
macOS users do not need Vulkan, as GPT4All will use Metal instead.
CUDA Toolkit >= 11.4 (you can bypass this with adding a custom flag to build step)
- Windows: There is difficulty compiling with cuda if the Visual Studio IDE is NOT present.

Build (from source)

git clone https://github.com/nomic-ai/gpt4all.git
cd gpt4all-bindings/typescript

llama.cpp git submodule for gpt4all can be possibly absent or outdated. Make sure to run

git submodule update --init --recursive

The below shell commands assume the current working directory is typescript.

Using yarn

yarn install
yarn build

Using npm

npm install
npm run build

The build:runtimes script will create runtime libraries for your platform in runtimes and build:prebuilds will create the bindings in prebuilds. build is a shortcut for both.

Test

yarn test

Source Overview

src/

Extra functions to help aid devex
Typings for the native node addon
the javascript interface

test/

simple unit testings for some functions exported.
more advanced ai testing is not handled

spec/

Average look and feel of the api
Should work assuming a model and libraries are installed locally in working directory

index.cc

The bridge between nodejs and c. Where the bindings are.

prompt.cc

Handling prompting and inference of models in a threadsafe, asynchronous way.

Known Issues

why your model may be spewing bull 💩
- The downloaded model is broken (just reinstall or download from official site)
Your model is hanging after a call to generate tokens.
- Is nPast set too high? This may cause your model to hang (03/16/2024), Linux Mint, Ubuntu 22.04
Your GPU usage is still high after node.js exits.
- Make sure to call model.dispose()!!!

Roadmap

This package has been stabilizing over time development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:

x] [Purely offline](#Offline-usage). Per the gui, which can be run completely offline, the bindings should be as well.

 ] NPM bundle size reduction via optionalDependencies strategy (need help)
*   Should include prebuilds to avoid painful node-gyp errors

x] createChatSession ( the python equivalent to create\_chat\_session )

x] generateTokens, the new name for createTokenStream. As of 3.2.0, this is released but not 100% tested. Check spec/generator.mjs!

x] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete

x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs

x] generateTokens is the new name for this^

x] proper unit testing (integrate with circle ci)

x] publish to npm under alpha tag `gpt4all@alpha`

x] have more people test on other platforms (mac tester needed)

```
x] switch to new pluggable backend
```

Changes

This repository serves as the new bindings for nodejs users.

If you were a user of these bindings, they are outdated.
Version 4 includes the follow breaking changes
- createEmbedding & EmbeddingModel.embed() returns an object, EmbeddingResult, instead of a Float32Array.
- Removed deprecated types ModelType and ModelFile
- Removed deprecated initiation of model by string path only