typescript: publish alpha on npm and lots of cleanup, documentation, and more (#913)

* fix typo so padding can be accessed * Small cleanups for settings dialog. * Fix the build. * localdocs * Fixup the rescan. Fix debug output. * Add remove folder implementation. * Remove this signal as unnecessary for now. * Cleanup of the database, better chunking, better matching. * Add new reverse prompt for new localdocs context feature. * Add a new muted text color. * Turn off the debugging messages by default. * Add prompt processing and localdocs to the busy indicator in UI. * Specify a large number of suffixes we will search for now. * Add a collection list to support a UI. * Add a localdocs tab. * Start fleshing out the localdocs ui. * Begin implementing the localdocs ui in earnest. * Clean up the settings dialog for localdocs a bit. * Add more of the UI for selecting collections for chats. * Complete the settings for localdocs. * Adds the collections to serialize and implement references for localdocs. * Store the references separately so they are not sent to datalake. * Add context link to references. * Don't use the full path in reference text. * Various fixes to remove unnecessary warnings. * Add a newline * ignore rider and vscode dirs * create test project and basic model loading tests * make sample print usage and cleaner * Get the backend as well as the client building/working with msvc. * Libraries named differently on msvc. * Bump the version number. * This time remember to bump the version right after a release. * rm redundant json * More precise condition * Nicer handling of missing model directory. Correct exception message. * Log where the model was found * Concise model matching * reduce nesting, better error reporting * convert to f-strings * less magic number * 1. Cleanup the interrupted download 2. with-syntax * Redundant else * Do not ignore explicitly passed 4 threads * Correct return type * Add optional verbosity * Correct indentation of the multiline error message * one funcion to append .bin suffix * hotfix default verbose optioin * export hidden types and fix prompt() type * tiny typo (#739) * Update README.md (#738) * Update README.md fix golang gpt4all import path Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> * Update README.md Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> --------- Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> * fix(training instructions): model repo name (#728) Signed-off-by: Chase McDougall <chasemcdougall@hotmail.com> * C# Bindings - Prompt formatting (#712) * Added support for custom prompt formatting * more docs added * bump version * clean up cc files and revert things * LocalDocs documentation initial (#761) * LocalDocs documentation initial * Improved localdocs documentation (#762) * Improved localdocs documentation * Improved localdocs documentation * Improved localdocs documentation * Improved localdocs documentation * New tokenizer implementation for MPT and GPT-J Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling * buf_ref.into() can be const now * add tokenizer readme w/ instructions for convert script * Revert "add tokenizer readme w/ instructions for convert script" This reverts commit 9c15d1f83e. * Revert "buf_ref.into() can be const now" This reverts commit 840e011b75. * Revert "New tokenizer implementation for MPT and GPT-J" This reverts commit ee3469ba6c. * Fix remove model from model download for regular models. * Fixed formatting of localdocs docs (#770) * construct and return the correct reponse when the request is a chat completion * chore: update typings to keep consistent with python api * progress, updating createCompletion to mirror py api * update spec, unfinished backend * prebuild binaries for package distribution using prebuildify/node-gyp-build * Get rid of blocking behavior for regenerate response. * Add a label to the model loading visual indicator. * Use the new MyButton for the regenerate response button. * Add a hover and pressed to the visual indication of MyButton. * Fix wording of this accessible description. * Some color and theme enhancements to make the UI contrast a bit better. * Make the comboboxes align in UI. * chore: update namespace and fix prompt bug * fix linux build * add roadmap * Fix offset of prompt/response icons for smaller text. * Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. * Add a custom busy indicator to further align look and feel across platforms. * Draw the indicator for combobox to ensure it looks the same on all platforms. * Fix warning. * Use the proper text color for sending messages. * Fixup the plus new chat button. * Make all the toolbuttons highlight on hover. * Advanced avxonly autodetection (#744) * Advanced avxonly requirement detection * chore: support llamaversion >= 3 and ggml default * Dlopen better implementation management (Version 2) * Add fixme's and clean up a bit. * Documentation improvements on LocalDocs (#790) * Update gpt4all_chat.md Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * typo Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> --------- Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Adapt code * Makefile changes (WIP to test) * Debug * Adapt makefile * Style * Implemented logging mechanism (#785) * Cleaned up implementation management (#787) * Cleaned up implementation management * Initialize LLModel::m_implementation to nullptr * llmodel.h: Moved dlhandle fwd declare above LLModel class * Fix compile * Fixed double-free in LLModel::Implementation destructor * Allow user to specify custom search path via $GPT4ALL_IMPLEMENTATIONS_PATH (#789) * Drop leftover include * Add ldl in gpt4all.go for dynamic linking (#797) * Logger should also output to stderr * Fix MSVC Build, Update C# Binding Scripts * Update gpt4all_chat.md (#800) Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * C# Bindings - improved logging (#714) * added optional support for .NET logging * bump version and add missing alpha suffix * avoid creating additional namespace for extensions * prefer NullLogger/NullLoggerFactory over null-conditional ILogger to avoid errors --------- Signed-off-by: mvenditto <venditto.matteo@gmail.com> * Make localdocs work with server mode. * Better name for database results. * Fix for stale references after we regenerate. * Don't hardcode these. * Fix bug with resetting context with chatgpt model. * Trying to shrink the copy+paste code and do more code sharing between backend model impl. * Remove this as it is no longer useful. * Try and fix build on mac. * Fix mac build again. * Add models/release.json to github repo to allow PRs * Fixed spelling error in models.json to make CI happy Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * updated bindings code for updated C api * load all model libs * model creation is failing... debugging * load libs correctly * fixed finding model libs * cleanup * cleanup * more cleanup * small typo fix * updated binding.gyp * Fixed model type for GPT-J (#815) Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * Fixed tons of warnings and clazy findings (#811) * Some tweaks to UI to make window resizing smooth and flow nicely. * Min constraints on about dialog. * Prevent flashing of white on resize. * Actually use the theme dark color for window background. * Add the ability to change the directory via text field not just 'browse' button. * add scripts to build dlls * markdown doc gen * add scripts, nearly done moving breaking changes * merge with main * oops, fixed comment * more meaningful name * leave for testing * Only default mlock on macOS where swap seems to be a problem Repeating the change that once was done in https://github.com/nomic-ai/gpt4all/pull/663 but then was overriden by 9c6c09cbd2 Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com> * Add a collection immediately and show a placeholder + busy indicator in localdocs settings. * some tweaks to optional types and defaults * mingw script for windows compilation * Update README.md huggingface -> Hugging Face Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Backend prompt dedup (#822) * Deduplicated prompt() function code * Better error handling when the model fails to load. * We no longer have an avx_only repository and better error handling for minimum hardware requirements. (#833) * Update build_and_run.md (#834) Signed-off-by: AT <manyoso@users.noreply.github.com> * Trying out a new feature to download directly from huggingface. * Try again with the url. * Allow for download of models hosted on third party hosts. * Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. * Update to latest llama.cpp * Remove older models that are not as popular. (#837) * Remove older models that are not as popular. * Update models.json Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> --------- Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Update models.json (#838) Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Update models.json Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * feat: finalyl compiled on windows (MSVC) goadman * update README and spec and promisfy createCompletion * update d.ts * Make installers work with mac/windows for big backend change. * Need this so the linux installer packages it as a dependency. * Try and fix mac. * Fix compile on mac. * These need to be installed for them to be packaged and work for both mac and windows. * Fix installers for windows and linux. * Fix symbol resolution on windows. * updated pypi version * Release notes for version 2.4.5 (#853) * Update README.md (#854) Signed-off-by: AT <manyoso@users.noreply.github.com> * Documentation for model sideloading (#851) * Documentation for model sideloading Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Update gpt4all_chat.md Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> --------- Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Speculative fix for windows llama models with installer. * Revert "Speculative fix for windows llama models with installer." This reverts commit add725d1eb. * Revert "Fix bug with resetting context with chatgpt model." (#859) This reverts commit e0dcf6a14f. * Fix llama models on linux and windows. * Bump the version. * New release notes * Set thread counts after loading model (#836) * Update gpt4all_faq.md (#861) Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Supports downloading officially supported models not hosted on gpt4all R2 * Replit Model (#713) * porting over replit code model to gpt4all * replaced memory with kv_self struct * continuing debug * welp it built but lot of sus things * working model loading and somewhat working generate.. need to format response? * revert back to semi working version * finally got rid of weird formatting * figured out problem is with python bindings - this is good to go for testing * addressing PR feedback * output refactor * fixed prompt reponse collection * cleanup * addressing PR comments * building replit backend with new ggmlver code * chatllm replit and clean python files * cleanup * updated replit to match new llmodel api * match llmodel api and change size_t to Token * resolve PR comments * replit model commit comment * Synced llama.cpp.cmake with upstream (#887) * Fix for windows. * fix: build script * Revert "Synced llama.cpp.cmake with upstream (#887)" This reverts commit 5c5e10c1f5. * Update README.md (#906) Add PyPI link and add clickable, more specific link to documentation Signed-off-by: Claudius Ellsel <claudius.ellsel@live.de> * Update CollectionsDialog.qml (#856) Phrasing for localdocs Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * sampling: remove incorrect offset for n_vocab (#900) no effect, but avoids a *potential* bug later if we use actualVocabSize - which is for when a model has a larger embedding tensor/# of output logits than actually trained token to allow room for adding extras in finetuning - presently all of our models have had "placeholder" tokens in the vocab so this hasn't broken anything, but if the sizes did differ we want the equivalent of `logits[actualVocabSize:]` (the start point is unchanged), not `logits[-actualVocabSize:]` (this.) * non-llama: explicitly greedy sampling for temp<=0 (#901) copied directly from llama.cpp - without this temp=0.0 will just scale all the logits to infinity and give bad output * work on thread safety and cleaning up, adding object option * chore: cleanup tests and spec * refactor for object based startup * more docs * Circleci builds for Linux, Windows, and macOS for gpt4all-chat. * more docs * Synced llama.cpp.cmake with upstream * add lock file to ignore codespell * Move usage in Python bindings readme to own section (#907) Have own section for short usage example, as it is not specific to local build Signed-off-by: Claudius Ellsel <claudius.ellsel@live.de> * Always sync for circleci. * update models json with replit model * Forgot to bump. * Change the default values for generation in GUI * Removed double-static from variables in replit.cpp The anonymous namespace already makes it static. Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * Generator in Python Bindings - streaming yields tokens at a time (#895) * generator method * cleanup * bump version number for clarity * added replace in decode to avoid unicodedecode exception * revert back to _build_prompt * Do auto detection by default in C++ API Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * remove comment * add comments for index.h * chore: add new models and edit ignore files and documentation * llama on Metal (#885) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> * Revert "llama on Metal (#885)" This reverts commit b59ce1c6e7. * add more readme stuff and debug info * spell * Metal+LLama take two (#929) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> * add prebuilts for windows * Add new solution for context links that does not force regular markdown (#938) in responses which is disruptive to code completions in responses. * add prettier * split out non llm related methods into util.js, add listModels method * add prebuild script for creating all platforms bindings at once * check in prebuild linux/so libs and allow distribution of napi prebuilds * apply autoformatter * move constants in config.js, add loadModel and retrieveModel methods * Clean up the context links a bit. * Don't interfere with selection. * Add code blocks and python syntax highlighting. * Spelling error. * Add c++/c highighting support. * Fix some bugs with bash syntax and add some C23 keywords. * Bugfixes for prompt syntax highlighting. * Try and fix a false positive from codespell. * When recalculating context we can't erase the BOS. * Fix Windows MSVC AVX builds - bug introduced in 557c82b5ed - currently getting: `warning C5102: ignoring invalid command-line macro definition '/arch:AVX2'` - solution is to use `_options(...)` not `_definitions(...)` * remove .so unneeded path --------- Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> Signed-off-by: Chase McDougall <chasemcdougall@hotmail.com> Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> Signed-off-by: mvenditto <venditto.matteo@gmail.com> Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com> Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Signed-off-by: AT <manyoso@users.noreply.github.com> Signed-off-by: Claudius Ellsel <claudius.ellsel@live.de> Co-authored-by: Justin Wang <justinwang46@gmail.com> Co-authored-by: Adam Treat <treat.adam@gmail.com> Co-authored-by: redthing1 <redthing1@alt.icu> Co-authored-by: Konstantin Gukov <gukkos@gmail.com> Co-authored-by: Richard Guo <richardg7890@gmail.com> Co-authored-by: Joseph Mearman <joseph@mearman.co.uk> Co-authored-by: Nandakumar <nandagunasekaran@gmail.com> Co-authored-by: Chase McDougall <chasemcdougall@hotmail.com> Co-authored-by: mvenditto <venditto.matteo@gmail.com> Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com> Co-authored-by: Aaron Miller <apage43@ninjawhale.com> Co-authored-by: FoivosC <christoulakis.foivos@adlittle.com> Co-authored-by: limez <limez@protonmail.com> Co-authored-by: AT <manyoso@users.noreply.github.com> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> Co-authored-by: niansa <anton-sa@web.de> Co-authored-by: mudler <mudler@mocaccino.org> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Tim Miller <innerlogic4321@gmail.com> Co-authored-by: Peter Gagarinov <pgagarinov@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Claudius Ellsel <claudius.ellsel@live.de> Co-authored-by: pingpongching <golololologol02@gmail.com> Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: Cosmic Snow <cosmic-snow@mailfence.com>
2024-10-01 01:06:10 -04:00 · 2023-06-12 14:00:20 -05:00 · 2023-06-12 14:00:20 -05:00 · 8d53614444
commit 8d53614444
parent 44bf91855d
25 changed files with 4073 additions and 486 deletions
--- a/.codespellrc
+++ b/.codespellrc
@ -1,3 +1,3 @@
 [codespell]
 ignore-words-list = blong, belong
-skip = .git,*.pdf,*.svg
+skip = .git,*.pdf,*.svg,*.lock
--- a/gpt4all-bindings/typescript/.gitignore
+++ b/gpt4all-bindings/typescript/.gitignore
@ -1,2 +1,3 @@
 node_modules/
 build/
 prebuilds/
--- a/gpt4all-bindings/typescript/.npmignore
+++ b/gpt4all-bindings/typescript/.npmignore
@ -1,3 +1,4 @@
 test/
 spec/
-
+scripts/
 build
--- a/gpt4all-bindings/typescript/README.md
+++ b/gpt4all-bindings/typescript/README.md
@ -2,12 +2,32 @@
 The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.
 - created by [jacoobes](https://github.com/jacoobes) and [nomic ai](https://home.nomic.ai) :D, for all to use.
 - will maintain this repository when possible, new feature requests will be handled through nomic
 ### Code (alpha)
 ```js
 import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } from '../src/gpt4all.js'
 const ll = new LLModel({
    model_name: 'ggml-vicuna-7b-1.1-q4_2.bin',
    model_path: './', 
    library_path: DEFAULT_LIBRARIES_DIRECTORY
 });
 const response = await createCompletion(ll, [
    { role : 'system', content: 'You are meant to be annoying and unhelpful.'  },
    { role : 'user', content: 'What is 1 + 1?'  } 
 ]);
 ```
 ### API 
 - The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
 - [docs](./docs/api.md)
 ### Build Instructions
- As of 05/21/2023, Tested on windows (MSVC) only. (somehow got it to work on MSVC 🤯)
+- As of 05/21/2023, Tested on windows (MSVC). (somehow got it to work on MSVC 🤯)
    - binding.gyp is compile config
 - Tested on Ubuntu. Everything seems to work fine
 - MingW works as well to build the gpt4all-backend. HOWEVER, this package works only with MSVC built dlls.
 ### Requirements
 - git
@ -31,6 +51,15 @@ cd gpt4all-bindings/typescript
 ```sh
 git submodule update --init --depth 1 --recursive
 ```
 **AS OF NEW BACKEND** to build the backend,
 ```sh
 yarn build:backend
 ```
 This will build platform-dependent dynamic libraries, and will be located in runtimes/(platform)/native The only current way to use them is to put them in the current working directory of your application. That is, **WHEREVER YOU RUN YOUR NODE APPLICATION**
 - llama-xxxx.dll is required.
 - According to whatever model you are using, you'll need to select the proper model loader.
    - For example, if you running an Mosaic MPT model, you will need to select the mpt-(buildvariant).(dynamiclibrary)
 ### Test
 ```sh
 yarn test
@ -48,9 +77,22 @@ yarn test
 #### spec/
 - Average look and feel of the api
- Should work assuming a model is installed locally in working directory
+- Should work assuming a model and libraries are installed locally in working directory
 #### index.cc
 - The bridge between nodejs and c. Where the bindings are.
 #### prompt.cc 
 - Handling prompting and inference of models in a threadsafe, asynchronous way.
 #### docs/
 - Autogenerated documentation using the script `yarn docs:build`
 ### Roadmap
 This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
 - [x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
 - [ ] createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)
 - [ ] proper unit testing (integrate with circle ci)
 - [ ] publish to npm under alpha tag `gpt4all@alpha`
 - [ ] have more people test on other platforms (mac tester needed)
 - [x] switch to new pluggable backend
--- a/gpt4all-bindings/typescript/binding.gyp
+++ b/gpt4all-bindings/typescript/binding.gyp
@ -1,45 +1,55 @@
 {
  "targets": [
    {
-      "target_name": "gpt4allts", # gpt4all-ts will cause compile error
+      "target_name": "gpt4all", # gpt4all-ts will cause compile error
-      "cflags!": [ "-fno-exceptions" ],
+      "cflags_cc!": [ "-fno-exceptions"],
      "cflags_cc!": [ "-fno-exceptions" ],
      "include_dirs": [
        "<!@(node -p \"require('node-addon-api').include\")",
        "../../gpt4all-backend/llama.cpp/", # need to include llama.cpp because the include paths for examples/common.h include llama.h relatively
        "../../gpt4all-backend",
      ],
-      "sources": [ # is there a better way to do this 
+      "sources": [ 
-        "../../gpt4all-backend/llama.cpp/examples/common.cpp",
+        # PREVIOUS VERSION: had to required the sources, but with newest changes do not need to
-        "../../gpt4all-backend/llama.cpp/ggml.c",
+        #"../../gpt4all-backend/llama.cpp/examples/common.cpp",
-        "../../gpt4all-backend/llama.cpp/llama.cpp",
+        #"../../gpt4all-backend/llama.cpp/ggml.c",
-        "../../gpt4all-backend/utils.cpp", 
+        #"../../gpt4all-backend/llama.cpp/llama.cpp",
        # "../../gpt4all-backend/utils.cpp", 
        "../../gpt4all-backend/llmodel_c.cpp",
-        "../../gpt4all-backend/gptj.cpp",
+        "../../gpt4all-backend/llmodel.cpp",
-        "../../gpt4all-backend/llamamodel.cpp",
+        "prompt.cc",
        "../../gpt4all-backend/mpt.cpp",
        "stdcapture.cc",
        "index.cc",
       ],
      "conditions": [
        ['OS=="mac"', {
            'defines': [
-                'NAPI_CPP_EXCEPTIONS'
+                'LIB_FILE_EXT=".dylib"',
-            ],
+                'NAPI_CPP_EXCEPTIONS',
            ]
        }],
        ['OS=="win"', {
            'defines': [
                'LIB_FILE_EXT=".dll"',
                'NAPI_CPP_EXCEPTIONS',
                "__AVX2__" # allows SIMD: https://discord.com/channels/1076964370942267462/1092290790388150272/1107564673957630023
            ],
            "msvs_settings": {
                "VCCLCompilerTool": {
                    "AdditionalOptions": [
                        "/std:c++20",
-                        "/EHsc"
+                        "/EHsc",
-                    ], 
+                  ], 
-                },  
+                },
            },
        }],
        ['OS=="linux"', {
            'defines': [
                'LIB_FILE_EXT=".so"',
                'NAPI_CPP_EXCEPTIONS',
            ],
            'cflags_cc!': [
                '-fno-rtti',
            ],
            'cflags_cc': [
                '-std=c++20'
            ]
        }]
      ]
    }]
--- a/gpt4all-bindings/typescript/docs/api.md
+++ b/gpt4all-bindings/typescript/docs/api.md
@ -0,0 +1,623 @@
 <!-- Generated by documentation.js. Update this documentation by updating the source code. -->
 ### Table of Contents
 *   [download][1]
    *   [Parameters][2]
    *   [Examples][3]
 *   [DownloadOptions][4]
    *   [location][5]
    *   [debug][6]
    *   [url][7]
 *   [DownloadController][8]
    *   [cancel][9]
    *   [promise][10]
 *   [ModelType][11]
 *   [ModelFile][12]
    *   [gptj][13]
    *   [llama][14]
    *   [mpt][15]
 *   [type][16]
 *   [LLModel][17]
    *   [constructor][18]
        *   [Parameters][19]
    *   [type][20]
    *   [name][21]
    *   [stateSize][22]
    *   [threadCount][23]
    *   [setThreadCount][24]
        *   [Parameters][25]
    *   [raw\_prompt][26]
        *   [Parameters][27]
    *   [isModelLoaded][28]
    *   [setLibraryPath][29]
        *   [Parameters][30]
    *   [getLibraryPath][31]
 *   [createCompletion][32]
    *   [Parameters][33]
    *   [Examples][34]
 *   [CompletionOptions][35]
    *   [verbose][36]
    *   [hasDefaultHeader][37]
    *   [hasDefaultFooter][38]
 *   [PromptMessage][39]
    *   [role][40]
    *   [content][41]
 *   [prompt\_tokens][42]
 *   [completion\_tokens][43]
 *   [total\_tokens][44]
 *   [CompletionReturn][45]
    *   [model][46]
    *   [usage][47]
    *   [choices][48]
 *   [CompletionChoice][49]
    *   [message][50]
 *   [LLModelPromptContext][51]
    *   [logits\_size][52]
    *   [tokens\_size][53]
    *   [n\_past][54]
    *   [n\_ctx][55]
    *   [n\_predict][56]
    *   [top\_k][57]
    *   [top\_p][58]
    *   [temp][59]
    *   [n\_batch][60]
    *   [repeat\_penalty][61]
    *   [repeat\_last\_n][62]
    *   [context\_erase][63]
 *   [createTokenStream][64]
    *   [Parameters][65]
 *   [DEFAULT\_DIRECTORY][66]
 *   [DEFAULT\_LIBRARIES\_DIRECTORY][67]
 ## download
 Initiates the download of a model file of a specific model type.
 By default this downloads without waiting. use the controller returned to alter this behavior.
 ### Parameters
 *   `model` **[ModelFile][12]** The model file to be downloaded.
 *   `options` **[DownloadOptions][4]** to pass into the downloader. Default is { location: (cwd), debug: false }.
 ### Examples
 ```javascript
 const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
 controller.promise().then(() => console.log('Downloaded!'))
 ```
 *   Throws **[Error][68]** If the model already exists in the specified location.
 *   Throws **[Error][68]** If the model cannot be found at the specified url.
 Returns **[DownloadController][8]** object that allows controlling the download process.
 ## DownloadOptions
 Options for the model download process.
 ### location
 location to download the model.
 Default is process.cwd(), or the current working directory
 Type: [string][69]
 ### debug
 Debug mode -- check how long it took to download in seconds
 Type: [boolean][70]
 ### url
 Remote download url. Defaults to `https://gpt4all.io/models`
 Type: [string][69]
 ## DownloadController
 Model download controller.
 ### cancel
 Cancel the request to download from gpt4all website if this is called.
 Type: function (): void
 ### promise
 Convert the downloader into a promise, allowing people to await and manage its lifetime
 Type: function (): [Promise][71]\<void>
 ## ModelType
 Type of the model
 Type: (`"gptj"` | `"llama"` | `"mpt"`)
 ## ModelFile
 Full list of models available
 ### gptj
 List of GPT-J Models
 Type: (`"ggml-gpt4all-j-v1.3-groovy.bin"` | `"ggml-gpt4all-j-v1.2-jazzy.bin"` | `"ggml-gpt4all-j-v1.1-breezy.bin"` | `"ggml-gpt4all-j.bin"`)
 ### llama
 List Llama Models
 Type: (`"ggml-gpt4all-l13b-snoozy.bin"` | `"ggml-vicuna-7b-1.1-q4_2.bin"` | `"ggml-vicuna-13b-1.1-q4_2.bin"` | `"ggml-wizardLM-7B.q4_2.bin"` | `"ggml-stable-vicuna-13B.q4_2.bin"` | `"ggml-nous-gpt4-vicuna-13b.bin"`)
 ### mpt
 List of MPT Models
 Type: (`"ggml-mpt-7b-base.bin"` | `"ggml-mpt-7b-chat.bin"` | `"ggml-mpt-7b-instruct.bin"`)
 ## type
 Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
 Type: [ModelType][11]
 ## LLModel
 LLModel class representing a language model.
 This is a base class that provides common functionality for different types of language models.
 ### constructor
 Initialize a new LLModel.
 #### Parameters
 *   `path` **[string][69]** Absolute path to the model file.
 <!---->
 *   Throws **[Error][68]** If the model file does not exist.
 ### type
 either 'gpt', mpt', or 'llama' or undefined
 Returns **([ModelType][11] | [undefined][72])**&#x20;
 ### name
 The name of the model.
 Returns **[ModelFile][12]**&#x20;
 ### stateSize
 Get the size of the internal state of the model.
 NOTE: This state data is specific to the type of model you have created.
 Returns **[number][73]** the size in bytes of the internal state of the model
 ### threadCount
 Get the number of threads used for model inference.
 The default is the number of physical cores your computer has.
 Returns **[number][73]** The number of threads used for model inference.
 ### setThreadCount
 Set the number of threads used for model inference.
 #### Parameters
 *   `newNumber` **[number][73]** The new number of threads.
 Returns **void**&#x20;
 ### raw\_prompt
 Prompt the model with a given input and optional parameters.
 This is the raw output from std out.
 Use the prompt function exported for a value
 #### Parameters
 *   `q` **[string][69]** The prompt input.
 *   `params` **Partial<[LLModelPromptContext][51]>?** Optional parameters for the prompt context.
 Returns **any** The result of the model prompt.
 ### isModelLoaded
 Whether the model is loaded or not.
 Returns **[boolean][70]**&#x20;
 ### setLibraryPath
 Where to search for the pluggable backend libraries
 #### Parameters
 *   `s` **[string][69]**&#x20;
 Returns **void**&#x20;
 ### getLibraryPath
 Where to get the pluggable backend libraries
 Returns **[string][69]**&#x20;
 ## createCompletion
 The nodejs equivalent to python binding's chat\_completion
 ### Parameters
 *   `llmodel` **[LLModel][17]** The language model object.
 *   `messages` **[Array][74]<[PromptMessage][39]>** The array of messages for the conversation.
 *   `options` **[CompletionOptions][35]** The options for creating the completion.
 ### Examples
 ```javascript
 const llmodel = new LLModel(model)
 const messages = [ 
 { role: 'system', message: 'You are a weather forecaster.' },
 { role: 'user', message: 'should i go out today?' } ]
 const completion = await createCompletion(llmodel, messages, {
 verbose: true,
 temp: 0.9,
 })
 console.log(completion.choices[0].message.content)
 // No, it's going to be cold and rainy.
 ```
 Returns **[CompletionReturn][45]** The completion result.
 ## CompletionOptions
 **Extends Partial\<LLModelPromptContext>**
 The options for creating the completion.
 ### verbose
 Indicates if verbose logging is enabled.
 Type: [boolean][70]
 ### hasDefaultHeader
 Indicates if the default header is included in the prompt.
 Type: [boolean][70]
 ### hasDefaultFooter
 Indicates if the default footer is included in the prompt.
 Type: [boolean][70]
 ## PromptMessage
 A message in the conversation, identical to OpenAI's chat message.
 ### role
 The role of the message.
 Type: (`"system"` | `"assistant"` | `"user"`)
 ### content
 The message content.
 Type: [string][69]
 ## prompt\_tokens
 The number of tokens used in the prompt.
 Type: [number][73]
 ## completion\_tokens
 The number of tokens used in the completion.
 Type: [number][73]
 ## total\_tokens
 The total number of tokens used.
 Type: [number][73]
 ## CompletionReturn
 The result of the completion, similar to OpenAI's format.
 ### model
 The model name.
 Type: [ModelFile][12]
 ### usage
 Token usage report.
 Type: {prompt\_tokens: [number][73], completion\_tokens: [number][73], total\_tokens: [number][73]}
 ### choices
 The generated completions.
 Type: [Array][74]<[CompletionChoice][49]>
 ## CompletionChoice
 A completion choice, similar to OpenAI's format.
 ### message
 Response message
 Type: [PromptMessage][39]
 ## LLModelPromptContext
 Model inference arguments for generating completions.
 ### logits\_size
 The size of the raw logits vector.
 Type: [number][73]
 ### tokens\_size
 The size of the raw tokens vector.
 Type: [number][73]
 ### n\_past
 The number of tokens in the past conversation.
 Type: [number][73]
 ### n\_ctx
 The number of tokens possible in the context window.
 Type: [number][73]
 ### n\_predict
 The number of tokens to predict.
 Type: [number][73]
 ### top\_k
 The top-k logits to sample from.
 Type: [number][73]
 ### top\_p
 The nucleus sampling probability threshold.
 Type: [number][73]
 ### temp
 The temperature to adjust the model's output distribution.
 Type: [number][73]
 ### n\_batch
 The number of predictions to generate in parallel.
 Type: [number][73]
 ### repeat\_penalty
 The penalty factor for repeated tokens.
 Type: [number][73]
 ### repeat\_last\_n
 The number of last tokens to penalize.
 Type: [number][73]
 ### context\_erase
 The percentage of context to erase if the context window is exceeded.
 Type: [number][73]
 ## createTokenStream
 TODO: Help wanted to implement this
 ### Parameters
 *   `llmodel` **[LLModel][17]**&#x20;
 *   `messages` **[Array][74]<[PromptMessage][39]>**&#x20;
 *   `options` **[CompletionOptions][35]**&#x20;
 Returns **function (ll: [LLModel][17]): AsyncGenerator<[string][69]>**&#x20;
 ## DEFAULT\_DIRECTORY
 From python api:
 models will be stored in (homedir)/.cache/gpt4all/\`
 Type: [string][69]
 ## DEFAULT\_LIBRARIES\_DIRECTORY
 From python api:
 The default path for dynamic libraries to be stored.
 You may separate paths by a semicolon to search in multiple areas.
 This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd.
 Type: [string][69]
 [1]: #download
 [2]: #parameters
 [3]: #examples
 [4]: #downloadoptions
 [5]: #location
 [6]: #debug
 [7]: #url
 [8]: #downloadcontroller
 [9]: #cancel
 [10]: #promise
 [11]: #modeltype
 [12]: #modelfile
 [13]: #gptj
 [14]: #llama
 [15]: #mpt
 [16]: #type
 [17]: #llmodel
 [18]: #constructor
 [19]: #parameters-1
 [20]: #type-1
 [21]: #name
 [22]: #statesize
 [23]: #threadcount
 [24]: #setthreadcount
 [25]: #parameters-2
 [26]: #raw_prompt
 [27]: #parameters-3
 [28]: #ismodelloaded
 [29]: #setlibrarypath
 [30]: #parameters-4
 [31]: #getlibrarypath
 [32]: #createcompletion
 [33]: #parameters-5
 [34]: #examples-1
 [35]: #completionoptions
 [36]: #verbose
 [37]: #hasdefaultheader
 [38]: #hasdefaultfooter
 [39]: #promptmessage
 [40]: #role
 [41]: #content
 [42]: #prompt_tokens
 [43]: #completion_tokens
 [44]: #total_tokens
 [45]: #completionreturn
 [46]: #model
 [47]: #usage
 [48]: #choices
 [49]: #completionchoice
 [50]: #message
 [51]: #llmodelpromptcontext
 [52]: #logits_size
 [53]: #tokens_size
 [54]: #n_past
 [55]: #n_ctx
 [56]: #n_predict
 [57]: #top_k
 [58]: #top_p
 [59]: #temp
 [60]: #n_batch
 [61]: #repeat_penalty
 [62]: #repeat_last_n
 [63]: #context_erase
 [64]: #createtokenstream
 [65]: #parameters-6
 [66]: #default_directory
 [67]: #default_libraries_directory
 [68]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error
 [69]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String
 [70]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean
 [71]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise
 [72]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined
 [73]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number
 [74]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array
--- a/gpt4all-bindings/typescript/index.cc
+++ b/gpt4all-bindings/typescript/index.cc
@ -1,68 +1,95 @@
-#include <napi.h>
+#include "index.h"
 #include <iostream>
 #include "llmodel_c.h" 
 #include "llmodel.h"
 #include "gptj.h"
 #include "llamamodel.h"
 #include "mpt.h"
 #include "stdcapture.h"
-class NodeModelWrapper : public Napi::ObjectWrap<NodeModelWrapper> {
+Napi::FunctionReference NodeModelWrapper::constructor;
-public:
+
-  static Napi::Object Init(Napi::Env env, Napi::Object exports) {
+Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
-    Napi::Function func = DefineClass(env, "LLModel", {
+    Napi::Function self = DefineClass(env, "LLModel", {
-      InstanceMethod("type",  &NodeModelWrapper::getType),
+       InstanceMethod("type",  &NodeModelWrapper::getType),
-      InstanceMethod("name", &NodeModelWrapper::getName),
+       InstanceMethod("isModelLoaded", &NodeModelWrapper::IsModelLoaded),
-      InstanceMethod("stateSize", &NodeModelWrapper::StateSize),
+       InstanceMethod("name", &NodeModelWrapper::getName),
-      InstanceMethod("raw_prompt", &NodeModelWrapper::Prompt),
+       InstanceMethod("stateSize", &NodeModelWrapper::StateSize),
-      InstanceMethod("setThreadCount", &NodeModelWrapper::SetThreadCount),
+       InstanceMethod("raw_prompt", &NodeModelWrapper::Prompt),
-      InstanceMethod("threadCount", &NodeModelWrapper::ThreadCount),
+       InstanceMethod("setThreadCount", &NodeModelWrapper::SetThreadCount),
       InstanceMethod("threadCount", &NodeModelWrapper::ThreadCount),
       InstanceMethod("getLibraryPath", &NodeModelWrapper::GetLibraryPath),
    });
-
+    // Keep a static reference to the constructor
-    Napi::FunctionReference* constructor = new Napi::FunctionReference();
+    //
-    *constructor = Napi::Persistent(func);
+    constructor = Napi::Persistent(self);
-    env.SetInstanceData(constructor);
+    constructor.SuppressDestruct();
-
+    return self;
    exports.Set("LLModel", func);
    return exports;
  }
-
+ 
-  Napi::Value getType(const Napi::CallbackInfo& info) 
+  Napi::Value NodeModelWrapper::getType(const Napi::CallbackInfo& info) 
  {
    if(type.empty()) {
        return info.Env().Undefined();
    } 
    return Napi::String::New(info.Env(), type);
  }
-  NodeModelWrapper(const Napi::CallbackInfo& info) : Napi::ObjectWrap<NodeModelWrapper>(info) 
+  NodeModelWrapper::NodeModelWrapper(const Napi::CallbackInfo& info) : Napi::ObjectWrap<NodeModelWrapper>(info) 
  {
    auto env = info.Env();
-    std::string weights_path = info[0].As<Napi::String>().Utf8Value();
+    fs::path model_path;
-    const char *c_weights_path = weights_path.c_str();
+    std::string full_weight_path;
-    
+    //todo
-    inference_ = create_model_set_type(c_weights_path);
+    std::string library_path = ".";
    std::string model_name;
    if(info[0].IsString()) {
        model_path = info[0].As<Napi::String>().Utf8Value();
        full_weight_path = model_path.string();
        std::cout << "DEPRECATION: constructor accepts object now. Check docs for more.\n";
    } else {
        auto config_object = info[0].As<Napi::Object>();
        model_name = config_object.Get("model_name").As<Napi::String>();
        model_path = config_object.Get("model_path").As<Napi::String>().Utf8Value(); 
        if(config_object.Has("model_type")) {
            type = config_object.Get("model_type").As<Napi::String>(); 
        }
        full_weight_path = (model_path / fs::path(model_name)).string();
        if(config_object.Has("library_path")) {
            library_path = config_object.Get("library_path").As<Napi::String>(); 
        } else {
            library_path = ".";
        }
    }
    llmodel_set_implementation_search_path(library_path.c_str());
    llmodel_error* e = nullptr;
    inference_ = std::make_shared<llmodel_model>(llmodel_model_create2(full_weight_path.c_str(), "auto", e));
    if(e != nullptr) {
       Napi::Error::New(env, e->message).ThrowAsJavaScriptException(); 
       return;
    }
    if(GetInference() == nullptr) {
       std::cerr << "Tried searching libraries in \"" << library_path << "\"" <<  std::endl;
       std::cerr << "Tried searching for model weight in \"" << full_weight_path << "\"" << std::endl;
       Napi::Error::New(env, "Had an issue creating llmodel object, inference is null").ThrowAsJavaScriptException(); 
       return;
    }
-    auto success = llmodel_loadModel(inference_, c_weights_path);
+    auto success = llmodel_loadModel(GetInference(), full_weight_path.c_str());
    if(!success) {
        Napi::Error::New(env, "Failed to load model at given path").ThrowAsJavaScriptException(); 
        return;
    }
-    name = weights_path.substr(weights_path.find_last_of("/\\") + 1);
+    name = model_name.empty() ? model_path.filename().string() : model_name;
  };
-  ~NodeModelWrapper() {
+  //NodeModelWrapper::~NodeModelWrapper() {
-    // destroying the model manually causes exit code 3221226505, why?
+    //GetInference().reset();
-    // However, bindings seem to operate fine without destructing pointer
+  //}
-    //llmodel_model_destroy(inference_);
+
  Napi::Value NodeModelWrapper::IsModelLoaded(const Napi::CallbackInfo& info) {
    return Napi::Boolean::New(info.Env(), llmodel_isModelLoaded(GetInference()));
  }
-  Napi::Value IsModelLoaded(const Napi::CallbackInfo& info) {
+  Napi::Value NodeModelWrapper::StateSize(const Napi::CallbackInfo& info) {
    return Napi::Boolean::New(info.Env(), llmodel_isModelLoaded(inference_));
  }
  Napi::Value StateSize(const Napi::CallbackInfo& info) {
    // Implement the binding for the stateSize method
-    return Napi::Number::New(info.Env(), static_cast<int64_t>(llmodel_get_state_size(inference_)));
+    return Napi::Number::New(info.Env(), static_cast<int64_t>(llmodel_get_state_size(GetInference())));
  }
 /**
 * Generate a response using the model.
@ -73,16 +100,14 @@ public:
 * @param recalculate_callback A callback function for handling recalculation requests.
 * @param ctx A pointer to the llmodel_prompt_context structure.
 */
-  Napi::Value Prompt(const Napi::CallbackInfo& info) {
+  Napi::Value NodeModelWrapper::Prompt(const Napi::CallbackInfo& info) {
    auto env = info.Env();
    std::string question;
    if(info[0].IsString()) {
        question = info[0].As<Napi::String>().Utf8Value();
    } else {
-        Napi::Error::New(env, "invalid string argument").ThrowAsJavaScriptException();
+        Napi::Error::New(info.Env(), "invalid string argument").ThrowAsJavaScriptException();
-        return env.Undefined();
+        return info.Env().Undefined();
    }
    //defaults copied from python bindings
    llmodel_prompt_context promptContext = {
@ -101,127 +126,90 @@ public:
         };
    if(info[1].IsObject())
    {
-        auto inputObject = info[1].As<Napi::Object>();
+       auto inputObject = info[1].As<Napi::Object>();
        // Extract and assign the properties
-        if (inputObject.Has("logits") || inputObject.Has("tokens")) {
+       if (inputObject.Has("logits") || inputObject.Has("tokens")) {
-            Napi::Error::New(env, "Invalid input: 'logits' or 'tokens' properties are not allowed").ThrowAsJavaScriptException();
+           Napi::Error::New(info.Env(), "Invalid input: 'logits' or 'tokens' properties are not allowed").ThrowAsJavaScriptException();
-            return env.Undefined();
+           return info.Env().Undefined();
-        }
+       }
             // Assign the remaining properties
-             if(inputObject.Has("n_past")) {
+       if(inputObject.Has("n_past")) 
-                 promptContext.n_past = inputObject.Get("n_past").As<Napi::Number>().Int32Value();
+            promptContext.n_past = inputObject.Get("n_past").As<Napi::Number>().Int32Value();
-             }
+       if(inputObject.Has("n_ctx")) 
-             if(inputObject.Has("n_ctx")) {
+            promptContext.n_ctx = inputObject.Get("n_ctx").As<Napi::Number>().Int32Value();
-                 promptContext.n_ctx = inputObject.Get("n_ctx").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("n_predict"))
-             }
+            promptContext.n_predict = inputObject.Get("n_predict").As<Napi::Number>().Int32Value();
-             if(inputObject.Has("n_predict")) {
+       if(inputObject.Has("top_k"))
-                 promptContext.n_predict = inputObject.Get("n_predict").As<Napi::Number>().Int32Value();
+            promptContext.top_k = inputObject.Get("top_k").As<Napi::Number>().Int32Value();
-             }
+       if(inputObject.Has("top_p")) 
-             if(inputObject.Has("top_k")) {
+            promptContext.top_p = inputObject.Get("top_p").As<Napi::Number>().FloatValue();
-                 promptContext.top_k = inputObject.Get("top_k").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("temp")) 
-             }
+            promptContext.temp = inputObject.Get("temp").As<Napi::Number>().FloatValue();
-             if(inputObject.Has("top_p")) {
+       if(inputObject.Has("n_batch")) 
-                 promptContext.top_p = inputObject.Get("top_p").As<Napi::Number>().FloatValue();
+            promptContext.n_batch = inputObject.Get("n_batch").As<Napi::Number>().Int32Value();
-             }
+       if(inputObject.Has("repeat_penalty")) 
-             if(inputObject.Has("temp")) {
+            promptContext.repeat_penalty = inputObject.Get("repeat_penalty").As<Napi::Number>().FloatValue();
-                 promptContext.temp = inputObject.Get("temp").As<Napi::Number>().FloatValue();
+       if(inputObject.Has("repeat_last_n")) 
-             }
+            promptContext.repeat_last_n = inputObject.Get("repeat_last_n").As<Napi::Number>().Int32Value();
-             if(inputObject.Has("n_batch")) {
+       if(inputObject.Has("context_erase")) 
-                 promptContext.n_batch = inputObject.Get("n_batch").As<Napi::Number>().Int32Value();
+            promptContext.context_erase = inputObject.Get("context_erase").As<Napi::Number>().FloatValue();
             }
             if(inputObject.Has("repeat_penalty")) {
                 promptContext.repeat_penalty = inputObject.Get("repeat_penalty").As<Napi::Number>().FloatValue();
             }
             if(inputObject.Has("repeat_last_n")) {
                 promptContext.repeat_last_n = inputObject.Get("repeat_last_n").As<Napi::Number>().Int32Value();
             }
             if(inputObject.Has("context_erase")) {
                 promptContext.context_erase = inputObject.Get("context_erase").As<Napi::Number>().FloatValue();
             }
    }
-    //    custom callbacks are weird with the gpt4all c bindings: I need to turn Napi::Functions into  raw c function pointers,
+    //copy to protect llmodel resources when splitting to new thread
    //    but it doesn't seem like its possible? (TODO, is it possible?)
-    //    if(info[1].IsFunction()) {
+    llmodel_prompt_context copiedPrompt = promptContext;
-    //        Napi::Callback cb = *info[1].As<Napi::Function>();
+    std::string copiedQuestion = question;
-    //    }
+    PromptWorkContext pc = {
-
+        copiedQuestion,
-
+        inference_.load(),
-    // For now, simple capture of stdout
+        copiedPrompt,
-    // possible TODO: put this on a libuv async thread. (AsyncWorker)
+    };
-    CoutRedirect cr;
+    auto threadSafeContext = new TsfnContext(env, pc);
-    llmodel_prompt(inference_, question.c_str(), &prompt_callback, &response_callback, &recalculate_callback,  &promptContext);
+    threadSafeContext->tsfn = Napi::ThreadSafeFunction::New(
-    return Napi::String::New(env, cr.getString());
+        env,                                    // Environment
        info[2].As<Napi::Function>(),           // JS function from caller
        "PromptCallback",                       // Resource name
        0,                                      // Max queue size (0 = unlimited).
        1,                                      // Initial thread count
        threadSafeContext,                      // Context,
        FinalizerCallback,                      // Finalizer
        (void*)nullptr                          // Finalizer data
    );
    threadSafeContext->nativeThread = std::thread(threadEntry, threadSafeContext);
    return threadSafeContext->deferred_.Promise();
  }
-  void SetThreadCount(const Napi::CallbackInfo& info) {
+  void NodeModelWrapper::SetThreadCount(const Napi::CallbackInfo& info) {
    if(info[0].IsNumber()) {
-        llmodel_setThreadCount(inference_, info[0].As<Napi::Number>().Int64Value());
+        llmodel_setThreadCount(GetInference(), info[0].As<Napi::Number>().Int64Value());
    } else {
        Napi::Error::New(info.Env(), "Could not set thread count: argument 1 is NaN").ThrowAsJavaScriptException(); 
        return;
    }
  }
-  Napi::Value getName(const Napi::CallbackInfo& info) {
+
  Napi::Value NodeModelWrapper::getName(const Napi::CallbackInfo& info) {
    return Napi::String::New(info.Env(), name);
  }
-  Napi::Value ThreadCount(const Napi::CallbackInfo& info) {
+  Napi::Value NodeModelWrapper::ThreadCount(const Napi::CallbackInfo& info) {
-    return Napi::Number::New(info.Env(), llmodel_threadCount(inference_));
+    return Napi::Number::New(info.Env(), llmodel_threadCount(GetInference()));
  }
-private:
+  Napi::Value NodeModelWrapper::GetLibraryPath(const Napi::CallbackInfo& info) {
-  llmodel_model inference_;
+      return Napi::String::New(info.Env(),
-  std::string type;
+        llmodel_get_implementation_search_path());
  std::string name;
  //wrapper cb to capture output into stdout.then, CoutRedirect captures this 
  // and writes it to a file
  static bool response_callback(int32_t tid, const char* resp) 
  {
    if(tid != -1) {
        std::cout<<std::string(resp);
        return true;
    }
    return false;
  }
-  static bool prompt_callback(int32_t tid) { return true; }
+  llmodel_model NodeModelWrapper::GetInference() {
-  static bool recalculate_callback(bool isrecalculating) { return  isrecalculating; }
+    return *inference_.load();
  // Had to use this instead of the c library in order 
  // set the type of the model loaded.
  // causes side effect: type is mutated;
  llmodel_model create_model_set_type(const char* c_weights_path) 
  {
    uint32_t magic;
    llmodel_model model;
    FILE *f = fopen(c_weights_path, "rb");
    fread(&magic, sizeof(magic), 1, f);
    if (magic == 0x67676d6c) {
        model = llmodel_gptj_create();  
        type = "gptj";
    }
    else if (magic == 0x67676a74) {
        model = llmodel_llama_create(); 
        type = "llama";
    }
    else if (magic == 0x67676d6d) {
        model = llmodel_mpt_create();   
        type = "mpt";
    }
    else  {fprintf(stderr, "Invalid model file\n");}
    fclose(f);
    return model;
  }
 };
 //Exports Bindings
 Napi::Object Init(Napi::Env env, Napi::Object exports) {
-  return NodeModelWrapper::Init(env, exports);
+  exports["LLModel"] = NodeModelWrapper::GetClass(env);
  return exports;
 }
 NODE_API_MODULE(NODE_GYP_MODULE_NAME, Init)
--- a/gpt4all-bindings/typescript/index.h
+++ b/gpt4all-bindings/typescript/index.h
@ -0,0 +1,45 @@
 #include <napi.h>
 #include "llmodel.h"
 #include <iostream>
 #include "llmodel_c.h" 
 #include "prompt.h"
 #include <atomic>
 #include <memory>
 #include <filesystem>
 namespace fs = std::filesystem;
 class NodeModelWrapper: public Napi::ObjectWrap<NodeModelWrapper> {
 public:
  NodeModelWrapper(const Napi::CallbackInfo &);
  //~NodeModelWrapper();
  Napi::Value getType(const Napi::CallbackInfo& info);
  Napi::Value IsModelLoaded(const Napi::CallbackInfo& info);
  Napi::Value StateSize(const Napi::CallbackInfo& info);
  /**
   * Prompting the model. This entails spawning a new thread and adding the response tokens
   * into a thread local string variable.
   */
  Napi::Value Prompt(const Napi::CallbackInfo& info);
  void SetThreadCount(const Napi::CallbackInfo& info);
  Napi::Value getName(const Napi::CallbackInfo& info);
  Napi::Value ThreadCount(const Napi::CallbackInfo& info);
  /*
   * The path that is used to search for the dynamic libraries
   */
  Napi::Value GetLibraryPath(const Napi::CallbackInfo& info);
  /**
   * Creates the LLModel class
   */
  static Napi::Function GetClass(Napi::Env);
  llmodel_model GetInference();
 private:
  /**
   * The underlying inference that interfaces with the C interface
   */
  std::atomic<std::shared_ptr<llmodel_model>> inference_;
  std::string type;
  // corresponds to LLModel::name() in typescript
  std::string name;
  static Napi::FunctionReference constructor;
 };
--- a/gpt4all-bindings/typescript/package.json
+++ b/gpt4all-bindings/typescript/package.json
@ -1,19 +1,32 @@
 {
-  "name": "gpt4all-ts",
+  "name": "gpt4all",
  "version": "2.0.0",
  "packageManager": "yarn@3.5.1",
-  "gypfile": true,
+  "main": "src/gpt4all.js",
  "repository": "nomic-ai/gpt4all",
  "scripts": {
-    "test": "node ./test/index.mjs"
+    "test": "node ./test/index.mjs",
    "build:backend": "node scripts/build.js",
    "install": "node-gyp-build",
    "prebuild": "node scripts/prebuild.js",
    "docs:build": "documentation build ./src/gpt4all.d.ts --parse-extension d.ts --format md --output docs/api.md"
  },
  "dependencies": {
-    "bindings": "^1.5.0",
+    "mkdirp": "^3.0.1",
-    "node-addon-api": "^6.1.0"
+    "node-addon-api": "^6.1.0",
    "node-gyp-build": "^4.6.0"
  },
  "devDependencies": {
-    "@types/node": "^20.1.5"
+    "@types/node": "^20.1.5",
    "documentation": "^14.0.2",
    "prebuildify": "^5.0.1",
    "prettier": "^2.8.8"
  },
  "engines": {
    "node": ">= 18.x.x"
  },
  "prettier": {
    "endOfLine": "lf",
    "tabWidth": 4
  }
 }
--- a/gpt4all-bindings/typescript/prompt.cc
+++ b/gpt4all-bindings/typescript/prompt.cc
@ -0,0 +1,62 @@
 #include "prompt.h"
 TsfnContext::TsfnContext(Napi::Env env, const PromptWorkContext& pc) 
    : deferred_(Napi::Promise::Deferred::New(env)), pc(pc) {
 }
 std::mutex mtx;
 static thread_local std::string res;
 bool response_callback(int32_t token_id, const char *response) {
   res+=response;
   return token_id != -1;
 }
 bool recalculate_callback (bool isrecalculating) {
    return isrecalculating; 
 };
 bool prompt_callback (int32_t tid) {
    return true; 
 };
 // The thread entry point. This takes as its arguments the specific
 // threadsafe-function context created inside the main thread.
 void threadEntry(TsfnContext* context) {
  std::lock_guard<std::mutex> lock(mtx);
  // Perform a call into JavaScript.
  napi_status status =
    context->tsfn.NonBlockingCall(&context->pc,
    [](Napi::Env env, Napi::Function jsCallback, PromptWorkContext* pc) {
        llmodel_prompt(
            *pc->inference_,
            pc->question.c_str(),
            &prompt_callback,
            &response_callback,
            &recalculate_callback,
            &pc->prompt_params
        );
        jsCallback.Call({ Napi::String::New(env, res)} );
        res.clear();
  });
  if (status != napi_ok) {
    Napi::Error::Fatal(
        "ThreadEntry",
        "Napi::ThreadSafeNapi::Function.NonBlockingCall() failed");
  }
  // Release the thread-safe function. This decrements the internal thread
  // count, and will perform finalization since the count will reach 0.
  context->tsfn.Release();
 }
 void FinalizerCallback(Napi::Env env,
                       void* finalizeData,
                       TsfnContext* context) {
  // Join the thread
  context->nativeThread.join();
  // Resolve the Promise previously returned to JS via the CreateTSFN method.
  context->deferred_.Resolve(Napi::Boolean::New(env, true));
  delete context;
 }
--- a/gpt4all-bindings/typescript/prompt.h
+++ b/gpt4all-bindings/typescript/prompt.h
@ -0,0 +1,42 @@
 #ifndef TSFN_CONTEXT_H
 #define TSFN_CONTEXT_H
 #include "napi.h"
 #include "llmodel_c.h"
 #include <thread>
 #include <mutex>
 #include <iostream>
 #include <atomic>
 #include <memory>
 struct PromptWorkContext {
    std::string question;
    std::shared_ptr<llmodel_model> inference_;
    llmodel_prompt_context prompt_params;
 };
 struct TsfnContext {
 public:
  TsfnContext(Napi::Env env, const PromptWorkContext &pc);
  std::thread nativeThread;
  Napi::Promise::Deferred deferred_;
  PromptWorkContext pc;
  Napi::ThreadSafeFunction tsfn;
  // Some data to pass around
  // int ints[ARRAY_LENGTH];
 };
 // The thread entry point. This takes as its arguments the specific
 // threadsafe-function context created inside the main thread.
 void threadEntry(TsfnContext* context);
 // The thread-safe function finalizer callback. This callback executes
 // at destruction of thread-safe function, taking as arguments the finalizer
 // data and threadsafe-function context.
 void FinalizerCallback(Napi::Env env, void* finalizeData, TsfnContext* context);
 bool response_callback(int32_t token_id, const char *response);
 bool recalculate_callback (bool isrecalculating);
 bool prompt_callback (int32_t tid); 
 #endif  // TSFN_CONTEXT_H
--- a/gpt4all-bindings/typescript/scripts/build.js
+++ b/gpt4all-bindings/typescript/scripts/build.js
@ -0,0 +1,17 @@
 const { spawn } = require("node:child_process");
 const { resolve } = require("path");
 const args = process.argv.slice(2);
 const platform = process.platform;
 //windows 64bit or 32
 if (platform === "win32") {
    const path = "scripts/build_msvc.bat";
    spawn(resolve(path), ["/Y", ...args], { shell: true, stdio: "inherit" });
    process.on("data", (s) => console.log(s.toString()));
 } else if (platform === "linux" || platform === "darwin") {
    const path = "scripts/build_unix.sh";
    const bash = spawn(`sh`, [path, ...args]);
    bash.stdout.on("data", (s) => console.log(s.toString()), {
        stdio: "inherit",
    });
 }
--- a/gpt4all-bindings/typescript/scripts/build_mingw.ps1
+++ b/gpt4all-bindings/typescript/scripts/build_mingw.ps1
@ -0,0 +1,16 @@
 $ROOT_DIR = '.\runtimes\win-x64'
 $BUILD_DIR = '.\runtimes\win-x64\build\mingw'
 $LIBS_DIR = '.\runtimes\win-x64\native'
 # cleanup env
 Remove-Item -Force -Recurse $ROOT_DIR -ErrorAction SilentlyContinue | Out-Null
 mkdir $BUILD_DIR | Out-Null
 mkdir $LIBS_DIR  | Out-Null
 # build
 cmake -G "MinGW Makefiles" -S ..\..\gpt4all-backend -B $BUILD_DIR -DLLAMA_AVX2=ON
 cmake --build $BUILD_DIR --parallel --config Release
 # copy native dlls
 # cp "C:\ProgramData\chocolatey\lib\mingw\tools\install\mingw64\bin\*dll" $LIBS_DIR
 cp "$BUILD_DIR\bin\*.dll" $LIBS_DIR
--- a/gpt4all-bindings/typescript/scripts/build_unix.sh
+++ b/gpt4all-bindings/typescript/scripts/build_unix.sh
@ -0,0 +1,31 @@
 #!/bin/sh
 SYSNAME=$(uname -s)
 if [ "$SYSNAME" = "Linux" ]; then
  BASE_DIR="runtimes/linux-x64"
  LIB_EXT="so"
 elif [ "$SYSNAME" = "Darwin" ]; then
  BASE_DIR="runtimes/osx"
  LIB_EXT="dylib"
 elif [ -n "$SYSNAME" ]; then
  echo "Unsupported system: $SYSNAME" >&2
  exit 1
 else
  echo "\"uname -s\" failed" >&2
  exit 1
 fi
 NATIVE_DIR="$BASE_DIR/native"
 BUILD_DIR="$BASE_DIR/build"
 rm -rf "$BASE_DIR"
 mkdir -p "$NATIVE_DIR" "$BUILD_DIR"
 cmake -S ../../gpt4all-backend -B "$BUILD_DIR" &&
 cmake --build "$BUILD_DIR" -j --config Release && {
  cp "$BUILD_DIR"/libllmodel.$LIB_EXT "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libgptj*.$LIB_EXT   "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libllama*.$LIB_EXT  "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libmpt*.$LIB_EXT    "$NATIVE_DIR"/
 }
--- a/gpt4all-bindings/typescript/scripts/prebuild.js
+++ b/gpt4all-bindings/typescript/scripts/prebuild.js
@ -0,0 +1,50 @@
 const prebuildify = require("prebuildify");
 async function createPrebuilds(combinations) {
    for (const { platform, arch } of combinations) {
        const opts = {
            platform,
            arch,
            napi: true,
        };
        try {
            await createPrebuild(opts);
            console.log(
                `Build succeeded for platform ${opts.platform} and architecture ${opts.arch}`
            );
        } catch (err) {
            console.error(
                `Error building for platform ${opts.platform} and architecture ${opts.arch}:`,
                err
            );
        }
    }
 }
 function createPrebuild(opts) {
    return new Promise((resolve, reject) => {
        prebuildify(opts, (err) => {
            if (err) {
                reject(err);
            } else {
                resolve();
            }
        });
    });
 }
 const prebuildConfigs = [
    { platform: "win32", arch: "x64" },
    { platform: "win32", arch: "arm64" },
    // { platform: 'win32', arch: 'armv7' },
    { platform: "darwin", arch: "x64" },
    { platform: "darwin", arch: "arm64" },
    // { platform: 'darwin', arch: 'armv7' },
    { platform: "linux", arch: "x64" },
    { platform: "linux", arch: "arm64" },
    { platform: "linux", arch: "armv7" },
 ];
 createPrebuilds(prebuildConfigs)
    .then(() => console.log("All builds succeeded"))
    .catch((err) => console.error("Error building:", err));
--- a/gpt4all-bindings/typescript/spec/index.mjs
+++ b/gpt4all-bindings/typescript/spec/index.mjs
@ -1,14 +1,15 @@
-import { LLModel, prompt, createCompletion } from '../src/gpt4all.js'
+import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } from '../src/gpt4all.js'
 const ll = new LLModel("./ggml-vicuna-7b-1.1-q4_2.bin");
 const ll = new LLModel({
    model_name: 'ggml-vicuna-7b-1.1-q4_2.bin',
    model_path: './', 
    library_path: DEFAULT_LIBRARIES_DIRECTORY
 });
 try {
   class Extended extends LLModel {
   }
 } catch(e) {
    console.log("Extending from native class gone wrong " + e)
 }
@ -20,13 +21,26 @@ ll.setThreadCount(5);
 console.log("thread count " + ll.threadCount());
 ll.setThreadCount(4);
 console.log("thread count " + ll.threadCount());
 console.log("name " + ll.name());
 console.log("type: " + ll.type());
 console.log("Default directory for models", DEFAULT_DIRECTORY);
 console.log("Default directory for libraries", DEFAULT_LIBRARIES_DIRECTORY);
-
+console.log(await createCompletion(
 console.log(createCompletion(
    ll,
-    prompt`${"header"} ${"prompt"}`, {
+    [
-        verbose: true,
+        { role : 'system', content: 'You are a girl who likes playing league of legends.'  },
-        prompt: 'hello! Say something thought provoking.'
+        { role : 'user', content: 'What is the best top laner to play right now?'  }, 
-    }
+    ],
    { verbose: false}
 ));
 console.log(await createCompletion(
    ll,
    [
        { role : 'user', content: 'What is the best bottom laner to play right now?'  }, 
    ],
 ))
--- a/gpt4all-bindings/typescript/src/config.js
+++ b/gpt4all-bindings/typescript/src/config.js
@ -0,0 +1,22 @@
 const os = require("node:os");
 const path = require("node:path");
 const DEFAULT_DIRECTORY = path.resolve(os.homedir(), ".cache/gpt4all");
 const librarySearchPaths = [
    path.join(DEFAULT_DIRECTORY, "libraries"),
    path.resolve("./libraries"),
    path.resolve(
        __dirname,
        "..",
        `runtimes/${process.platform}-${process.arch}/native`
    ),
    process.cwd(),
 ];
 const DEFAULT_LIBRARIES_DIRECTORY = librarySearchPaths.join(";");
 module.exports = {
    DEFAULT_DIRECTORY,
    DEFAULT_LIBRARIES_DIRECTORY,
 };
--- a/gpt4all-bindings/typescript/src/gpt4all.d.ts
+++ b/gpt4all-bindings/typescript/src/gpt4all.d.ts
@ -1,162 +1,310 @@
 /// <reference types="node" />
-declare module 'gpt4all-ts';
+declare module "gpt4all";
 export * from "./util.d.ts";
 /** Type of the model */
 type ModelType = "gptj" | "llama" | "mpt" | "replit";
-
+/**
-interface LLModelPromptContext {
+ * Full list of models available
-  
+ */
-  // Size of the raw logits vector
+interface ModelFile {
-  logits_size: number;
+    /** List of GPT-J Models */
-  
+    gptj:
-  // Size of the raw tokens vector
+        | "ggml-gpt4all-j-v1.3-groovy.bin"
-  tokens_size: number;
+        | "ggml-gpt4all-j-v1.2-jazzy.bin"
-  
+        | "ggml-gpt4all-j-v1.1-breezy.bin"
-  // Number of tokens in past conversation
+        | "ggml-gpt4all-j.bin";
-  n_past: number;
+    /** List Llama Models */
-  
+    llama:
-  // Number of tokens possible in context window
+        | "ggml-gpt4all-l13b-snoozy.bin"
-  n_ctx: number;
+        | "ggml-vicuna-7b-1.1-q4_2.bin"
-  
+        | "ggml-vicuna-13b-1.1-q4_2.bin"
-  // Number of tokens to predict
+        | "ggml-wizardLM-7B.q4_2.bin"
-  n_predict: number;
+        | "ggml-stable-vicuna-13B.q4_2.bin"
-  
+        | "ggml-nous-gpt4-vicuna-13b.bin"
-  // Top k logits to sample from
+        | "ggml-v3-13b-hermes-q5_1.bin";
-  top_k: number;
+    /** List of MPT Models */
-  
+    mpt:
-  // Nucleus sampling probability threshold
+        | "ggml-mpt-7b-base.bin"
-  top_p: number;
+        | "ggml-mpt-7b-chat.bin"
-  
+        | "ggml-mpt-7b-instruct.bin";
-  // Temperature to adjust model's output distribution
+    /** List of Replit Models */
-  temp: number;
+    replit: "ggml-replit-code-v1-3b.bin";
  // Number of predictions to generate in parallel
  n_batch: number;
  // Penalty factor for repeated tokens
  repeat_penalty: number;
  // Last n tokens to penalize
  repeat_last_n: number;
  // Percent of context to erase if we exceed the context window
  context_erase: number;
 }
-
+//mirrors py options
 interface LLModelOptions {
    /**
     * Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
     */
    type?: ModelType;
    model_name: ModelFile[ModelType];
    model_path: string;
    library_path?: string;
 }
 /**
 * LLModel class representing a language model.
 * This is a base class that provides common functionality for different types of language models.
 */
 declare class LLModel {
-    //either 'gpt', mpt', or 'llama'
+    /**
-    type() : ModelType;
+     * Initialize a new LLModel.
-    //The name of the model
+     * @param path Absolute path to the model file.
-    name(): ModelFile;
+     * @throws {Error} If the model file does not exist.
     */
    constructor(path: string);
    constructor(options: LLModelOptions);
    /** either 'gpt', mpt', or 'llama' or undefined */
    type(): ModelType | undefined;
    /** The name of the model. */
    name(): ModelFile;
    /**
     * Get the size of the internal state of the model.
     * NOTE: This state data is specific to the type of model you have created.
     * @return the size in bytes of the internal state of the model
     */
    stateSize(): number;
    /**
     * Get the number of threads used for model inference.
     * The default is the number of physical cores your computer has.
     * @returns The number of threads used for model inference.
     */
-    threadCount() : number;
+    threadCount(): number;
    /**
     * Set the number of threads used for model inference.
     * @param newNumber The new number of threads.
     */
    setThreadCount(newNumber: number): void;
     /**
      * Prompt the model with a given input and optional parameters.
      * This is the raw output from std out.
      * Use the prompt function exported for a value 
      * @param q The prompt input.
      * @param params Optional parameters for the prompt context.
      * @returns The result of the model prompt.
     */
    raw_prompt(q: string, params?: Partial<LLModelPromptContext>) : unknown; //todo work on return type
 }
 interface DownloadController {
    //Cancel the request to download from gpt4all website if this is called.
    cancel: () => void;
    //Convert the downloader into a promise, allowing people to await and manage its lifetime
    promise: () => Promise<void>
 }
 export interface DownloadConfig {
    /**
-     * location to download the model.
+     * Prompt the model with a given input and optional parameters.
-     * Default is process.cwd(), or the current working directory
+     * This is the raw output from std out.
     * Use the prompt function exported for a value
     * @param q The prompt input.
     * @param params Optional parameters for the prompt context.
     * @returns The result of the model prompt.
     */
-    location: string;
+    raw_prompt(q: string, params: Partial<LLModelPromptContext>, callback: (res: string) => void): void; // TODO work on return type
    /**
-     * Debug mode -- check how long it took to download in seconds
+     * Whether the model is loaded or not.
     */
-    debug: boolean;
+    isModelLoaded(): boolean;
    /**
-     * Default link = https://gpt4all.io/models`
+     * Where to search for the pluggable backend libraries
     * This property overrides the default.
     */
-    link?: string
+    setLibraryPath(s: string): void;
-}
+    /**
-/**
+     * Where to get the pluggable backend libraries
- * Initiates the download of a model file of a specific model type.
+     */
- * By default this downloads without waiting. use the controller returned to alter this behavior.
+    getLibraryPath(): string;
 * @param {ModelFile[ModelType]} m - The model file to be downloaded.
 * @param {Record<string, unknown>} op - options to pass into the downloader. Default is { location: (cwd), debug: false }.
 * @returns {DownloadController} A DownloadController object that allows controlling the download process.
 */
 declare function download(m: ModelFile[ModelType], op: { location: string, debug: boolean, link?:string }): DownloadController 
 type ModelType = 'gptj' | 'llama' | 'mpt';
 /*
 * A nice interface for intellisense of all possibly models. 
 */
 interface ModelFile {
    'gptj': | "ggml-gpt4all-j-v1.3-groovy.bin"
            | "ggml-gpt4all-j-v1.2-jazzy.bin"
            | "ggml-gpt4all-j-v1.1-breezy.bin"
            | "ggml-gpt4all-j.bin";
    'llama':| "ggml-gpt4all-l13b-snoozy.bin"
            | "ggml-vicuna-7b-1.1-q4_2.bin"
            | "ggml-vicuna-13b-1.1-q4_2.bin"
            | "ggml-wizardLM-7B.q4_2.bin"
            | "ggml-stable-vicuna-13B.q4_2.bin"
            | "ggml-nous-gpt4-vicuna-13b.bin"
    'mpt':  | "ggml-mpt-7b-base.bin"
            | "ggml-mpt-7b-chat.bin"
            | "ggml-mpt-7b-instruct.bin"
 }
-interface ExtendedOptions {
+interface LoadModelOptions {
    modelPath?: string;
    librariesPath?: string;
    allowDownload?: boolean;
    verbose?: boolean;
    system?: string;
    header?: string;
    prompt: string;
    promptEntries?: Record<string, unknown>
 }
-type PromptTemplate = (...args: string[]) => string;
+declare function loadModel(
    modelName: string,
    options?: LoadModelOptions
 ): Promise<LLModel>;
 /**
 * The nodejs equivalent to python binding's chat_completion
 * @param {LLModel} llmodel - The language model object.
 * @param {PromptMessage[]} messages - The array of messages for the conversation.
 * @param {CompletionOptions} options - The options for creating the completion.
 * @returns {CompletionReturn} The completion result.
 * @example
 * const llmodel = new LLModel(model)
 * const messages = [
 * { role: 'system', message: 'You are a weather forecaster.' },
 * { role: 'user', message: 'should i go out today?' } ]
 * const completion = await createCompletion(llmodel, messages, {
 *  verbose: true,
 *  temp: 0.9,
 * })
 * console.log(completion.choices[0].message.content)
 * // No, it's going to be cold and rainy.
 */
 declare function createCompletion(
-    model: LLModel,
+    llmodel: LLModel,
-    pt: PromptTemplate,
+    messages: PromptMessage[],
-    options: LLModelPromptContext&ExtendedOptions 
+    options?: CompletionOptions
-) : string
+): Promise<CompletionReturn>;
-function prompt(
+/**
-    strings: TemplateStringsArray
+ * The options for creating the completion.
-): PromptTemplate
+ */
 interface CompletionOptions extends Partial<LLModelPromptContext> {
    /**
     * Indicates if verbose logging is enabled.
     * @default true
     */
    verbose?: boolean;
    /**
     * Indicates if the default header is included in the prompt.
     * @default true
     */
    hasDefaultHeader?: boolean;
-export { LLModel, LLModelPromptContext, ModelType, download, DownloadController, prompt, ExtendedOptions, createCompletion }
+    /**
     * Indicates if the default footer is included in the prompt.
     * @default true
     */
    hasDefaultFooter?: boolean;
 }
 /**
 * A message in the conversation, identical to OpenAI's chat message.
 */
 interface PromptMessage {
    /** The role of the message. */
    role: "system" | "assistant" | "user";
    /** The message content. */
    content: string;
 }
 /**
 * The result of the completion, similar to OpenAI's format.
 */
 interface CompletionReturn {
    /** The model name.
     * @type {ModelFile}
     */
    model: ModelFile[ModelType];
    /** Token usage report. */
    usage: {
        /** The number of tokens used in the prompt. */
        prompt_tokens: number;
        /** The number of tokens used in the completion. */
        completion_tokens: number;
        /** The total number of tokens used. */
        total_tokens: number;
    };
    /** The generated completions. */
    choices: CompletionChoice[];
 }
 /**
 * A completion choice, similar to OpenAI's format.
 */
 interface CompletionChoice {
    /** Response message */
    message: PromptMessage;
 }
 /**
 * Model inference arguments for generating completions.
 */
 interface LLModelPromptContext {
    /** The size of the raw logits vector. */
    logits_size: number;
    /** The size of the raw tokens vector. */
    tokens_size: number;
    /** The number of tokens in the past conversation. */
    n_past: number;
    /** The number of tokens possible in the context window.
     * @default 1024
     */
    n_ctx: number;
    /** The number of tokens to predict.
     * @default 128
     * */
    n_predict: number;
    /** The top-k logits to sample from.
     * @default 40
     * */
    top_k: number;
    /** The nucleus sampling probability threshold.
     * @default 0.9
     * */
    top_p: number;
    /** The temperature to adjust the model's output distribution.
     * @default 0.72
     * */
    temp: number;
    /** The number of predictions to generate in parallel.
     * @default 8
     * */
    n_batch: number;
    /** The penalty factor for repeated tokens.
     * @default 1
     * */
    repeat_penalty: number;
    /** The number of last tokens to penalize.
     * @default 10
     * */
    repeat_last_n: number;
    /** The percentage of context to erase if the context window is exceeded.
     * @default 0.5
     * */
    context_erase: number;
 }
 /**
 * TODO: Help wanted to implement this
 */
 declare function createTokenStream(
    llmodel: LLModel,
    messages: PromptMessage[],
    options: CompletionOptions
 ): (ll: LLModel) => AsyncGenerator<string>;
 /**
 * From python api:
 * models will be stored in (homedir)/.cache/gpt4all/`
 */
 declare const DEFAULT_DIRECTORY: string;
 /**
 * From python api:
 * The default path for dynamic libraries to be stored.
 * You may separate paths by a semicolon to search in multiple areas.
 * This searches DEFAULT_DIRECTORY/libraries, cwd/libraries, and finally cwd.
 */
 declare const DEFAULT_LIBRARIES_DIRECTORY: string;
 interface PromptMessage {
    role: "system" | "assistant" | "user";
    content: string;
 }
 export {
    ModelType,
    ModelFile,
    LLModel,
    LLModelPromptContext,
    PromptMessage,
    CompletionOptions,
    LoadModelOptions,
    loadModel,
    createCompletion,
    createTokenStream,
    DEFAULT_DIRECTORY,
    DEFAULT_LIBRARIES_DIRECTORY,
 };
--- a/gpt4all-bindings/typescript/src/gpt4all.js
+++ b/gpt4all-bindings/typescript/src/gpt4all.js
@ -1,112 +1,138 @@
 "use strict";
 /// This file implements the gpt4all.d.ts file endings.
 /// Written in commonjs to support both ESM and CJS projects.
 const { existsSync } = require("fs");
 const path = require("node:path");
 const { LLModel } = require("node-gyp-build")(path.resolve(__dirname, ".."));
 const {
    retrieveModel,
    downloadModel,
    appendBinSuffixIfMissing,
 } = require("./util.js");
 const config = require("./config.js");
-const { LLModel } = require('bindings')('../build/Release/gpt4allts');
+async function loadModel(modelName, options = {}) {
-const { createWriteStream, existsSync } = require('fs');
+    const loadOptions = {
-const { join } = require('path');
+        modelPath: config.DEFAULT_DIRECTORY,
-const { performance } = require('node:perf_hooks');
+        librariesPath: config.DEFAULT_LIBRARIES_DIRECTORY,
-
+        allowDownload: true,
-
+        verbose: true,
-
+        ...options,
 // readChunks() reads from the provided reader and yields the results into an async iterable
 // https://css-tricks.com/web-streams-everywhere-and-fetch-for-node-js/
 function readChunks(reader) {
    return {
        async* [Symbol.asyncIterator]() {
            let readResult = await reader.read();
            while (!readResult.done) {
                yield readResult.value;
                readResult = await reader.read();
            }
        },
    };
    await retrieveModel(modelName, {
        modelPath: loadOptions.modelPath,
        allowDownload: loadOptions.allowDownload,
        verbose: loadOptions.verbose,
    });
    const libSearchPaths = loadOptions.librariesPath.split(";");
    let libPath = null;
    for (const searchPath of libSearchPaths) {
        if (existsSync(searchPath)) {
            libPath = searchPath;
            break;
        }
    }
    const llmOptions = {
        model_name: appendBinSuffixIfMissing(modelName),
        model_path: loadOptions.modelPath,
        library_path: libPath,
    };
    if (loadOptions.verbose) {
        console.log("Creating LLModel with options:", llmOptions);
    }
    const llmodel = new LLModel(llmOptions);
    return llmodel;
 }
-exports.LLModel = LLModel;
+function createPrompt(messages, hasDefaultHeader, hasDefaultFooter) {
    let fullPrompt = "";
    for (const message of messages) {
        if (message.role === "system") {
            const systemMessage = message.content + "\n";
            fullPrompt += systemMessage;
        }
    }
    if (hasDefaultHeader) {
        fullPrompt += `### Instruction: 
        The prompt below is a question to answer, a task to complete, or a conversation 
        to respond to; decide which and write an appropriate response.
        \n### Prompt: 
        `;
    }
    for (const message of messages) {
        if (message.role === "user") {
            const user_message = "\n" + message["content"];
            fullPrompt += user_message;
        }
        if (message["role"] == "assistant") {
            const assistant_message = "\nResponse: " + message["content"];
            fullPrompt += assistant_message;
        }
    }
    if (hasDefaultFooter) {
        fullPrompt += "\n### Response:";
    }
-exports.download = function (
+    return fullPrompt;
-    name,
+}
-    options = { debug: false, location: process.cwd(), link: undefined }
+
 async function createCompletion(
    llmodel,
    messages,
    options = {
        hasDefaultHeader: true,
        hasDefaultFooter: false,
        verbose: true,
    }
 ) {
    const abortController = new AbortController();
    const signal = abortController.signal;
    const pathToModel = join(options.location, name);
    if(existsSync(pathToModel)) {
        throw Error("Path to model already exists");
    }
    //wrapper function to get the readable stream from request
    const fetcher = (name) => fetch(options.link ?? `https://gpt4all.io/models/${name}`, {
        signal,
    })
    .then(res => {
         if(!res.ok) {
            throw Error("Could not find "+ name + " from " + `https://gpt4all.io/models/` )
         }
         return res.body.getReader()
    })
    //a promise that executes and writes to a stream. Resolves when done writing.
    const res = new Promise((resolve, reject) => {
        fetcher(name)
            //Resolves an array of a reader and writestream.
            .then(reader => [reader, createWriteStream(pathToModel)])
            .then( 
            async ([readable, wstream]) => {
                console.log('(CLI might hang) Downloading @ ', pathToModel);
                let perf;
                if(options.debug) {
                   perf = performance.now(); 
                }
                for await (const chunk of readChunks(readable)) {
                    wstream.write(chunk);
                }
                if(options.debug) {
                    console.log("Time taken: ", (performance.now()-perf).toFixed(2), " ms"); 
                }
                resolve();
            }
        ).catch(reject);
    });
    return {
        cancel : () => abortController.abort(),
        promise: () => res
    }
 }
 //https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates
 exports.prompt = function prompt(strings, ...keys) {
  return (...values) => {
    const dict = values[values.length - 1] || {};
    const result = [strings[0]];
    keys.forEach((key, i) => {
      const value = Number.isInteger(key) ? values[key] : dict[key];
      result.push(value, strings[i + 1]);
    });
    return result.join("");
  };
 }
 exports.createCompletion = function (llmodel, promptMaker, options) {
    //creating the keys to insert into promptMaker.
-    const entries = { 
+    const fullPrompt = createPrompt(
-        system: options.system ?? '',
+        messages,
-        header: options.header ?? "### Instruction: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.\n### Prompt: ",
+        options.hasDefaultHeader ?? true,
-        prompt: options.prompt,
+        options.hasDefaultFooter
-        ...(options.promptEntries ?? {})
+    );
-    };
+    if (options.verbose) {
-    
+        console.log("Sent: " + fullPrompt);
    const fullPrompt = promptMaker(entries)+'\n### Response:';
    if(options.verbose) {
       console.log("sending prompt: " + `"${fullPrompt}"`) 
    }
-    
+    const promisifiedRawPrompt = new Promise((resolve, rej) => {
-    return llmodel.raw_prompt(fullPrompt, options);
+        llmodel.raw_prompt(fullPrompt, options, (s) => {
            resolve(s);
        });
    });
    return promisifiedRawPrompt.then((response) => {
        return {
            llmodel: llmodel.name(),
            usage: {
                prompt_tokens: fullPrompt.length,
                completion_tokens: response.length, //TODO
                total_tokens: fullPrompt.length + response.length, //TODO
            },
            choices: [
                {
                    message: {
                        role: "assistant",
                        content: response,
                    },
                },
            ],
        };
    });
 }
 module.exports = {
    ...config,
    LLModel,
    createCompletion,
    downloadModel,
    retrieveModel,
    loadModel,
 };
--- a/gpt4all-bindings/typescript/src/util.d.ts
+++ b/gpt4all-bindings/typescript/src/util.d.ts
@ -0,0 +1,69 @@
 /// <reference types="node" />
 declare module "gpt4all";
 /**
 * Initiates the download of a model file of a specific model type.
 * By default this downloads without waiting. use the controller returned to alter this behavior.
 * @param {ModelFile} model - The model file to be downloaded.
 * @param {DownloadOptions} options - to pass into the downloader. Default is { location: (cwd), debug: false }.
 * @returns {DownloadController} object that allows controlling the download process.
 *
 * @throws {Error} If the model already exists in the specified location.
 * @throws {Error} If the model cannot be found at the specified url.
 *
 * @example
 * const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
 * controller.promise().then(() => console.log('Downloaded!'))
 */
 declare function downloadModel(
    modelName: string,
    options?: DownloadModelOptions
 ): DownloadController;
 /**
 * Options for the model download process.
 */
 export interface DownloadModelOptions {
    /**
     * location to download the model.
     * Default is process.cwd(), or the current working directory
     */
    modelPath?: string;
    /**
     * Debug mode -- check how long it took to download in seconds
     * @default false
     */
    debug?: boolean;
    /**
     * Remote download url. Defaults to `https://gpt4all.io/models`
     * @default https://gpt4all.io/models
     */
    url?: string;
 }
 declare function listModels(): Promise<Record<string, string>[]>;
 interface RetrieveModelOptions {
    allowDownload?: boolean;
    verbose?: boolean;
    modelPath?: string;
 }
 declare async function retrieveModel(
    model: string,
    options?: RetrieveModelOptions
 ): Promise<string>;
 /**
 * Model download controller.
 */
 interface DownloadController {
    /** Cancel the request to download from gpt4all website if this is called. */
    cancel: () => void;
    /** Convert the downloader into a promise, allowing people to await and manage its lifetime */
    promise: () => Promise<void>;
 }
 export { downloadModel, DownloadModelOptions, DownloadController, listModels, retrieveModel, RetrieveModelOptions };
--- a/gpt4all-bindings/typescript/src/util.js
+++ b/gpt4all-bindings/typescript/src/util.js
@ -0,0 +1,156 @@
 const { createWriteStream, existsSync } = require("fs");
 const { performance } = require("node:perf_hooks");
 const path = require("node:path");
 const {mkdirp} = require("mkdirp");
 const { DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } = require("./config.js");
 async function listModels() {
    const res = await fetch("https://gpt4all.io/models/models.json");
    const modelList = await res.json();
    return modelList;
 }
 function appendBinSuffixIfMissing(name) {
    if (!name.endsWith(".bin")) {
        return name + ".bin";
    }
    return name;
 }
 // readChunks() reads from the provided reader and yields the results into an async iterable
 // https://css-tricks.com/web-streams-everywhere-and-fetch-for-node-js/
 function readChunks(reader) {
    return {
        async *[Symbol.asyncIterator]() {
            let readResult = await reader.read();
            while (!readResult.done) {
                yield readResult.value;
                readResult = await reader.read();
            }
        },
    };
 }
 function downloadModel(
    modelName,
    options = {}
 ) {
    const downloadOptions = {
        modelPath: DEFAULT_DIRECTORY,
        debug: false,
        url: "https://gpt4all.io/models",
        ...options,
    };
    const modelFileName = appendBinSuffixIfMissing(modelName);
    const fullModelPath = path.join(downloadOptions.modelPath, modelFileName);
    const modelUrl = `${downloadOptions.url}/${modelFileName}`
    if (existsSync(fullModelPath)) {
        throw Error(`Model already exists at ${fullModelPath}`);
    }
    const abortController = new AbortController();
    const signal = abortController.signal;
    //wrapper function to get the readable stream from request
    // const baseUrl = options.url ?? "https://gpt4all.io/models";
    const fetchModel = () =>
        fetch(modelUrl, {
            signal,
        }).then((res) => {
            if (!res.ok) {
                throw Error(`Failed to download model from ${modelUrl} - ${res.statusText}`);
            }
            return res.body.getReader();
        });
    //a promise that executes and writes to a stream. Resolves when done writing.
    const res = new Promise((resolve, reject) => {
        fetchModel()
            //Resolves an array of a reader and writestream.
            .then((reader) => [reader, createWriteStream(fullModelPath)])
            .then(async ([readable, wstream]) => {
                console.log("Downloading @ ", fullModelPath);
                let perf;
                if (options.debug) {
                    perf = performance.now();
                }
                for await (const chunk of readChunks(readable)) {
                    wstream.write(chunk);
                }
                if (options.debug) {
                    console.log(
                        "Time taken: ",
                        (performance.now() - perf).toFixed(2),
                        " ms"
                    );
                }
                resolve(fullModelPath);
            })
            .catch(reject);
    });
    return {
        cancel: () => abortController.abort(),
        promise: () => res,
    };
 };
 async function retrieveModel (
    modelName,
    options = {}
 ) {
    const retrieveOptions = {
        modelPath: DEFAULT_DIRECTORY,
        allowDownload: true,
        verbose: true,
        ...options,
    };
    await mkdirp(retrieveOptions.modelPath);
    const modelFileName = appendBinSuffixIfMissing(modelName);
    const fullModelPath = path.join(retrieveOptions.modelPath, modelFileName);
    const modelExists = existsSync(fullModelPath);
    if (modelExists) {
        return fullModelPath;
    }
    if (!retrieveOptions.allowDownload) {
        throw Error(`Model does not exist at ${fullModelPath}`);
    }
    const availableModels = await listModels();
    const foundModel = availableModels.find((model) => model.filename === modelFileName);
    if (!foundModel) {
        throw Error(`Model "${modelName}" is not available.`);
    }
    if (retrieveOptions.verbose) {
        console.log(`Downloading ${modelName}...`);
    }
    const downloadController = downloadModel(modelName, {
        modelPath: retrieveOptions.modelPath,
        debug: retrieveOptions.verbose,
    });
    const downloadPath = await downloadController.promise();
    if (retrieveOptions.verbose) {
        console.log(`Model downloaded to ${downloadPath}`);
    }
    return downloadPath
 }
 module.exports = {
    appendBinSuffixIfMissing,
    downloadModel,
    retrieveModel,
 };
--- a/gpt4all-bindings/typescript/stdcapture.cc
+++ b/gpt4all-bindings/typescript/stdcapture.cc
@ -1,14 +0,0 @@
 #include "stdcapture.h"
 CoutRedirect::CoutRedirect() {
    old = std::cout.rdbuf(buffer.rdbuf());  // redirect cout to buffer stream
 }
 std::string CoutRedirect::getString() {
    return buffer.str();  // get string
 }
 CoutRedirect::~CoutRedirect() {
    std::cout.rdbuf(old);  // reverse redirect
 }
--- a/gpt4all-bindings/typescript/stdcapture.h
+++ b/gpt4all-bindings/typescript/stdcapture.h
@ -1,21 +0,0 @@
 //https://stackoverflow.com/questions/5419356/redirect-stdout-stderr-to-a-string
 #ifndef COUTREDIRECT_H
 #define COUTREDIRECT_H
 #include <iostream>
 #include <streambuf>
 #include <string>
 #include <sstream>
 class CoutRedirect {
 public:
    CoutRedirect();
    std::string getString();
    ~CoutRedirect();
 private:
    std::stringstream buffer;
    std::streambuf* old;
 };
 #endif  // COUTREDIRECT_H
--- a/gpt4all-bindings/typescript/test/index.mjs
+++ b/gpt4all-bindings/typescript/test/index.mjs
@ -1,38 +1,5 @@
 import * as assert from 'node:assert'
-import { prompt, download } from '../src/gpt4all.js'
+import { download } from '../src/gpt4all.js'
 {
    const somePrompt = prompt`${"header"} Hello joe, my name is Ron. ${"prompt"}`;
    assert.equal(
        somePrompt({ header: 'oompa', prompt: 'holy moly' }),   
        'oompa Hello joe, my name is Ron. holy moly'
    );
 }
 {
    const indexedPrompt = prompt`${0}, ${1} ${0}`;
    assert.equal(
        indexedPrompt('hello', 'world'),
        'hello, world hello'
    );
    assert.notEqual(
        indexedPrompt(['hello', 'world']),
        'hello, world hello'
    );
 }
 {
    assert.equal(
    (prompt`${"header"} ${"prompt"}`)({ header: 'hello', prompt: 'poo' }), 'hello poo',
    "Template prompt not equal"
    );
 }
 assert.rejects(async () => download('poo.bin').promise());
--- a/gpt4all-bindings/typescript/yarn.lock
+++ b/gpt4all-bindings/typescript/yarn.lock