typescript: publish alpha on npm and lots of cleanup, documentation, and more (#913)

* fix typo so padding can be accessed * Small cleanups for settings dialog. * Fix the build. * localdocs * Fixup the rescan. Fix debug output. * Add remove folder implementation. * Remove this signal as unnecessary for now. * Cleanup of the database, better chunking, better matching. * Add new reverse prompt for new localdocs context feature. * Add a new muted text color. * Turn off the debugging messages by default. * Add prompt processing and localdocs to the busy indicator in UI. * Specify a large number of suffixes we will search for now. * Add a collection list to support a UI. * Add a localdocs tab. * Start fleshing out the localdocs ui. * Begin implementing the localdocs ui in earnest. * Clean up the settings dialog for localdocs a bit. * Add more of the UI for selecting collections for chats. * Complete the settings for localdocs. * Adds the collections to serialize and implement references for localdocs. * Store the references separately so they are not sent to datalake. * Add context link to references. * Don't use the full path in reference text. * Various fixes to remove unnecessary warnings. * Add a newline * ignore rider and vscode dirs * create test project and basic model loading tests * make sample print usage and cleaner * Get the backend as well as the client building/working with msvc. * Libraries named differently on msvc. * Bump the version number. * This time remember to bump the version right after a release. * rm redundant json * More precise condition * Nicer handling of missing model directory. Correct exception message. * Log where the model was found * Concise model matching * reduce nesting, better error reporting * convert to f-strings * less magic number * 1. Cleanup the interrupted download 2. with-syntax * Redundant else * Do not ignore explicitly passed 4 threads * Correct return type * Add optional verbosity * Correct indentation of the multiline error message * one funcion to append .bin suffix * hotfix default verbose optioin * export hidden types and fix prompt() type * tiny typo (#739) * Update README.md (#738) * Update README.md fix golang gpt4all import path Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> * Update README.md Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> --------- Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> * fix(training instructions): model repo name (#728) Signed-off-by: Chase McDougall <chasemcdougall@hotmail.com> * C# Bindings - Prompt formatting (#712) * Added support for custom prompt formatting * more docs added * bump version * clean up cc files and revert things * LocalDocs documentation initial (#761) * LocalDocs documentation initial * Improved localdocs documentation (#762) * Improved localdocs documentation * Improved localdocs documentation * Improved localdocs documentation * Improved localdocs documentation * New tokenizer implementation for MPT and GPT-J Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling * buf_ref.into() can be const now * add tokenizer readme w/ instructions for convert script * Revert "add tokenizer readme w/ instructions for convert script" This reverts commit 9c15d1f83e. * Revert "buf_ref.into() can be const now" This reverts commit 840e011b75. * Revert "New tokenizer implementation for MPT and GPT-J" This reverts commit ee3469ba6c. * Fix remove model from model download for regular models. * Fixed formatting of localdocs docs (#770) * construct and return the correct reponse when the request is a chat completion * chore: update typings to keep consistent with python api * progress, updating createCompletion to mirror py api * update spec, unfinished backend * prebuild binaries for package distribution using prebuildify/node-gyp-build * Get rid of blocking behavior for regenerate response. * Add a label to the model loading visual indicator. * Use the new MyButton for the regenerate response button. * Add a hover and pressed to the visual indication of MyButton. * Fix wording of this accessible description. * Some color and theme enhancements to make the UI contrast a bit better. * Make the comboboxes align in UI. * chore: update namespace and fix prompt bug * fix linux build * add roadmap * Fix offset of prompt/response icons for smaller text. * Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. * Add a custom busy indicator to further align look and feel across platforms. * Draw the indicator for combobox to ensure it looks the same on all platforms. * Fix warning. * Use the proper text color for sending messages. * Fixup the plus new chat button. * Make all the toolbuttons highlight on hover. * Advanced avxonly autodetection (#744) * Advanced avxonly requirement detection * chore: support llamaversion >= 3 and ggml default * Dlopen better implementation management (Version 2) * Add fixme's and clean up a bit. * Documentation improvements on LocalDocs (#790) * Update gpt4all_chat.md Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * typo Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> --------- Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Adapt code * Makefile changes (WIP to test) * Debug * Adapt makefile * Style * Implemented logging mechanism (#785) * Cleaned up implementation management (#787) * Cleaned up implementation management * Initialize LLModel::m_implementation to nullptr * llmodel.h: Moved dlhandle fwd declare above LLModel class * Fix compile * Fixed double-free in LLModel::Implementation destructor * Allow user to specify custom search path via $GPT4ALL_IMPLEMENTATIONS_PATH (#789) * Drop leftover include * Add ldl in gpt4all.go for dynamic linking (#797) * Logger should also output to stderr * Fix MSVC Build, Update C# Binding Scripts * Update gpt4all_chat.md (#800) Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * C# Bindings - improved logging (#714) * added optional support for .NET logging * bump version and add missing alpha suffix * avoid creating additional namespace for extensions * prefer NullLogger/NullLoggerFactory over null-conditional ILogger to avoid errors --------- Signed-off-by: mvenditto <venditto.matteo@gmail.com> * Make localdocs work with server mode. * Better name for database results. * Fix for stale references after we regenerate. * Don't hardcode these. * Fix bug with resetting context with chatgpt model. * Trying to shrink the copy+paste code and do more code sharing between backend model impl. * Remove this as it is no longer useful. * Try and fix build on mac. * Fix mac build again. * Add models/release.json to github repo to allow PRs * Fixed spelling error in models.json to make CI happy Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * updated bindings code for updated C api * load all model libs * model creation is failing... debugging * load libs correctly * fixed finding model libs * cleanup * cleanup * more cleanup * small typo fix * updated binding.gyp * Fixed model type for GPT-J (#815) Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * Fixed tons of warnings and clazy findings (#811) * Some tweaks to UI to make window resizing smooth and flow nicely. * Min constraints on about dialog. * Prevent flashing of white on resize. * Actually use the theme dark color for window background. * Add the ability to change the directory via text field not just 'browse' button. * add scripts to build dlls * markdown doc gen * add scripts, nearly done moving breaking changes * merge with main * oops, fixed comment * more meaningful name * leave for testing * Only default mlock on macOS where swap seems to be a problem Repeating the change that once was done in https://github.com/nomic-ai/gpt4all/pull/663 but then was overriden by 9c6c09cbd2 Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com> * Add a collection immediately and show a placeholder + busy indicator in localdocs settings. * some tweaks to optional types and defaults * mingw script for windows compilation * Update README.md huggingface -> Hugging Face Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Backend prompt dedup (#822) * Deduplicated prompt() function code * Better error handling when the model fails to load. * We no longer have an avx_only repository and better error handling for minimum hardware requirements. (#833) * Update build_and_run.md (#834) Signed-off-by: AT <manyoso@users.noreply.github.com> * Trying out a new feature to download directly from huggingface. * Try again with the url. * Allow for download of models hosted on third party hosts. * Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. * Update to latest llama.cpp * Remove older models that are not as popular. (#837) * Remove older models that are not as popular. * Update models.json Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> --------- Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Update models.json (#838) Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Update models.json Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * feat: finalyl compiled on windows (MSVC) goadman * update README and spec and promisfy createCompletion * update d.ts * Make installers work with mac/windows for big backend change. * Need this so the linux installer packages it as a dependency. * Try and fix mac. * Fix compile on mac. * These need to be installed for them to be packaged and work for both mac and windows. * Fix installers for windows and linux. * Fix symbol resolution on windows. * updated pypi version * Release notes for version 2.4.5 (#853) * Update README.md (#854) Signed-off-by: AT <manyoso@users.noreply.github.com> * Documentation for model sideloading (#851) * Documentation for model sideloading Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Update gpt4all_chat.md Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> --------- Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Speculative fix for windows llama models with installer. * Revert "Speculative fix for windows llama models with installer." This reverts commit add725d1eb. * Revert "Fix bug with resetting context with chatgpt model." (#859) This reverts commit e0dcf6a14f. * Fix llama models on linux and windows. * Bump the version. * New release notes * Set thread counts after loading model (#836) * Update gpt4all_faq.md (#861) Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * Supports downloading officially supported models not hosted on gpt4all R2 * Replit Model (#713) * porting over replit code model to gpt4all * replaced memory with kv_self struct * continuing debug * welp it built but lot of sus things * working model loading and somewhat working generate.. need to format response? * revert back to semi working version * finally got rid of weird formatting * figured out problem is with python bindings - this is good to go for testing * addressing PR feedback * output refactor * fixed prompt reponse collection * cleanup * addressing PR comments * building replit backend with new ggmlver code * chatllm replit and clean python files * cleanup * updated replit to match new llmodel api * match llmodel api and change size_t to Token * resolve PR comments * replit model commit comment * Synced llama.cpp.cmake with upstream (#887) * Fix for windows. * fix: build script * Revert "Synced llama.cpp.cmake with upstream (#887)" This reverts commit 5c5e10c1f5. * Update README.md (#906) Add PyPI link and add clickable, more specific link to documentation Signed-off-by: Claudius Ellsel <claudius.ellsel@live.de> * Update CollectionsDialog.qml (#856) Phrasing for localdocs Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> * sampling: remove incorrect offset for n_vocab (#900) no effect, but avoids a *potential* bug later if we use actualVocabSize - which is for when a model has a larger embedding tensor/# of output logits than actually trained token to allow room for adding extras in finetuning - presently all of our models have had "placeholder" tokens in the vocab so this hasn't broken anything, but if the sizes did differ we want the equivalent of `logits[actualVocabSize:]` (the start point is unchanged), not `logits[-actualVocabSize:]` (this.) * non-llama: explicitly greedy sampling for temp<=0 (#901) copied directly from llama.cpp - without this temp=0.0 will just scale all the logits to infinity and give bad output * work on thread safety and cleaning up, adding object option * chore: cleanup tests and spec * refactor for object based startup * more docs * Circleci builds for Linux, Windows, and macOS for gpt4all-chat. * more docs * Synced llama.cpp.cmake with upstream * add lock file to ignore codespell * Move usage in Python bindings readme to own section (#907) Have own section for short usage example, as it is not specific to local build Signed-off-by: Claudius Ellsel <claudius.ellsel@live.de> * Always sync for circleci. * update models json with replit model * Forgot to bump. * Change the default values for generation in GUI * Removed double-static from variables in replit.cpp The anonymous namespace already makes it static. Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * Generator in Python Bindings - streaming yields tokens at a time (#895) * generator method * cleanup * bump version number for clarity * added replace in decode to avoid unicodedecode exception * revert back to _build_prompt * Do auto detection by default in C++ API Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> * remove comment * add comments for index.h * chore: add new models and edit ignore files and documentation * llama on Metal (#885) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> * Revert "llama on Metal (#885)" This reverts commit b59ce1c6e7. * add more readme stuff and debug info * spell * Metal+LLama take two (#929) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> * add prebuilts for windows * Add new solution for context links that does not force regular markdown (#938) in responses which is disruptive to code completions in responses. * add prettier * split out non llm related methods into util.js, add listModels method * add prebuild script for creating all platforms bindings at once * check in prebuild linux/so libs and allow distribution of napi prebuilds * apply autoformatter * move constants in config.js, add loadModel and retrieveModel methods * Clean up the context links a bit. * Don't interfere with selection. * Add code blocks and python syntax highlighting. * Spelling error. * Add c++/c highighting support. * Fix some bugs with bash syntax and add some C23 keywords. * Bugfixes for prompt syntax highlighting. * Try and fix a false positive from codespell. * When recalculating context we can't erase the BOS. * Fix Windows MSVC AVX builds - bug introduced in 557c82b5ed - currently getting: `warning C5102: ignoring invalid command-line macro definition '/arch:AVX2'` - solution is to use `_options(...)` not `_definitions(...)` * remove .so unneeded path --------- Signed-off-by: Nandakumar <nandagunasekaran@gmail.com> Signed-off-by: Chase McDougall <chasemcdougall@hotmail.com> Signed-off-by: Andriy Mulyar <andriy.mulyar@gmail.com> Signed-off-by: mvenditto <venditto.matteo@gmail.com> Signed-off-by: niansa/tuxifan <tuxifan@posteo.de> Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com> Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Signed-off-by: AT <manyoso@users.noreply.github.com> Signed-off-by: Claudius Ellsel <claudius.ellsel@live.de> Co-authored-by: Justin Wang <justinwang46@gmail.com> Co-authored-by: Adam Treat <treat.adam@gmail.com> Co-authored-by: redthing1 <redthing1@alt.icu> Co-authored-by: Konstantin Gukov <gukkos@gmail.com> Co-authored-by: Richard Guo <richardg7890@gmail.com> Co-authored-by: Joseph Mearman <joseph@mearman.co.uk> Co-authored-by: Nandakumar <nandagunasekaran@gmail.com> Co-authored-by: Chase McDougall <chasemcdougall@hotmail.com> Co-authored-by: mvenditto <venditto.matteo@gmail.com> Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com> Co-authored-by: Aaron Miller <apage43@ninjawhale.com> Co-authored-by: FoivosC <christoulakis.foivos@adlittle.com> Co-authored-by: limez <limez@protonmail.com> Co-authored-by: AT <manyoso@users.noreply.github.com> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de> Co-authored-by: niansa <anton-sa@web.de> Co-authored-by: mudler <mudler@mocaccino.org> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Tim Miller <innerlogic4321@gmail.com> Co-authored-by: Peter Gagarinov <pgagarinov@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Claudius Ellsel <claudius.ellsel@live.de> Co-authored-by: pingpongching <golololologol02@gmail.com> Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: Cosmic Snow <cosmic-snow@mailfence.com>
2024-10-01 01:06:10 -04:00 · 2023-06-12 14:00:20 -05:00 · 2023-06-12 14:00:20 -05:00 · 8d53614444
commit 8d53614444
parent 44bf91855d
25 changed files with 4073 additions and 486 deletions
--- a/.codespellrc
+++ b/.codespellrc
@ -1,3 +1,3 @@
 [codespell]
 ignore-words-list = blong, belong
-skip = .git,*.pdf,*.svg
+skip = .git,*.pdf,*.svg,*.lock
--- a/gpt4all-bindings/typescript/.gitignore
+++ b/gpt4all-bindings/typescript/.gitignore
@ -1,2 +1,3 @@
 node_modules/
 build/
+prebuilds/
--- a/gpt4all-bindings/typescript/.npmignore
+++ b/gpt4all-bindings/typescript/.npmignore
@ -1,3 +1,4 @@
 test/
 spec/
-
+scripts/
+build
--- a/gpt4all-bindings/typescript/README.md
+++ b/gpt4all-bindings/typescript/README.md
@ -2,12 +2,32 @@
 The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.

 - created by [jacoobes](https://github.com/jacoobes) and [nomic ai](https://home.nomic.ai) :D, for all to use.
- will maintain this repository when possible, new feature requests will be handled through nomic

+### Code (alpha)
+```js
+import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } from '../src/gpt4all.js'
+
+const ll = new LLModel({
+    model_name: 'ggml-vicuna-7b-1.1-q4_2.bin',
+    model_path: './', 
+    library_path: DEFAULT_LIBRARIES_DIRECTORY
+});
+
+const response = await createCompletion(ll, [
+    { role : 'system', content: 'You are meant to be annoying and unhelpful.'  },
+    { role : 'user', content: 'What is 1 + 1?'  } 
+]);
+
+```
+### API 
+- The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
+- [docs](./docs/api.md)
 ### Build Instructions

- As of 05/21/2023, Tested on windows (MSVC) only. (somehow got it to work on MSVC 🤯)
+- As of 05/21/2023, Tested on windows (MSVC). (somehow got it to work on MSVC 🤯)
    - binding.gyp is compile config
+- Tested on Ubuntu. Everything seems to work fine
+- MingW works as well to build the gpt4all-backend. HOWEVER, this package works only with MSVC built dlls.

 ### Requirements
 - git
@ -31,6 +51,15 @@ cd gpt4all-bindings/typescript
 ```sh
 git submodule update --init --depth 1 --recursive
 ```
+**AS OF NEW BACKEND** to build the backend,
+```sh
+yarn build:backend
+```
+This will build platform-dependent dynamic libraries, and will be located in runtimes/(platform)/native The only current way to use them is to put them in the current working directory of your application. That is, **WHEREVER YOU RUN YOUR NODE APPLICATION**
+- llama-xxxx.dll is required.
+- According to whatever model you are using, you'll need to select the proper model loader.
+    - For example, if you running an Mosaic MPT model, you will need to select the mpt-(buildvariant).(dynamiclibrary)
+
 ### Test
 ```sh
 yarn test
@ -48,9 +77,22 @@ yarn test

 #### spec/
 - Average look and feel of the api
- Should work assuming a model is installed locally in working directory
+- Should work assuming a model and libraries are installed locally in working directory

 #### index.cc
 - The bridge between nodejs and c. Where the bindings are.
+#### prompt.cc 
+- Handling prompting and inference of models in a threadsafe, asynchronous way.
+#### docs/
+- Autogenerated documentation using the script `yarn docs:build`

+### Roadmap
+This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
+
+- [x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
+- [ ] createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)
+- [ ] proper unit testing (integrate with circle ci)
+- [ ] publish to npm under alpha tag `gpt4all@alpha`
+- [ ] have more people test on other platforms (mac tester needed)
+- [x] switch to new pluggable backend

--- a/gpt4all-bindings/typescript/binding.gyp
+++ b/gpt4all-bindings/typescript/binding.gyp
@ -1,45 +1,55 @@
 {
  "targets": [
    {
-      "target_name": "gpt4allts", # gpt4all-ts will cause compile error
-      "cflags!": [ "-fno-exceptions" ],
-      "cflags_cc!": [ "-fno-exceptions" ],
+      "target_name": "gpt4all", # gpt4all-ts will cause compile error
+      "cflags_cc!": [ "-fno-exceptions"],
      "include_dirs": [
        "<!@(node -p \"require('node-addon-api').include\")",
-        "../../gpt4all-backend/llama.cpp/", # need to include llama.cpp because the include paths for examples/common.h include llama.h relatively
        "../../gpt4all-backend",
      ],
-      "sources": [ # is there a better way to do this 
-        "../../gpt4all-backend/llama.cpp/examples/common.cpp",
-        "../../gpt4all-backend/llama.cpp/ggml.c",
-        "../../gpt4all-backend/llama.cpp/llama.cpp",
-        "../../gpt4all-backend/utils.cpp", 
+      "sources": [ 
+        # PREVIOUS VERSION: had to required the sources, but with newest changes do not need to
+        #"../../gpt4all-backend/llama.cpp/examples/common.cpp",
+        #"../../gpt4all-backend/llama.cpp/ggml.c",
+        #"../../gpt4all-backend/llama.cpp/llama.cpp",
+        # "../../gpt4all-backend/utils.cpp", 
        "../../gpt4all-backend/llmodel_c.cpp",
-        "../../gpt4all-backend/gptj.cpp",
-        "../../gpt4all-backend/llamamodel.cpp",
-        "../../gpt4all-backend/mpt.cpp",
-        "stdcapture.cc",
+        "../../gpt4all-backend/llmodel.cpp",
+        "prompt.cc",
        "index.cc",
       ],
      "conditions": [
        ['OS=="mac"', {
            'defines': [
-                'NAPI_CPP_EXCEPTIONS'
-            ],
+                'LIB_FILE_EXT=".dylib"',
+                'NAPI_CPP_EXCEPTIONS',
+            ]
        }],
        ['OS=="win"', {
            'defines': [
+                'LIB_FILE_EXT=".dll"',
                'NAPI_CPP_EXCEPTIONS',
-                "__AVX2__" # allows SIMD: https://discord.com/channels/1076964370942267462/1092290790388150272/1107564673957630023
            ],
            "msvs_settings": {
                "VCCLCompilerTool": {
                    "AdditionalOptions": [
                        "/std:c++20",
-                        "/EHsc"
-                    ], 
-                },  
+                        "/EHsc",
+                  ], 
+                },
            },
+        }],
+        ['OS=="linux"', {
+            'defines': [
+                'LIB_FILE_EXT=".so"',
+                'NAPI_CPP_EXCEPTIONS',
+            ],
+            'cflags_cc!': [
+                '-fno-rtti',
+            ],
+            'cflags_cc': [
+                '-std=c++20'
+            ]
        }]
      ]
    }]
--- a/gpt4all-bindings/typescript/docs/api.md
+++ b/gpt4all-bindings/typescript/docs/api.md
@ -0,0 +1,623 @@
+<!-- Generated by documentation.js. Update this documentation by updating the source code. -->
+
+### Table of Contents
+
+*   [download][1]
+    *   [Parameters][2]
+    *   [Examples][3]
+*   [DownloadOptions][4]
+    *   [location][5]
+    *   [debug][6]
+    *   [url][7]
+*   [DownloadController][8]
+    *   [cancel][9]
+    *   [promise][10]
+*   [ModelType][11]
+*   [ModelFile][12]
+    *   [gptj][13]
+    *   [llama][14]
+    *   [mpt][15]
+*   [type][16]
+*   [LLModel][17]
+    *   [constructor][18]
+        *   [Parameters][19]
+    *   [type][20]
+    *   [name][21]
+    *   [stateSize][22]
+    *   [threadCount][23]
+    *   [setThreadCount][24]
+        *   [Parameters][25]
+    *   [raw\_prompt][26]
+        *   [Parameters][27]
+    *   [isModelLoaded][28]
+    *   [setLibraryPath][29]
+        *   [Parameters][30]
+    *   [getLibraryPath][31]
+*   [createCompletion][32]
+    *   [Parameters][33]
+    *   [Examples][34]
+*   [CompletionOptions][35]
+    *   [verbose][36]
+    *   [hasDefaultHeader][37]
+    *   [hasDefaultFooter][38]
+*   [PromptMessage][39]
+    *   [role][40]
+    *   [content][41]
+*   [prompt\_tokens][42]
+*   [completion\_tokens][43]
+*   [total\_tokens][44]
+*   [CompletionReturn][45]
+    *   [model][46]
+    *   [usage][47]
+    *   [choices][48]
+*   [CompletionChoice][49]
+    *   [message][50]
+*   [LLModelPromptContext][51]
+    *   [logits\_size][52]
+    *   [tokens\_size][53]
+    *   [n\_past][54]
+    *   [n\_ctx][55]
+    *   [n\_predict][56]
+    *   [top\_k][57]
+    *   [top\_p][58]
+    *   [temp][59]
+    *   [n\_batch][60]
+    *   [repeat\_penalty][61]
+    *   [repeat\_last\_n][62]
+    *   [context\_erase][63]
+*   [createTokenStream][64]
+    *   [Parameters][65]
+*   [DEFAULT\_DIRECTORY][66]
+*   [DEFAULT\_LIBRARIES\_DIRECTORY][67]
+
+## download
+
+Initiates the download of a model file of a specific model type.
+By default this downloads without waiting. use the controller returned to alter this behavior.
+
+### Parameters
+
+*   `model` **[ModelFile][12]** The model file to be downloaded.
+*   `options` **[DownloadOptions][4]** to pass into the downloader. Default is { location: (cwd), debug: false }.
+
+### Examples
+
+```javascript
+const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
+controller.promise().then(() => console.log('Downloaded!'))
+```
+
+*   Throws **[Error][68]** If the model already exists in the specified location.
+*   Throws **[Error][68]** If the model cannot be found at the specified url.
+
+Returns **[DownloadController][8]** object that allows controlling the download process.
+
+## DownloadOptions
+
+Options for the model download process.
+
+### location
+
+location to download the model.
+Default is process.cwd(), or the current working directory
+
+Type: [string][69]
+
+### debug
+
+Debug mode -- check how long it took to download in seconds
+
+Type: [boolean][70]
+
+### url
+
+Remote download url. Defaults to `https://gpt4all.io/models`
+
+Type: [string][69]
+
+## DownloadController
+
+Model download controller.
+
+### cancel
+
+Cancel the request to download from gpt4all website if this is called.
+
+Type: function (): void
+
+### promise
+
+Convert the downloader into a promise, allowing people to await and manage its lifetime
+
+Type: function (): [Promise][71]\<void>
+
+## ModelType
+
+Type of the model
+
+Type: (`"gptj"` | `"llama"` | `"mpt"`)
+
+## ModelFile
+
+Full list of models available
+
+### gptj
+
+List of GPT-J Models
+
+Type: (`"ggml-gpt4all-j-v1.3-groovy.bin"` | `"ggml-gpt4all-j-v1.2-jazzy.bin"` | `"ggml-gpt4all-j-v1.1-breezy.bin"` | `"ggml-gpt4all-j.bin"`)
+
+### llama
+
+List Llama Models
+
+Type: (`"ggml-gpt4all-l13b-snoozy.bin"` | `"ggml-vicuna-7b-1.1-q4_2.bin"` | `"ggml-vicuna-13b-1.1-q4_2.bin"` | `"ggml-wizardLM-7B.q4_2.bin"` | `"ggml-stable-vicuna-13B.q4_2.bin"` | `"ggml-nous-gpt4-vicuna-13b.bin"`)
+
+### mpt
+
+List of MPT Models
+
+Type: (`"ggml-mpt-7b-base.bin"` | `"ggml-mpt-7b-chat.bin"` | `"ggml-mpt-7b-instruct.bin"`)
+
+## type
+
+Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
+
+Type: [ModelType][11]
+
+## LLModel
+
+LLModel class representing a language model.
+This is a base class that provides common functionality for different types of language models.
+
+### constructor
+
+Initialize a new LLModel.
+
+#### Parameters
+
+*   `path` **[string][69]** Absolute path to the model file.
+
+<!---->
+
+*   Throws **[Error][68]** If the model file does not exist.
+
+### type
+
+either 'gpt', mpt', or 'llama' or undefined
+
+Returns **([ModelType][11] | [undefined][72])**&#x20;
+
+### name
+
+The name of the model.
+
+Returns **[ModelFile][12]**&#x20;
+
+### stateSize
+
+Get the size of the internal state of the model.
+NOTE: This state data is specific to the type of model you have created.
+
+Returns **[number][73]** the size in bytes of the internal state of the model
+
+### threadCount
+
+Get the number of threads used for model inference.
+The default is the number of physical cores your computer has.
+
+Returns **[number][73]** The number of threads used for model inference.
+
+### setThreadCount
+
+Set the number of threads used for model inference.
+
+#### Parameters
+
+*   `newNumber` **[number][73]** The new number of threads.
+
+Returns **void**&#x20;
+
+### raw\_prompt
+
+Prompt the model with a given input and optional parameters.
+This is the raw output from std out.
+Use the prompt function exported for a value
+
+#### Parameters
+
+*   `q` **[string][69]** The prompt input.
+*   `params` **Partial<[LLModelPromptContext][51]>?** Optional parameters for the prompt context.
+
+Returns **any** The result of the model prompt.
+
+### isModelLoaded
+
+Whether the model is loaded or not.
+
+Returns **[boolean][70]**&#x20;
+
+### setLibraryPath
+
+Where to search for the pluggable backend libraries
+
+#### Parameters
+
+*   `s` **[string][69]**&#x20;
+
+Returns **void**&#x20;
+
+### getLibraryPath
+
+Where to get the pluggable backend libraries
+
+Returns **[string][69]**&#x20;
+
+## createCompletion
+
+The nodejs equivalent to python binding's chat\_completion
+
+### Parameters
+
+*   `llmodel` **[LLModel][17]** The language model object.
+*   `messages` **[Array][74]<[PromptMessage][39]>** The array of messages for the conversation.
+*   `options` **[CompletionOptions][35]** The options for creating the completion.
+
+### Examples
+
+```javascript
+const llmodel = new LLModel(model)
+const messages = [ 
+{ role: 'system', message: 'You are a weather forecaster.' },
+{ role: 'user', message: 'should i go out today?' } ]
+const completion = await createCompletion(llmodel, messages, {
+ verbose: true,
+ temp: 0.9,
+})
+console.log(completion.choices[0].message.content)
+// No, it's going to be cold and rainy.
+```
+
+Returns **[CompletionReturn][45]** The completion result.
+
+## CompletionOptions
+
+**Extends Partial\<LLModelPromptContext>**
+
+The options for creating the completion.
+
+### verbose
+
+Indicates if verbose logging is enabled.
+
+Type: [boolean][70]
+
+### hasDefaultHeader
+
+Indicates if the default header is included in the prompt.
+
+Type: [boolean][70]
+
+### hasDefaultFooter
+
+Indicates if the default footer is included in the prompt.
+
+Type: [boolean][70]
+
+## PromptMessage
+
+A message in the conversation, identical to OpenAI's chat message.
+
+### role
+
+The role of the message.
+
+Type: (`"system"` | `"assistant"` | `"user"`)
+
+### content
+
+The message content.
+
+Type: [string][69]
+
+## prompt\_tokens
+
+The number of tokens used in the prompt.
+
+Type: [number][73]
+
+## completion\_tokens
+
+The number of tokens used in the completion.
+
+Type: [number][73]
+
+## total\_tokens
+
+The total number of tokens used.
+
+Type: [number][73]
+
+## CompletionReturn
+
+The result of the completion, similar to OpenAI's format.
+
+### model
+
+The model name.
+
+Type: [ModelFile][12]
+
+### usage
+
+Token usage report.
+
+Type: {prompt\_tokens: [number][73], completion\_tokens: [number][73], total\_tokens: [number][73]}
+
+### choices
+
+The generated completions.
+
+Type: [Array][74]<[CompletionChoice][49]>
+
+## CompletionChoice
+
+A completion choice, similar to OpenAI's format.
+
+### message
+
+Response message
+
+Type: [PromptMessage][39]
+
+## LLModelPromptContext
+
+Model inference arguments for generating completions.
+
+### logits\_size
+
+The size of the raw logits vector.
+
+Type: [number][73]
+
+### tokens\_size
+
+The size of the raw tokens vector.
+
+Type: [number][73]
+
+### n\_past
+
+The number of tokens in the past conversation.
+
+Type: [number][73]
+
+### n\_ctx
+
+The number of tokens possible in the context window.
+
+Type: [number][73]
+
+### n\_predict
+
+The number of tokens to predict.
+
+Type: [number][73]
+
+### top\_k
+
+The top-k logits to sample from.
+
+Type: [number][73]
+
+### top\_p
+
+The nucleus sampling probability threshold.
+
+Type: [number][73]
+
+### temp
+
+The temperature to adjust the model's output distribution.
+
+Type: [number][73]
+
+### n\_batch
+
+The number of predictions to generate in parallel.
+
+Type: [number][73]
+
+### repeat\_penalty
+
+The penalty factor for repeated tokens.
+
+Type: [number][73]
+
+### repeat\_last\_n
+
+The number of last tokens to penalize.
+
+Type: [number][73]
+
+### context\_erase
+
+The percentage of context to erase if the context window is exceeded.
+
+Type: [number][73]
+
+## createTokenStream
+
+TODO: Help wanted to implement this
+
+### Parameters
+
+*   `llmodel` **[LLModel][17]**&#x20;
+*   `messages` **[Array][74]<[PromptMessage][39]>**&#x20;
+*   `options` **[CompletionOptions][35]**&#x20;
+
+Returns **function (ll: [LLModel][17]): AsyncGenerator<[string][69]>**&#x20;
+
+## DEFAULT\_DIRECTORY
+
+From python api:
+models will be stored in (homedir)/.cache/gpt4all/\`
+
+Type: [string][69]
+
+## DEFAULT\_LIBRARIES\_DIRECTORY
+
+From python api:
+The default path for dynamic libraries to be stored.
+You may separate paths by a semicolon to search in multiple areas.
+This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd.
+
+Type: [string][69]
+
+[1]: #download
+
+[2]: #parameters
+
+[3]: #examples
+
+[4]: #downloadoptions
+
+[5]: #location
+
+[6]: #debug
+
+[7]: #url
+
+[8]: #downloadcontroller
+
+[9]: #cancel
+
+[10]: #promise
+
+[11]: #modeltype
+
+[12]: #modelfile
+
+[13]: #gptj
+
+[14]: #llama
+
+[15]: #mpt
+
+[16]: #type
+
+[17]: #llmodel
+
+[18]: #constructor
+
+[19]: #parameters-1
+
+[20]: #type-1
+
+[21]: #name
+
+[22]: #statesize
+
+[23]: #threadcount
+
+[24]: #setthreadcount
+
+[25]: #parameters-2
+
+[26]: #raw_prompt
+
+[27]: #parameters-3
+
+[28]: #ismodelloaded
+
+[29]: #setlibrarypath
+
+[30]: #parameters-4
+
+[31]: #getlibrarypath
+
+[32]: #createcompletion
+
+[33]: #parameters-5
+
+[34]: #examples-1
+
+[35]: #completionoptions
+
+[36]: #verbose
+
+[37]: #hasdefaultheader
+
+[38]: #hasdefaultfooter
+
+[39]: #promptmessage
+
+[40]: #role
+
+[41]: #content
+
+[42]: #prompt_tokens
+
+[43]: #completion_tokens
+
+[44]: #total_tokens
+
+[45]: #completionreturn
+
+[46]: #model
+
+[47]: #usage
+
+[48]: #choices
+
+[49]: #completionchoice
+
+[50]: #message
+
+[51]: #llmodelpromptcontext
+
+[52]: #logits_size
+
+[53]: #tokens_size
+
+[54]: #n_past
+
+[55]: #n_ctx
+
+[56]: #n_predict
+
+[57]: #top_k
+
+[58]: #top_p
+
+[59]: #temp
+
+[60]: #n_batch
+
+[61]: #repeat_penalty
+
+[62]: #repeat_last_n
+
+[63]: #context_erase
+
+[64]: #createtokenstream
+
+[65]: #parameters-6
+
+[66]: #default_directory
+
+[67]: #default_libraries_directory
+
+[68]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error
+
+[69]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String
+
+[70]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean
+
+[71]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise
+
+[72]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined
+
+[73]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number
+
+[74]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array
--- a/gpt4all-bindings/typescript/index.cc
+++ b/gpt4all-bindings/typescript/index.cc
@ -1,68 +1,95 @@
-#include <napi.h>
-#include <iostream>
-#include "llmodel_c.h" 
-#include "llmodel.h"
-#include "gptj.h"
-#include "llamamodel.h"
-#include "mpt.h"
-#include "stdcapture.h"
+#include "index.h"

-class NodeModelWrapper : public Napi::ObjectWrap<NodeModelWrapper> {
-public:
-  static Napi::Object Init(Napi::Env env, Napi::Object exports) {
-    Napi::Function func = DefineClass(env, "LLModel", {
-      InstanceMethod("type",  &NodeModelWrapper::getType),
-      InstanceMethod("name", &NodeModelWrapper::getName),
-      InstanceMethod("stateSize", &NodeModelWrapper::StateSize),
-      InstanceMethod("raw_prompt", &NodeModelWrapper::Prompt),
-      InstanceMethod("setThreadCount", &NodeModelWrapper::SetThreadCount),
-      InstanceMethod("threadCount", &NodeModelWrapper::ThreadCount),
+Napi::FunctionReference NodeModelWrapper::constructor;
+
+Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
+    Napi::Function self = DefineClass(env, "LLModel", {
+       InstanceMethod("type",  &NodeModelWrapper::getType),
+       InstanceMethod("isModelLoaded", &NodeModelWrapper::IsModelLoaded),
+       InstanceMethod("name", &NodeModelWrapper::getName),
+       InstanceMethod("stateSize", &NodeModelWrapper::StateSize),
+       InstanceMethod("raw_prompt", &NodeModelWrapper::Prompt),
+       InstanceMethod("setThreadCount", &NodeModelWrapper::SetThreadCount),
+       InstanceMethod("threadCount", &NodeModelWrapper::ThreadCount),
+       InstanceMethod("getLibraryPath", &NodeModelWrapper::GetLibraryPath),
    });
-
-    Napi::FunctionReference* constructor = new Napi::FunctionReference();
-    *constructor = Napi::Persistent(func);
-    env.SetInstanceData(constructor);
-
-    exports.Set("LLModel", func);
-    return exports;
+    // Keep a static reference to the constructor
+    //
+    constructor = Napi::Persistent(self);
+    constructor.SuppressDestruct();
+    return self;
  }
-
-  Napi::Value getType(const Napi::CallbackInfo& info) 
+ 
+  Napi::Value NodeModelWrapper::getType(const Napi::CallbackInfo& info) 
  {
+    if(type.empty()) {
+        return info.Env().Undefined();
+    } 
    return Napi::String::New(info.Env(), type);
  }

-  NodeModelWrapper(const Napi::CallbackInfo& info) : Napi::ObjectWrap<NodeModelWrapper>(info) 
+  NodeModelWrapper::NodeModelWrapper(const Napi::CallbackInfo& info) : Napi::ObjectWrap<NodeModelWrapper>(info) 
  {
    auto env = info.Env();
-    std::string weights_path = info[0].As<Napi::String>().Utf8Value();
+    fs::path model_path;

-    const char *c_weights_path = weights_path.c_str();
-    
-    inference_ = create_model_set_type(c_weights_path);
+    std::string full_weight_path;
+    //todo
+    std::string library_path = ".";
+    std::string model_name;
+    if(info[0].IsString()) {
+        model_path = info[0].As<Napi::String>().Utf8Value();
+        full_weight_path = model_path.string();
+        std::cout << "DEPRECATION: constructor accepts object now. Check docs for more.\n";
+    } else {
+        auto config_object = info[0].As<Napi::Object>();
+        model_name = config_object.Get("model_name").As<Napi::String>();
+        model_path = config_object.Get("model_path").As<Napi::String>().Utf8Value(); 
+        if(config_object.Has("model_type")) {
+            type = config_object.Get("model_type").As<Napi::String>(); 
+        }
+        full_weight_path = (model_path / fs::path(model_name)).string();
+        
+        if(config_object.Has("library_path")) {
+            library_path = config_object.Get("library_path").As<Napi::String>(); 
+        } else {
+            library_path = ".";
+        }
+    }
+    llmodel_set_implementation_search_path(library_path.c_str());
+    llmodel_error* e = nullptr;
+    inference_ = std::make_shared<llmodel_model>(llmodel_model_create2(full_weight_path.c_str(), "auto", e));
+    if(e != nullptr) {
+       Napi::Error::New(env, e->message).ThrowAsJavaScriptException(); 
+       return;
+    }
+    if(GetInference() == nullptr) {
+       std::cerr << "Tried searching libraries in \"" << library_path << "\"" <<  std::endl;
+       std::cerr << "Tried searching for model weight in \"" << full_weight_path << "\"" << std::endl;
+       Napi::Error::New(env, "Had an issue creating llmodel object, inference is null").ThrowAsJavaScriptException(); 
+       return;
+    }

-    auto success = llmodel_loadModel(inference_, c_weights_path);
+    auto success = llmodel_loadModel(GetInference(), full_weight_path.c_str());
    if(!success) {
        Napi::Error::New(env, "Failed to load model at given path").ThrowAsJavaScriptException(); 
        return;
    }
-    name = weights_path.substr(weights_path.find_last_of("/\\") + 1);
-    
+    name = model_name.empty() ? model_path.filename().string() : model_name;
  };
-  ~NodeModelWrapper() {
-    // destroying the model manually causes exit code 3221226505, why?
-    // However, bindings seem to operate fine without destructing pointer
-    //llmodel_model_destroy(inference_);
+  //NodeModelWrapper::~NodeModelWrapper() {
+    //GetInference().reset();
+  //}
+
+  Napi::Value NodeModelWrapper::IsModelLoaded(const Napi::CallbackInfo& info) {
+    return Napi::Boolean::New(info.Env(), llmodel_isModelLoaded(GetInference()));
  }

-  Napi::Value IsModelLoaded(const Napi::CallbackInfo& info) {
-    return Napi::Boolean::New(info.Env(), llmodel_isModelLoaded(inference_));
-  }
-
-  Napi::Value StateSize(const Napi::CallbackInfo& info) {
+  Napi::Value NodeModelWrapper::StateSize(const Napi::CallbackInfo& info) {
    // Implement the binding for the stateSize method
-    return Napi::Number::New(info.Env(), static_cast<int64_t>(llmodel_get_state_size(inference_)));
+    return Napi::Number::New(info.Env(), static_cast<int64_t>(llmodel_get_state_size(GetInference())));
  }
+  

 /**
 * Generate a response using the model.
@ -73,16 +100,14 @@ public:
 * @param recalculate_callback A callback function for handling recalculation requests.
 * @param ctx A pointer to the llmodel_prompt_context structure.
 */
-  Napi::Value Prompt(const Napi::CallbackInfo& info) {
-
+  Napi::Value NodeModelWrapper::Prompt(const Napi::CallbackInfo& info) {
    auto env = info.Env();
-
    std::string question;
    if(info[0].IsString()) {
        question = info[0].As<Napi::String>().Utf8Value();
    } else {
-        Napi::Error::New(env, "invalid string argument").ThrowAsJavaScriptException();
-        return env.Undefined();
+        Napi::Error::New(info.Env(), "invalid string argument").ThrowAsJavaScriptException();
+        return info.Env().Undefined();
    }
    //defaults copied from python bindings
    llmodel_prompt_context promptContext = {
@ -101,127 +126,90 @@ public:
         };
    if(info[1].IsObject())
    {
-        auto inputObject = info[1].As<Napi::Object>();
+       auto inputObject = info[1].As<Napi::Object>();
             
        // Extract and assign the properties
-        if (inputObject.Has("logits") || inputObject.Has("tokens")) {
-            Napi::Error::New(env, "Invalid input: 'logits' or 'tokens' properties are not allowed").ThrowAsJavaScriptException();
-            return env.Undefined();
-        }
+       if (inputObject.Has("logits") || inputObject.Has("tokens")) {
+           Napi::Error::New(info.Env(), "Invalid input: 'logits' or 'tokens' properties are not allowed").ThrowAsJavaScriptException();
+           return info.Env().Undefined();
+       }
             // Assign the remaining properties
-             if(inputObject.Has("n_past")) {
-                 promptContext.n_past = inputObject.Get("n_past").As<Napi::Number>().Int32Value();
-             }
-             if(inputObject.Has("n_ctx")) {
-                 promptContext.n_ctx = inputObject.Get("n_ctx").As<Napi::Number>().Int32Value();
-             }
-             if(inputObject.Has("n_predict")) {
-                 promptContext.n_predict = inputObject.Get("n_predict").As<Napi::Number>().Int32Value();
-             }
-             if(inputObject.Has("top_k")) {
-                 promptContext.top_k = inputObject.Get("top_k").As<Napi::Number>().Int32Value();
-             }
-             if(inputObject.Has("top_p")) {
-                 promptContext.top_p = inputObject.Get("top_p").As<Napi::Number>().FloatValue();
-             }
-             if(inputObject.Has("temp")) {
-                 promptContext.temp = inputObject.Get("temp").As<Napi::Number>().FloatValue();
-             }
-             if(inputObject.Has("n_batch")) {
-                 promptContext.n_batch = inputObject.Get("n_batch").As<Napi::Number>().Int32Value();
-             }
-             if(inputObject.Has("repeat_penalty")) {
-                 promptContext.repeat_penalty = inputObject.Get("repeat_penalty").As<Napi::Number>().FloatValue();
-             }
-             if(inputObject.Has("repeat_last_n")) {
-                 promptContext.repeat_last_n = inputObject.Get("repeat_last_n").As<Napi::Number>().Int32Value();
-             }
-             if(inputObject.Has("context_erase")) {
-                 promptContext.context_erase = inputObject.Get("context_erase").As<Napi::Number>().FloatValue();
-             }
+       if(inputObject.Has("n_past")) 
+            promptContext.n_past = inputObject.Get("n_past").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("n_ctx")) 
+            promptContext.n_ctx = inputObject.Get("n_ctx").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("n_predict"))
+            promptContext.n_predict = inputObject.Get("n_predict").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("top_k"))
+            promptContext.top_k = inputObject.Get("top_k").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("top_p")) 
+            promptContext.top_p = inputObject.Get("top_p").As<Napi::Number>().FloatValue();
+       if(inputObject.Has("temp")) 
+            promptContext.temp = inputObject.Get("temp").As<Napi::Number>().FloatValue();
+       if(inputObject.Has("n_batch")) 
+            promptContext.n_batch = inputObject.Get("n_batch").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("repeat_penalty")) 
+            promptContext.repeat_penalty = inputObject.Get("repeat_penalty").As<Napi::Number>().FloatValue();
+       if(inputObject.Has("repeat_last_n")) 
+            promptContext.repeat_last_n = inputObject.Get("repeat_last_n").As<Napi::Number>().Int32Value();
+       if(inputObject.Has("context_erase")) 
+            promptContext.context_erase = inputObject.Get("context_erase").As<Napi::Number>().FloatValue();
    }
-    //    custom callbacks are weird with the gpt4all c bindings: I need to turn Napi::Functions into  raw c function pointers,
-    //    but it doesn't seem like its possible? (TODO, is it possible?)
+    //copy to protect llmodel resources when splitting to new thread

-    //    if(info[1].IsFunction()) {
-    //        Napi::Callback cb = *info[1].As<Napi::Function>();
-    //    }
-
-
-    // For now, simple capture of stdout
-    // possible TODO: put this on a libuv async thread. (AsyncWorker)
-    CoutRedirect cr;
-    llmodel_prompt(inference_, question.c_str(), &prompt_callback, &response_callback, &recalculate_callback,  &promptContext);
-    return Napi::String::New(env, cr.getString());
+    llmodel_prompt_context copiedPrompt = promptContext;
+    std::string copiedQuestion = question;
+    PromptWorkContext pc = {
+        copiedQuestion,
+        inference_.load(),
+        copiedPrompt,
+    };
+    auto threadSafeContext = new TsfnContext(env, pc);
+    threadSafeContext->tsfn = Napi::ThreadSafeFunction::New(
+        env,                                    // Environment
+        info[2].As<Napi::Function>(),           // JS function from caller
+        "PromptCallback",                       // Resource name
+        0,                                      // Max queue size (0 = unlimited).
+        1,                                      // Initial thread count
+        threadSafeContext,                      // Context,
+        FinalizerCallback,                      // Finalizer
+        (void*)nullptr                          // Finalizer data
+    );
+    threadSafeContext->nativeThread = std::thread(threadEntry, threadSafeContext);
+    return threadSafeContext->deferred_.Promise();
  }

-  void SetThreadCount(const Napi::CallbackInfo& info) {
+  void NodeModelWrapper::SetThreadCount(const Napi::CallbackInfo& info) {
    if(info[0].IsNumber()) {
-        llmodel_setThreadCount(inference_, info[0].As<Napi::Number>().Int64Value());
+        llmodel_setThreadCount(GetInference(), info[0].As<Napi::Number>().Int64Value());
    } else {
        Napi::Error::New(info.Env(), "Could not set thread count: argument 1 is NaN").ThrowAsJavaScriptException(); 
        return;
    }
  }
-  Napi::Value getName(const Napi::CallbackInfo& info) {
+
+  Napi::Value NodeModelWrapper::getName(const Napi::CallbackInfo& info) {
    return Napi::String::New(info.Env(), name);
  }
-  Napi::Value ThreadCount(const Napi::CallbackInfo& info) {
-    return Napi::Number::New(info.Env(), llmodel_threadCount(inference_));
+  Napi::Value NodeModelWrapper::ThreadCount(const Napi::CallbackInfo& info) {
+    return Napi::Number::New(info.Env(), llmodel_threadCount(GetInference()));
  }

-private:
-  llmodel_model inference_;
-  std::string type;
-  std::string name;
-
-
-  //wrapper cb to capture output into stdout.then, CoutRedirect captures this 
-  // and writes it to a file
-  static bool response_callback(int32_t tid, const char* resp) 
-  {
-    if(tid != -1) {
-        std::cout<<std::string(resp);
-        return true;
-    }
-    return false;
+  Napi::Value NodeModelWrapper::GetLibraryPath(const Napi::CallbackInfo& info) {
+      return Napi::String::New(info.Env(),
+        llmodel_get_implementation_search_path());
  }

-  static bool prompt_callback(int32_t tid) { return true; }
-  static bool recalculate_callback(bool isrecalculating) { return  isrecalculating; }
-  // Had to use this instead of the c library in order 
-  // set the type of the model loaded.
-  // causes side effect: type is mutated;
-  llmodel_model create_model_set_type(const char* c_weights_path) 
-  {
-
-    uint32_t magic;
-    llmodel_model model;
-    FILE *f = fopen(c_weights_path, "rb");
-    fread(&magic, sizeof(magic), 1, f);
-
-    if (magic == 0x67676d6c) {
-        model = llmodel_gptj_create();  
-        type = "gptj";
-    }
-    else if (magic == 0x67676a74) {
-        model = llmodel_llama_create(); 
-        type = "llama";
-    }
-    else if (magic == 0x67676d6d) {
-        model = llmodel_mpt_create();   
-        type = "mpt";
-    }
-    else  {fprintf(stderr, "Invalid model file\n");}
-    fclose(f);
-    
-    return model;
+  llmodel_model NodeModelWrapper::GetInference() {
+    return *inference_.load();
  }
-};

 //Exports Bindings
 Napi::Object Init(Napi::Env env, Napi::Object exports) {
-  return NodeModelWrapper::Init(env, exports);
+  exports["LLModel"] = NodeModelWrapper::GetClass(env);
+  return exports;
 }

+
+
 NODE_API_MODULE(NODE_GYP_MODULE_NAME, Init)
--- a/gpt4all-bindings/typescript/index.h
+++ b/gpt4all-bindings/typescript/index.h
@ -0,0 +1,45 @@
+#include <napi.h>
+#include "llmodel.h"
+#include <iostream>
+#include "llmodel_c.h" 
+#include "prompt.h"
+#include <atomic>
+#include <memory>
+#include <filesystem>
+namespace fs = std::filesystem;
+
+class NodeModelWrapper: public Napi::ObjectWrap<NodeModelWrapper> {
+public:
+  NodeModelWrapper(const Napi::CallbackInfo &);
+  //~NodeModelWrapper();
+  Napi::Value getType(const Napi::CallbackInfo& info);
+  Napi::Value IsModelLoaded(const Napi::CallbackInfo& info);
+  Napi::Value StateSize(const Napi::CallbackInfo& info);
+  /**
+   * Prompting the model. This entails spawning a new thread and adding the response tokens
+   * into a thread local string variable.
+   */
+  Napi::Value Prompt(const Napi::CallbackInfo& info);
+  void SetThreadCount(const Napi::CallbackInfo& info);
+  Napi::Value getName(const Napi::CallbackInfo& info);
+  Napi::Value ThreadCount(const Napi::CallbackInfo& info);
+  /*
+   * The path that is used to search for the dynamic libraries
+   */
+  Napi::Value GetLibraryPath(const Napi::CallbackInfo& info);
+  /**
+   * Creates the LLModel class
+   */
+  static Napi::Function GetClass(Napi::Env);
+  llmodel_model GetInference();
+private:
+  /**
+   * The underlying inference that interfaces with the C interface
+   */
+  std::atomic<std::shared_ptr<llmodel_model>> inference_;
+
+  std::string type;
+  // corresponds to LLModel::name() in typescript
+  std::string name;
+  static Napi::FunctionReference constructor;
+};
--- a/gpt4all-bindings/typescript/package.json
+++ b/gpt4all-bindings/typescript/package.json
@ -1,19 +1,32 @@
 {
-  "name": "gpt4all-ts",
+  "name": "gpt4all",
+  "version": "2.0.0",
  "packageManager": "yarn@3.5.1",
-  "gypfile": true,
+  "main": "src/gpt4all.js",
+  "repository": "nomic-ai/gpt4all",
  "scripts": {
-    "test": "node ./test/index.mjs"
+    "test": "node ./test/index.mjs",
+    "build:backend": "node scripts/build.js",
+    "install": "node-gyp-build",
+    "prebuild": "node scripts/prebuild.js",
+    "docs:build": "documentation build ./src/gpt4all.d.ts --parse-extension d.ts --format md --output docs/api.md"
  },
  "dependencies": {
-    "bindings": "^1.5.0",
-    "node-addon-api": "^6.1.0"
+    "mkdirp": "^3.0.1",
+    "node-addon-api": "^6.1.0",
+    "node-gyp-build": "^4.6.0"
  },
  "devDependencies": {
-    "@types/node": "^20.1.5"
+    "@types/node": "^20.1.5",
+    "documentation": "^14.0.2",
+    "prebuildify": "^5.0.1",
+    "prettier": "^2.8.8"
  },
  "engines": {
    "node": ">= 18.x.x"
+  },
+  "prettier": {
+    "endOfLine": "lf",
+    "tabWidth": 4
  }
-
 }
--- a/gpt4all-bindings/typescript/prompt.cc
+++ b/gpt4all-bindings/typescript/prompt.cc
@ -0,0 +1,62 @@
+#include "prompt.h"
+
+
+TsfnContext::TsfnContext(Napi::Env env, const PromptWorkContext& pc) 
+    : deferred_(Napi::Promise::Deferred::New(env)), pc(pc) {
+}
+
+std::mutex mtx;
+static thread_local std::string res;
+bool response_callback(int32_t token_id, const char *response) {
+   res+=response;
+   return token_id != -1;
+}
+bool recalculate_callback (bool isrecalculating) {
+    return isrecalculating; 
+};
+bool prompt_callback (int32_t tid) {
+    return true; 
+};
+
+// The thread entry point. This takes as its arguments the specific
+// threadsafe-function context created inside the main thread.
+void threadEntry(TsfnContext* context) {
+  std::lock_guard<std::mutex> lock(mtx);
+  // Perform a call into JavaScript.
+  napi_status status =
+    context->tsfn.NonBlockingCall(&context->pc,
+    [](Napi::Env env, Napi::Function jsCallback, PromptWorkContext* pc) {
+        llmodel_prompt(
+            *pc->inference_,
+            pc->question.c_str(),
+            &prompt_callback,
+            &response_callback,
+            &recalculate_callback,
+            &pc->prompt_params
+        );
+        jsCallback.Call({ Napi::String::New(env, res)} );
+        res.clear();
+  });
+
+  if (status != napi_ok) {
+    Napi::Error::Fatal(
+        "ThreadEntry",
+        "Napi::ThreadSafeNapi::Function.NonBlockingCall() failed");
+  }
+
+  // Release the thread-safe function. This decrements the internal thread
+  // count, and will perform finalization since the count will reach 0.
+  context->tsfn.Release();
+}
+
+void FinalizerCallback(Napi::Env env,
+                       void* finalizeData,
+                       TsfnContext* context) {
+  // Join the thread
+  context->nativeThread.join();
+  // Resolve the Promise previously returned to JS via the CreateTSFN method.
+  context->deferred_.Resolve(Napi::Boolean::New(env, true));
+  delete context;
+}
+
+
--- a/gpt4all-bindings/typescript/prompt.h
+++ b/gpt4all-bindings/typescript/prompt.h
@ -0,0 +1,42 @@
+#ifndef TSFN_CONTEXT_H
+#define TSFN_CONTEXT_H
+
+#include "napi.h"
+#include "llmodel_c.h"
+#include <thread>
+#include <mutex>
+#include <iostream>
+#include <atomic>
+#include <memory>
+struct PromptWorkContext {
+    std::string question;
+    std::shared_ptr<llmodel_model> inference_;
+    llmodel_prompt_context prompt_params;
+};
+
+struct TsfnContext {
+public:
+  TsfnContext(Napi::Env env, const PromptWorkContext &pc);
+  std::thread nativeThread;
+  Napi::Promise::Deferred deferred_;
+  PromptWorkContext pc;
+  Napi::ThreadSafeFunction tsfn;
+
+  // Some data to pass around
+  // int ints[ARRAY_LENGTH];
+
+};
+
+// The thread entry point. This takes as its arguments the specific
+// threadsafe-function context created inside the main thread.
+void threadEntry(TsfnContext* context);
+
+// The thread-safe function finalizer callback. This callback executes
+// at destruction of thread-safe function, taking as arguments the finalizer
+// data and threadsafe-function context.
+void FinalizerCallback(Napi::Env env, void* finalizeData, TsfnContext* context);
+
+bool response_callback(int32_t token_id, const char *response);
+bool recalculate_callback (bool isrecalculating);
+bool prompt_callback (int32_t tid); 
+#endif  // TSFN_CONTEXT_H
--- a/gpt4all-bindings/typescript/scripts/build.js
+++ b/gpt4all-bindings/typescript/scripts/build.js
@ -0,0 +1,17 @@
+const { spawn } = require("node:child_process");
+const { resolve } = require("path");
+const args = process.argv.slice(2);
+const platform = process.platform;
+
+//windows 64bit or 32
+if (platform === "win32") {
+    const path = "scripts/build_msvc.bat";
+    spawn(resolve(path), ["/Y", ...args], { shell: true, stdio: "inherit" });
+    process.on("data", (s) => console.log(s.toString()));
+} else if (platform === "linux" || platform === "darwin") {
+    const path = "scripts/build_unix.sh";
+    const bash = spawn(`sh`, [path, ...args]);
+    bash.stdout.on("data", (s) => console.log(s.toString()), {
+        stdio: "inherit",
+    });
+}
--- a/gpt4all-bindings/typescript/scripts/build_mingw.ps1
+++ b/gpt4all-bindings/typescript/scripts/build_mingw.ps1
@ -0,0 +1,16 @@
+$ROOT_DIR = '.\runtimes\win-x64'
+$BUILD_DIR = '.\runtimes\win-x64\build\mingw'
+$LIBS_DIR = '.\runtimes\win-x64\native'
+
+# cleanup env
+Remove-Item -Force -Recurse $ROOT_DIR -ErrorAction SilentlyContinue | Out-Null
+mkdir $BUILD_DIR | Out-Null
+mkdir $LIBS_DIR  | Out-Null
+
+# build
+cmake -G "MinGW Makefiles" -S ..\..\gpt4all-backend -B $BUILD_DIR -DLLAMA_AVX2=ON
+cmake --build $BUILD_DIR --parallel --config Release
+
+# copy native dlls
+# cp "C:\ProgramData\chocolatey\lib\mingw\tools\install\mingw64\bin\*dll" $LIBS_DIR
+cp "$BUILD_DIR\bin\*.dll" $LIBS_DIR
--- a/gpt4all-bindings/typescript/scripts/build_unix.sh
+++ b/gpt4all-bindings/typescript/scripts/build_unix.sh
@ -0,0 +1,31 @@
+#!/bin/sh
+
+SYSNAME=$(uname -s)
+
+if [ "$SYSNAME" = "Linux" ]; then
+  BASE_DIR="runtimes/linux-x64"
+  LIB_EXT="so"
+elif [ "$SYSNAME" = "Darwin" ]; then
+  BASE_DIR="runtimes/osx"
+  LIB_EXT="dylib"
+elif [ -n "$SYSNAME" ]; then
+  echo "Unsupported system: $SYSNAME" >&2
+  exit 1
+else
+  echo "\"uname -s\" failed" >&2
+  exit 1
+fi
+
+NATIVE_DIR="$BASE_DIR/native"
+BUILD_DIR="$BASE_DIR/build"
+
+rm -rf "$BASE_DIR"
+mkdir -p "$NATIVE_DIR" "$BUILD_DIR"
+
+cmake -S ../../gpt4all-backend -B "$BUILD_DIR" &&
+cmake --build "$BUILD_DIR" -j --config Release && {
+  cp "$BUILD_DIR"/libllmodel.$LIB_EXT "$NATIVE_DIR"/
+  cp "$BUILD_DIR"/libgptj*.$LIB_EXT   "$NATIVE_DIR"/
+  cp "$BUILD_DIR"/libllama*.$LIB_EXT  "$NATIVE_DIR"/
+  cp "$BUILD_DIR"/libmpt*.$LIB_EXT    "$NATIVE_DIR"/
+}
--- a/gpt4all-bindings/typescript/scripts/prebuild.js
+++ b/gpt4all-bindings/typescript/scripts/prebuild.js
@ -0,0 +1,50 @@
+const prebuildify = require("prebuildify");
+
+async function createPrebuilds(combinations) {
+    for (const { platform, arch } of combinations) {
+        const opts = {
+            platform,
+            arch,
+            napi: true,
+        };
+        try {
+            await createPrebuild(opts);
+            console.log(
+                `Build succeeded for platform ${opts.platform} and architecture ${opts.arch}`
+            );
+        } catch (err) {
+            console.error(
+                `Error building for platform ${opts.platform} and architecture ${opts.arch}:`,
+                err
+            );
+        }
+    }
+}
+
+function createPrebuild(opts) {
+    return new Promise((resolve, reject) => {
+        prebuildify(opts, (err) => {
+            if (err) {
+                reject(err);
+            } else {
+                resolve();
+            }
+        });
+    });
+}
+
+const prebuildConfigs = [
+    { platform: "win32", arch: "x64" },
+    { platform: "win32", arch: "arm64" },
+    // { platform: 'win32', arch: 'armv7' },
+    { platform: "darwin", arch: "x64" },
+    { platform: "darwin", arch: "arm64" },
+    // { platform: 'darwin', arch: 'armv7' },
+    { platform: "linux", arch: "x64" },
+    { platform: "linux", arch: "arm64" },
+    { platform: "linux", arch: "armv7" },
+];
+
+createPrebuilds(prebuildConfigs)
+    .then(() => console.log("All builds succeeded"))
+    .catch((err) => console.error("Error building:", err));
--- a/gpt4all-bindings/typescript/spec/index.mjs
+++ b/gpt4all-bindings/typescript/spec/index.mjs
@ -1,14 +1,15 @@
-import { LLModel, prompt, createCompletion } from '../src/gpt4all.js'
-
-
-
-const ll = new LLModel("./ggml-vicuna-7b-1.1-q4_2.bin");
+import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } from '../src/gpt4all.js'

+const ll = new LLModel({
+    model_name: 'ggml-vicuna-7b-1.1-q4_2.bin',
+    model_path: './', 
+    library_path: DEFAULT_LIBRARIES_DIRECTORY
+});

 try {
   class Extended extends LLModel {
-        
   }
+
 } catch(e) {
    console.log("Extending from native class gone wrong " + e)
 }
@ -20,13 +21,26 @@ ll.setThreadCount(5);
 console.log("thread count " + ll.threadCount());
 ll.setThreadCount(4);
 console.log("thread count " + ll.threadCount());
+console.log("name " + ll.name());
+console.log("type: " + ll.type());
+console.log("Default directory for models", DEFAULT_DIRECTORY);
+console.log("Default directory for libraries", DEFAULT_LIBRARIES_DIRECTORY);

-
-console.log(createCompletion(
+console.log(await createCompletion(
    ll,
-    prompt`${"header"} ${"prompt"}`, {
-        verbose: true,
-        prompt: 'hello! Say something thought provoking.'
-    }
+    [
+        { role : 'system', content: 'You are a girl who likes playing league of legends.'  },
+        { role : 'user', content: 'What is the best top laner to play right now?'  }, 
+    ],
+    { verbose: false}
 ));

+
+console.log(await createCompletion(
+    ll,
+    [
+        { role : 'user', content: 'What is the best bottom laner to play right now?'  }, 
+    ],
+))
+
+
--- a/gpt4all-bindings/typescript/src/config.js
+++ b/gpt4all-bindings/typescript/src/config.js
@ -0,0 +1,22 @@
+const os = require("node:os");
+const path = require("node:path");
+
+const DEFAULT_DIRECTORY = path.resolve(os.homedir(), ".cache/gpt4all");
+
+const librarySearchPaths = [
+    path.join(DEFAULT_DIRECTORY, "libraries"),
+    path.resolve("./libraries"),
+    path.resolve(
+        __dirname,
+        "..",
+        `runtimes/${process.platform}-${process.arch}/native`
+    ),
+    process.cwd(),
+];
+
+const DEFAULT_LIBRARIES_DIRECTORY = librarySearchPaths.join(";");
+
+module.exports = {
+    DEFAULT_DIRECTORY,
+    DEFAULT_LIBRARIES_DIRECTORY,
+};
--- a/gpt4all-bindings/typescript/src/gpt4all.d.ts
+++ b/gpt4all-bindings/typescript/src/gpt4all.d.ts
@ -1,162 +1,310 @@
 /// <reference types="node" />
-declare module 'gpt4all-ts';
+declare module "gpt4all";

+export * from "./util.d.ts";

+/** Type of the model */
+type ModelType = "gptj" | "llama" | "mpt" | "replit";

-
-interface LLModelPromptContext {
-  
-  // Size of the raw logits vector
-  logits_size: number;
-  
-  // Size of the raw tokens vector
-  tokens_size: number;
-  
-  // Number of tokens in past conversation
-  n_past: number;
-  
-  // Number of tokens possible in context window
-  n_ctx: number;
-  
-  // Number of tokens to predict
-  n_predict: number;
-  
-  // Top k logits to sample from
-  top_k: number;
-  
-  // Nucleus sampling probability threshold
-  top_p: number;
-  
-  // Temperature to adjust model's output distribution
-  temp: number;
-  
-  // Number of predictions to generate in parallel
-  n_batch: number;
-  
-  // Penalty factor for repeated tokens
-  repeat_penalty: number;
-  
-  // Last n tokens to penalize
-  repeat_last_n: number;
-  
-  // Percent of context to erase if we exceed the context window
-  context_erase: number;
+/**
+ * Full list of models available
+ */
+interface ModelFile {
+    /** List of GPT-J Models */
+    gptj:
+        | "ggml-gpt4all-j-v1.3-groovy.bin"
+        | "ggml-gpt4all-j-v1.2-jazzy.bin"
+        | "ggml-gpt4all-j-v1.1-breezy.bin"
+        | "ggml-gpt4all-j.bin";
+    /** List Llama Models */
+    llama:
+        | "ggml-gpt4all-l13b-snoozy.bin"
+        | "ggml-vicuna-7b-1.1-q4_2.bin"
+        | "ggml-vicuna-13b-1.1-q4_2.bin"
+        | "ggml-wizardLM-7B.q4_2.bin"
+        | "ggml-stable-vicuna-13B.q4_2.bin"
+        | "ggml-nous-gpt4-vicuna-13b.bin"
+        | "ggml-v3-13b-hermes-q5_1.bin";
+    /** List of MPT Models */
+    mpt:
+        | "ggml-mpt-7b-base.bin"
+        | "ggml-mpt-7b-chat.bin"
+        | "ggml-mpt-7b-instruct.bin";
+    /** List of Replit Models */
+    replit: "ggml-replit-code-v1-3b.bin";
 }

-
+//mirrors py options
+interface LLModelOptions {
+    /**
+     * Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
+     */
+    type?: ModelType;
+    model_name: ModelFile[ModelType];
+    model_path: string;
+    library_path?: string;
+}
 /**
 * LLModel class representing a language model.
 * This is a base class that provides common functionality for different types of language models.
 */
 declare class LLModel {
-    //either 'gpt', mpt', or 'llama'
-    type() : ModelType;
-    //The name of the model
-    name(): ModelFile;
+    /**
+     * Initialize a new LLModel.
+     * @param path Absolute path to the model file.
+     * @throws {Error} If the model file does not exist.
+     */
    constructor(path: string);
+    constructor(options: LLModelOptions);
+
+    /** either 'gpt', mpt', or 'llama' or undefined */
+    type(): ModelType | undefined;
+
+    /** The name of the model. */
+    name(): ModelFile;
+
    /**
     * Get the size of the internal state of the model.
     * NOTE: This state data is specific to the type of model you have created.
     * @return the size in bytes of the internal state of the model
     */
    stateSize(): number;
+
    /**
     * Get the number of threads used for model inference.
     * The default is the number of physical cores your computer has.
     * @returns The number of threads used for model inference.
     */
-    threadCount() : number;
+    threadCount(): number;
+
    /**
     * Set the number of threads used for model inference.
     * @param newNumber The new number of threads.
     */
    setThreadCount(newNumber: number): void;
-     /**
-      * Prompt the model with a given input and optional parameters.
-      * This is the raw output from std out.
-      * Use the prompt function exported for a value 
-      * @param q The prompt input.
-      * @param params Optional parameters for the prompt context.
-      * @returns The result of the model prompt.
-     */
-    raw_prompt(q: string, params?: Partial<LLModelPromptContext>) : unknown; //todo work on return type
-    
-}

-interface DownloadController {
-    //Cancel the request to download from gpt4all website if this is called.
-    cancel: () => void;
-    //Convert the downloader into a promise, allowing people to await and manage its lifetime
-    promise: () => Promise<void>
-}
-
-
-export interface DownloadConfig {
    /**
-     * location to download the model.
-     * Default is process.cwd(), or the current working directory
+     * Prompt the model with a given input and optional parameters.
+     * This is the raw output from std out.
+     * Use the prompt function exported for a value
+     * @param q The prompt input.
+     * @param params Optional parameters for the prompt context.
+     * @returns The result of the model prompt.
     */
-    location: string;
+    raw_prompt(q: string, params: Partial<LLModelPromptContext>, callback: (res: string) => void): void; // TODO work on return type
+
    /**
-     * Debug mode -- check how long it took to download in seconds
+     * Whether the model is loaded or not.
     */
-    debug: boolean;
+    isModelLoaded(): boolean;
+
    /**
-     * Default link = https://gpt4all.io/models`
-     * This property overrides the default.
+     * Where to search for the pluggable backend libraries
     */
-    link?: string
-}
-/**
- * Initiates the download of a model file of a specific model type.
- * By default this downloads without waiting. use the controller returned to alter this behavior.
- * @param {ModelFile[ModelType]} m - The model file to be downloaded.
- * @param {Record<string, unknown>} op - options to pass into the downloader. Default is { location: (cwd), debug: false }.
- * @returns {DownloadController} A DownloadController object that allows controlling the download process.
- */
-declare function download(m: ModelFile[ModelType], op: { location: string, debug: boolean, link?:string }): DownloadController 
-
-
-type ModelType = 'gptj' | 'llama' | 'mpt';
-
-/*
- * A nice interface for intellisense of all possibly models. 
- */
-interface ModelFile {
-    'gptj': | "ggml-gpt4all-j-v1.3-groovy.bin"
-            | "ggml-gpt4all-j-v1.2-jazzy.bin"
-            | "ggml-gpt4all-j-v1.1-breezy.bin"
-            | "ggml-gpt4all-j.bin";
-    'llama':| "ggml-gpt4all-l13b-snoozy.bin"
-            | "ggml-vicuna-7b-1.1-q4_2.bin"
-            | "ggml-vicuna-13b-1.1-q4_2.bin"
-            | "ggml-wizardLM-7B.q4_2.bin"
-            | "ggml-stable-vicuna-13B.q4_2.bin"
-            | "ggml-nous-gpt4-vicuna-13b.bin"
-    'mpt':  | "ggml-mpt-7b-base.bin"
-            | "ggml-mpt-7b-chat.bin"
-            | "ggml-mpt-7b-instruct.bin"
+    setLibraryPath(s: string): void;
+    /**
+     * Where to get the pluggable backend libraries
+     */
+    getLibraryPath(): string;
 }

-interface ExtendedOptions {
+interface LoadModelOptions {
+    modelPath?: string;
+    librariesPath?: string;
+    allowDownload?: boolean;
    verbose?: boolean;
-    system?: string;
-    header?: string;
-    prompt: string;
-    promptEntries?: Record<string, unknown>
 }

-type PromptTemplate = (...args: string[]) => string;
+declare function loadModel(
+    modelName: string,
+    options?: LoadModelOptions
+): Promise<LLModel>;

+/**
+ * The nodejs equivalent to python binding's chat_completion
+ * @param {LLModel} llmodel - The language model object.
+ * @param {PromptMessage[]} messages - The array of messages for the conversation.
+ * @param {CompletionOptions} options - The options for creating the completion.
+ * @returns {CompletionReturn} The completion result.
+ * @example
+ * const llmodel = new LLModel(model)
+ * const messages = [
+ * { role: 'system', message: 'You are a weather forecaster.' },
+ * { role: 'user', message: 'should i go out today?' } ]
+ * const completion = await createCompletion(llmodel, messages, {
+ *  verbose: true,
+ *  temp: 0.9,
+ * })
+ * console.log(completion.choices[0].message.content)
+ * // No, it's going to be cold and rainy.
+ */
 declare function createCompletion(
-    model: LLModel,
-    pt: PromptTemplate,
-    options: LLModelPromptContext&ExtendedOptions 
-) : string
+    llmodel: LLModel,
+    messages: PromptMessage[],
+    options?: CompletionOptions
+): Promise<CompletionReturn>;

-function prompt(
-    strings: TemplateStringsArray
-): PromptTemplate
+/**
+ * The options for creating the completion.
+ */
+interface CompletionOptions extends Partial<LLModelPromptContext> {
+    /**
+     * Indicates if verbose logging is enabled.
+     * @default true
+     */
+    verbose?: boolean;

+    /**
+     * Indicates if the default header is included in the prompt.
+     * @default true
+     */
+    hasDefaultHeader?: boolean;

-export { LLModel, LLModelPromptContext, ModelType, download, DownloadController, prompt, ExtendedOptions, createCompletion }
+    /**
+     * Indicates if the default footer is included in the prompt.
+     * @default true
+     */
+    hasDefaultFooter?: boolean;
+}
+
+/**
+ * A message in the conversation, identical to OpenAI's chat message.
+ */
+interface PromptMessage {
+    /** The role of the message. */
+    role: "system" | "assistant" | "user";
+
+    /** The message content. */
+    content: string;
+}
+
+/**
+ * The result of the completion, similar to OpenAI's format.
+ */
+interface CompletionReturn {
+    /** The model name.
+     * @type {ModelFile}
+     */
+    model: ModelFile[ModelType];
+
+    /** Token usage report. */
+    usage: {
+        /** The number of tokens used in the prompt. */
+        prompt_tokens: number;
+
+        /** The number of tokens used in the completion. */
+        completion_tokens: number;
+
+        /** The total number of tokens used. */
+        total_tokens: number;
+    };
+
+    /** The generated completions. */
+    choices: CompletionChoice[];
+}
+
+/**
+ * A completion choice, similar to OpenAI's format.
+ */
+interface CompletionChoice {
+    /** Response message */
+    message: PromptMessage;
+}
+
+/**
+ * Model inference arguments for generating completions.
+ */
+interface LLModelPromptContext {
+    /** The size of the raw logits vector. */
+    logits_size: number;
+
+    /** The size of the raw tokens vector. */
+    tokens_size: number;
+
+    /** The number of tokens in the past conversation. */
+    n_past: number;
+
+    /** The number of tokens possible in the context window.
+     * @default 1024
+     */
+    n_ctx: number;
+
+    /** The number of tokens to predict.
+     * @default 128
+     * */
+    n_predict: number;
+
+    /** The top-k logits to sample from.
+     * @default 40
+     * */
+    top_k: number;
+
+    /** The nucleus sampling probability threshold.
+     * @default 0.9
+     * */
+    top_p: number;
+
+    /** The temperature to adjust the model's output distribution.
+     * @default 0.72
+     * */
+    temp: number;
+
+    /** The number of predictions to generate in parallel.
+     * @default 8
+     * */
+    n_batch: number;
+
+    /** The penalty factor for repeated tokens.
+     * @default 1
+     * */
+    repeat_penalty: number;
+
+    /** The number of last tokens to penalize.
+     * @default 10
+     * */
+    repeat_last_n: number;
+
+    /** The percentage of context to erase if the context window is exceeded.
+     * @default 0.5
+     * */
+    context_erase: number;
+}
+
+/**
+ * TODO: Help wanted to implement this
+ */
+declare function createTokenStream(
+    llmodel: LLModel,
+    messages: PromptMessage[],
+    options: CompletionOptions
+): (ll: LLModel) => AsyncGenerator<string>;
+/**
+ * From python api:
+ * models will be stored in (homedir)/.cache/gpt4all/`
+ */
+declare const DEFAULT_DIRECTORY: string;
+/**
+ * From python api:
+ * The default path for dynamic libraries to be stored.
+ * You may separate paths by a semicolon to search in multiple areas.
+ * This searches DEFAULT_DIRECTORY/libraries, cwd/libraries, and finally cwd.
+ */
+declare const DEFAULT_LIBRARIES_DIRECTORY: string;
+interface PromptMessage {
+    role: "system" | "assistant" | "user";
+    content: string;
+}
+export {
+    ModelType,
+    ModelFile,
+    LLModel,
+    LLModelPromptContext,
+    PromptMessage,
+    CompletionOptions,
+    LoadModelOptions,
+    loadModel,
+    createCompletion,
+    createTokenStream,
+    DEFAULT_DIRECTORY,
+    DEFAULT_LIBRARIES_DIRECTORY,
+};
--- a/gpt4all-bindings/typescript/src/gpt4all.js
+++ b/gpt4all-bindings/typescript/src/gpt4all.js
@ -1,112 +1,138 @@
+"use strict";
+
 /// This file implements the gpt4all.d.ts file endings.
 /// Written in commonjs to support both ESM and CJS projects.
+const { existsSync } = require("fs");
+const path = require("node:path");
+const { LLModel } = require("node-gyp-build")(path.resolve(__dirname, ".."));
+const {
+    retrieveModel,
+    downloadModel,
+    appendBinSuffixIfMissing,
+} = require("./util.js");
+const config = require("./config.js");

-const { LLModel } = require('bindings')('../build/Release/gpt4allts');
-const { createWriteStream, existsSync } = require('fs');
-const { join } = require('path');
-const { performance } = require('node:perf_hooks');
-
-
-
-// readChunks() reads from the provided reader and yields the results into an async iterable
-// https://css-tricks.com/web-streams-everywhere-and-fetch-for-node-js/
-function readChunks(reader) {
-    return {
-        async* [Symbol.asyncIterator]() {
-            let readResult = await reader.read();
-            while (!readResult.done) {
-                yield readResult.value;
-                readResult = await reader.read();
-            }
-        },
+async function loadModel(modelName, options = {}) {
+    const loadOptions = {
+        modelPath: config.DEFAULT_DIRECTORY,
+        librariesPath: config.DEFAULT_LIBRARIES_DIRECTORY,
+        allowDownload: true,
+        verbose: true,
+        ...options,
    };
+
+    await retrieveModel(modelName, {
+        modelPath: loadOptions.modelPath,
+        allowDownload: loadOptions.allowDownload,
+        verbose: loadOptions.verbose,
+    });
+
+    const libSearchPaths = loadOptions.librariesPath.split(";");
+
+    let libPath = null;
+
+    for (const searchPath of libSearchPaths) {
+        if (existsSync(searchPath)) {
+            libPath = searchPath;
+            break;
+        }
+    }
+
+    const llmOptions = {
+        model_name: appendBinSuffixIfMissing(modelName),
+        model_path: loadOptions.modelPath,
+        library_path: libPath,
+    };
+
+    if (loadOptions.verbose) {
+        console.log("Creating LLModel with options:", llmOptions);
+    }
+    const llmodel = new LLModel(llmOptions);
+
+    return llmodel;
 }

-exports.LLModel = LLModel;
+function createPrompt(messages, hasDefaultHeader, hasDefaultFooter) {
+    let fullPrompt = "";

+    for (const message of messages) {
+        if (message.role === "system") {
+            const systemMessage = message.content + "\n";
+            fullPrompt += systemMessage;
+        }
+    }
+    if (hasDefaultHeader) {
+        fullPrompt += `### Instruction: 
+        The prompt below is a question to answer, a task to complete, or a conversation 
+        to respond to; decide which and write an appropriate response.
+        \n### Prompt: 
+        `;
+    }
+    for (const message of messages) {
+        if (message.role === "user") {
+            const user_message = "\n" + message["content"];
+            fullPrompt += user_message;
+        }
+        if (message["role"] == "assistant") {
+            const assistant_message = "\nResponse: " + message["content"];
+            fullPrompt += assistant_message;
+        }
+    }
+    if (hasDefaultFooter) {
+        fullPrompt += "\n### Response:";
+    }

-exports.download = function (
-    name,
-    options = { debug: false, location: process.cwd(), link: undefined }
+    return fullPrompt;
+}
+
+async function createCompletion(
+    llmodel,
+    messages,
+    options = {
+        hasDefaultHeader: true,
+        hasDefaultFooter: false,
+        verbose: true,
+    }
 ) {
-    const abortController = new AbortController();
-    const signal = abortController.signal;
-
-    const pathToModel = join(options.location, name);
-    if(existsSync(pathToModel)) {
-        throw Error("Path to model already exists");
-    }
-
-    //wrapper function to get the readable stream from request
-    const fetcher = (name) => fetch(options.link ?? `https://gpt4all.io/models/${name}`, {
-        signal,
-    })
-    .then(res => {
-         if(!res.ok) {
-            throw Error("Could not find "+ name + " from " + `https://gpt4all.io/models/` )
-         }
-         return res.body.getReader()
-    })
-    
-    //a promise that executes and writes to a stream. Resolves when done writing.
-    const res = new Promise((resolve, reject) => {
-        fetcher(name)
-            //Resolves an array of a reader and writestream.
-            .then(reader => [reader, createWriteStream(pathToModel)])
-            .then( 
-            async ([readable, wstream]) => {
-                console.log('(CLI might hang) Downloading @ ', pathToModel);
-                let perf;
-                if(options.debug) {
-                   perf = performance.now(); 
-                }
-                for await (const chunk of readChunks(readable)) {
-                    wstream.write(chunk);
-                }
-                if(options.debug) {
-                    console.log("Time taken: ", (performance.now()-perf).toFixed(2), " ms"); 
-                }
-                resolve();
-            }
-        ).catch(reject);
-    });
-    
-    return {
-        cancel : () => abortController.abort(),
-        promise: () => res
-    }
-}
-
-
-//https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates
-exports.prompt = function prompt(strings, ...keys) {
-  return (...values) => {
-    const dict = values[values.length - 1] || {};
-    const result = [strings[0]];
-    keys.forEach((key, i) => {
-      const value = Number.isInteger(key) ? values[key] : dict[key];
-      result.push(value, strings[i + 1]);
-    });
-    return result.join("");
-  };
-}
-
-
-
-exports.createCompletion = function (llmodel, promptMaker, options) {
    //creating the keys to insert into promptMaker.
-    const entries = { 
-        system: options.system ?? '',
-        header: options.header ?? "### Instruction: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.\n### Prompt: ",
-        prompt: options.prompt,
-        ...(options.promptEntries ?? {})
-    };
-    
-    const fullPrompt = promptMaker(entries)+'\n### Response:';
-
-    if(options.verbose) {
-       console.log("sending prompt: " + `"${fullPrompt}"`) 
+    const fullPrompt = createPrompt(
+        messages,
+        options.hasDefaultHeader ?? true,
+        options.hasDefaultFooter
+    );
+    if (options.verbose) {
+        console.log("Sent: " + fullPrompt);
    }
-    
-    return llmodel.raw_prompt(fullPrompt, options);
+    const promisifiedRawPrompt = new Promise((resolve, rej) => {
+        llmodel.raw_prompt(fullPrompt, options, (s) => {
+            resolve(s);
+        });
+    });
+    return promisifiedRawPrompt.then((response) => {
+        return {
+            llmodel: llmodel.name(),
+            usage: {
+                prompt_tokens: fullPrompt.length,
+                completion_tokens: response.length, //TODO
+                total_tokens: fullPrompt.length + response.length, //TODO
+            },
+            choices: [
+                {
+                    message: {
+                        role: "assistant",
+                        content: response,
+                    },
+                },
+            ],
+        };
+    });
 }
+
+module.exports = {
+    ...config,
+    LLModel,
+    createCompletion,
+    downloadModel,
+    retrieveModel,
+    loadModel,
+};
--- a/gpt4all-bindings/typescript/src/util.d.ts
+++ b/gpt4all-bindings/typescript/src/util.d.ts
@ -0,0 +1,69 @@
+/// <reference types="node" />
+declare module "gpt4all";
+
+/**
+ * Initiates the download of a model file of a specific model type.
+ * By default this downloads without waiting. use the controller returned to alter this behavior.
+ * @param {ModelFile} model - The model file to be downloaded.
+ * @param {DownloadOptions} options - to pass into the downloader. Default is { location: (cwd), debug: false }.
+ * @returns {DownloadController} object that allows controlling the download process.
+ *
+ * @throws {Error} If the model already exists in the specified location.
+ * @throws {Error} If the model cannot be found at the specified url.
+ *
+ * @example
+ * const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
+ * controller.promise().then(() => console.log('Downloaded!'))
+ */
+declare function downloadModel(
+    modelName: string,
+    options?: DownloadModelOptions
+): DownloadController;
+
+/**
+ * Options for the model download process.
+ */
+export interface DownloadModelOptions {
+    /**
+     * location to download the model.
+     * Default is process.cwd(), or the current working directory
+     */
+    modelPath?: string;
+
+    /**
+     * Debug mode -- check how long it took to download in seconds
+     * @default false
+     */
+    debug?: boolean;
+
+    /**
+     * Remote download url. Defaults to `https://gpt4all.io/models`
+     * @default https://gpt4all.io/models
+     */
+    url?: string;
+}
+
+declare function listModels(): Promise<Record<string, string>[]>;
+
+interface RetrieveModelOptions {
+    allowDownload?: boolean;
+    verbose?: boolean;
+    modelPath?: string;
+}
+
+declare async function retrieveModel(
+    model: string,
+    options?: RetrieveModelOptions
+): Promise<string>;
+
+/**
+ * Model download controller.
+ */
+interface DownloadController {
+    /** Cancel the request to download from gpt4all website if this is called. */
+    cancel: () => void;
+    /** Convert the downloader into a promise, allowing people to await and manage its lifetime */
+    promise: () => Promise<void>;
+}
+
+export { downloadModel, DownloadModelOptions, DownloadController, listModels, retrieveModel, RetrieveModelOptions };
--- a/gpt4all-bindings/typescript/src/util.js
+++ b/gpt4all-bindings/typescript/src/util.js
@ -0,0 +1,156 @@
+const { createWriteStream, existsSync } = require("fs");
+const { performance } = require("node:perf_hooks");
+const path = require("node:path");
+const {mkdirp} = require("mkdirp");
+const { DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } = require("./config.js");
+
+async function listModels() {
+    const res = await fetch("https://gpt4all.io/models/models.json");
+    const modelList = await res.json();
+    return modelList;
+}
+
+function appendBinSuffixIfMissing(name) {
+    if (!name.endsWith(".bin")) {
+        return name + ".bin";
+    }
+    return name;
+}
+
+// readChunks() reads from the provided reader and yields the results into an async iterable
+// https://css-tricks.com/web-streams-everywhere-and-fetch-for-node-js/
+function readChunks(reader) {
+    return {
+        async *[Symbol.asyncIterator]() {
+            let readResult = await reader.read();
+            while (!readResult.done) {
+                yield readResult.value;
+                readResult = await reader.read();
+            }
+        },
+    };
+}
+
+function downloadModel(
+    modelName,
+    options = {}
+) {
+    const downloadOptions = {
+        modelPath: DEFAULT_DIRECTORY,
+        debug: false,
+        url: "https://gpt4all.io/models",
+        ...options,
+    };
+
+    const modelFileName = appendBinSuffixIfMissing(modelName);
+    const fullModelPath = path.join(downloadOptions.modelPath, modelFileName);
+    const modelUrl = `${downloadOptions.url}/${modelFileName}`
+
+    if (existsSync(fullModelPath)) {
+        throw Error(`Model already exists at ${fullModelPath}`);
+    }
+
+    const abortController = new AbortController();
+    const signal = abortController.signal;
+
+    //wrapper function to get the readable stream from request
+    // const baseUrl = options.url ?? "https://gpt4all.io/models";
+    const fetchModel = () =>
+        fetch(modelUrl, {
+            signal,
+        }).then((res) => {
+            if (!res.ok) {
+                throw Error(`Failed to download model from ${modelUrl} - ${res.statusText}`);
+            }
+            return res.body.getReader();
+        });
+
+    //a promise that executes and writes to a stream. Resolves when done writing.
+    const res = new Promise((resolve, reject) => {
+        fetchModel()
+            //Resolves an array of a reader and writestream.
+            .then((reader) => [reader, createWriteStream(fullModelPath)])
+            .then(async ([readable, wstream]) => {
+                console.log("Downloading @ ", fullModelPath);
+                let perf;
+                if (options.debug) {
+                    perf = performance.now();
+                }
+                for await (const chunk of readChunks(readable)) {
+                    wstream.write(chunk);
+                }
+                if (options.debug) {
+                    console.log(
+                        "Time taken: ",
+                        (performance.now() - perf).toFixed(2),
+                        " ms"
+                    );
+                }
+                resolve(fullModelPath);
+            })
+            .catch(reject);
+    });
+
+    return {
+        cancel: () => abortController.abort(),
+        promise: () => res,
+    };
+};
+
+async function retrieveModel (
+    modelName,
+    options = {}
+) {
+    const retrieveOptions = {
+        modelPath: DEFAULT_DIRECTORY,
+        allowDownload: true,
+        verbose: true,
+        ...options,
+    };
+
+    await mkdirp(retrieveOptions.modelPath);
+
+    const modelFileName = appendBinSuffixIfMissing(modelName);
+    const fullModelPath = path.join(retrieveOptions.modelPath, modelFileName);
+    const modelExists = existsSync(fullModelPath);
+
+    if (modelExists) {
+        return fullModelPath;
+    }
+
+    if (!retrieveOptions.allowDownload) {
+        throw Error(`Model does not exist at ${fullModelPath}`);
+    }
+
+    const availableModels = await listModels();
+    const foundModel = availableModels.find((model) => model.filename === modelFileName);
+
+    if (!foundModel) {
+        throw Error(`Model "${modelName}" is not available.`);
+    }
+
+    if (retrieveOptions.verbose) {
+        console.log(`Downloading ${modelName}...`);
+    }
+
+    const downloadController = downloadModel(modelName, {
+        modelPath: retrieveOptions.modelPath,
+        debug: retrieveOptions.verbose,
+    });
+
+    const downloadPath = await downloadController.promise();
+
+    if (retrieveOptions.verbose) {
+        console.log(`Model downloaded to ${downloadPath}`);
+    }
+
+    return downloadPath
+
+}
+
+
+module.exports = {
+    appendBinSuffixIfMissing,
+    downloadModel,
+    retrieveModel,
+};
--- a/gpt4all-bindings/typescript/stdcapture.cc
+++ b/gpt4all-bindings/typescript/stdcapture.cc
@ -1,14 +0,0 @@
-
-#include "stdcapture.h"
-
-CoutRedirect::CoutRedirect() {
-    old = std::cout.rdbuf(buffer.rdbuf());  // redirect cout to buffer stream
-}
-
-std::string CoutRedirect::getString() {
-    return buffer.str();  // get string
-}
-
-CoutRedirect::~CoutRedirect() {
-    std::cout.rdbuf(old);  // reverse redirect
-}
--- a/gpt4all-bindings/typescript/stdcapture.h
+++ b/gpt4all-bindings/typescript/stdcapture.h
@ -1,21 +0,0 @@
-//https://stackoverflow.com/questions/5419356/redirect-stdout-stderr-to-a-string
-#ifndef COUTREDIRECT_H
-#define COUTREDIRECT_H
-
-#include <iostream>
-#include <streambuf>
-#include <string>
-#include <sstream>
-
-class CoutRedirect {
-public:
-    CoutRedirect();
-    std::string getString();
-    ~CoutRedirect();
-
-private:
-    std::stringstream buffer;
-    std::streambuf* old;
-};
-
-#endif  // COUTREDIRECT_H
--- a/gpt4all-bindings/typescript/test/index.mjs
+++ b/gpt4all-bindings/typescript/test/index.mjs
@ -1,38 +1,5 @@
 import * as assert from 'node:assert'
-import { prompt, download } from '../src/gpt4all.js'
-
-{
-
-    const somePrompt = prompt`${"header"} Hello joe, my name is Ron. ${"prompt"}`;
-    assert.equal(
-        somePrompt({ header: 'oompa', prompt: 'holy moly' }),   
-        'oompa Hello joe, my name is Ron. holy moly'
-    );
-
-}
-
-{
-
-    const indexedPrompt = prompt`${0}, ${1} ${0}`;
-    assert.equal(
-        indexedPrompt('hello', 'world'),
-        'hello, world hello'
-    );
-    
-    assert.notEqual(
-        indexedPrompt(['hello', 'world']),
-        'hello, world hello'
-    );
-
-}
-
-{
-    assert.equal(
-    (prompt`${"header"} ${"prompt"}`)({ header: 'hello', prompt: 'poo' }), 'hello poo',
-    "Template prompt not equal"
-    );
-
-}
+import { download } from '../src/gpt4all.js'


 assert.rejects(async () => download('poo.bin').promise());
--- a/gpt4all-bindings/typescript/yarn.lock
+++ b/gpt4all-bindings/typescript/yarn.lock