V3 docs max (#2488)

* new skeleton Signed-off-by: Max Cembalest <max@nomic.ai> * v3 docs Signed-off-by: Max Cembalest <max@nomic.ai> --------- Signed-off-by: Max Cembalest <max@nomic.ai>
2024-10-01 01:06:10 -04:00 · 2024-07-01 13:00:14 -04:00 · 2024-07-01 13:00:14 -04:00 · 5306595176
commit 5306595176
parent bd307abfe6
57 changed files with 865 additions and 170 deletions
--- a/README.md
+++ b/README.md
@ -1,13 +1,34 @@
 <h1 align="center">GPT4All</h1>
-<p align="center">Privacy-oriented software for chatting with large language models that run on your own computer.</p>
+
+<p align="center">GPT4All runs large language models (LLMs) privately on everyday desktops & laptops. <br> <br> No API calls or GPUs required - you can just download the application and <a href="https://docs.gpt4all.io/gpt4all_desktop/quickstart.html#quickstart">get started</a>
+
+
 <p align="center">
-  <a href="https://gpt4all.io">Official Website</a> &bull; <a href="https://docs.gpt4all.io">Documentation</a> &bull; <a href="https://discord.gg/mGZE39AS3e">Discord</a>
+  <a href="https://gpt4all.io/installers/gpt4all-installer-win64.exe">
+    <img src="gpt4all-bindings/python/docs/assets/windows.png" width="80" height="80"><br>
+    Download for Windows
+  </a>
+</p>
+
+<p align="center">
+  <a href="https://gpt4all.io/installers/gpt4all-installer-darwin.dmg">
+    <img src="gpt4all-bindings/python/docs/assets/mac.png" width="80" height="90"><br>
+    Download for MacOS
+  </a>
+</p>
+
+<p align="center">
+  <a href="https://gpt4all.io/installers/gpt4all-installer-linux.run">
+    <img src="gpt4all-bindings/python/docs/assets/linux.png" width="80" height="80"><br>
+    Download for Linux
+  </a>
+</p>
+
+<p align="center">
+  <a href="https://gpt4all.io">Website</a> &bull; <a href="https://docs.gpt4all.io">Documentation</a> &bull; <a href="https://discord.gg/mGZE39AS3e">Discord</a>
 </p>
 <p align="center">
-  Official Download Links: <a href="https://gpt4all.io/installers/gpt4all-installer-win64.exe">Windows</a> &mdash; <a href="https://gpt4all.io/installers/gpt4all-installer-darwin.dmg">macOS</a> &mdash; <a href="https://gpt4all.io/installers/gpt4all-installer-linux.run">Ubuntu</a>
-</p>
-<p align="center">
-  <b>NEW:</b> <a href="https://forms.nomic.ai/gpt4all-release-notes-signup">Subscribe to our mailing list</a> for updates and news!
+  <a href="https://forms.nomic.ai/gpt4all-release-notes-signup">Subscribe to the newsletter</a>
 </p>
 <p align="center">
 GPT4All is made possible by our compute partner <a href="https://www.paperspace.com/">Paperspace</a>.
@ -16,31 +37,30 @@ GPT4All is made possible by our compute partner <a href="https://www.paperspace.
 <a href="https://www.phorm.ai/query?projectId=755eecd3-24ad-49cc-abf4-0ab84caacf63"><img src="https://img.shields.io/badge/Phorm-Ask_AI-%23F2777A.svg" alt="phorm.ai"></a>
 </p>

-<p align="center">
-  <img width="auto" height="400" src="https://github.com/nomic-ai/gpt4all/assets/14168726/495fce3e-769b-4e5a-a394-99f072ac4d29">
-</p>
-<p align="center">
-Run on an M2 MacBook Pro (not sped up!)
-</p>


-## About GPT4All
+## Install GPT4All Python

-GPT4All is an ecosystem to run **powerful** and **customized** large language models that work locally on consumer grade CPUs and NVIDIA and AMD GPUs. Note that your CPU needs to support [AVX instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions).
+`gpt4all` gives you access to LLMs with our Python client around [`llama.cpp`](https://github.com/ggerganov/llama.cpp) implementations. 

-Learn more in the [documentation](https://docs.gpt4all.io).
+Nomic contributes to open source software like [`llama.cpp`](https://github.com/ggerganov/llama.cpp) to make LLMs accessible and efficient **for all**.

-A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All software. **Nomic AI** supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily deploy their own on-edge large language models.
+```bash
+pip install gpt4all
+```
+
+```python
+from gpt4all import GPT4All
+model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
+with model.chat_session():
+    print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))
+```


-### Installation
-
-The recommended way to install GPT4All is to use one of the online installers linked above in this README, which are also available at the [GPT4All website](https://gpt4all.io/). These require an internet connection at install time, are slightly easier to use on macOS due to code signing, and provide a version of GPT4All that can check for updates.
-
-An alternative way to install GPT4All is to use one of the offline installers available on the [Releases page](https://github.com/nomic-ai/gpt4all/releases). These do not require an internet connection at install time, and can be used to install an older version of GPT4All if so desired. But using these requires acknowledging a security warning on macOS, and they provide a version of GPT4All that is unable to notify you of updates, so you should enable notifications for Releases on this repository (Watch > Custom > Releases) or sign up for announcements in our [Discord server](https://discord.gg/mGZE39AS3e).
-
-
-### What's New
+### Release History
+- **July 2nd, 2024**: V3.0.0 Release
+    - New UI/UX: fresh redesign of the chat application GUI and user experience
+    - LocalDocs: bring information from files on-device into chats
 - **October 19th, 2023**: GGUF Support Launches with Support for:
    - Mistral 7b base model, an updated model gallery on [gpt4all.io](https://gpt4all.io), several new local code models including Rift Coder v1.5
    - [Nomic Vulkan](https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan) support for Q4\_0 and Q4\_1 quantizations in GGUF.
@ -51,21 +71,9 @@ An alternative way to install GPT4All is to use one of the offline installers av

 [Docker-based API server]: https://github.com/nomic-ai/gpt4all/tree/cef74c2be20f5b697055d5b8b506861c7b997fab/gpt4all-api

-
-### Building From Source
-
-* Follow the instructions [here](gpt4all-chat/build_and_run.md) to build the GPT4All Chat UI from source.
-
-
-### Bindings
-
-* :snake: <a href="https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/python">Official Python Bindings</a> [![Downloads](https://static.pepy.tech/badge/gpt4all/week)](https://pepy.tech/project/gpt4all)
-* :computer: <a href="https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/typescript">Typescript Bindings</a>
-
-
 ### Integrations

-* :parrot::link: [Langchain](https://python.langchain.com/en/latest/modules/models/llms/integrations/gpt4all.html)
+* :parrot::link: [Langchain](https://python.langchain.com/v0.2/docs/integrations/providers/gpt4all/)
 * :card_file_box: [Weaviate Vector Database](https://github.com/weaviate/weaviate) - [module docs](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-gpt4all)
 * :telescope: [OpenLIT (OTel-native Monitoring)](https://github.com/openlit/openlit) - [Docs](https://docs.openlit.io/latest/integrations/gpt4all)

@ -79,58 +87,6 @@ Please make sure to tag all of the above with relevant project identifiers or yo
 Example tags: `backend`, `bindings`, `python-bindings`, `documentation`, etc.


-## GPT4All 2024 Roadmap
-To contribute to the development of any of the below roadmap items, make or find the corresponding issue and cross-reference the [in-progress task](https://github.com/orgs/nomic-ai/projects/2/views/1).
-
-Each item should have an issue link below.
-
- Chat UI Language Localization (localize UI into the native languages of users)
-    - [ ] Chinese
-    - [ ] German
-    - [ ] French
-    - [ ] Portuguese
-    - [ ] Your native language here. 
- UI Redesign: an internal effort at Nomic to improve the UI/UX of gpt4all for all users.
-    - [ ] Design new user interface and gather community feedback
-    - [ ] Implement the new user interface and experience.
- Installer and Update Improvements
-    - [ ] Seamless native installation and update process on OSX
-    - [ ] Seamless native installation and update process on Windows
-    - [ ] Seamless native installation and update process on Linux
- Model discoverability improvements:
-    - [x] Support huggingface model discoverability
-    - [ ] Support Nomic hosted model discoverability
- LocalDocs (towards a local perplexity)
-    - Multilingual LocalDocs Support
-        - [ ] Create a multilingual experience
-        - [ ] Incorporate a multilingual embedding model
-        - [ ] Specify a preferred multilingual LLM for localdocs
-    - Improved RAG techniques
-        - [ ] Query augmentation and re-writing
-        - [ ] Improved chunking and text extraction from arbitrary modalities
-            - [ ] Custom PDF extractor past the QT default (charts, tables, text)
-        - [ ] Faster indexing and local exact search with v1.5 hamming embeddings and reranking (skip ANN index construction!)
-    - Support queries like 'summarize X document'
-    - Multimodal LocalDocs support with Nomic Embed
-    - Nomic Dataset Integration with real-time LocalDocs
-        - [ ] Include an option to allow the export of private LocalDocs collections to Nomic Atlas for debugging data/chat quality
-        - [ ] Allow optional sharing of LocalDocs collections between users.
-        - [ ] Allow the import of a LocalDocs collection from an Atlas Datasets
-            - Chat with live version of Wikipedia, Chat with Pubmed, chat with the latest snapshot of world news.
- First class Multilingual LLM Support
-    - [ ] Recommend and set a default LLM for German
-    - [ ] Recommend and set a default LLM for English
-    - [ ] Recommend and set a default LLM for Chinese
-    - [ ] Recommend and set a default LLM for Spanish
-
- Server Mode improvements
-    - Improved UI and new requested features:
-        - [ ] Fix outstanding bugs and feature requests around networking configurations.
-        - [ ] Support Nomic Embed inferencing
-        - [ ] First class documentation
-        - [ ] Improving developer use and quality of server mode (e.g. support larger batches)
-
-
 ## Technical Reports

 <p align="center">
--- a/gpt4all-bindings/python/docs/assets/add.png
+++ b/gpt4all-bindings/python/docs/assets/add.png
--- a/gpt4all-bindings/python/docs/assets/add_model_gpt4.png
+++ b/gpt4all-bindings/python/docs/assets/add_model_gpt4.png
--- a/gpt4all-bindings/python/docs/assets/baelor.png
+++ b/gpt4all-bindings/python/docs/assets/baelor.png
--- a/gpt4all-bindings/python/docs/assets/before_first_chat.png
+++ b/gpt4all-bindings/python/docs/assets/before_first_chat.png
--- a/gpt4all-bindings/python/docs/assets/closed_chat_panel.png
+++ b/gpt4all-bindings/python/docs/assets/closed_chat_panel.png
--- a/gpt4all-bindings/python/docs/assets/configure_doc_collection.png
+++ b/gpt4all-bindings/python/docs/assets/configure_doc_collection.png
--- a/gpt4all-bindings/python/docs/assets/download.png
+++ b/gpt4all-bindings/python/docs/assets/download.png
--- a/gpt4all-bindings/python/docs/assets/download_llama.png
+++ b/gpt4all-bindings/python/docs/assets/download_llama.png
--- a/gpt4all-bindings/python/docs/assets/explore.png
+++ b/gpt4all-bindings/python/docs/assets/explore.png
--- a/gpt4all-bindings/python/docs/assets/explore_models.png
+++ b/gpt4all-bindings/python/docs/assets/explore_models.png
--- a/gpt4all-bindings/python/docs/assets/good_tyrion.png
+++ b/gpt4all-bindings/python/docs/assets/good_tyrion.png
--- a/gpt4all-bindings/python/docs/assets/got_docs_ready.png
+++ b/gpt4all-bindings/python/docs/assets/got_docs_ready.png
--- a/gpt4all-bindings/python/docs/assets/got_done.png
+++ b/gpt4all-bindings/python/docs/assets/got_done.png
--- a/gpt4all-bindings/python/docs/assets/gpt4all_home.png
+++ b/gpt4all-bindings/python/docs/assets/gpt4all_home.png
--- a/gpt4all-bindings/python/docs/assets/installed_models.png
+++ b/gpt4all-bindings/python/docs/assets/installed_models.png
--- a/gpt4all-bindings/python/docs/assets/linux.png
+++ b/gpt4all-bindings/python/docs/assets/linux.png
--- a/gpt4all-bindings/python/docs/assets/local_embed.gif
+++ b/gpt4all-bindings/python/docs/assets/local_embed.gif
--- a/gpt4all-bindings/python/docs/assets/mac.png
+++ b/gpt4all-bindings/python/docs/assets/mac.png
--- a/gpt4all-bindings/python/docs/assets/models_page_icon.png
+++ b/gpt4all-bindings/python/docs/assets/models_page_icon.png
--- a/gpt4all-bindings/python/docs/assets/new_docs_annotated.png
+++ b/gpt4all-bindings/python/docs/assets/new_docs_annotated.png
--- a/gpt4all-bindings/python/docs/assets/new_docs_annotated_filled.png
+++ b/gpt4all-bindings/python/docs/assets/new_docs_annotated_filled.png
--- a/gpt4all-bindings/python/docs/assets/new_first_chat.png
+++ b/gpt4all-bindings/python/docs/assets/new_first_chat.png
--- a/gpt4all-bindings/python/docs/assets/no_docs.png
+++ b/gpt4all-bindings/python/docs/assets/no_docs.png
--- a/gpt4all-bindings/python/docs/assets/no_models.png
+++ b/gpt4all-bindings/python/docs/assets/no_models.png
--- a/gpt4all-bindings/python/docs/assets/no_models_tiny.png
+++ b/gpt4all-bindings/python/docs/assets/no_models_tiny.png
--- a/gpt4all-bindings/python/docs/assets/open_chat_panel.png
+++ b/gpt4all-bindings/python/docs/assets/open_chat_panel.png
--- a/gpt4all-bindings/python/docs/assets/open_local_docs.png
+++ b/gpt4all-bindings/python/docs/assets/open_local_docs.png
--- a/gpt4all-bindings/python/docs/assets/open_sources.png
+++ b/gpt4all-bindings/python/docs/assets/open_sources.png
--- a/gpt4all-bindings/python/docs/assets/search_mistral.png
+++ b/gpt4all-bindings/python/docs/assets/search_mistral.png
--- a/gpt4all-bindings/python/docs/assets/search_settings.png
+++ b/gpt4all-bindings/python/docs/assets/search_settings.png
--- a/gpt4all-bindings/python/docs/assets/syrio_snippets.png
+++ b/gpt4all-bindings/python/docs/assets/syrio_snippets.png
--- a/gpt4all-bindings/python/docs/assets/three_model_options.png
+++ b/gpt4all-bindings/python/docs/assets/three_model_options.png
--- a/gpt4all-bindings/python/docs/assets/windows.png
+++ b/gpt4all-bindings/python/docs/assets/windows.png
--- a/gpt4all-bindings/python/docs/css/custom.css
+++ b/gpt4all-bindings/python/docs/css/custom.css
@ -1,5 +1,5 @@
-/* Remove the `In` and `Out` block in rendered Jupyter notebooks */
-.md-container .jp-Cell-outputWrapper .jp-OutputPrompt.jp-OutputArea-prompt,
-.md-container .jp-Cell-inputWrapper .jp-InputPrompt.jp-InputArea-prompt {
-  display: none !important;
-}
+.md-content h1,
+.md-content h2 {
+  margin-top: 0.5em;
+  margin-bottom: 0.5em;
+}
--- a/gpt4all-bindings/python/docs/gpt4all_desktop/chats.md
+++ b/gpt4all-bindings/python/docs/gpt4all_desktop/chats.md
@ -0,0 +1,124 @@
+# Chats
+
+Chats are conversations with [language models](models.md) that run locally on your device.
+
+!!! note "Example Chats"
+
+    === "Llama 3"
+        Download `Llama 3` and prompt:
+        ```
+        explain why the sky is blue in a way that is correct and makes sense to a child
+        ```
+
+        With the default sampling settings, you should see text resembling the following:
+
+        ```
+        What a great question!
+        So, you know how we can see different colors like red, yellow, green, and orange? Well, when sunlight enters Earth's atmosphere, it starts to interact with tiny particles called molecules of gases like nitrogen (N2) and oxygen (02). These molecules are really small, but they're all around us!
+        Now, here's the cool part: these molecules scatter, or bounce off, shorter wavelengths of light more than longer wavelengths. Think of it like a game of pool where the cue ball hits other balls with different sizes and speeds.
+        The shortest wavelength of visible light is violet (about 400 nanometers), followed by blue (around 450-495 nanometers). These shorter wavelengths are scattered in all directions, kind of like how a basketball player might bounce the ball around on the court. This scattering effect makes the sky appear blue to our eyes!
+        Here's why it doesn't look as bright or intense as a blue crayon: there are many more molecules in the atmosphere that scatter longer wavelengths (like red and orange) than shorter ones, so those colors get spread out and become less noticeable.
+        To sum it up:
+        1. Sunlight enters Earth's atmosphere.
+        2. Tiny gas molecule particles interact with light, scattering shorter wavelengths like blue and violet.
+        3. This scattered blue light reaches our eyes from all directions, making the sky appear blue!
+        Now, go outside and enjoy that beautiful blue sky!
+        ```
+
+    === "Nous Hermes 2 Mistral DPO"
+        Download `Nous Hermes 2 Mistral DPO` and prompt:
+        ```
+        write me a react app i can run from the command line to play a quick game
+        ```
+
+        With the default sampling settings, you should see text and code blocks resembling the following:
+        
+        ```
+        I'll provide you with a simple React app that allows you to play a guessing game. This app will generate a random number between 1 and 10, and ask you to guess it. You can run this app locally using Create React App. Follow the steps below:
+        Open your terminal or command line interface.
+        Navigate to the directory where you want to create the project (e.g., cd Documents/Projects).
+        Run the following commands:
+        ```
+
+        ```bash
+        npx create-react-app guessing-game
+        cd guessing-game
+        ```
+
+        ```
+        Open the project in your preferred code editor by running code . (for VS Code) or any other command to open the folder with your text editor.
+        Replace the content of the src/App.js file with the following code:
+        ```
+
+        ```javascript
+        import React, { useState } from 'react';
+
+        function App() {
+        const [guess, setGuess] = useState('');
+        const [message, setMessage] = useState('');
+        const [numberToGuess, setNumberToGuess] = useState(Math.floor(Math.random() * 10) + 1);
+
+        function handleChange(e) {
+            setGuess(e.target.value);
+        }
+
+        function handleSubmit(e) {
+            e.preventDefault();
+            if (parseInt(guess) === numberToGuess) {
+            setMessage('You guessed it right!');
+            } else {
+            setMessage(`Wrong guess, the correct number was ${numberToGuess}. Try again next time.`);
+            }
+        }
+
+        return (
+            <div>
+            <h1>Guess a Number</h1>
+            <p>{message}</p>
+            <form onSubmit={handleSubmit}>
+                <input type="number" value={guess} onChange={handleChange} />
+                <button type="submit">Submit</button>
+            </form>
+            </div>
+        );
+        }
+
+        export default App;
+        ```
+
+        ```
+        Save the file and go back to your terminal or command line interface. Run npm start. This will start a local development server, and you can access the app in your browser at http://localhost:3000.
+        Play the guessing game by entering a number between 1 and 10 into the input field and clicking "Submit". The app will tell you if your guess is correct or not.
+        Remember that this is just a simple example, and you can expand upon it to make the game more interesting with additional features like high scores, multiple difficulty levels, etc.
+        ```
+
+## New Chat
+
+Choose a model with the dropdown at the top of the Chats page
+
+If you don't have any models, [download one](models.md#download-models). Once you have models, you can start chats by loading your default model, which you can configure in [settings](settings.md#application-settings)
+
+![Choose a model](../assets/three_model_options.png)
+
+## LocalDocs
+
+Open the [LocalDocs](localdocs.md) panel with the button in the top-right corner to bring your files into the chat. With LocalDocs, your chats are enhanced with semantically related snippets from your files included in the model's context.
+
+![Open LocalDocs](../assets/open_local_docs.png)
+
+## Chat History
+
+View your chat history with the button in the top-left corner of the Chats page.
+
+<table>
+<tr>
+    <td>
+    <img src="../assets/closed_chat_panel.png" alt="Close chats" style="width:100%">
+    </td>
+    <td>
+    <img src="../assets/open_chat_panel.png" alt="Open chats" style="width:100%">
+    </td>
+</tr>
+</table>
+
+You can change a chat name or delete it from your chat history at any time.
--- a/gpt4all-bindings/python/docs/gpt4all_desktop/localdocs.md
+++ b/gpt4all-bindings/python/docs/gpt4all_desktop/localdocs.md
@ -0,0 +1,48 @@
+# LocalDocs
+
+LocalDocs brings the information you have from files on-device into your LLM chats - **privately**.
+
+## Create LocalDocs
+
+!!! note "Create LocalDocs"
+
+    1. Click `+ Add Collection`.
+    
+    2. Name your collection and link it to a folder.
+
+        <table>
+        <tr>
+            <td>
+            <img src="../assets/new_docs_annotated.png" alt="new GOT Docs" style="width:100%">
+            </td>
+            <td>
+            <img src="../assets/new_docs_annotated_filled.png" alt="new GOT Docs filled out" style="width:100%">
+            </td>
+        </tr>
+        </table>
+
+    3. Click `Create Collection`. Progress for the collection is displayed on the LocalDocs page. 
+
+        ![Embedding in progress](../assets/baelor.png)
+
+        You will see a green `Ready` indicator when the entire collection is ready. 
+
+        Note: you can still chat with the files that are ready before the entire collection is ready.
+
+        ![Embedding complete](../assets/got_done.png)
+
+        Later on if you modify your LocalDocs settings you can rebuild your collections with your new settings.
+
+    4. In your chats, open `LocalDocs` with button in top-right corner to give your LLM context from those files.
+
+        ![LocalDocs result](../assets/syrio_snippets.png)
+
+    5. See which files were referenced by clicking `Sources` below the LLM responses.
+
+        ![Sources](../assets/open_sources.png)
+
+## How It Works
+
+A LocalDocs collection uses Nomic AI's free and fast on-device embedding models to index your folder into text snippets that each get an **embedding vector**. These vectors allow us to find snippets from your files that are semantically similar to the questions and prompts you enter in your chats. We then include those semantically similar snippets in the prompt to the LLM.
+
+To try the embedding models yourself, we recommend using the [Nomic Python SDK](https://docs.nomic.ai/atlas/guides/embeddings)
--- a/gpt4all-bindings/python/docs/gpt4all_desktop/models.md
+++ b/gpt4all-bindings/python/docs/gpt4all_desktop/models.md
@ -0,0 +1,79 @@
+# Models
+
+GPT4All is optimized to run LLMs in the 3-13B parameter range on consumer-grade hardware.
+
+LLMs are downloaded to your device so you can run them locally and privately. With our backend anyone can interact with LLMs efficiently and securely on their own hardware.
+
+## Download Models
+
+!!! note "Download Models"
+
+    <div style="text-align: center; margin-top: 20px;">
+        <table style="margin-left: auto; margin-right: auto;">
+            <tr>
+                <td style="text-align: right; padding-right: 10px;">1.</td>
+                <td style="text-align: left;">Click `Models` in the menu on the left (below `Chats` and above `LocalDocs`)</td>
+                <td><img src="../assets/models_page_icon.png" alt="Models Page Icon" style="width: 80px; height: auto;"></td>
+            </tr>
+            <tr>
+                <td style="text-align: right; padding-right: 10px;">2.</td>
+                <td style="text-align: left;">Click `+ Add Model` to navigate to the `Explore Models` page</td>
+                <td><img src="../assets/add.png" alt="Add Model button" style="width: 100px; height: auto;"></td>
+            </tr>
+            <tr>
+                <td style="text-align: right; padding-right: 10px;">3.</td>
+                <td style="text-align: left;">Search for models available online</td>
+                <td><img src="../assets/explore.png" alt="Explore Models search" style="width: 120px; height: auto;"></td>
+            </tr>
+            <tr>
+                <td style="text-align: right; padding-right: 10px;">4.</td>
+                <td style="text-align: left;">Hit `Download` to save a model to your device</td>
+                <td><img src="../assets/download.png" alt="Download Models button" style="width: 120px; height: auto;"></td>
+            </tr>
+            <tr>
+                <td style="text-align: right; padding-right: 10px;">5.</td>
+                <td style="text-align: left;">Once the model is downloaded you will see it in `Models`.</td>
+                <td><img src="../assets/installed_models.png" alt="Download Models button" style="width: 120px; height: auto;"></td>
+            </tr>
+        </table>
+    </div>
+
+## Explore Models
+
+GPT4All connects you with LLMs from HuggingFace with a [`llama.cpp`](https://github.com/ggerganov/llama.cpp) backend so that they will run efficiently on your hardware. Many of these models can be identified by the file type `.gguf`.
+
+![Explore models](../assets/search_mistral.png)
+
+## Example Models
+
+Many LLMs are available at various sizes, quantizations, and licenses. 
+
+- LLMs with more parameters tend to be better at coherently responding to instructions
+
+- LLMs with a smaller quantization (e.g. 4bit instead of 16bit) are much faster and less memory intensive, and tend to have slightly worse performance
+
+- Licenses vary in their terms for personal and commercial use
+
+Here are a few examples:
+
+| Model| Filesize| RAM Required| Parameters| Developer| License| MD5 Sum (Unique Hash)|
+|------|---------|-------------|-----------|----------|--------|----------------------|
+| Llama 3 Instruct  | 4.66 GB| 8 GB| 8 Billion| Meta| [Llama 3 License](https://llama.meta.com/llama3/license/)| c87ad09e1e4c8f9c35a5fcef52b6f1c9|
+| Nous Hermes 2 Mistral DPO| 4.21 GB| 8 GB| 7 Billion| Mistral & Nous Research | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)| Coa5f6b4eabd3992da4d7fb7f020f921eb|
+| Phi-3 Mini Instruct | 2.03 GB| 4 GB| 4 billion| Microsoft| [MIT](https://opensource.org/license/mit)| f8347badde9bfc2efbe89124d78ddaf5|
+| Mini Orca (Small)| 1.84 GB| 4 GB| 3 billion| Microsoft | [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0)| 0e769317b90ac30d6e09486d61fefa26|
+| GPT4All Snoozy| 7.36 GB| 16 GB| 13 billion| Nomic AI| [GPL](https://www.gnu.org/licenses/gpl-3.0.en.html)| 40388eb2f8d16bb5d08c96fdfaac6b2c|
+
+### Search Results
+
+You can click the gear icon in the search bar to sort search results by their # of likes, # of downloads, or date of upload (all from HuggingFace).
+
+![Sort search results](../assets/search_settings.png)
+
+## Connect Model APIs
+
+You can add your API key for remote model providers.
+
+**Note**: this does not download a model file to your computer to use securely. Instead, this way of interacting with models has your prompts leave your computer to the API provider and returns the response to your computer.
+
+![Connect APIs](../assets/add_model_gpt4.png)
--- a/gpt4all-bindings/python/docs/gpt4all_desktop/quickstart.md
+++ b/gpt4all-bindings/python/docs/gpt4all_desktop/quickstart.md
@ -0,0 +1,40 @@
+# GPT4All Desktop
+
+The GPT4All Desktop Application allows you to download and run large language models (LLMs) locally & privately on your device.
+
+With GPT4All, you can chat with models, turn your local files into information sources for models [(LocalDocs)](localdocs.md), or browse models available online to download onto your device.
+
+## Quickstart
+
+!!! note "Quickstart"
+
+    1. Install GPT4All for your operating system and open the application.
+
+        <div style="text-align: center; margin-top: 20px;">
+            [Download for Windows](https://gpt4all.io/installers/gpt4all-installer-win64.exe) &nbsp;&nbsp;&nbsp;&nbsp;
+            [Download for Mac](https://gpt4all.io/installers/gpt4all-installer-darwin.dmg) &nbsp;&nbsp;&nbsp;&nbsp;
+            [Download for Linux](https://gpt4all.io/installers/gpt4all-installer-linux.run)
+        </div>
+
+    2. Hit `Start Chatting`. ![GPT4All home page](../assets/gpt4all_home.png)
+
+    3. Click `+ Add Model`.
+
+    4. Download a model. We recommend starting with Llama 3, but you can [browse more models](models.md). ![Download a model](../assets/download_llama.png)
+
+    5. Once downloaded, go to Chats (below Home and above Models in the menu on the left).  
+
+    6. Click "Load Default Model" (will be Llama 3 or whichever model you downloaded). 
+
+        <table>
+        <tr>
+            <td>
+            <img src="../assets/before_first_chat.png" alt="Before first chat" style="width:100%">
+            </td>
+            <td>
+            <img src="../assets/new_first_chat.png" alt="New first chat" style="width:100%">
+            </td>
+        </tr>
+        </table>
+
+    7. Try the [example chats](chats.md) or your own prompts!
--- a/gpt4all-bindings/python/docs/gpt4all_desktop/settings.md
+++ b/gpt4all-bindings/python/docs/gpt4all_desktop/settings.md
@ -0,0 +1,75 @@
+# Settings
+
+## Application Settings
+
+!!! note "General Application Settings"
+
+    | Setting | Description | Default Value |
+    | --- | --- | --- |
+    | **Theme** | Color theme for the application. Options are `Light`, `Dark`, and `LegacyDark` | `Light` |
+    | **Font Size** | Font size setting for text throughout the application. Options are Small, Medium, and Large | Small |
+    | **Device** | Device that will run your models. Options are `Auto` (GPT4All chooses), `Metal` (Apple Silicon M1+), `CPU`, and `GPU` | `Auto` |
+    | **Default Model** | Choose your preferred LLM to load by default on startup| Auto |
+    | **Download Path** | Select a destination on your device to save downloaded models | Windows: `C:\Users\{username}\AppData\Local\nomic.ai\GPT4All`<br><br>Mac: `/Users/{username}/Library/Application Support/nomic.ai/GPT4All/`<br><br>Linux: `/home/{username}/.local/share/nomic.ai/GPT4All` |
+    
+    | **Enable Datalake** | Opt-in to sharing interactions with GPT4All community (**anonymous** and **optional**) | Off |
+
+!!! note "Advanced Application Settings"
+
+    | Setting | Description | Default Value |
+    | --- | --- | --- |
+    | **CPU Threads** | Number of concurrently running CPU threads (more can speed up responses) | 4 |
+    | **Save Chat Context** | Save chat context to disk to pick up exactly where a model left off. | Off |
+    | **Enable Local Server** | Allow any application on your device to use GPT4All via an OpenAI-compatible GPT4All API | Off |
+    | **API Server Port** | Local HTTP port for the local API server | 4891 |
+
+## Model Settings
+
+!!! note "Model / Character Settings"
+
+    | Setting | Description | Default Value |
+    | --- | --- | --- |
+    | **Name** | Unique name of this model / character| set by model uploader |
+    | **System Prompt** | General instructions for the chats this model will be used for | set by model uploader |
+    | **Prompt Template** | Format of user <-> assistant interactions for the chats this model will be used for | set by model uploader |
+
+### Clone
+
+You can **clone** an existing model, which allows you to save a configuration of a model file with different prompt templates and sampling settings.
+
+### Sampling Settings
+
+!!! note "Model Sampling Settings"
+
+    | Setting             | Description                          | Default Value |
+    |----------------------------|------------------------------------------|-----------|
+    | **Context Length**         | Maximum length of input sequence in tokens        | 2048      |
+    | **Max Length**             | Maximum length of response in tokens     | 4096      |
+    | **Prompt Batch Size**      | Token batch size for parallel processing | 128      |
+    | **Temperature**            | Lower temperature gives more likely generations | 0.7       |
+    | **Top P**                  | Prevents choosing highly unlikely tokens  | 0.4       |
+    | **Top K**                  | Size of selection pool for tokens         | 40        |
+    | **Min P**                  | Minimum relative probability              | 0         |
+    | **Repeat Penalty Tokens**  | Length to apply penalty                   | 64        |
+    | **Repeat Penalty**         | Penalize repetitiveness                   | 1.18      |
+    | **GPU Layers**             | How many model layers to load into VRAM     | 32        |
+
+## LocalDocs Settings
+
+!!! note "General LocalDocs Settings"
+
+    | Setting | Description | Default Value |
+    | --- | --- | --- |
+    | **Allowed File Extensions** | Choose which file types will be indexed into LocalDocs collections as text snippets with embedding vectors | `.txt`, `.pdf`, `.md`, `.rst` |
+    | **Use Nomic Embed API** | Use Nomic API to create LocalDocs collections fast and off-device; [Nomic API Key](https://atlas.nomic.ai/) required | Off |
+    | **Embeddings Device** | Device that will run embedding models. Options are `Auto` (GPT4All chooses), `Metal` (Apple Silicon M1+), `CPU`, and `GPU` | `Auto` |
+    | **Show Sources** | Titles of source files retrieved by LocalDocs will be displayed directly in your chats.| On |
+
+!!! note "Advanced LocalDocs Settings"
+
+    Note that increasing these settings can increase the likelihood of factual responses, but may result in slower generation times.
+
+    | Setting | Description | Default Value |
+    | --- | --- | --- |
+    | **Document Snippet Size** | Number of string characters per document snippet | 512 |
+    | **Maximum Document Snippets Per Prompt** | Upper limit for the number of snippets from your files LocalDocs can retrieve for LLM context | 3 |
--- a/gpt4all-bindings/python/docs/gpt4all_help/faq.md
+++ b/gpt4all-bindings/python/docs/gpt4all_help/faq.md
@ -0,0 +1,49 @@
+# Frequently Asked Questions
+
+## Models
+
+### Which language models are supported?
+
+Our backend supports models with a `llama.cpp` implementation which have been uploaded to [HuggingFace](https://huggingface.co/).
+
+### Which embedding models are supported?
+
+The following embedding models can be used within the application and with the `Embed4All` class from the `gpt4all` Python library. The default context length as GGUF files is 2048 but can be [extended](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF#description).
+
+| Name               | Initializing with `Embed4All`                            | Context Length | Embedding Length | File Size |
+|--------------------|------------------------------------------------------|---------------:|-----------------:|----------:|
+| [SBert](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)| ```pythonemb = Embed4All("all-MiniLM-L6-v2.gguf2.f16.gguf")```|            512 |              384 |    44 MiB |
+| [Nomic Embed v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1-GGUF)   | nomic&#x2011;embed&#x2011;text&#x2011;v1.f16.gguf|           2048 |              768 |   262 MiB |
+| [Nomic Embed v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF) | nomic&#x2011;embed&#x2011;text&#x2011;v1.5.f16.gguf|           2048 |           64-768 |   262 MiB |
+
+## Software
+
+### What software do I need?
+
+All you need is to [install GPT4all](../index.md) onto you Windows, Mac, or Linux computer.
+
+### Which SDK languages are supported?
+
+Our SDK is in Python for usability, but these are light bindings around [`llama.cpp`](https://github.com/ggerganov/llama.cpp) implementations that we contribute to for efficiency and accessibility on everyday computers.
+
+### Is there an API?
+
+Yes, you can run your model in server-mode with our [OpenAI-compatible API](https://platform.openai.com/docs/api-reference/completions), which you can configure in [settings](../gpt4all_desktop/settings.md#application-settings)
+
+### Can I monitor a GPT4All deployment?
+
+Yes, GPT4All [integrates](../gpt4all_python/monitoring.md) with [OpenLIT](https://github.com/openlit/openlit) so you can deploy LLMs with user interactions and hardware usage automatically monitored for full observability.
+
+### Is there a command line interface (CLI)?
+
+[Yes](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/cli), we have a lightweight use of the Python client as a CLI. We welcome further contributions!
+
+## Hardware
+
+### What hardware do I need?
+
+GPT4All can run on CPU, Metal (Apple Silicon M1+), and GPU.
+
+### What are the system requirements?
+
+Your CPU needs to support [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) and you need enough RAM to load a model into memory. If the 
--- a/gpt4all-bindings/python/docs/gpt4all_help/troubleshooting.md
+++ b/gpt4all-bindings/python/docs/gpt4all_help/troubleshooting.md
@ -0,0 +1,27 @@
+# Troubleshooting
+
+## Error Loading Models
+
+It is possible you are trying to load a model from HuggingFace whose weights are not compatible with our [backend](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings).
+
+Try downloading one of the officially supported models mentioned our [website](https://gpt4all.io/). If the problem persists, please share your experience on our [Discord](https://discord.com/channels/1076964370942267462).
+
+## Bad Responses 
+
+Try the [example chats](../gpt4all_desktop/chats.md) to double check that your system is implementing models correctly.
+
+### Responses Incoherent
+
+If you are seeing something **not at all** resembling the [example chats](../gpt4all_desktop/chats.md) - for example, if the responses you are seeing look nonsensical - try [downloading a different model](../gpt4all_desktop/models.md), and please share your experience on our [Discord](https://discord.com/channels/1076964370942267462).
+
+### Responses Incorrect
+
+LLMs can be unreliable. It's helpful to know what their training data was - they are less likely to be correct when asking about data they were not trained on unless you give the necessary information in the prompt as **context**.
+
+Giving LLMs additional context, like chatting using [LocalDocs](../gpt4all_desktop/localdocs.md), can help merge the language model's ability to understand text with the files that you trust to contain the information you need. 
+
+Including information in a prompt is not a guarantee that it will be used correctly, but the more clear and concise your prompts, and the more relevant your prompts are to your files, the better.
+
+### LocalDocs Issues
+
+Occasionally a model - particularly a smaller or overall weaker LLM - may not use the relevant text snippets from the files that were referenced via LocalDocs. If you are seeing this, it can help to use phrases like "in the docs" or "from the provided files" when prompting your model.
--- a/gpt4all-bindings/python/docs/gpt4all_python/home.md
+++ b/gpt4all-bindings/python/docs/gpt4all_python/home.md
@ -0,0 +1,152 @@
+# GPT4All Python SDK
+
+## Installation
+
+To get started, pip-install the `gpt4all` package into your python environment.
+
+```bash
+pip install gpt4all
+```
+
+We recommend installing `gpt4all` into its own virtual environment using `venv` or `conda`
+
+## Load LLM
+
+Models are loaded by name via the `GPT4All` class. If it's your first time loading a model, it will be downloaded to your device and saved so it can be quickly reloaded next time you create a `GPT4All` model with the same name.
+
+!!! note "Load LLM"
+
+    ```python
+    from gpt4all import GPT4All
+    model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
+    with model.chat_session():
+        print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))
+    ```
+
+## Chat Session Generation
+
+Most of the language models you will be able to access from HuggingFace have been trained as assistants. This guides language models to not just answer with relevant text, but *helpful* text.
+
+If you want your LLM's responses to be helpful in the typical sense, we recommend you apply the chat templates the models were finetuned with. Information about specific prompt templates is typically available on the official HuggingFace page for the model.
+
+!!! note "Example LLM Chat Session Generation"
+
+    === "Code"
+
+        Load `Llama 3` and enter the following prompt in a chat session:
+
+        ```python
+        from gpt4all import GPT4All
+        model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
+        with model.chat_session():
+            print(model.generate("quadratic formula"))
+        ```
+
+    === "Output"
+   
+        With the default sampling settings, you should see something resembling the following:
+        ```
+        The quadratic formula!
+
+        The quadratic formula is a mathematical formula that provides the solutions to a quadratic equation of the form:
+
+        ax^2 + bx + c = 0
+
+        where a, b, and c are constants. The formula is:
+
+        x = (-b ± √(b^2 - 4ac)) / 2a
+
+        Let's break it down:
+
+        * x is the variable we're trying to solve for.
+        * a, b, and c are the coefficients of the quadratic equation.
+        * ± means "plus or minus".
+        * √ denotes the square root.
+
+        To use the formula, simply plug in the values of a, b, and c into the expression above. The resulting value(s) will be the solutions to the original quadratic equation!
+
+        For example, let's say we have the quadratic equation:
+
+        x^2 + 5x + 6 = 0
+
+        We can plug these values into the formula as follows:
+
+        a = 1
+        b = 5
+        ```
+
+## Example Models
+
+| Model| Filesize| RAM Required| Parameters| Developer| License| MD5 Sum (Unique Hash)|
+|------|---------|-------------|-----------|----------|--------|----------------------|
+| `Meta-Llama-3-8B-Instruct.Q4_0.gguf`  | 4.66 GB| 8 GB| 8 Billion| Meta| [Llama 3 License](https://llama.meta.com/llama3/license/)| c87ad09e1e4c8f9c35a5fcef52b6f1c9|
+| Nous Hermes 2 Mistral DPO| 4.21 GB| 8 GB| 7 Billion| Mistral & Nous Research | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)| Coa5f6b4eabd3992da4d7fb7f020f921eb|
+| Phi-3 Mini Instruct | 2.03 GB| 4 GB| 4 billion| Microsoft| [MIT](https://opensource.org/license/mit)| f8347badde9bfc2efbe89124d78ddaf5|
+| Mini Orca (Small)| 1.84 GB| 4 GB| 3 billion| Microsoft | [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0)| 0e769317b90ac30d6e09486d61fefa26|
+| GPT4All Snoozy| 7.36 GB| 16 GB| 13 billion| Nomic AI| [GPL](https://www.gnu.org/licenses/gpl-3.0.en.html)| 40388eb2f8d16bb5d08c96fdfaac6b2c|
+
+## Direct Generation
+
+Directly calling `model.generate()` prompts the model without applying any templates. 
+
+Note: this can result in responses that are less like helpful responses and more like mirroring the tone of your prompt. In general, a language model outside of a chat session is less of a helpful assistant and more of a lens into the distribution of the model's training data.
+
+As an example, see how the model's response changes when we give the same prompt as above without applying a chat session:
+
+!!! note "Example LLM Direct Generation"
+
+    === "Code"
+    
+        Load `Llama 3` and enter the following prompt:
+
+        ```python
+        from gpt4all import GPT4All
+        model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
+        print(model.generate("quadratic formula"))
+        ```
+    === "Output"
+
+        With the default sampling settings, you should see something resembling the following:
+        ```
+        . The equation is in the form of a + bx = c, where a and b are constants.
+        The solution to this problem involves using the quadratic formula which states that for any quadratic equation ax^2+bx+c=0, its solutions can be found by:
+        x = (-b ± √(b^2-4ac)) / 2a
+        In your case, since you have a + bx = c, we need to rewrite it in the form of ax^2+bx+c=0. To do this, subtract both sides from c, so that:
+        c - (a + bx) = 0
+        Now, combine like terms on the left side and simplify:
+        ax^2 + (-b)x + (c-a) = 0\n\nSo now we have a quadratic equation in standard form: ax^2+bx+c=0. We can use this to find its solutions using the quadratic formula:
+        
+        x = ((-b ± √((-b)^2
+        ```
+
+Why did it respond differently? Because language models, before being fine-tuned as assistants, are trained to be more like a data mimic than a helpful assistant. Therefore our responses ends up more like a typical continuation of math-style text rather than a helpful answer in dialog. 
+
+## Embeddings
+
+Nomic trains and open-sources free embedding models that will run very fast on your hardware.
+
+The easiest way to run the text embedding model locally uses the [`nomic`](https://github.com/nomic-ai/nomic) python library to interface with our fast [C/C++ implementations](ref.md#gpt4all.gpt4all.Embed4All).
+
+!!! note "Example Embeddings Generation"
+
+    === "Code"
+
+        Importing `embed` from the [`nomic`](https://github.com/nomic-ai/nomic) library, you can call `embed.text()` with `inference_mode="local"`. This downloads an embedding model and saves it for later.
+
+        ```python
+        from nomic import embed
+        embeddings = embed.text(["String 1", "String 2"], inference_mode="local")['embeddings']
+        print("Number of embeddings created:", len(embeddings))
+        print("Number of dimensions per embedding:", len(embeddings[0]))
+        ```
+    
+    === "Output"
+
+        ```
+        Number of embeddings created: 2
+        Number of dimensions per embedding: 768
+        ```
+
+![Nomic embed text local inference](../assets/local_embed.gif)
+
+To learn more about making embeddings locally with `nomic`, visit our [embeddings guide](https://docs.nomic.ai/atlas/guides/embeddings#local-inference).
--- a/gpt4all-bindings/python/docs/gpt4all_python/monitoring.md
+++ b/gpt4all-bindings/python/docs/gpt4all_python/monitoring.md
@ -0,0 +1,47 @@
+# GPT4All Monitoring
+
+GPT4All integrates with [OpenLIT](https://github.com/openlit/openlit) open telemetry instrumentation to perform real-time monitoring of your LLM application and hardware.
+
+Monitoring can enhance your GPT4All deployment with auto-generated traces for
+
+- performance metrics
+
+- user interactions
+
+- GPU metrics like utilization, memory, temperature, power usage
+
+## Setup Monitoring
+
+!!! note "Setup Monitoring"
+
+    With [OpenLIT](https://github.com/openlit/openlit), you can automatically monitor metrics for your LLM deployment:
+
+    ```shell
+    pip install openlit
+    ```
+
+    ```python
+    from gpt4all import GPT4All
+    import openlit
+
+    openlit.init()  # start
+    # openlit.init(collect_gpu_stats=True)  # or, start with optional GPU monitoring
+
+    model = GPT4All(model_name='orca-mini-3b-gguf2-q4_0.gguf')
+
+    # Start a chat session and send queries
+    with model.chat_session():
+        response1 = model.generate(prompt='hello', temp=0)
+        response2 = model.generate(prompt='write me a short poem', temp=0)
+        response3 = model.generate(prompt='thank you', temp=0)
+
+        print(model.current_chat_session)
+    ```
+
+## OpenLIT UI
+
+Connect to OpenLIT's UI to start exploring performance metrics. Visit the OpenLIT [Quickstart Guide](https://docs.openlit.io/latest/quickstart) for step-by-step details.
+
+## Grafana, DataDog, & Other Integrations
+
+If you use tools like , you can integrate the data collected by OpenLIT. For instructions on setting up these connections, check the OpenLIT [Connections Guide](https://docs.openlit.io/latest/connections/intro).
--- a/gpt4all-bindings/python/docs/gpt4all_python/ref.md
+++ b/gpt4all-bindings/python/docs/gpt4all_python/ref.md
@ -0,0 +1,4 @@
+# GPT4All Python SDK Reference
+::: gpt4all.gpt4all.GPT4All
+
+::: gpt4all.gpt4all.Embed4All
--- a/gpt4all-bindings/python/docs/index.md
+++ b/gpt4all-bindings/python/docs/index.md
@ -1,66 +1,28 @@
-# GPT4All
-Welcome to the GPT4All technical documentation.
+# GPT4All Documentation

-GPT4All is an open-source software ecosystem that allows anyone to train and deploy **powerful** and **customized** large language models (LLMs) on **everyday hardware**.
-Nomic AI oversees contributions to the open-source ecosystem ensuring quality, security and maintainability.
+GPT4All runs large language models (LLMs) privately on everyday desktops & laptops. 

-GPT4All software is optimized to run inference of 3-13 billion parameter large language models on the CPUs of laptops, desktops and servers.
+No API calls or GPUs required - you can just download the application and [get started](gpt4all_desktop/quickstart.md#quickstart).

-=== "GPT4All Example"
-    ``` py
+!!! note "Desktop Application"
+    GPT4All runs LLMs as an application on your computer. Nomic's embedding models can bring information from your local documents and files into your chats. It's fast, on-device, and completely **private**.
+
+    <div style="text-align: center; margin-top: 20px;">
+        [Download for Windows](https://gpt4all.io/installers/gpt4all-installer-win64.exe) &nbsp;&nbsp;&nbsp;&nbsp;
+        [Download for Mac](https://gpt4all.io/installers/gpt4all-installer-darwin.dmg) &nbsp;&nbsp;&nbsp;&nbsp;
+        [Download for Linux](https://gpt4all.io/installers/gpt4all-installer-linux.run)
+    </div>
+
+!!! note "Python SDK"
+    Use GPT4All in Python to program with LLMs implemented with the [`llama.cpp`](https://github.com/ggerganov/llama.cpp) backend and [Nomic's C backend](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-backend). Nomic contributes to open source software like [`llama.cpp`](https://github.com/ggerganov/llama.cpp) to make LLMs accessible and efficient **for all**.
+
+    ```bash
+    pip install gpt4all
+    ```
+
+    ```python
    from gpt4all import GPT4All
-    model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
-    output = model.generate("The capital of France is ", max_tokens=3)
-    print(output)
+    model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
+    with model.chat_session():
+        print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))
    ```
-=== "Output"
-    ```
-    1. Paris
-    ```
-See [Python Bindings](gpt4all_python.md) to use GPT4All.
-
-### Navigating the Documentation
-In an effort to ensure cross-operating-system and cross-language compatibility, the [GPT4All software ecosystem](https://github.com/nomic-ai/gpt4all)
-is organized as a monorepo with the following structure:
-
- **gpt4all-backend**: The GPT4All backend maintains and exposes a universal, performance optimized C API for running inference with multi-billion parameter Transformer Decoders.
-This C API is then bound to any higher level programming language such as C++, Python, Go, etc.
- **gpt4all-bindings**: GPT4All bindings contain a variety of high-level programming languages that implement the C API. Each directory is a bound programming language. The [CLI](gpt4all_cli.md) is included here, as well.
- **gpt4all-chat**: GPT4All Chat is an OS native chat application that runs on macOS, Windows and Linux. It is the easiest way to run local, privacy aware chat assistants on everyday hardware. You can download it on the [GPT4All Website](https://gpt4all.io) and read its source code in the monorepo.
-
-Explore detailed documentation for the backend, bindings and chat client in the sidebar.
-## Models
-The GPT4All software ecosystem is compatible with the following Transformer architectures:
-
- `Falcon`
- `LLaMA` (including `OpenLLaMA`)
- `MPT` (including `Replit`)
- `GPT-J`
-
-You can find an exhaustive list of supported models on the [website](https://gpt4all.io) or in the [models directory](https://raw.githubusercontent.com/nomic-ai/gpt4all/main/gpt4all-chat/metadata/models3.json)
-
-
-GPT4All models are artifacts produced through a process known as neural network quantization.
-A multi-billion parameter Transformer Decoder usually takes 30+ GB of VRAM to execute a forward pass.
-Most people do not have such a powerful computer or access to GPU hardware. By running trained LLMs through quantization algorithms, 
-some GPT4All models can run on your laptop using only 4-8GB of RAM enabling their wide-spread usage.
-Bigger models might still require more RAM, however.
-
-Any model trained with one of these architectures can be quantized and run locally with all GPT4All bindings and in the
-chat client. You can add new variants by contributing to the gpt4all-backend.
-
-## Frequently Asked Questions
-Find answers to frequently asked questions by searching the [Github issues](https://github.com/nomic-ai/gpt4all/issues) or in the [documentation FAQ](gpt4all_faq.md).
-
-## Getting the most of your local LLM
-
-**Inference Speed**
-of a local LLM depends on two factors: model size and the number of tokens given as input. 
-It is not advised to prompt local LLMs with large chunks of context as their inference speed will heavily degrade.
-You will likely want to run GPT4All models on GPU if you would like to utilize context windows larger than 750 tokens. Native GPU support for GPT4All models is planned.
-
-**Inference Performance:**
-Which model is best? That question depends on your use-case. The ability of an LLM to faithfully follow instructions is conditioned
-on the quantity and diversity of the pre-training data it trained on and the diversity, quality and factuality of the data the LLM
-was fine-tuned on. A goal of GPT4All is to bring the most powerful local assistant model to your desktop and Nomic AI is actively
-working on efforts to improve their performance and quality.
--- a/gpt4all-bindings/python/docs/old/gpt4all_chat.md
+++ b/gpt4all-bindings/python/docs/old/gpt4all_chat.md
--- a/gpt4all-bindings/python/docs/old/gpt4all_cli.md
+++ b/gpt4all-bindings/python/docs/old/gpt4all_cli.md
--- a/gpt4all-bindings/python/docs/old/gpt4all_faq.md
+++ b/gpt4all-bindings/python/docs/old/gpt4all_faq.md
--- a/gpt4all-bindings/python/docs/old/gpt4all_monitoring.md
+++ b/gpt4all-bindings/python/docs/old/gpt4all_monitoring.md
--- a/gpt4all-bindings/python/docs/old/gpt4all_nodejs.md
+++ b/gpt4all-bindings/python/docs/old/gpt4all_nodejs.md
--- a/gpt4all-bindings/python/docs/old/gpt4all_python.md
+++ b/gpt4all-bindings/python/docs/old/gpt4all_python.md
--- a/gpt4all-bindings/python/docs/old/gpt4all_python_embedding.md
+++ b/gpt4all-bindings/python/docs/old/gpt4all_python_embedding.md
--- a/gpt4all-bindings/python/docs/old/index.md
+++ b/gpt4all-bindings/python/docs/old/index.md
@ -0,0 +1,71 @@
+# GPT4All
+Welcome to the GPT4All documentation LOCAL EDIT
+
+GPT4All is an open-source software ecosystem for anyone to run large language models (LLMs) **privately** on **everyday laptop & desktop computers**. No API calls or GPUs required.
+
+The GPT4All Desktop Application is a touchpoint to interact with LLMs and integrate them with your local docs & local data for RAG (retrieval-augmented generation). No coding is required, just install the application, download the models of your choice, and you are ready to use your LLM.
+
+Your local data is **yours**. GPT4All handles the retrieval privately and on-device to fetch relevant data to support your queries to your LLM.
+
+Nomic AI oversees contributions to GPT4All to ensure quality, security, and maintainability. Additionally, Nomic AI has open-sourced code for training and deploying your own customized LLMs internally.
+
+GPT4All software is optimized to run inference of 3-13 billion parameter large language models on the CPUs of laptops, desktops and servers.
+
+=== "GPT4All Example"
+    ``` py
+    from gpt4all import GPT4All
+    model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
+    output = model.generate("The capital of France is ", max_tokens=3)
+    print(output)
+    ```
+=== "Output"
+    ```
+    1. Paris
+    ```
+See [Python Bindings](gpt4all_python.md) to use GPT4All.
+
+### Navigating the Documentation
+In an effort to ensure cross-operating-system and cross-language compatibility, the [GPT4All software ecosystem](https://github.com/nomic-ai/gpt4all)
+is organized as a monorepo with the following structure:
+
+- **gpt4all-backend**: The GPT4All backend maintains and exposes a universal, performance optimized C API for running inference with multi-billion parameter Transformer Decoders.
+This C API is then bound to any higher level programming language such as C++, Python, Go, etc.
+- **gpt4all-bindings**: GPT4All bindings contain a variety of high-level programming languages that implement the C API. Each directory is a bound programming language. The [CLI](gpt4all_cli.md) is included here, as well.
+- **gpt4all-chat**: GPT4All Chat is an OS native chat application that runs on macOS, Windows and Linux. It is the easiest way to run local, privacy aware chat assistants on everyday hardware. You can download it on the [GPT4All Website](https://gpt4all.io) and read its source code in the monorepo.
+
+Explore detailed documentation for the backend, bindings and chat client in the sidebar.
+## Models
+The GPT4All software ecosystem is compatible with the following Transformer architectures:
+
+- `Falcon`
+- `LLaMA` (including `OpenLLaMA`)
+- `MPT` (including `Replit`)
+- `GPT-J`
+
+You can find an exhaustive list of supported models on the [website](https://gpt4all.io) or in the [models directory](https://raw.githubusercontent.com/nomic-ai/gpt4all/main/gpt4all-chat/metadata/models3.json)
+
+
+GPT4All models are artifacts produced through a process known as neural network quantization.
+A multi-billion parameter Transformer Decoder usually takes 30+ GB of VRAM to execute a forward pass.
+Most people do not have such a powerful computer or access to GPU hardware. By running trained LLMs through quantization algorithms, 
+some GPT4All models can run on your laptop using only 4-8GB of RAM enabling their wide-spread usage.
+Bigger models might still require more RAM, however.
+
+Any model trained with one of these architectures can be quantized and run locally with all GPT4All bindings and in the
+chat client. You can add new variants by contributing to the gpt4all-backend.
+
+## Frequently Asked Questions
+Find answers to frequently asked questions by searching the [Github issues](https://github.com/nomic-ai/gpt4all/issues) or in the [documentation FAQ](gpt4all_faq.md).
+
+## Getting the most of your local LLM
+
+**Inference Speed**
+of a local LLM depends on two factors: model size and the number of tokens given as input. 
+It is not advised to prompt local LLMs with large chunks of context as their inference speed will heavily degrade.
+You will likely want to run GPT4All models on GPU if you would like to utilize context windows larger than 750 tokens. Native GPU support for GPT4All models is planned.
+
+**Inference Performance:**
+Which model is best? That question depends on your use-case. The ability of an LLM to faithfully follow instructions is conditioned
+on the quantity and diversity of the pre-training data it trained on and the diversity, quality and factuality of the data the LLM
+was fine-tuned on. A goal of GPT4All is to bring the most powerful local assistant model to your desktop and Nomic AI is actively
+working on efforts to improve their performance and quality.
--- a/gpt4all-bindings/python/mkdocs.yml
+++ b/gpt4all-bindings/python/mkdocs.yml
@ -1,4 +1,4 @@
-site_name: GPT4All Documentation
+site_name: GPT4All
 repo_url: https://github.com/nomic-ai/gpt4all
 repo_name: nomic-ai/gpt4all
 site_url: https://docs.gpt4all.io
@ -8,17 +8,20 @@ copyright: Copyright &copy; 2023 Nomic, Inc
 use_directory_urls: false

 nav:
-    - 'index.md'
-    - 'GPT4All Chat Client': 'gpt4all_chat.md'
-    - 'Bindings':
-      - 'GPT4All in Python':
-        - 'Generation': 'gpt4all_python.md'
-        - 'Embedding': 'gpt4all_python_embedding.md'
-        - 'Monitoring with OpenLIT': 'gpt4all_monitoring.md'
-      - 'GPT4ALL in NodeJs': 'gpt4all_nodejs.md'
-      - 'gpt4all_cli.md'
-    - 'Wiki':
-      - 'gpt4all_faq.md'
+    - 'index.md' 
+    - 'Quickstart' : 'gpt4all_desktop/quickstart.md'
+    - 'Chats' : 'gpt4all_desktop/chats.md'
+    - 'Models' : 'gpt4all_desktop/models.md'
+    - 'LocalDocs' : 'gpt4all_desktop/localdocs.md'
+    - 'Settings' : 'gpt4all_desktop/settings.md'
+    - 'Python SDK':
+      - 'gpt4all_python/home.md'
+      - 'Monitoring': 'gpt4all_python/monitoring.md'
+      - 'SDK Reference': 'gpt4all_python/ref.md'
+    - 'Help':
+      - 'FAQ': 'gpt4all_help/faq.md'
+      - 'Troubleshooting': 'gpt4all_help/troubleshooting.md'
+

 theme:
  name: material
@ -27,6 +30,7 @@ theme:
  logo: assets/nomic.png
  favicon: assets/favicon.ico
  features:
+    - content.code.copy
    - navigation.instant
    - navigation.tracking
    - navigation.sections
@ -52,11 +56,13 @@ markdown_extensions:
  - admonition
  - codehilite:
      css_class: highlight
+  - markdown_captions

 extra_css:
    - css/custom.css

 plugins:
+  - search
  - mkdocstrings:
      handlers:
        python:
@ -65,8 +71,11 @@ plugins:
            heading_level: 4
            show_root_full_path: false
            docstring_section_style: list
-  #- material/social:
-  #    cards_font: Roboto
+  - material/social:
+      cards_layout_options:
+          font_family: Roboto
+          description: GPT4All runs LLMs efficiently on your hardware
+

  #- mkdocs-jupyter:
  #    ignore_h1_titles: True
--- a/gpt4all-bindings/python/setup.py
+++ b/gpt4all-bindings/python/setup.py
@ -68,7 +68,7 @@ def get_long_description():

 setup(
    name=package_name,
-    version="2.7.0",
+    version="3.0.0",
    description="Python bindings for GPT4All",
    long_description=get_long_description(),
    long_description_content_type="text/markdown",
@ -107,6 +107,7 @@ setup(
            'wheel',
            'setuptools',
            'mkdocs-material',
+            'mkdocs-material[imaging]',
            'mkautodoc',
            'mkdocstrings[python]',
            'mkdocs-jupyter',
--- a/roadmap.md
+++ b/roadmap.md
@ -0,0 +1,51 @@
+
+# GPT4All 2024 Roadmap
+To contribute to the development of any of the below roadmap items, make or find the corresponding issue and cross-reference the [in-progress task](https://github.com/orgs/nomic-ai/projects/2/views/1).
+
+Each item should have an issue link below.
+
+- Chat UI Language Localization (localize UI into the native languages of users)
+    - [ ] Chinese
+    - [ ] German
+    - [ ] French
+    - [ ] Portuguese
+    - [ ] Your native language here. 
+- UI Redesign: an internal effort at Nomic to improve the UI/UX of gpt4all for all users.
+    - [ ] Design new user interface and gather community feedback
+    - [ ] Implement the new user interface and experience.
+- Installer and Update Improvements
+    - [ ] Seamless native installation and update process on OSX
+    - [ ] Seamless native installation and update process on Windows
+    - [ ] Seamless native installation and update process on Linux
+- Model discoverability improvements:
+    - [x] Support huggingface model discoverability
+    - [ ] Support Nomic hosted model discoverability
+- LocalDocs (towards a local perplexity)
+    - Multilingual LocalDocs Support
+        - [ ] Create a multilingual experience
+        - [ ] Incorporate a multilingual embedding model
+        - [ ] Specify a preferred multilingual LLM for localdocs
+    - Improved RAG techniques
+        - [ ] Query augmentation and re-writing
+        - [ ] Improved chunking and text extraction from arbitrary modalities
+            - [ ] Custom PDF extractor past the QT default (charts, tables, text)
+        - [ ] Faster indexing and local exact search with v1.5 hamming embeddings and reranking (skip ANN index construction!)
+    - Support queries like 'summarize X document'
+    - Multimodal LocalDocs support with Nomic Embed
+    - Nomic Dataset Integration with real-time LocalDocs
+        - [ ] Include an option to allow the export of private LocalDocs collections to Nomic Atlas for debugging data/chat quality
+        - [ ] Allow optional sharing of LocalDocs collections between users.
+        - [ ] Allow the import of a LocalDocs collection from an Atlas Datasets
+            - Chat with live version of Wikipedia, Chat with Pubmed, chat with the latest snapshot of world news.
+- First class Multilingual LLM Support
+    - [ ] Recommend and set a default LLM for German
+    - [ ] Recommend and set a default LLM for English
+    - [ ] Recommend and set a default LLM for Chinese
+    - [ ] Recommend and set a default LLM for Spanish
+
+- Server Mode improvements
+    - Improved UI and new requested features:
+        - [ ] Fix outstanding bugs and feature requests around networking configurations.
+        - [ ] Support Nomic Embed inferencing
+        - [ ] First class documentation
+        - [ ] Improving developer use and quality of server mode (e.g. support larger batches)