AI/gpt4all

mirror of https://github.com/nomic-ai/gpt4all.git synced 2024-10-01 01:06:10 -04:00

Python bindings: Custom callbacks, chat session improvement, refactoring (#1145 )

* Added the following features: \n 1) Now prompt_model uses the positional argument callback to return the response tokens. \n 2) Due to the callback argument of prompt_model, prompt_model_streaming only manages the queue and threading now, which reduces duplication of the code. \n 3) Added optional verbose argument to prompt_model which prints out the prompt that is passed to the model. \n 4) Chat sessions can now have a header, i.e. an instruction before the transcript of the conversation. The header is set at the creation of the chat session context. \n 5) generate function now accepts an optional callback. \n 6) When streaming and using chat session, the user doesn't need to save assistant's messages by himself. This is done automatically.

* added _empty_response_callback so I don't have to check if callback is None

* added docs

* now if the callback stop generation, the last token is ignored

* fixed type hints, reimplemented chat session header as a system prompt, minor refactoring, docs: removed section about manual update of chat session for streaming

* forgot to add some type hints!

* keep the config of the model in GPT4All class which is taken from models.json if the download is allowed

* During chat sessions, the model-specific systemPrompt and promptTemplate are applied.

* implemented the changes

* Fixed typing. Now the user can set a prompt template that will be applied even outside of a chat session. The template can also have multiple placeholders that can be filled by passing a dictionary to the generate function

* reversed some changes concerning the prompt templates and their functionality

* fixed some type hints, changed list[float] to List[Float]

* fixed type hints, changed List[Float] to List[float]

* fix typo in the comment: Pepare => Prepare

---------

Signed-off-by: 385olt <385olt@gmail.com>

2023-07-19 18:36:49 -04:00

2.9 KiB

Raw Blame History

GPT4All Python Generation API

The GPT4All python package provides bindings to our C/C++ model backend libraries. The source code and local build instructions can be found here.

Quickstart

pip install gpt4all

=== "GPT4All Example" py from gpt4all import GPT4All model = GPT4All("orca-mini-3b.ggmlv3.q4_0.bin") output = model.generate("The capital of France is ", max_tokens=3) print(output) === "Output" 1. Paris

Chatting with GPT4All

Local LLMs can be optimized for chat conversions by reusing previous computational history.

Use the GPT4All chat_session context manager to hold chat conversations with the model.

=== "GPT4All Example" py model = GPT4All(model_name='orca-mini-3b.ggmlv3.q4_0.bin') with model.chat_session(): response = model.generate(prompt='hello', top_k=1) response = model.generate(prompt='write me a short poem', top_k=1) response = model.generate(prompt='thank you', top_k=1) print(model.current_chat_session) === "Output" json [ { 'role': 'user', 'content': 'hello' }, { 'role': 'assistant', 'content': 'What is your name?' }, { 'role': 'user', 'content': 'write me a short poem' }, { 'role': 'assistant', 'content': "I would love to help you with that! Here's a short poem I came up with:\nBeneath the autumn leaves,\nThe wind whispers through the trees.\nA gentle breeze, so at ease,\nAs if it were born to play.\nAnd as the sun sets in the sky,\nThe world around us grows still." }, { 'role': 'user', 'content': 'thank you' }, { 'role': 'assistant', 'content': "You're welcome! I hope this poem was helpful or inspiring for you. Let me know if there is anything else I can assist you with." } ] When using GPT4All models in the chat_session context:

The model is given a prompt template which makes it chatty.
Internal K/V caches are preserved from previous conversation history speeding up inference.

Generation Parameters

::: gpt4all.gpt4all.GPT4All.generate

Streaming Generations

To interact with GPT4All responses as the model generates, use the streaming = True flag during generation.

=== "GPT4All Streaming Example" py from gpt4all import GPT4All model = GPT4All("orca-mini-3b.ggmlv3.q4_0.bin") tokens = [] for token in model.generate("The capital of France is", max_tokens=20, streaming=True): tokens.append(token) print(tokens) === "Output" [' Paris', ' is', ' a', ' city', ' that', ' has', ' been', ' a', ' major', ' cultural', ' and', ' economic', ' center', ' for', ' over', ' ', '2', ',', '0', '0']

::: gpt4all.gpt4all.GPT4All

2.9 KiB Raw Blame History