AI/gpt4all

mirror of https://github.com/nomic-ai/gpt4all.git synced 2024-10-01 01:06:10 -04:00

gpt4all: an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue

Go to file

Richard Guo 7beb082673 contributing and readme		2023-05-11 12:31:08 -04:00
.github	contributing and readme	2023-05-11 12:31:08 -04:00
gpt4all-api	mono repo structure	2023-05-01 15:45:23 -04:00
gpt4all-backend	Move the llmodel C API to new top-level directory and version it.	2023-05-10 11:46:40 -04:00
gpt4all-bindings	rough draft of monorepo plan	2023-05-01 15:45:39 -04:00
gpt4all-chat	Update README.md	2023-05-10 12:10:33 -04:00
gpt4all-docker	mono repo structure	2023-05-01 15:45:23 -04:00
gpt4all-training	contributing and readme	2023-05-11 12:31:08 -04:00
.gitignore	Fix ignore for build dirs.	2023-05-10 10:51:47 -04:00
.gitmodules	Move the llmodel C API to new top-level directory and version it.	2023-05-10 11:46:40 -04:00
CONTRIBUTING.md	contributing and readme	2023-05-11 12:31:08 -04:00
gpt4all-lora-demo.gif	GIF	2023-03-28 15:54:44 -04:00
LICENSE.txt	Add MIT license.	2023-04-06 11:28:59 -04:00
monorepo_plan.md	Update monorepo_plan.md	2023-05-05 09:32:45 -04:00
README.md	contributing and readme	2023-05-11 12:31:08 -04:00

README.md

GPT4All

Open-source assistant-style large language models that run locally on CPU

GPT4All Website

Discord

📗 Technical Report 3: GPT4All Snoozy and Groovy

📗 Technical Report 2: GPT4All-J

📗 Technical Report 1: GPT4All

🐍 Official Python Bindings

💻 Official Typescript Bindings

💬 Official Chat Interface

💬 Official Web Chat Interface

🦜️🔗 Official Langchain Backend

GPT4All is made possible by our compute partner Paperspace.

GPT4All: An ecosystem of open-source on-edge large language models.

Run on an M1 Mac (not sped up!)

Contributing

GPT4All welcomes contribution, involvment, and discussion from the open source community! Please see CONTRIBUTING.md and follow the issue, bug report, and PR markdown templates.

Note: Please make sure to tag all of the above with relevant project identifiers

Chat Client

Run any GPT4All model natively on your home desktop with the auto-updating desktop chat client. See website for exaustive list of models.

GPT4All Website

Direct Installer Links:

Mac/OSX

Windows

Ubuntu

If you have older hardware that only supports avx and not avx2 you can use these.

Mac/OSX - avx-only

Windows - avx-only

Ubuntu - avx-only

Find the most up-to-date information on the GPT4All Website

Python Bindings

pip install gpt4all

import gpt4all
gptj = gpt4all.GPT4All("ggml-gpt4all-j-v1.3-groovy")
messages = [{"role": "user", "content": "Name 3 colors"}]
gptj.chat_completion(messages)

Training GPT4All-J

Please see GPT4All-J Technical Report for details.

GPT4All-J Training Data

We are releasing the curated training data for anyone to replicate GPT4All-J here: GPT4All-J Training Data
- Atlas Map of Prompts
- Atlas Map of Responses

We have released updated versions of our GPT4All-J model and training data.

v1.0: The original model trained on the v1.0 dataset
v1.1-breezy: Trained on a filtered dataset where we removed all instances of AI language model
v1.2-jazzy: Trained on a filtered dataset where we also removed instances like I'm sorry, I can't answer... and AI language model

The models and data versions can be specified by passing a revision argument.

For example, to load the v1.2-jazzy model and dataset, run:

from datasets import load_dataset
from transformers import AutoModelForCausalLM

dataset = load_dataset("nomic-ai/gpt4all-j-prompt-generations", revision="v1.2-jazzy")
model = AutoModelForCausalLM.from_pretrained("nomic-ai/gpt4all-j-prompt-generations", revision="v1.2-jazzy")

GPT4All-J Training Instructions

accelerate launch --dynamo_backend=inductor --num_processes=8 --num_machines=1 --machine_rank=0 --deepspeed_multinode_launcher standard --mixed_precision=bf16  --use_deepspeed --deepspeed_config_file=configs/deepspeed/ds_config_gptj.json train.py --config configs/train/finetune_gptj.yaml

Citation

If you utilize this repository, models or data in a downstream project, please consider citing it with:

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}

README.md Unescape Escape