2023-11-06 00:38:29 -05:00
## OpenAI compatible API
2023-05-02 21:49:53 -04:00
2023-11-10 09:56:04 -05:00
The main API for this project is meant to be a drop-in replacement to the OpenAI API, including Chat and Completions endpoints.
* It is 100% offline and private.
* It doesn't create any logs.
* It doesn't connect to OpenAI.
* It doesn't use the openai-python library.
2023-05-02 21:49:53 -04:00
2023-11-06 00:38:29 -05:00
### Starting the API
2023-06-24 21:50:04 -04:00
2023-11-10 09:39:08 -05:00
Add `--api` to your command-line flags.
2023-06-24 21:50:04 -04:00
2023-11-07 10:45:55 -05:00
* To create a public Cloudflare URL, add the `--public-api` flag.
* To listen on your local network, add the `--listen` flag.
2023-11-07 09:22:17 -05:00
* To change the port, which is 5000 by default, use `--api-port 1234` (change 1234 to your desired port number).
2023-11-06 00:38:29 -05:00
* To use SSL, add `--ssl-keyfile key.pem --ssl-certfile cert.pem` . Note that it doesn't work with `--public-api` .
2023-11-07 10:45:55 -05:00
* To use an API key for authentication, add `--api-key yourkey` .
2023-06-24 21:50:04 -04:00
2023-11-06 00:38:29 -05:00
### Examples
2023-09-23 23:58:28 -04:00
2024-08-21 14:33:45 -04:00
For the documentation with all the endpoints, parameters and their types, consult `http://127.0.0.1:5000/docs` or the [typing.py ](https://github.com/oobabooga/text-generation-webui/blob/main/extensions/openai/typing.py ) file.
2023-11-06 08:38:22 -05:00
The official examples in the [OpenAI documentation ](https://platform.openai.com/docs/api-reference ) should also work, and the same parameters apply (although the API here has more optional parameters).
#### Completions
```shell
curl http://127.0.0.1:5000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "This is a cake recipe:\n\n1.",
"max_tokens": 200,
"temperature": 1,
"top_p": 0.9,
"seed": 10
}'
```
#### Chat completions
Works best with instruction-following models. If the "instruction_template" variable is not provided, it will be guessed automatically based on the model name using the regex patterns in `models/config.yaml` .
```shell
curl http://127.0.0.1:5000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"mode": "instruct",
"instruction_template": "Alpaca"
}'
```
#### Chat completions with characters
```shell
curl http://127.0.0.1:5000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Hello! Who are you?"
}
],
"mode": "chat",
"character": "Example"
}'
```
#### SSE streaming
```shell
curl http://127.0.0.1:5000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"mode": "instruct",
"instruction_template": "Alpaca",
"stream": true
}'
```
2023-11-18 21:19:31 -05:00
#### Logits
2024-01-26 09:00:26 -05:00
```shell
2023-11-18 21:19:31 -05:00
curl -k http://127.0.0.1:5000/v1/internal/logits \
-H "Content-Type: application/json" \
-d '{
"prompt": "Who is best, Asuka or Rei? Answer:",
"use_samplers": false
}'
```
#### Logits after sampling parameters
2024-01-26 09:00:26 -05:00
```shell
2023-11-18 21:19:31 -05:00
curl -k http://127.0.0.1:5000/v1/internal/logits \
-H "Content-Type: application/json" \
-d '{
"prompt": "Who is best, Asuka or Rei? Answer:",
"use_samplers": true,
"top_k": 3
}'
```
2024-08-21 14:33:45 -04:00
#### List models
```shell
curl -k http://127.0.0.1:5000/v1/internal/model/list \
-H "Content-Type: application/json"
```
#### Load model
```shell
curl -k http://127.0.0.1:5000/v1/internal/model/load \
-H "Content-Type: application/json" \
-d '{
"model_name": "model_name",
"args": {
"load_in_4bit": true,
"n_gpu_layers": 12
},
"settings": {
"instruction_template": "Alpaca"
}
}'
```
2023-11-06 08:38:22 -05:00
#### Python chat example
```python
import requests
url = "http://127.0.0.1:5000/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
history = []
2023-11-07 10:35:04 -05:00
2023-11-06 08:38:22 -05:00
while True:
user_message = input("> ")
history.append({"role": "user", "content": user_message})
data = {
"mode": "chat",
"character": "Example",
"messages": history
}
response = requests.post(url, headers=headers, json=data, verify=False)
assistant_message = response.json()['choices'][0]['message']['content']
history.append({"role": "assistant", "content": assistant_message})
print(assistant_message)
```
2023-11-07 10:35:04 -05:00
#### Python chat example with streaming
Start the script with `python -u` to see the output in real time.
```python
import requests
import sseclient # pip install sseclient-py
import json
url = "http://127.0.0.1:5000/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
history = []
while True:
user_message = input("> ")
history.append({"role": "user", "content": user_message})
data = {
"mode": "instruct",
"stream": True,
"messages": history
}
stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
client = sseclient.SSEClient(stream_response)
2023-11-07 10:51:41 -05:00
assistant_message = ''
2023-11-07 10:35:04 -05:00
for event in client.events():
payload = json.loads(event.data)
2023-11-07 10:51:41 -05:00
chunk = payload['choices'][0]['message']['content']
assistant_message += chunk
print(chunk, end='')
2023-11-07 10:35:04 -05:00
print()
2023-11-07 10:51:41 -05:00
history.append({"role": "assistant", "content": assistant_message})
2023-11-07 10:35:04 -05:00
```
2023-11-07 10:36:52 -05:00
#### Python completions example with streaming
2023-11-07 10:35:04 -05:00
Start the script with `python -u` to see the output in real time.
```python
import json
import requests
import sseclient # pip install sseclient-py
url = "http://127.0.0.1:5000/v1/completions"
headers = {
"Content-Type": "application/json"
}
data = {
"prompt": "This is a cake recipe:\n\n1.",
"max_tokens": 200,
"temperature": 1,
"top_p": 0.9,
"seed": 10,
"stream": True,
}
stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
client = sseclient.SSEClient(stream_response)
print(data['prompt'], end='')
for event in client.events():
payload = json.loads(event.data)
print(payload['choices'][0]['text'], end='')
print()
```
2023-11-10 09:39:08 -05:00
### Environment variables
2024-06-22 20:40:25 -04:00
The following environment variables can be used (they take precedence over everything else):
2023-11-10 09:39:08 -05:00
| Variable Name | Description | Example Value |
|------------------------|------------------------------------|----------------------------|
| `OPENEDAI_PORT` | Port number | 5000 |
| `OPENEDAI_CERT_PATH` | SSL certificate file path | cert.pem |
| `OPENEDAI_KEY_PATH` | SSL key file path | key.pem |
| `OPENEDAI_DEBUG` | Enable debugging (set to 1) | 1 |
| `SD_WEBUI_URL` | WebUI URL (used by endpoint) | http://127.0.0.1:7861 |
2023-11-10 10:34:27 -05:00
| `OPENEDAI_EMBEDDING_MODEL` | Embedding model (if applicable) | sentence-transformers/all-mpnet-base-v2 |
2023-11-10 09:39:08 -05:00
| `OPENEDAI_EMBEDDING_DEVICE` | Embedding device (if applicable) | cuda |
#### Persistent settings with `settings.yaml`
You can also set the following variables in your `settings.yaml` file:
```
openai-embedding_device: cuda
2023-11-10 10:34:27 -05:00
openai-embedding_model: "sentence-transformers/all-mpnet-base-v2"
2023-11-10 09:39:08 -05:00
openai-sd_webui_url: http://127.0.0.1:7861
openai-debug: 1
```
2023-11-07 10:38:39 -05:00
### Third-party application setup
2023-06-24 21:50:04 -04:00
2023-11-06 00:38:29 -05:00
You can usually force an application that uses the OpenAI API to connect to the local API by using the following environment variables:
2023-06-06 00:43:04 -04:00
2023-11-06 00:38:29 -05:00
```shell
OPENAI_API_HOST=http://127.0.0.1:5000
2023-06-06 00:43:04 -04:00
```
2023-06-24 21:50:04 -04:00
2023-11-06 00:38:29 -05:00
or
2023-09-16 11:04:04 -04:00
2023-11-06 00:38:29 -05:00
```shell
OPENAI_API_KEY=sk-111111111111111111111111111111111111111111111111
2023-11-07 10:38:39 -05:00
OPENAI_API_BASE=http://127.0.0.1:5000/v1
2023-06-24 21:50:04 -04:00
```
2023-05-02 21:49:53 -04:00
2023-11-07 10:38:39 -05:00
With the [official python openai client ](https://github.com/openai/openai-python ), the address can be set like this:
2023-05-02 21:49:53 -04:00
2023-11-07 10:59:40 -05:00
```python
2023-11-07 10:38:39 -05:00
import openai
2023-05-02 21:49:53 -04:00
2023-11-07 10:38:39 -05:00
openai.api_key = "..."
openai.api_base = "http://127.0.0.1:5000/v1"
openai.api_version = "2023-05-15"
```
2023-05-02 21:49:53 -04:00
2023-09-16 11:04:04 -04:00
If using .env files to save the `OPENAI_API_BASE` and `OPENAI_API_KEY` variables, make sure the .env file is loaded before the openai module is imported:
2023-05-02 21:49:53 -04:00
2023-07-24 10:28:12 -04:00
```python
2023-05-02 21:49:53 -04:00
from dotenv import load_dotenv
2023-07-24 10:28:12 -04:00
load_dotenv() # make sure the environment variables are set before import
2023-05-02 21:49:53 -04:00
import openai
```
With the [official Node.js openai client ](https://github.com/openai/openai-node ) it is slightly more more complex because the environment variables are not used by default, so small source code changes may be required to use the environment variables, like so:
2023-07-24 10:28:12 -04:00
```js
2023-09-16 11:04:04 -04:00
const openai = OpenAI(
Configuration({
apiKey: process.env.OPENAI_API_KEY,
basePath: process.env.OPENAI_API_BASE
})
);
2023-05-02 21:49:53 -04:00
```
For apps made with the [chatgpt-api Node.js client library ](https://github.com/transitive-bullshit/chatgpt-api ):
2023-07-24 10:28:12 -04:00
```js
2023-05-02 21:49:53 -04:00
const api = new ChatGPTAPI({
apiKey: process.env.OPENAI_API_KEY,
2023-09-16 11:04:04 -04:00
apiBaseUrl: process.env.OPENAI_API_BASE
});
2023-05-02 21:49:53 -04:00
```
2023-11-06 00:38:29 -05:00
### Embeddings (alpha)
2023-05-02 21:49:53 -04:00
2023-11-06 00:38:29 -05:00
Embeddings requires `sentence-transformers` installed, but chat and completions will function without it loaded. The embeddings endpoint is currently using the HuggingFace model: `sentence-transformers/all-mpnet-base-v2` for embeddings. This produces 768 dimensional embeddings (the same as the text-davinci-002 embeddings), which is different from OpenAI's current default `text-embedding-ada-002` model which produces 1536 dimensional embeddings. The model is small-ish and fast-ish. This model and embedding size may change in the future.
| model name | dimensions | input max tokens | speed | size | Avg. performance |
| ---------------------- | ---------- | ---------------- | ----- | ---- | ---------------- |
| text-embedding-ada-002 | 1536 | 8192 | - | - | - |
| text-davinci-002 | 768 | 2046 | - | - | - |
| all-mpnet-base-v2 | 768 | 384 | 2800 | 420M | 63.3 |
| all-MiniLM-L6-v2 | 384 | 256 | 14200 | 80M | 58.8 |
In short, the all-MiniLM-L6-v2 model is 5x faster, 5x smaller ram, 2x smaller storage, and still offers good quality. Stats from (https://www.sbert.net/docs/pretrained_models.html). To change the model from the default you can set the environment variable `OPENEDAI_EMBEDDING_MODEL` , ex. "OPENEDAI_EMBEDDING_MODEL=all-MiniLM-L6-v2".
Warning: You cannot mix embeddings from different models even if they have the same dimensions. They are not comparable.
### Compatibility & not so compatibility
2023-05-02 21:49:53 -04:00
2023-11-07 10:40:52 -05:00
Note: the table below may be obsolete.
2023-09-16 11:04:04 -04:00
| API endpoint | tested with | notes |
| ------------------------- | ---------------------------------- | --------------------------------------------------------------------------- |
| /v1/chat/completions | openai.ChatCompletion.create() | Use it with instruction following models |
| /v1/embeddings | openai.Embedding.create() | Using SentenceTransformer embeddings |
| /v1/images/generations | openai.Image.create() | Bare bones, no model configuration, response_format='b64_json' only. |
| /v1/moderations | openai.Moderation.create() | Basic initial support via embeddings |
| /v1/models | openai.Model.list() | Lists models, Currently loaded model first, plus some compatibility options |
| /v1/models/{id} | openai.Model.get() | returns whatever you ask for |
2023-11-06 00:38:29 -05:00
| /v1/edits | openai.Edit.create() | Removed, use /v1/chat/completions instead |
2023-09-16 11:04:04 -04:00
| /v1/text_completion | openai.Completion.create() | Legacy endpoint, variable quality based on the model |
| /v1/completions | openai api completions.create | Legacy endpoint (v0.25) |
| /v1/engines/\*/embeddings | python-openai v0.25 | Legacy endpoint |
| /v1/engines/\*/generate | openai engines.generate | Legacy endpoint |
| /v1/engines | openai engines.list | Legacy Lists models |
| /v1/engines/{model_name} | openai engines.get -i {model_name} | You can use this legacy endpoint to load models via the api or command line |
| /v1/images/edits | openai.Image.create_edit() | not yet supported |
| /v1/images/variations | openai.Image.create_variation() | not yet supported |
| /v1/audio/\* | openai.Audio.\* | supported |
| /v1/files\* | openai.Files.\* | not yet supported |
| /v1/fine-tunes\* | openai.FineTune.\* | not yet supported |
| /v1/search | openai.search, engines.search | not yet supported |
2023-05-02 21:49:53 -04:00
2023-11-06 00:38:29 -05:00
#### Applications
2023-05-02 21:49:53 -04:00
2023-09-16 11:04:04 -04:00
Almost everything needs the `OPENAI_API_KEY` and `OPENAI_API_BASE` environment variable set, but there are some exceptions.
2023-11-07 10:40:52 -05:00
Note: the table below may be obsolete.
2023-09-16 11:04:04 -04:00
| Compatibility | Application/Library | Website | Notes |
| ------------- | ---------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ✅❌ | openai-python (v0.25+) | https://github.com/openai/openai-python | only the endpoints from above are working. OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅❌ | openai-node | https://github.com/openai/openai-node | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) |
| ✅❌ | chatgpt-api | https://github.com/transitive-bullshit/chatgpt-api | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) |
| ✅ | anse | https://github.com/anse-app/anse | API Key & URL configurable in UI, Images also work |
| ✅ | shell_gpt | https://github.com/TheR1D/shell_gpt | OPENAI_API_HOST=http://127.0.0.1:5001 |
| ✅ | gpt-shell | https://github.com/jla/gpt-shell | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅ | gpt-discord-bot | https://github.com/openai/gpt-discord-bot | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅ | OpenAI for Notepad++ | https://github.com/Krazal/nppopenai | api_url=http://127.0.0.1:5001 in the config file, or environment variables |
| ✅ | vscode-openai | https://marketplace.visualstudio.com/items?itemName=AndrewButson.vscode-openai | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅❌ | langchain | https://github.com/hwchase17/langchain | OPENAI_API_BASE=http://127.0.0.1:5001/v1 even with a good 30B-4bit model the result is poor so far. It assumes zero shot python/json coding. Some model tailored prompt formatting improves results greatly. |
| ✅❌ | Auto-GPT | https://github.com/Significant-Gravitas/Auto-GPT | OPENAI_API_BASE=http://127.0.0.1:5001/v1 Same issues as langchain. Also assumes a 4k+ context |
| ✅❌ | babyagi | https://github.com/yoheinakajima/babyagi | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ❌ | guidance | https://github.com/microsoft/guidance | logit_bias and logprobs not yet supported |