text-generation-webui/extensions/openai/README.md

# An OpenedAI API (openai like)

This extension creates an API that works kind of like openai (ie. api.openai.com).
It's incomplete so far but perhaps is functional enough for you.

## Setup & installation 

Optional (for flask_cloudflared, embeddings):

```
pip3 install -r requirements.txt
```

It listens on tcp port 5001 by default. You can use the OPENEDAI_PORT environment variable to change this.

To enable the bare bones image generation (txt2img) set: SD_WEBUI_URL to point to your Stable Diffusion API ([Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)).

Example:
```
SD_WEBUI_URL=http://127.0.0.1:7861
```

Make sure you enable it in server launch parameters. Just make sure they include:

```
--extensions openai
```

### Embeddings (alpha)

Embeddings requires ```sentence-transformers``` installed, but chat and completions will function without it loaded. The embeddings endpoint is currently using the HuggingFace model: ```sentence-transformers/all-mpnet-base-v2``` for embeddings. This produces 768 dimensional embeddings (the same as the text-davinci-002 embeddings), which is different from OpenAI's current default ```text-embedding-ada-002``` model which produces 1536 dimensional embeddings. The model is small-ish and fast-ish. This model and embedding size may change in the future.

| model name | dimensions | input max tokens | speed | size | Avg. performance | 
| --- | --- | --- | --- | --- | --- |
| text-embedding-ada-002 | 1536 | 8192| - | - | - |
| text-davinci-002 | 768 | 2046 | - | - | - |
| all-mpnet-base-v2 | 768 | 384 | 2800 | 420M | 63.3 |
| all-MiniLM-L6-v2 | 384 | 256 | 14200 | 80M | 58.8 |

In short, the all-MiniLM-L6-v2 model is 5x faster, 5x smaller ram, 2x smaller storage, and still offers good quality. Stats from (https://www.sbert.net/docs/pretrained_models.html). To change the model from the default you can set the environment variable OPENEDAI_EMBEDDING_MODEL, ex. "OPENEDAI_EMBEDDING_MODEL=all-MiniLM-L6-v2".

Warning: You cannot mix embeddings from different models even if they have the same dimensions. They are not comparable.

### Client Application Setup

Almost everything you use it with will require you to set a dummy OpenAI API key environment variable.

With the [official python openai client](https://github.com/openai/openai-python), you can set the OPENAI_API_BASE environment variable before you import the openai module, like so:

```
OPENAI_API_KEY=sk-dummy
OPENAI_API_BASE=http://127.0.0.1:5001/v1
```

If needed, replace 127.0.0.1 with the IP/port of your server.

If using .env files to save the OPENAI_API_BASE and OPENAI_API_KEY variables, you can ensure compatibility by loading the .env file before loading the openai module, like so in python:

```
from dotenv import load_dotenv
load_dotenv()
import openai
```

With the [official Node.js openai client](https://github.com/openai/openai-node) it is slightly more more complex because the environment variables are not used by default, so small source code changes may be required to use the environment variables, like so:

```
const openai = OpenAI(Configuration({
  apiKey: process.env.OPENAI_API_KEY,
  basePath: process.env.OPENAI_API_BASE,
}));
```

For apps made with the [chatgpt-api Node.js client library](https://github.com/transitive-bullshit/chatgpt-api):

```
const api = new ChatGPTAPI({
  apiKey: process.env.OPENAI_API_KEY,
  apiBaseUrl: process.env.OPENAI_API_BASE,
})
```

## Compatibility & not so compatibility

| API endpoint | tested with | notes |
| --- | --- | --- |
| /v1/models | openai.Model.list() | Lists models, Currently loaded model first, plus some compatibility options |
| /v1/models/{id} | openai.Model.get() | returns whatever you ask for, model does nothing yet anyways |
| /v1/text_completion | openai.Completion.create() | the most tested, only supports single string input so far, variable quality based on the model |
| /v1/chat/completions | openai.ChatCompletion.create() | Quality depends a lot on the model |
| /v1/edits | openai.Edit.create() | Works the best of all, perfect for instruction following models |
| /v1/images/generations | openai.Image.create() | Bare bones, no model configuration, response_format='b64_json' only. |
| /v1/embeddings | openai.Embedding.create() | Using Sentence Transformer, dimensions are different and may never be directly comparable to openai embeddings. |
| /v1/moderations | openai.Moderation.create() | does nothing. successfully. |
| /v1/completions | openai api completions.create | Legacy endpoint (v0.25) |
| /v1/engines/*/embeddings | python-openai v0.25 | Legacy endpoint |
| /v1/engines/*/generate | openai engines.generate | Legacy endpoint |
| /v1/engines | openai engines.list | Legacy Lists models |
| /v1/engines/{model_name} | openai engines.get -i {model_name} | You can use this legacy endpoint to load models via the api |
| /v1/images/edits | openai.Image.create_edit() | not yet supported |
| /v1/images/variations | openai.Image.create_variation() | not yet supported |
| /v1/audio/\* | openai.Audio.\* | not yet supported |
| /v1/files\* | openai.Files.\* | not yet supported |
| /v1/fine-tunes\* | openai.FineTune.\* | not yet supported |
| /v1/search | openai.search, engines.search | not yet supported |

The model name setting is ignored in completions, but you may need to adjust the maximum token length to fit the model (ie. set to <2048 tokens instead of 4096, 8k, etc). To mitigate some of this, the max_tokens value is halved until it is less than truncation_length for the model (typically 2k).

Streaming, temperature, top_p, max_tokens, stop, should all work as expected, but not all parameters are mapped correctly.

Some hacky mappings:

| OpenAI | text-generation-webui | note |
| --- | --- | --- |
| frequency_penalty | encoder_repetition_penalty | this seems to operate with a different scale and defaults, I tried to scale it based on range & defaults, but the results are terrible. hardcoded to 1.18 until there is a better way |
| presence_penalty | repetition_penalty | same issues as frequency_penalty, hardcoded to 1.0 |
| best_of | top_k | |
| stop | custom_stopping_strings | this is also stuffed with ['\n###', "\n{user prompt}", "{user prompt}" ] for good measure. |
| n | 1 | hardcoded, it may be worth implementing this but I'm not sure how yet |
| 1.0 | typical_p | hardcoded |
| 1 | num_beams | hardcoded |
| max_tokens | max_new_tokens | For Text Completions max_tokens is set smaller than the truncation_length minus the prompt length. This can cause no input to be generated if the prompt is too large. For ChatCompletions, the older chat messages may be dropped to fit the max_new_tokens requested |
| logprobs | - | ignored |
| logit_bias | - | ignored |
| messages.name | - | ignored |
| user | - | ignored |

defaults are mostly from openai, so are different. I use the openai defaults where I can and try to scale them to the webui defaults with the same intent.

### Models

This has been successfully tested with Koala, Alpaca, gpt4-x-alpaca, GPT4all-snoozy,  wizard-vicuna, stable-vicuna and Vicuna 1.1 - ie. Instruction Following models. If you test with other models please let me know how it goes. Less than satisfying results (so far): RWKV-4-Raven, llama, mpt-7b-instruct/chat

### Applications

Everything needs OPENAI_API_KEY=dummy set.

| Compatibility | Application/Library | url | notes / setting |
| --- | --- | --- | --- |
| ✅❌ | openai-python (v0.25+) | https://github.com/openai/openai-python | only the endpoints from above are working. OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅❌ | openai-node | https://github.com/openai/openai-node | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) |
| ✅❌ | chatgpt-api | https://github.com/transitive-bullshit/chatgpt-api | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) |
| ✅ | anse | https://github.com/anse-app/anse | API Key & URL configurable in UI |
| ✅ | shell_gpt | https://github.com/TheR1D/shell_gpt | OPENAI_API_HOST=http://127.0.0.1:5001 |
| ✅ | gpt-shell | https://github.com/jla/gpt-shell | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅ | gpt-discord-bot | https://github.com/openai/gpt-discord-bot | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅ | OpenAI for Notepad++| https://github.com/Krazal/nppopenai | api_url=http://127.0.0.1:5001 in the config file |
| ✅❌ | langchain | https://github.com/hwchase17/langchain | OPENAI_API_BASE=http://127.0.0.1:5001/v1 even with a good 30B-4bit model the result is poor so far. It assumes zero shot python/json coding. Some model tailored prompt formatting improves results greatly. |
| ✅❌ | Auto-GPT | https://github.com/Significant-Gravitas/Auto-GPT | OPENAI_API_BASE=http://127.0.0.1:5001/v1 Same issues as langchain. Also assumes a 4k+ context |
| ✅❌ | babyagi | https://github.com/yoheinakajima/babyagi | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |

## Future plans
* better error handling
* model changing, esp. something for swapping loras or embedding models
* consider switching to FastAPI + starlette for SSE (openai SSE seems non-standard)
* do something about rate limiting or locking requests for completions, most systems will only be able handle a single request at a time before OOM

## Bugs? Feedback? Comments? Pull requests?

To enable debugging and get copious output you can set the OPENEDAI_DEBUG=1 environment variable.

Are all appreciated, please @matatonic and I'll try to get back to you as soon as possible.
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`# An OpenedAI API (openai like)`

			`This extension creates an API that works kind of like openai (ie. api.openai.com).`
			`It's incomplete so far but perhaps is functional enough for you.`

			`## Setup & installation`

			`Optional (for flask_cloudflared, embeddings):`

			```
			`pip3 install -r requirements.txt`
			```

[extension/openai] add edits & image endpoints & fix prompt return in non --chat modes (#1935) 2023-05-11 10:06:39 -04:00			`It listens on tcp port 5001 by default. You can use the OPENEDAI_PORT environment variable to change this.`

			`To enable the bare bones image generation (txt2img) set: SD_WEBUI_URL to point to your Stable Diffusion API ([Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)).`

			`Example:`
			```
			`SD_WEBUI_URL=http://127.0.0.1:7861`
			```

[extensions/openai] various fixes (#2533) 2023-06-06 00:43:04 -04:00			`Make sure you enable it in server launch parameters. Just make sure they include:`

			```
			`--extensions openai`
			```

add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`### Embeddings (alpha)`

			Embeddings requires ```sentence-transformers``` installed, but chat and completions will function without it loaded. The embeddings endpoint is currently using the HuggingFace model: ```sentence-transformers/all-mpnet-base-v2``` for embeddings. This produces 768 dimensional embeddings (the same as the text-davinci-002 embeddings), which is different from OpenAI's current default ```text-embedding-ada-002``` model which produces 1536 dimensional embeddings. The model is small-ish and fast-ish. This model and embedding size may change in the future.

			`\| model name \| dimensions \| input max tokens \| speed \| size \| Avg. performance \|`
			`\| --- \| --- \| --- \| --- \| --- \| --- \|`
			`\| text-embedding-ada-002 \| 1536 \| 8192\| - \| - \| - \|`
			`\| text-davinci-002 \| 768 \| 2046 \| - \| - \| - \|`
			`\| all-mpnet-base-v2 \| 768 \| 384 \| 2800 \| 420M \| 63.3 \|`
			`\| all-MiniLM-L6-v2 \| 384 \| 256 \| 14200 \| 80M \| 58.8 \|`

			`In short, the all-MiniLM-L6-v2 model is 5x faster, 5x smaller ram, 2x smaller storage, and still offers good quality. Stats from (https://www.sbert.net/docs/pretrained_models.html). To change the model from the default you can set the environment variable OPENEDAI_EMBEDDING_MODEL, ex. "OPENEDAI_EMBEDDING_MODEL=all-MiniLM-L6-v2".`

			`Warning: You cannot mix embeddings from different models even if they have the same dimensions. They are not comparable.`

			`### Client Application Setup`

			`Almost everything you use it with will require you to set a dummy OpenAI API key environment variable.`

			`With the [official python openai client](https://github.com/openai/openai-python), you can set the OPENAI_API_BASE environment variable before you import the openai module, like so:`

			```
[extensions/openai] various fixes (#2533) 2023-06-06 00:43:04 -04:00			`OPENAI_API_KEY=sk-dummy`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`OPENAI_API_BASE=http://127.0.0.1:5001/v1`
			```

			`If needed, replace 127.0.0.1 with the IP/port of your server.`

			`If using .env files to save the OPENAI_API_BASE and OPENAI_API_KEY variables, you can ensure compatibility by loading the .env file before loading the openai module, like so in python:`

			```
			`from dotenv import load_dotenv`
			`load_dotenv()`
			`import openai`
			```

			`With the [official Node.js openai client](https://github.com/openai/openai-node) it is slightly more more complex because the environment variables are not used by default, so small source code changes may be required to use the environment variables, like so:`

			```
			`const openai = OpenAI(Configuration({`
			`apiKey: process.env.OPENAI_API_KEY,`
			`basePath: process.env.OPENAI_API_BASE,`
			`}));`
			```

			`For apps made with the [chatgpt-api Node.js client library](https://github.com/transitive-bullshit/chatgpt-api):`

			```
			`const api = new ChatGPTAPI({`
			`apiKey: process.env.OPENAI_API_KEY,`
			`apiBaseUrl: process.env.OPENAI_API_BASE,`
			`})`
			```

			`## Compatibility & not so compatibility`

			`\| API endpoint \| tested with \| notes \|`
			`\| --- \| --- \| --- \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| /v1/models \| openai.Model.list() \| Lists models, Currently loaded model first, plus some compatibility options \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`\| /v1/models/{id} \| openai.Model.get() \| returns whatever you ask for, model does nothing yet anyways \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| /v1/text_completion \| openai.Completion.create() \| the most tested, only supports single string input so far, variable quality based on the model \|`
			`\| /v1/chat/completions \| openai.ChatCompletion.create() \| Quality depends a lot on the model \|`
			`\| /v1/edits \| openai.Edit.create() \| Works the best of all, perfect for instruction following models \|`
[extension/openai] add edits & image endpoints & fix prompt return in non --chat modes (#1935) 2023-05-11 10:06:39 -04:00			`\| /v1/images/generations \| openai.Image.create() \| Bare bones, no model configuration, response_format='b64_json' only. \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`\| /v1/embeddings \| openai.Embedding.create() \| Using Sentence Transformer, dimensions are different and may never be directly comparable to openai embeddings. \|`
			`\| /v1/moderations \| openai.Moderation.create() \| does nothing. successfully. \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| /v1/completions \| openai api completions.create \| Legacy endpoint (v0.25) \|`
			`\| /v1/engines/*/embeddings \| python-openai v0.25 \| Legacy endpoint \|`
			`\| /v1/engines/*/generate \| openai engines.generate \| Legacy endpoint \|`
			`\| /v1/engines \| openai engines.list \| Legacy Lists models \|`
			`\| /v1/engines/{model_name} \| openai engines.get -i {model_name} \| You can use this legacy endpoint to load models via the api \|`
			`\| /v1/images/edits \| openai.Image.create_edit() \| not yet supported \|`
			`\| /v1/images/variations \| openai.Image.create_variation() \| not yet supported \|`
			`\| /v1/audio/\* \| openai.Audio.\* \| not yet supported \|`
			`\| /v1/files\* \| openai.Files.\* \| not yet supported \|`
			`\| /v1/fine-tunes\* \| openai.FineTune.\* \| not yet supported \|`
			`\| /v1/search \| openai.search, engines.search \| not yet supported \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00
			`The model name setting is ignored in completions, but you may need to adjust the maximum token length to fit the model (ie. set to <2048 tokens instead of 4096, 8k, etc). To mitigate some of this, the max_tokens value is halved until it is less than truncation_length for the model (typically 2k).`

			`Streaming, temperature, top_p, max_tokens, stop, should all work as expected, but not all parameters are mapped correctly.`

			`Some hacky mappings:`

			`\| OpenAI \| text-generation-webui \| note \|`
			`\| --- \| --- \| --- \|`
			`\| frequency_penalty \| encoder_repetition_penalty \| this seems to operate with a different scale and defaults, I tried to scale it based on range & defaults, but the results are terrible. hardcoded to 1.18 until there is a better way \|`
			`\| presence_penalty \| repetition_penalty \| same issues as frequency_penalty, hardcoded to 1.0 \|`
			`\| best_of \| top_k \| \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| stop \| custom_stopping_strings \| this is also stuffed with ['\n###', "\n{user prompt}", "{user prompt}" ] for good measure. \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`\| n \| 1 \| hardcoded, it may be worth implementing this but I'm not sure how yet \|`
			`\| 1.0 \| typical_p \| hardcoded \|`
			`\| 1 \| num_beams \| hardcoded \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| max_tokens \| max_new_tokens \| For Text Completions max_tokens is set smaller than the truncation_length minus the prompt length. This can cause no input to be generated if the prompt is too large. For ChatCompletions, the older chat messages may be dropped to fit the max_new_tokens requested \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`\| logprobs \| - \| ignored \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| logit_bias \| - \| ignored \|`
			`\| messages.name \| - \| ignored \|`
			`\| user \| - \| ignored \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00
			`defaults are mostly from openai, so are different. I use the openai defaults where I can and try to scale them to the webui defaults with the same intent.`

[extension/openai] add edits & image endpoints & fix prompt return in non --chat modes (#1935) 2023-05-11 10:06:39 -04:00			`### Models`

			`This has been successfully tested with Koala, Alpaca, gpt4-x-alpaca, GPT4all-snoozy, wizard-vicuna, stable-vicuna and Vicuna 1.1 - ie. Instruction Following models. If you test with other models please let me know how it goes. Less than satisfying results (so far): RWKV-4-Raven, llama, mpt-7b-instruct/chat`

add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`### Applications`

			`Everything needs OPENAI_API_KEY=dummy set.`

			`\| Compatibility \| Application/Library \| url \| notes / setting \|`
			`\| --- \| --- \| --- \| --- \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| ✅❌ \| openai-python (v0.25+) \| https://github.com/openai/openai-python \| only the endpoints from above are working. OPENAI_API_BASE=http://127.0.0.1:5001/v1 \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`\| ✅❌ \| openai-node \| https://github.com/openai/openai-node \| only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) \|`
			`\| ✅❌ \| chatgpt-api \| https://github.com/transitive-bullshit/chatgpt-api \| only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) \|`
extensions/openai: cross_origin + chunked_response (updated fix) (#2423) 2023-05-30 20:54:24 -04:00			`\| ✅ \| anse \| https://github.com/anse-app/anse \| API Key & URL configurable in UI \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`\| ✅ \| shell_gpt \| https://github.com/TheR1D/shell_gpt \| OPENAI_API_HOST=http://127.0.0.1:5001 \|`
			`\| ✅ \| gpt-shell \| https://github.com/jla/gpt-shell \| OPENAI_API_BASE=http://127.0.0.1:5001/v1 \|`
			`\| ✅ \| gpt-discord-bot \| https://github.com/openai/gpt-discord-bot \| OPENAI_API_BASE=http://127.0.0.1:5001/v1 \|`
extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`\| ✅ \| OpenAI for Notepad++\| https://github.com/Krazal/nppopenai \| api_url=http://127.0.0.1:5001 in the config file \|`
add openai compatible api (#1475) 2023-05-02 21:49:53 -04:00			`\| ✅❌ \| langchain \| https://github.com/hwchase17/langchain \| OPENAI_API_BASE=http://127.0.0.1:5001/v1 even with a good 30B-4bit model the result is poor so far. It assumes zero shot python/json coding. Some model tailored prompt formatting improves results greatly. \|`
			`\| ✅❌ \| Auto-GPT \| https://github.com/Significant-Gravitas/Auto-GPT \| OPENAI_API_BASE=http://127.0.0.1:5001/v1 Same issues as langchain. Also assumes a 4k+ context \|`
			`\| ✅❌ \| babyagi \| https://github.com/yoheinakajima/babyagi \| OPENAI_API_BASE=http://127.0.0.1:5001/v1 \|`

			`## Future plans`
			`* better error handling`
			`* model changing, esp. something for swapping loras or embedding models`
			`* consider switching to FastAPI + starlette for SSE (openai SSE seems non-standard)`
			`* do something about rate limiting or locking requests for completions, most systems will only be able handle a single request at a time before OOM`
[extension/openai] add edits & image endpoints & fix prompt return in non --chat modes (#1935) 2023-05-11 10:06:39 -04:00
			`## Bugs? Feedback? Comments? Pull requests?`

extensions/openai: docs update, model loader, minor fixes (#2557) 2023-06-17 18:15:24 -04:00			`To enable debugging and get copious output you can set the OPENEDAI_DEBUG=1 environment variable.`

[extension/openai] add edits & image endpoints & fix prompt return in non --chat modes (#1935) 2023-05-11 10:06:39 -04:00			`Are all appreciated, please @matatonic and I'll try to get back to you as soon as possible.`