889c8d1758
* Add embeddings endpoint * Add test for embedding endpoint |
||
---|---|---|
.. | ||
gpt4all_api | ||
.gitignore | ||
.isort.cfg | ||
docker-compose.gpu.yaml | ||
docker-compose.yaml | ||
LICENSE | ||
makefile | ||
README.md |
GPT4All REST API
This directory contains the source code to run and build docker images that run a FastAPI app for serving inference from GPT4All models. The API matches the OpenAI API spec.
Tutorial
Starting the app
First build the FastAPI docker image. You only have to do this on initial build or when you add new dependencies to the requirements.txt file:
DOCKER_BUILDKIT=1 docker build -t gpt4all_api --progress plain -f gpt4all_api/Dockerfile.buildkit .
Then, start the backend with:
docker compose up --build
This will run both the API and locally hosted GPU inference server. If you want to run the API without the GPU inference server, you can run:
docker compose up --build gpt4all_api
To run the API with the GPU inference server, you will need to include environment variables (like the MODEL_ID
). Edit the .env
file and run
docker compose --env-file .env up --build
Spinning up your app
Run docker compose up
to spin up the backend. Monitor the logs for errors in-case you forgot to set an environment variable above.
Development
Run
docker compose up --build
and edit files in the api
directory. The api will hot-reload on changes.
You can run the unit tests with
make test
Viewing API documentation
Once the FastAPI ap is started you can access its documentation and test the search endpoint by going to:
localhost:80/docs
This documentation should match the OpenAI OpenAPI spec located at https://github.com/openai/openai-openapi/blob/master/openapi.yaml
Running inference
import openai
openai.api_base = "http://localhost:4891/v1"
openai.api_key = "not needed for a local LLM"
def test_completion():
model = "gpt4all-j-v1.3-groovy"
prompt = "Who is Michael Jordan?"
response = openai.Completion.create(
model=model,
prompt=prompt,
max_tokens=50,
temperature=0.28,
top_p=0.95,
n=1,
echo=True,
stream=False
)
assert len(response['choices'][0]['text']) > len(prompt)
print(response)