gpt4all/gpt4all-backend/llmodel.h

#ifndef LLMODEL_H
#define LLMODEL_H

#include <string>
#include <functional>
#include <vector>
#include <cstdint>

class LLModel {
public:
    explicit LLModel() {}
    virtual ~LLModel() {}

    virtual bool loadModel(const std::string &modelPath) = 0;
    virtual bool isModelLoaded() const = 0;
    virtual size_t stateSize() const { return 0; }
    virtual size_t saveState(uint8_t *dest) const { return 0; }
    virtual size_t restoreState(const uint8_t *src) { return 0; }
    struct PromptContext {
        std::vector<float> logits;      // logits of current context
        std::vector<int32_t> tokens;    // current tokens in the context window
        int32_t n_past = 0;             // number of tokens in past conversation
        int32_t n_ctx = 0;              // number of tokens possible in context window
        int32_t n_predict = 200;
        int32_t top_k = 40;
        float   top_p = 0.9f;
        float   temp = 0.9f;
        int32_t n_batch = 9;
        float   repeat_penalty = 1.10f;
        int32_t repeat_last_n = 64;     // last n tokens to penalize
        float   contextErase = 0.75f;   // percent of context to erase if we exceed the context
                                        // window
    };
    virtual void prompt(const std::string &prompt,
        std::function<bool(int32_t)> promptCallback,
        std::function<bool(int32_t, const std::string&)> responseCallback,
        std::function<bool(bool)> recalculateCallback,
        PromptContext &ctx) = 0;
    virtual void setThreadCount(int32_t n_threads) {}
    virtual int32_t threadCount() { return 1; }

protected:
    virtual void recalculateContext(PromptContext &promptCtx,
        std::function<bool(bool)> recalculate) = 0;
};

#endif // LLMODEL_H
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-13 22:15:40 -04:00			`#ifndef LLMODEL_H`
			`#define LLMODEL_H`

			`#include <string>`
			`#include <functional>`
			`#include <vector>`
include <cstdint> in llmodel.h 2023-05-04 20:01:32 -04:00			`#include <cstdint>`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-13 22:15:40 -04:00
			`class LLModel {`
			`public:`
			`explicit LLModel() {}`
			`virtual ~LLModel() {}`

Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`virtual bool loadModel(const std::string &modelPath) = 0;`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-13 22:15:40 -04:00			`virtual bool isModelLoaded() const = 0;`
First attempt at providing a persistent chat list experience. Limitations: 1) Context is not restored for gpt-j models 2) When you switch between different model types in an existing chat the context and all the conversation is lost 3) The settings are not chat or conversation specific 4) The sizes of the chat persisted files are very large due to how much data the llama.cpp backend tries to persist. Need to investigate how we can shrink this. 2023-05-04 15:31:41 -04:00			`virtual size_t stateSize() const { return 0; }`
			`virtual size_t saveState(uint8_t *dest) const { return 0; }`
			`virtual size_t restoreState(const uint8_t *src) { return 0; }`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-13 22:15:40 -04:00			`struct PromptContext {`
Implement repeat penalty for both llama and gptj in gui. 2023-04-25 08:38:29 -04:00			`std::vector<float> logits; // logits of current context`
			`std::vector<int32_t> tokens; // current tokens in the context window`
			`int32_t n_past = 0; // number of tokens in past conversation`
			`int32_t n_ctx = 0; // number of tokens possible in context window`
			`int32_t n_predict = 200;`
			`int32_t top_k = 40;`
			`float top_p = 0.9f;`
			`float temp = 0.9f;`
			`int32_t n_batch = 9;`
			`float repeat_penalty = 1.10f;`
			`int32_t repeat_last_n = 64; // last n tokens to penalize`
Infinite context window through trimming. 2023-04-25 11:20:51 -04:00			`float contextErase = 0.75f; // percent of context to erase if we exceed the context`
			`// window`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-13 22:15:40 -04:00			`};`
Implement repeat penalty for both llama and gptj in gui. 2023-04-25 08:38:29 -04:00			`virtual void prompt(const std::string &prompt,`
Move the promptCallback to own function. 2023-04-27 11:08:15 -04:00			`std::function<bool(int32_t)> promptCallback,`
			`std::function<bool(int32_t, const std::string&)> responseCallback,`
			`std::function<bool(bool)> recalculateCallback,`
Implement repeat penalty for both llama and gptj in gui. 2023-04-25 08:38:29 -04:00			`PromptContext &ctx) = 0;`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`virtual void setThreadCount(int32_t n_threads) {}`
			`virtual int32_t threadCount() { return 1; }`
Infinite context window through trimming. 2023-04-25 11:20:51 -04:00
			`protected:`
			`virtual void recalculateContext(PromptContext &promptCtx,`
			`std::function<bool(bool)> recalculate) = 0;`
Add an abstraction around gpt-j that will allow other arch models to be loaded in ui. 2023-04-13 22:15:40 -04:00			`};`

Add thread count setting 2023-04-18 09:46:03 -04:00			`#endif // LLMODEL_H`