gpt4all/gpt4all-backend/llamamodel_impl.h

#ifndef LLAMAMODEL_H_I_KNOW_WHAT_I_AM_DOING_WHEN_INCLUDING_THIS_FILE
#error This file is NOT meant to be included outside of llamamodel.cpp. Doing so is DANGEROUS. Be sure to know what you are doing before proceeding to #define LLAMAMODEL_H_I_KNOW_WHAT_I_AM_DOING_WHEN_INCLUDING_THIS_FILE
#endif
#ifndef LLAMAMODEL_H
#define LLAMAMODEL_H

#include <functional>
#include <memory>
#include <string>
#include <vector>
#include "llmodel.h"

struct LLamaPrivate;
class LLamaModel : public LLModel {
public:
    LLamaModel();
    ~LLamaModel();

    bool supportsEmbedding() const override { return false; }
    bool supportsCompletion() const override { return true; }
    bool loadModel(const std::string &modelPath, int n_ctx, int ngl) override;
    bool isModelLoaded() const override;
    size_t requiredMem(const std::string &modelPath, int n_ctx, int ngl) override;
    size_t stateSize() const override;
    size_t saveState(uint8_t *dest) const override;
    size_t restoreState(const uint8_t *src) override;
    void setThreadCount(int32_t n_threads) override;
    int32_t threadCount() const override;
    std::vector<GPUDevice> availableGPUDevices(size_t memoryRequired) const override;
    bool initializeGPUDevice(size_t memoryRequired, const std::string& name) const override;
    bool initializeGPUDevice(int device, std::string *unavail_reason) const override;
    bool hasGPUDevice() override;
    bool usingGPUDevice() override;

private:
    std::unique_ptr<LLamaPrivate> d_ptr;

protected:
    std::vector<Token> tokenize(PromptContext &, const std::string&) const override;
    std::string tokenToString(Token) const override;
    Token sampleToken(PromptContext& ctx) const override;
    bool evalTokens(PromptContext& ctx, const std::vector<int32_t> &tokens) const override;
    int32_t contextLength() const override;
    const std::vector<Token>& endTokens() const override;

    int32_t maxContextLength(std::string const &modelPath) const override;
    int32_t layerCount(std::string const &modelPath) const override;
};

#endif // LLAMAMODEL_H
Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 17:04:01 -04:00			`#ifndef LLAMAMODEL_H_I_KNOW_WHAT_I_AM_DOING_WHEN_INCLUDING_THIS_FILE`
			`#error This file is NOT meant to be included outside of llamamodel.cpp. Doing so is DANGEROUS. Be sure to know what you are doing before proceeding to #define LLAMAMODEL_H_I_KNOW_WHAT_I_AM_DOING_WHEN_INCLUDING_THIS_FILE`
			`#endif`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`#ifndef LLAMAMODEL_H`
			`#define LLAMAMODEL_H`

			`#include <functional>`
expose n_gpu_layers parameter of llama.cpp (#1890) Also dynamically limit the GPU layers and context length fields to the maximum supported by the model. Signed-off-by: Jared Van Bortel <jared@nomic.ai> 2024-01-31 14:17:44 -05:00			`#include <memory>`
			`#include <string>`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`#include <vector>`
			`#include "llmodel.h"`

Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 17:04:01 -04:00			`struct LLamaPrivate;`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`class LLamaModel : public LLModel {`
			`public:`
			`LLamaModel();`
			`~LLamaModel();`

Bert 2023-07-09 11:32:51 -04:00			`bool supportsEmbedding() const override { return false; }`
			`bool supportsCompletion() const override { return true; }`
expose n_gpu_layers parameter of llama.cpp (#1890) Also dynamically limit the GPU layers and context length fields to the maximum supported by the model. Signed-off-by: Jared Van Bortel <jared@nomic.ai> 2024-01-31 14:17:44 -05:00			`bool loadModel(const std::string &modelPath, int n_ctx, int ngl) override;`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`bool isModelLoaded() const override;`
expose n_gpu_layers parameter of llama.cpp (#1890) Also dynamically limit the GPU layers and context length fields to the maximum supported by the model. Signed-off-by: Jared Van Bortel <jared@nomic.ai> 2024-01-31 14:17:44 -05:00			`size_t requiredMem(const std::string &modelPath, int n_ctx, int ngl) override;`
First attempt at providing a persistent chat list experience. Limitations: 1) Context is not restored for gpt-j models 2) When you switch between different model types in an existing chat the context and all the conversation is lost 3) The settings are not chat or conversation specific 4) The sizes of the chat persisted files are very large due to how much data the llama.cpp backend tries to persist. Need to investigate how we can shrink this. 2023-05-04 15:31:41 -04:00			`size_t stateSize() const override;`
			`size_t saveState(uint8_t *dest) const override;`
			`size_t restoreState(const uint8_t *src) override;`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`void setThreadCount(int32_t n_threads) override;`
llmodel: constify LLModel::threadCount() 2023-05-21 16:45:29 -04:00			`int32_t threadCount() const override;`
expose n_gpu_layers parameter of llama.cpp (#1890) Also dynamically limit the GPU layers and context length fields to the maximum supported by the model. Signed-off-by: Jared Van Bortel <jared@nomic.ai> 2024-01-31 14:17:44 -05:00			`std::vector<GPUDevice> availableGPUDevices(size_t memoryRequired) const override;`
			`bool initializeGPUDevice(size_t memoryRequired, const std::string& name) const override;`
			`bool initializeGPUDevice(int device, std::string *unavail_reason) const override;`
Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 2023-08-30 09:43:56 -04:00			`bool hasGPUDevice() override;`
Only show GPU when we're actually using it. 2023-09-14 09:59:19 -04:00			`bool usingGPUDevice() override;`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00
			`private:`
expose n_gpu_layers parameter of llama.cpp (#1890) Also dynamically limit the GPU layers and context length fields to the maximum supported by the model. Signed-off-by: Jared Van Bortel <jared@nomic.ai> 2024-01-31 14:17:44 -05:00			`std::unique_ptr<LLamaPrivate> d_ptr;`
Backend prompt dedup (#822) * Deduplicated prompt() function code 2023-06-04 08:59:24 -04:00
			`protected:`
Fix up for newer models on reset context. This fixes the model from totally failing after a reset context. 2023-06-04 19:31:00 -04:00			`std::vector<Token> tokenize(PromptContext &, const std::string&) const override;`
llmodel: change tokenToString to not use string_view (#968) fixes a definite use-after-free and likely avoids some other potential ones - std::string will convert to a std::string_view automatically but as soon as the std::string in question goes out of scope it is already freed and the string_view is pointing at freed memory - this is mostly fine if its returning a reference to the tokenizer's internal vocab table but it's, imo, too easy to return a reference to a dynamically constructed string with this as replit is doing (and unfortunately needs to do to convert the internal whitespace replacement symbol back to a space) 2023-06-13 07:14:02 -04:00			`std::string tokenToString(Token) const override;`
Backend prompt dedup (#822) * Deduplicated prompt() function code 2023-06-04 08:59:24 -04:00			`Token sampleToken(PromptContext& ctx) const override;`
			`bool evalTokens(PromptContext& ctx, const std::vector<int32_t> &tokens) const override;`
			`int32_t contextLength() const override;`
			`const std::vector<Token>& endTokens() const override;`
expose n_gpu_layers parameter of llama.cpp (#1890) Also dynamically limit the GPU layers and context length fields to the maximum supported by the model. Signed-off-by: Jared Van Bortel <jared@nomic.ai> 2024-01-31 14:17:44 -05:00
			`int32_t maxContextLength(std::string const &modelPath) const override;`
			`int32_t layerCount(std::string const &modelPath) const override;`
Add llama.cpp support for loading llama based models in the gui. We now support loading both gptj derived models and llama derived models. 2023-04-15 15:57:32 -04:00			`};`

Dlopen backend 5 (#779) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved. 2023-05-31 17:04:01 -04:00			`#endif // LLAMAMODEL_H`