mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2024-10-01 01:06:10 -04:00
832720dd27
closer to the behavior of huggingface `tokenizers`, do not attempt to handle additional tokens as if they were part of the original vocabulary as this cannot prevent them from being split into smaller chunks - handle added tokens *before* the regular tokenizing pass note this is still necessary even with a "proper" tokenizer implementation |
||
---|---|---|
.. | ||
llama.cpp@03ceb39c1e | ||
CMakeLists.txt | ||
gptj.cpp | ||
gptj.h | ||
llamamodel.cpp | ||
llamamodel.h | ||
llmodel_c.cpp | ||
llmodel_c.h | ||
llmodel.h | ||
mpt.cpp | ||
mpt.h | ||
utils.cpp | ||
utils.h |