mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2024-09-19 23:35:41 +00:00
bbcee1ced5
Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling |
||
---|---|---|
.. | ||
bpe.cpp | ||
bpe.h | ||
gptj_tokenizer_config.h | ||
mpt_tokenizer_config.h |