mirror of
https://github.com/nomic-ai/gpt4all.git
synced 2024-10-01 01:06:10 -04:00
ee3469ba6c
Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling
5 lines
81 B
Plaintext
5 lines
81 B
Plaintext
[codespell]
|
|
skip = .git,*.pdf,*.svg,*_tokenizer_config.h
|
|
#
|
|
# ignore-words-list =
|