From b5edaa26561c713ba04c6d1ea7eb74855307b46e Mon Sep 17 00:00:00 2001
From: Adam Treat <treat.adam@gmail.com>
Date: Tue, 30 May 2023 12:58:18 -0400
Subject: [PATCH] Revert "add tokenizer readme w/ instructions for convert
 script"

This reverts commit 5063c2c1b2d0d97a124d18a7e03eb6d0ccc0a5ab.
---
 gpt4all-backend/tokenizer/README.md | 14 --------------
 1 file changed, 14 deletions(-)
 delete mode 100644 gpt4all-backend/tokenizer/README.md

diff --git a/gpt4all-backend/tokenizer/README.md b/gpt4all-backend/tokenizer/README.md
deleted file mode 100644
index 46563a2a..00000000
--- a/gpt4all-backend/tokenizer/README.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# BPE tokenizer
-
-This is a C++ implementation of the encoding/decoding functions of a pretrained GPT-2 style BPE tokenizer. It is meant to be compatible with the GPT-J and MPT-7B tokenizers that were trained with HuggingFace [`tokenizers`](https://github.com/huggingface/tokenizers), and only implements the necessary functionality for those models (it is assumed that strings should always be [normalized](https://en.wikipedia.org/wiki/Unicode_equivalence) to Unicode NFC form and split with the GPT-2 "pretokenizing" [regular expression](https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/bpe.py#L92))
-
-## Converting a tokenizer file
-
-`scripts/gen_tokenizer_include.py` can be used to convert a huggingface `tokenizers` `tokenizer.json` file into a C++ header file:
-
-```bash
-# get tokenizer.json
-cd gpt4all-backend
-wget -O /tmp/gptj-tokenizer.json https://huggingface.co/nomic-ai/gpt4all-j/raw/main/tokenizer.json
-python ./scripts/gen_tokenizer_include.py /tmp/gptj-tokenizer.json gptj > ./tokenizer/gptj_tokenizer_config.h
-```