2.5 KiB
Generation Parameters
For a technical description of the parameters, the transformers documentation is a good reference.
The best presets, according to the Preset Arena experiment, are:
Instruction following:
- Divine Intellect
- Big O
- simple-1
- Space Alien
- StarChat
- Titanic
- tfs-with-top-a
- Asterism
- Contrastive Search
Chat:
- Midnight Enigma
- Yara
- Shortwave
Temperature
Primary factor to control randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
top_p
If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
top_k
Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
typical_p
If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
epsilon_cutoff
In units of 1e-4; a reasonable value is 3. This sets a probability floor below which tokens are excluded from being sampled. Should be used with top_p, top_k, and eta_cutoff set to 0.
eta_cutoff
In units of 1e-4; a reasonable value is 3. Should be used with top_p, top_k, and epsilon_cutoff set to 0.
repetition_penalty
Exponential penalty factor for repeating prior tokens. 1 means no penalty, higher value = less repetition, lower value = more repetition.
repetition_penalty_range
The number of most recent tokens to consider for repetition penalty. 0 makes all tokens be used.
encoder_repetition_penalty
Also known as the "Hallucinations filter". Used to penalize tokens that are not in the prior text. Higher value = more likely to stay in context, lower value = more likely to diverge.
no_repeat_ngram_size
If not set to 0, specifies the length of token sets that are completely blocked from repeating at all. Higher values = blocks larger phrases, lower values = blocks words or letters from repeating. Only 0 or high values are a good idea in most cases.
min_length
Minimum generation length in tokens.
penalty_alpha
Contrastive Search is enabled by setting this to greater than zero and unchecking "do_sample". It should be used with a low value of top_k, for instance, top_k = 4.