mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-15 21:44:05 +00:00
Make backend's top_p sampler inclusive
In addition to match the algorithm proposed in the original [paper](https://arxiv.org/abs/1904.09751), this resolves the edge-case where `max_p is > top_p` for a single logit, where the mask would otherwise be empty (and we thus sample from the whole vocabulary with equal likelihood)
This commit is contained in:
@@ -512,6 +512,7 @@ static void test_backend_top_p_sampling(const char * model_path) {
|
||||
}
|
||||
}
|
||||
GGML_ASSERT(filtered_logits.size() < (size_t) test_ctx.n_vocab);
|
||||
GGML_ASSERT(filtered_logits.size() > 0);
|
||||
|
||||
// Sample using CPU sampler for verification to inspect they are reasonable
|
||||
struct llama_sampler_chain_params chain_params = llama_sampler_chain_default_params();
|
||||
|
||||
Reference in New Issue
Block a user