Make backend's top_p sampler inclusive

In addition to match the algorithm proposed in the original
[paper](https://arxiv.org/abs/1904.09751), this resolves the edge-case
where `max_p is > top_p` for a single logit, where the mask would
otherwise be empty (and we thus sample from the whole vocabulary with
equal likelihood)
This commit is contained in:
Oliver Simons
2025-12-01 15:24:32 +01:00
parent ae0bb6a6da
commit 217469f07f
2 changed files with 14 additions and 1 deletions

View File

@@ -512,6 +512,7 @@ static void test_backend_top_p_sampling(const char * model_path) {
}
}
GGML_ASSERT(filtered_logits.size() < (size_t) test_ctx.n_vocab);
GGML_ASSERT(filtered_logits.size() > 0);
// Sample using CPU sampler for verification to inspect they are reasonable
struct llama_sampler_chain_params chain_params = llama_sampler_chain_default_params();