Make backend's top_p sampler inclusive

In addition to match the algorithm proposed in the original [paper](https://arxiv.org/abs/1904.09751), this resolves the edge-case where `max_p is > top_p` for a single logit, where the mask would otherwise be empty (and we thus sample from the whole vocabulary with equal likelihood)
2026-05-15 21:44:05 +00:00 · 2025-12-01 15:24:32 +01:00
parent ae0bb6a6da
commit 217469f07f
2 changed files with 14 additions and 1 deletions
--- a/tests/test-backend-sampler.cpp
+++ b/tests/test-backend-sampler.cpp
@@ -512,6 +512,7 @@ static void test_backend_top_p_sampling(const char * model_path) {
        }
    }
    GGML_ASSERT(filtered_logits.size() < (size_t) test_ctx.n_vocab);
+    GGML_ASSERT(filtered_logits.size() > 0);

    // Sample using CPU sampler for verification to inspect they are reasonable
    struct llama_sampler_chain_params chain_params = llama_sampler_chain_default_params();