llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-08 10:04:10 +00:00

Files

Georgi Gerganov 1da7b76569 server : fix speculative decoding with context shift (#10641 )

* server : fix speculative decoding with context shift

ggml-ci

* server : take into account speculative limits

ggml-ci

* server : add tests

2024-12-04 22:38:20 +02:00

test_basic.py

2024-11-29 21:48:56 +01:00

test_chat_completion.py

2024-12-02 14:45:54 +01:00

test_completion.py

2024-11-26 16:20:18 +01:00

test_ctx_shift.py

2024-11-26 16:20:18 +01:00

test_embedding.py

2024-11-26 16:20:18 +01:00

test_infill.py

2024-11-29 21:48:56 +01:00

test_lora.py

2024-11-26 16:20:18 +01:00

test_rerank.py

2024-11-29 21:48:56 +01:00

test_security.py

2024-11-26 16:20:18 +01:00

test_slot_save.py

2024-11-26 16:20:18 +01:00

test_speculative.py

2024-12-04 22:38:20 +02:00

test_tokenize.py

2024-11-26 16:20:18 +01:00