Commit Graph

116 Commits

Author SHA1 Message Date
Georgi Gerganov
c8f8e2364c cont : simplify 2026-05-11 10:54:07 +03:00
Aman Gupta
a428b010ab spec: support MTP 2026-05-11 11:28:30 +08:00
Georgi Gerganov
db8e326913 spec : introduce common_speculative_process() 2026-05-09 17:12:24 +03:00
Georgi Gerganov
ec8bc44854 cont : minor 2026-05-09 16:38:17 +03:00
Georgi Gerganov
b3bd3bd4cc cont : clean-up 2026-05-09 15:03:20 +03:00
Georgi Gerganov
ce0acf03ea server, spec : clean-up 2026-05-09 10:21:57 +03:00
Georgi Gerganov
f1652197dd server : support parallel drafting 2026-05-08 19:30:31 +03:00
Georgi Gerganov
f88c942861 spec : support parallel drafts 2026-05-08 18:53:33 +03:00
Georgi Gerganov
927d6635d3 cont : prepare params 2026-05-08 17:50:20 +03:00
Georgi Gerganov
8822c122be cont : prepare params 2026-05-08 17:06:24 +03:00
Georgi Gerganov
6582523eaa spec : refactor for multi-sequence speculative context 2026-05-08 15:43:36 +03:00
Georgi Gerganov
efa2f8e5a7 naming : improve consistency 2026-05-08 12:24:57 +03:00
Georgi Gerganov
1dbc054da5 server : fix slot ctx_drft ptr 2026-05-08 11:55:05 +03:00
Georgi Gerganov
e5b1401318 speculative-simple : update 2026-05-08 11:09:34 +03:00
Georgi Gerganov
3b1a8df8fd server : clean-up + dry 2026-05-08 10:20:01 +03:00
Georgi Gerganov
233d1aee69 server : add comment
[no ci]
2026-05-08 08:50:23 +03:00
Georgi Gerganov
6a4b05a030 server : fix mtmd draft processing 2026-05-08 08:02:11 +03:00
Georgi Gerganov
7e118cdce0 cont : process images throught the draft context 2026-05-07 21:44:09 +03:00
Georgi Gerganov
ae6703fa89 cont : pass correct n_past for drafting 2026-05-07 21:44:08 +03:00
Georgi Gerganov
0239f4c611 cont : handle non-ckpt models 2026-05-07 21:44:08 +03:00
Georgi Gerganov
c7facb0fe1 cont : async drft eval when possible 2026-05-07 21:44:08 +03:00
Georgi Gerganov
08c8012bde cont : sync main and drft contexts 2026-05-07 21:44:08 +03:00
Georgi Gerganov
de35b1255c server, spec : transition to unified spec context 2026-05-07 21:44:08 +03:00
Georgi Gerganov
1afee5b262 server : improve ctx names
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
11fd5e7272 server : draft prompt cache and checkpoints
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
c97dc3605e server : sketch the ctx_dft decode loop
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
8a50f6f0b9 cont : dedup ctx_seq_rm_type
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
77269ad8a7 cont : pass seq_id
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
4550f0f08b spec : update common_speculative_init()
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
befc7ef635 spec : drop support for incompatible vocabs
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
2c9a40849f spec : refactor
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
d6e7b033a4 llama : add option to save memory in device buffers (#22679)
* llama : add option to save memory in device buffers

* tests : extend llama-save-load-state
2026-05-05 06:35:07 +03:00
Georgi Gerganov
0754b7b6fe server : avoid checkpoint data host copies (#22558)
* server : avoid checkpoint data host copies

* llama : refactor llama_io_read_i
2026-05-02 18:03:25 +03:00
Georgi Gerganov
80afa33aad spec : fix draft model checkpoints (#22521)
* spec : fix draft model checkpoints

* cont : clean-up

* cont : gate the ngram-mod reset warning behind verbose flag
2026-04-30 08:32:18 +03:00
Georgi Gerganov
683c5acb90 spec : disacard last drafted token with low prob (#22506) 2026-04-29 17:00:00 +03:00
Georgi Gerganov
14e733e36f spec : refactor params (#22397)
* spec : refactor params

* cont : fix

* cont : rename "sparam" to "sampling"

* cont : add spec params category

* cont : add info about removed arguments

* cont : skip param length check for spec params

* cont : adapt server tests
2026-04-28 09:07:33 +03:00
Aman Gupta
516e8d7a8a server: use pos_next instead of n_tokens for m-rope (#22439) 2026-04-28 08:41:00 +03:00
Georgi Gerganov
ffdd983fb8 server : fix swa-full logic (#22288) 2026-04-24 10:17:37 +03:00
Yes You Can Have Your Own
793d0a7931 server: rename debug tags to match --cache-idle-slots naming (#22292) 2026-04-24 09:28:44 +03:00
Tarek Dakhran
550d684bd1 server: Enable transcriptions API for LFM2-Audio (#22000) 2026-04-23 10:47:26 +02:00
Georgi Gerganov
bcb5eeb645 speculative-simple : add checkpoint support (#22227)
* speculative-simple : add checkpoint support

* cont : fix build
2026-04-22 15:44:45 +03:00
Ethan Turner
750579ff14 common: Refactoring sampler parameters (#20429) (#22233)
This change refactors the reasoning_budget_message parameter from the
common params into the sampling parameters specifically. It also removes
the reasoning_budget common parameter and standardizes on the existing
reasoning_budget_tokens parameter in the sampling configuration.

Issue: https://github.com/ggml-org/llama.cpp/issues/20429
Original PR: https://github.com/ggml-org/llama.cpp/pull/20297
2026-04-22 10:40:19 +02:00
Piotr Wilkin (ilintar)
134d6e54d4 common/chat, server: refactor, move all conversion functions to common, add tests (#20690)
* Refactor conversion functions
2026-04-22 10:28:45 +02:00
Georgi Gerganov
cf8b0dbda9 server : remove /api endpoints (#22165)
* server : remove /api endpoints

* cont : remove /api/tags
2026-04-20 20:41:19 +03:00
Georgi Gerganov
de71b5f81c server : refactor "use checkpoint" logic (#22114) 2026-04-20 08:42:37 +03:00
Yes You Can Have Your Own
9d49acb2a7 server: rename --clear-idle to --cache-idle-slots (#21741) 2026-04-20 08:30:24 +03:00
Sascha Rogmann
455d8e4be8 server : speculative checkpointing (#19493)
* server : speculative decoding using checkpoints

* server : fix draft check with checkpoints

* server : rename spec vars

* server : log levels

* server : refactored spec logic to speculative.cpp

* server : renamed spec checkpoints option

* server : fix spec checkpoints, logging

* speculative : checkpoints with draft model, logging

* server : n_tokens_cur and create_checkpoint in draft

* server : fix server_speculative_callback (slot.id)

* spec : fix ngram-map/begin idx_last_check

* spec : init ckpt (begin() wasn't called)

* chore: update webui build output

* server : restore sampler in spec checkpoint and clear mem

* cont : avoid --spec-use-checkpoints argument

* cont : remove server_prompt_checkpoint_with_size

* spec : rename (leave_draft_state)

* cont : clean-up

* cont : do not ignore partial drafts even if the are short

* cont : spec callback owned by session

* cont : simplify

* cont : avoid empty speculative session

* cont : simplify

* cont : simplify

* cont : enable mtmd speculative decoding

* cont : keep the spec sampler alive

* cont : simplify

* cont : fix nullptr deref + draft checkpoints

* cont : remove common_speculative_accept_response

* cont : remove callback

* cont : simplify

* cont : minor

* cont : simplify

* cont : fix accepted number

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-19 10:24:06 +03:00
Cetarthoriphros
9e5647affa server: Expose media_tag on /props endpoint. (#22028) 2026-04-19 00:27:17 +02:00
Georgi Gerganov
6990e2f1f7 libs : rename libcommon -> libllama-common (#21936)
* cmake : allow libcommon to be shared

* cmake : rename libcommon to libllama-common

* cont : set -fPIC for httplib

* cont : export all symbols

* cont : fix build_info exports

* libs : add libllama-common-base

* log : add common_log_get_verbosity_thold()
2026-04-17 11:11:46 +03:00
Xuan-Son Nguyen
408225bb1a server: use random media marker (#21962)
* server: use random media marker

* nits

* remove legacy <__image__> token

* revert special char in random
2026-04-15 23:52:22 +02:00