Georgi Gerganov
c8f8e2364c
cont : simplify
2026-05-11 10:54:07 +03:00
Aman Gupta
a428b010ab
spec: support MTP
2026-05-11 11:28:30 +08:00
Georgi Gerganov
db8e326913
spec : introduce common_speculative_process()
2026-05-09 17:12:24 +03:00
Georgi Gerganov
ec8bc44854
cont : minor
2026-05-09 16:38:17 +03:00
Georgi Gerganov
b3bd3bd4cc
cont : clean-up
2026-05-09 15:03:20 +03:00
Georgi Gerganov
ce0acf03ea
server, spec : clean-up
2026-05-09 10:21:57 +03:00
Georgi Gerganov
f1652197dd
server : support parallel drafting
2026-05-08 19:30:31 +03:00
Georgi Gerganov
f88c942861
spec : support parallel drafts
2026-05-08 18:53:33 +03:00
Georgi Gerganov
927d6635d3
cont : prepare params
2026-05-08 17:50:20 +03:00
Georgi Gerganov
8822c122be
cont : prepare params
2026-05-08 17:06:24 +03:00
Georgi Gerganov
6582523eaa
spec : refactor for multi-sequence speculative context
2026-05-08 15:43:36 +03:00
Georgi Gerganov
efa2f8e5a7
naming : improve consistency
2026-05-08 12:24:57 +03:00
Georgi Gerganov
1dbc054da5
server : fix slot ctx_drft ptr
2026-05-08 11:55:05 +03:00
Georgi Gerganov
e5b1401318
speculative-simple : update
2026-05-08 11:09:34 +03:00
Georgi Gerganov
3b1a8df8fd
server : clean-up + dry
2026-05-08 10:20:01 +03:00
Georgi Gerganov
233d1aee69
server : add comment
...
[no ci]
2026-05-08 08:50:23 +03:00
Georgi Gerganov
6a4b05a030
server : fix mtmd draft processing
2026-05-08 08:02:11 +03:00
Georgi Gerganov
7e118cdce0
cont : process images throught the draft context
2026-05-07 21:44:09 +03:00
Georgi Gerganov
ae6703fa89
cont : pass correct n_past for drafting
2026-05-07 21:44:08 +03:00
Georgi Gerganov
0239f4c611
cont : handle non-ckpt models
2026-05-07 21:44:08 +03:00
Georgi Gerganov
c7facb0fe1
cont : async drft eval when possible
2026-05-07 21:44:08 +03:00
Georgi Gerganov
08c8012bde
cont : sync main and drft contexts
2026-05-07 21:44:08 +03:00
Georgi Gerganov
de35b1255c
server, spec : transition to unified spec context
2026-05-07 21:44:08 +03:00
Georgi Gerganov
1afee5b262
server : improve ctx names
...
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
11fd5e7272
server : draft prompt cache and checkpoints
...
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
c97dc3605e
server : sketch the ctx_dft decode loop
...
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
8a50f6f0b9
cont : dedup ctx_seq_rm_type
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
77269ad8a7
cont : pass seq_id
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
4550f0f08b
spec : update common_speculative_init()
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
befc7ef635
spec : drop support for incompatible vocabs
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
2c9a40849f
spec : refactor
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
d6e7b033a4
llama : add option to save memory in device buffers ( #22679 )
...
* llama : add option to save memory in device buffers
* tests : extend llama-save-load-state
2026-05-05 06:35:07 +03:00
Georgi Gerganov
0754b7b6fe
server : avoid checkpoint data host copies ( #22558 )
...
* server : avoid checkpoint data host copies
* llama : refactor llama_io_read_i
2026-05-02 18:03:25 +03:00
Georgi Gerganov
80afa33aad
spec : fix draft model checkpoints ( #22521 )
...
* spec : fix draft model checkpoints
* cont : clean-up
* cont : gate the ngram-mod reset warning behind verbose flag
2026-04-30 08:32:18 +03:00
Georgi Gerganov
683c5acb90
spec : disacard last drafted token with low prob ( #22506 )
2026-04-29 17:00:00 +03:00
Georgi Gerganov
14e733e36f
spec : refactor params ( #22397 )
...
* spec : refactor params
* cont : fix
* cont : rename "sparam" to "sampling"
* cont : add spec params category
* cont : add info about removed arguments
* cont : skip param length check for spec params
* cont : adapt server tests
2026-04-28 09:07:33 +03:00
Aman Gupta
516e8d7a8a
server: use pos_next instead of n_tokens for m-rope ( #22439 )
2026-04-28 08:41:00 +03:00
Georgi Gerganov
ffdd983fb8
server : fix swa-full logic ( #22288 )
2026-04-24 10:17:37 +03:00
Yes You Can Have Your Own
793d0a7931
server: rename debug tags to match --cache-idle-slots naming ( #22292 )
2026-04-24 09:28:44 +03:00
Tarek Dakhran
550d684bd1
server: Enable transcriptions API for LFM2-Audio ( #22000 )
2026-04-23 10:47:26 +02:00
Georgi Gerganov
bcb5eeb645
speculative-simple : add checkpoint support ( #22227 )
...
* speculative-simple : add checkpoint support
* cont : fix build
2026-04-22 15:44:45 +03:00
Ethan Turner
750579ff14
common: Refactoring sampler parameters ( #20429 ) ( #22233 )
...
This change refactors the reasoning_budget_message parameter from the
common params into the sampling parameters specifically. It also removes
the reasoning_budget common parameter and standardizes on the existing
reasoning_budget_tokens parameter in the sampling configuration.
Issue: https://github.com/ggml-org/llama.cpp/issues/20429
Original PR: https://github.com/ggml-org/llama.cpp/pull/20297
2026-04-22 10:40:19 +02:00
Piotr Wilkin (ilintar)
134d6e54d4
common/chat, server: refactor, move all conversion functions to common, add tests ( #20690 )
...
* Refactor conversion functions
2026-04-22 10:28:45 +02:00
Georgi Gerganov
cf8b0dbda9
server : remove /api endpoints ( #22165 )
...
* server : remove /api endpoints
* cont : remove /api/tags
2026-04-20 20:41:19 +03:00
Georgi Gerganov
de71b5f81c
server : refactor "use checkpoint" logic ( #22114 )
2026-04-20 08:42:37 +03:00
Yes You Can Have Your Own
9d49acb2a7
server: rename --clear-idle to --cache-idle-slots ( #21741 )
2026-04-20 08:30:24 +03:00
Sascha Rogmann
455d8e4be8
server : speculative checkpointing ( #19493 )
...
* server : speculative decoding using checkpoints
* server : fix draft check with checkpoints
* server : rename spec vars
* server : log levels
* server : refactored spec logic to speculative.cpp
* server : renamed spec checkpoints option
* server : fix spec checkpoints, logging
* speculative : checkpoints with draft model, logging
* server : n_tokens_cur and create_checkpoint in draft
* server : fix server_speculative_callback (slot.id)
* spec : fix ngram-map/begin idx_last_check
* spec : init ckpt (begin() wasn't called)
* chore: update webui build output
* server : restore sampler in spec checkpoint and clear mem
* cont : avoid --spec-use-checkpoints argument
* cont : remove server_prompt_checkpoint_with_size
* spec : rename (leave_draft_state)
* cont : clean-up
* cont : do not ignore partial drafts even if the are short
* cont : spec callback owned by session
* cont : simplify
* cont : avoid empty speculative session
* cont : simplify
* cont : simplify
* cont : enable mtmd speculative decoding
* cont : keep the spec sampler alive
* cont : simplify
* cont : fix nullptr deref + draft checkpoints
* cont : remove common_speculative_accept_response
* cont : remove callback
* cont : simplify
* cont : minor
* cont : simplify
* cont : fix accepted number
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2026-04-19 10:24:06 +03:00
Cetarthoriphros
9e5647affa
server: Expose media_tag on /props endpoint. ( #22028 )
2026-04-19 00:27:17 +02:00
Georgi Gerganov
6990e2f1f7
libs : rename libcommon -> libllama-common ( #21936 )
...
* cmake : allow libcommon to be shared
* cmake : rename libcommon to libllama-common
* cont : set -fPIC for httplib
* cont : export all symbols
* cont : fix build_info exports
* libs : add libllama-common-base
* log : add common_log_get_verbosity_thold()
2026-04-17 11:11:46 +03:00
Xuan-Son Nguyen
408225bb1a
server: use random media marker ( #21962 )
...
* server: use random media marker
* nits
* remove legacy <__image__> token
* revert special char in random
2026-04-15 23:52:22 +02:00