Georgi Gerganov
1dbc054da5
server : fix slot ctx_drft ptr
2026-05-08 11:55:05 +03:00
Georgi Gerganov
e5b1401318
speculative-simple : update
2026-05-08 11:09:34 +03:00
Georgi Gerganov
3b1a8df8fd
server : clean-up + dry
2026-05-08 10:20:01 +03:00
Georgi Gerganov
233d1aee69
server : add comment
...
[no ci]
2026-05-08 08:50:23 +03:00
Georgi Gerganov
12c7cfbe83
server : fix URL for draft model
2026-05-08 08:03:49 +03:00
Georgi Gerganov
6a4b05a030
server : fix mtmd draft processing
2026-05-08 08:02:11 +03:00
Georgi Gerganov
7e118cdce0
cont : process images throught the draft context
2026-05-07 21:44:09 +03:00
Georgi Gerganov
ae6703fa89
cont : pass correct n_past for drafting
2026-05-07 21:44:08 +03:00
Georgi Gerganov
0239f4c611
cont : handle non-ckpt models
2026-05-07 21:44:08 +03:00
Georgi Gerganov
c7facb0fe1
cont : async drft eval when possible
2026-05-07 21:44:08 +03:00
Georgi Gerganov
08c8012bde
cont : sync main and drft contexts
2026-05-07 21:44:08 +03:00
Georgi Gerganov
de35b1255c
server, spec : transition to unified spec context
2026-05-07 21:44:08 +03:00
Georgi Gerganov
1afee5b262
server : improve ctx names
...
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
11fd5e7272
server : draft prompt cache and checkpoints
...
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
c97dc3605e
server : sketch the ctx_dft decode loop
...
[no ci]
2026-05-07 21:44:08 +03:00
Georgi Gerganov
8a50f6f0b9
cont : dedup ctx_seq_rm_type
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
77269ad8a7
cont : pass seq_id
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
4550f0f08b
spec : update common_speculative_init()
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
befc7ef635
spec : drop support for incompatible vocabs
...
[no ci]
2026-05-07 21:44:07 +03:00
Georgi Gerganov
2c9a40849f
spec : refactor
...
[no ci]
2026-05-07 21:44:07 +03:00
Pascal
cc97e45a14
mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT ( #22770 )
2026-05-07 14:01:01 +02:00
Pascal
f4b5a2ee91
webui: fix ?model= URL param race in router mode ( #22771 )
...
* webui: fix ?model= URL param race in router mode
* chore: update webui build output
2026-05-07 13:09:32 +02:00
viggy
e358d75adb
webui: fix flicker issue on dismiss animation on overlay primitives ( #22773 )
...
* add fill-mode-forwards
* generated diffs
2026-05-07 08:11:31 +02:00
tc-mb
2496f9c149
mtmd : support MiniCPM-V 4.6 ( #22529 )
...
* Support MiniCPM-V 4.6 in new branch
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* fix code bug
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* fix pre-commit
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* fix convert
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* rename clip_graph_minicpmv4_6
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* use new TYPE_MINICPMV4_6
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* use build_attn to allow flash attention support
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* no use legacy code, restored here.
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* use the existing tensors name
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* unused ctx->model.hparams.minicpmv_version
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* use n_merge for slice alignment
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* borrow wa_layer_indexes for vit_merger insertion point
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* fix code style
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* use filter_tensors and add model.vision_tower
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* fix chkhsh
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
* fix type check
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
---------
Signed-off-by: tc-mb <tianchi_cai@icloud.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2026-05-06 21:54:09 +02:00
Yakine Tahtah
a00e47e422
mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) ( #22101 )
...
* mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech)
Conformer encoder with Shaw relative position encoding,
QFormer projector, log-mel spectrogram with frame stacking.
Encoder uses GLU gating, folded batch norm, and SSM depthwise
conv. QFormer compresses encoder output via windowed
cross-attention (window=15, queries=3) into the LLM embedding
space.
Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank,
dynamic range compression, 2x frame stacking (80->160 mel).
GGUF converter handles batch norm folding at export time,
fused K/V split, and Conv1d weight reshaping.
Tested against HF transformers reference: token-for-token match
on 30s/60s audio clips with greedy decoding.
* mtmd: rename gs_ prefixed tensors to generic/architecture names
* mtmd: use tensor_mapping.py for all granite_speech tensors
* convert: fold GraniteSpeechTextModel into GraniteModel
* mtmd: replace n_layer hack with explicit has_standard_layers flag
* mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech
* mtmd: align KEY_A_ define spacing
* convert: register GraniteModel for GraniteSpeechForConditionalGeneration
* convert: fix ty type-check for GraniteSpeechMmprojModel registration
* mtmd: align TN_ define spacing
* mtmd: use generic layer loop for granite speech tensor loading
* mtmd: merge qformer_proj_layer into clip_layer
* mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs
* mtmd: granite_speech add comment explaining why build_attn is not used
* mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata
* gguf: add spacing between granite_speech tensor mapping blocks
* mtmd: make generic audio layer_norm_eps read optional
* mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps
* mtmd: align defines and struct fields in clip-impl.h and clip-model.h
* mtmd: fix alignment and ordering issues across granite speech files
* convert: granite_speech use filter_tensors instead of modify_tensors for skipping
2026-05-06 14:40:59 +02:00
Aleksander Grygier
e3e3f8e46a
webui: Remove Google Favicons & Improve MCP Information logic & UI ( #22719 )
...
* refactor: Remove Google favicon utility
* fix: MCP Server favicon
* refactor: Cleanup
* refactor: MCP Server Information
* fix: Fix MCP Settings UI
* refactor: Cleanup
2026-05-06 11:12:27 +02:00
viggy
07eaf919ed
add tabindex and aria-hidden ( #22699 )
2026-05-06 09:21:58 +02:00
Adrien Gallouët
bf76ac77be
common : only load backends when required ( #22290 )
...
* common : only load backends when required
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
* llama : call ggml_backend_load_all() directly from llama_backend_init()
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
* Add ggml_backend_load_all() where llama_backend_init() is not used
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
2026-05-05 09:23:50 +02:00
Georgi Gerganov
2bacb1eb77
server : validate --tools CLI argument against known tool names ( #22538 )
...
Previously, unknown tool names passed via --tools were silently ignored.
Now the server validates each tool name at startup and exits with an
error if an unrecognized tool is specified, listing the available tools.
Assisted-by: llama.cpp:local pi
2026-05-05 06:35:27 +03:00
Georgi Gerganov
d6e7b033a4
llama : add option to save memory in device buffers ( #22679 )
...
* llama : add option to save memory in device buffers
* tests : extend llama-save-load-state
2026-05-05 06:35:07 +03:00
Xuan-Son Nguyen
935a340292
server: implement /models?reload=1 ( #21848 )
2026-05-04 16:23:26 +02:00
JusteLeo
36a694c965
webui : fix circular dependency between chat.service.ts and models.svelte.ts ( #22625 )
2026-05-04 13:38:10 +02:00
Piotr Wilkin (ilintar)
a4701c98f7
common/autoparser: fixes for newline handling / forced tool calls ( #22654 )
...
* chat/autoparser: the fixes
* Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls.
* Trim whitespace on apply instead
2026-05-04 13:18:11 +02:00
Evan Huus
c84e6d6db5
server: Add a simple get_datetime server tool ( #22649 )
2026-05-04 12:19:41 +02:00
Nick Towle
fa8feaed34
webui: restore missing settings ( #22666 )
2026-05-04 09:04:07 +02:00
Georgi Gerganov
846262d787
docs : update speculative decoding parameters after refactor ( #22397 ) ( #22539 )
...
* docs : update speculative decoding parameters after refactor (#22397 )
Update docs/speculative.md to reflect the new parameter naming scheme
introduced in PR #22397 :
- Replace --draft-max/--draft-min with --spec-draft-n-max/--spec-draft-n-min
- Replace --spec-ngram-size-n/m with per-implementation variants
- Add documentation for all new --spec-ngram-*- parameters
- Update all example commands
Assisted-by: llama.cpp:local pi
* pi : add rule to use gh CLI for GitHub resources
Assisted-by: llama.cpp:local pi
* docs : run llama-gen-docs
* arg : fix typo
2026-05-04 08:52:07 +03:00
Georgi Gerganov
0754b7b6fe
server : avoid checkpoint data host copies ( #22558 )
...
* server : avoid checkpoint data host copies
* llama : refactor llama_io_read_i
2026-05-02 18:03:25 +03:00
Aleksander Grygier
ab6120cde5
webui: Spring Cleaning Refactor v1 ( #22505 )
...
* wip: server_tools
* feat: Integrate with `/tools` endpoint
* feat: Builtin + MCP + JSON Schema Tools WIP
* refactor
* displayName -> display_name
* snake_case everywhere
* rm redundant field
* feat: Improvements
* chore: update webui build output
* refactor: Updates after server updates
* chore: update webui build output
* change arg to --tools all
* feat: UI improvements
* chore: update webui build output
* add readme mention
* llama-gen-docs
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* feat: Reorganize settings sections
* feat: Separate dialogs for MCP Servers Settings and Import/Export
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* WIP on allozaur/20677-webui-server-tools
* feat: UI improvements
* chore: Update package lock
* chore: Run `npm audit fix`
* feat: UI WIP
* feat: UI
* refactor: Desktop Icon Strip DRY
* feat: Cleaner rendering and transition for ChatScreen
* feat: UI improvements
* feat: UI improvement
* feat: Remove MCP Server "enable" switch from Tools submenu
* chore: Run `npm audit fix`
* feat: WIP
* feat: Logic improvements
* refactor: Cleanup
* refactor: DRY
* test: Fix Chat Sidebar UI Tests
* chore: Update package lock
* refactor: Cleanup
* feat: Chat Message Action Card with Continue and Permission flow implementations
* feat: Add agentic steering messages, draft messages and improve chat UX
* fix: Search results UI
* test: Fix unit test
* feat: UI/UX improvements
* refactor: Simplify `useToolsPanel` access in components
* feat: Implement Processing Info Context API
* feat: Implement 'Go back to chat' functionality for settings
* feat: Enhance MCP Server management in Chat Form Attachments
* style: Minor UI and branding adjustments
* chore: Update webui static build output
* chore: Formatting, linting & type checks
* feat: Draft messages logic
* feat: UI improvements
* feat: Steering Messages improvements
* refactor: Cleanup
* refactor: Cleanup
* feat: Improve UI
* refactor: Settings navigation hook
* refactor: DRY code
* refactor: DRY ChatMessageUser UI components
* refactor: Desktop Icon Strip DRY
* refactor: Tools & permissions
* fix: Navigation condition
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* fix: preserve reasoning_content in agentic flow
* refactor: Storybook cleanup
* refactor: isInViewport util function
* refactor: Rename globally `onClick` to `onclick`
* chore: `npm audit fix`
* refactor: Action Icon usage
* refactor: Naming
* refactor: JS in `class` directive
* refactor: Chat components cleanup WIP
* refactor: Components structure
* refactor: Cleanup WIP
* feat: New ChatAttachmentsPreview component
* feat: UI improvements
* feat: UI improvements
* refactor: Cleanup
* refactor: ChatAttachmentsPreview UI/UX
* refactor: Remove dead code
* refactor: Cleanup
* fix: Model Name aliases displaying
* feat: Shortcut improvements
* refactor: Chat Message
* feat: Move Import/Export to settings
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2026-05-01 18:36:29 +02:00
Georgi Gerganov
80afa33aad
spec : fix draft model checkpoints ( #22521 )
...
* spec : fix draft model checkpoints
* cont : clean-up
* cont : gate the ngram-mod reset warning behind verbose flag
2026-04-30 08:32:18 +03:00
Georgi Gerganov
683c5acb90
spec : disacard last drafted token with low prob ( #22506 )
2026-04-29 17:00:00 +03:00
Pascal
59237bfbbc
webui: fix slow mic stop and WAV encode ( #22480 )
...
* webui: instant mic stop, race-free recorder restart
* webui: faster WAV PCM encode via hoisted channels and Int16Array
* chore: update webui build output
* webui: drop setTimeout(0) hack and harden cancelRecording
* chore: update webui build output
2026-04-29 12:58:35 +02:00
Aleksander Grygier
f42e29fdf1
webui: Server tools ( #21237 )
...
* wip: server_tools
* feat: Integrate with `/tools` endpoint
* feat: Builtin + MCP + JSON Schema Tools WIP
* refactor
* displayName -> display_name
* snake_case everywhere
* rm redundant field
* feat: Improvements
* chore: update webui build output
* refactor: Updates after server updates
* chore: update webui build output
* change arg to --tools all
* feat: UI improvements
* chore: update webui build output
* add readme mention
* llama-gen-docs
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* feat: Reorganize settings sections
* feat: Separate dialogs for MCP Servers Settings and Import/Export
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* WIP on allozaur/20677-webui-server-tools
* feat: UI improvements
* chore: Update package lock
* chore: Run `npm audit fix`
* feat: UI WIP
* feat: UI
* refactor: Desktop Icon Strip DRY
* feat: Cleaner rendering and transition for ChatScreen
* feat: UI improvements
* feat: UI improvement
* feat: Remove MCP Server "enable" switch from Tools submenu
* chore: Run `npm audit fix`
* feat: WIP
* feat: Logic improvements
* refactor: Cleanup
* refactor: DRY
* test: Fix Chat Sidebar UI Tests
* chore: Update package lock
* refactor: Cleanup
* feat: Chat Message Action Card with Continue and Permission flow implementations
* feat: Add agentic steering messages, draft messages and improve chat UX
* fix: Search results UI
* test: Fix unit test
* feat: UI/UX improvements
* refactor: Simplify `useToolsPanel` access in components
* feat: Implement Processing Info Context API
* feat: Implement 'Go back to chat' functionality for settings
* feat: Enhance MCP Server management in Chat Form Attachments
* style: Minor UI and branding adjustments
* chore: Update webui static build output
* chore: Formatting, linting & type checks
* feat: Draft messages logic
* feat: UI improvements
* feat: Steering Messages improvements
* refactor: Cleanup
* refactor: Cleanup
* feat: Improve UI
* refactor: Settings navigation hook
* refactor: DRY code
* refactor: DRY ChatMessageUser UI components
* refactor: Desktop Icon Strip DRY
* refactor: Tools & permissions
* fix: Navigation condition
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* fix: preserve reasoning_content in agentic flow
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2026-04-28 14:35:49 +03:00
Georgi Gerganov
14e733e36f
spec : refactor params ( #22397 )
...
* spec : refactor params
* cont : fix
* cont : rename "sparam" to "sampling"
* cont : add spec params category
* cont : add info about removed arguments
* cont : skip param length check for spec params
* cont : adapt server tests
2026-04-28 09:07:33 +03:00
Aman Gupta
516e8d7a8a
server: use pos_next instead of n_tokens for m-rope ( #22439 )
2026-04-28 08:41:00 +03:00
tha80
983ca8992e
server: (router) Forward form-data to model server ( Fixes #22044 ) ( #22118 )
...
* This commit enables the router to forward form-data to model server.
Fixes #22044 (enabling to use the /v1/audio/transcriptions in router mode)
* * Applied the suggestion from Copilots first comment: using the non-throwing json::parse overload.
* Addressed Copilots third comment by extending the files representation to also include filename and content-type
* Addressed Copilots fourth comment by making the RNG thread_local
* Changed variable body from std::string to std::ostringstream in build_multipart_body
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127099053
* Added sanitize_field lambda in build_multipart_body for key, filename and content_type
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647
* explicitly checking if value/item is string before calling value/item.get<std::string>()
as requested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127111279
* Added double quote to the sanitize lambda and throw on json parse failure
---------
Co-authored-by: Ralph Paßgang <ralph@trust-it.de >
2026-04-27 23:55:00 +02:00
unraido
ceaf47c4b1
fix: rpc-server cache may not work in Windows environments ( #22394 )
...
* fix: create directory and log cache file name.
* Remove GGML_LOG_INFO conditional compilation.
---------
Co-authored-by: kotaro <kotaro.kusunoki@gmail.com >
2026-04-27 17:25:09 +03:00
Max Krasnyansky
5594d13224
common: fix missing exports in llama-common ( #22340 )
...
* common: refactor common/debug to move abort_on_nan into base_callback_data
Passing bool abort_on_nan as template parameter for common_debug_cb_eval is unnecessary and creates an issue with LTO.
It should just be a member of the base_callback_data instead.
* cont : cleanup
* common : use pimpl in debug.h to reduce header dependencies
Move common_debug_cb_user_data's data members (std::regex,
std::vector<uint8_t>) into a private impl struct in debug.cpp.
This removes the includes of common.h and <regex> from debug.h,
reducing transitive dependencies for any translation unit that
includes the header.
Assisted-by: llama.cpp:local pi
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2026-04-27 08:06:39 +03:00
Piotr Wilkin (ilintar)
0adede866d
parser: fix structured output bug ( #22302 )
...
* fix very stupid structured output bug
* Things just cannot be too easy.
2026-04-24 23:19:55 +02:00
Georgi Gerganov
ffdd983fb8
server : fix swa-full logic ( #22288 )
2026-04-24 10:17:37 +03:00
Yes You Can Have Your Own
793d0a7931
server: rename debug tags to match --cache-idle-slots naming ( #22292 )
2026-04-24 09:28:44 +03:00