smugman-dot
5d6f18a638
webui: fix LLM title generation for agentic conversations ( #22840 )
2026-05-08 16:36:04 +02:00
Xuan-Son Nguyen
29debb3a6a
server: support Vertex AI compatible API ( #22545 )
...
* server: support Vertex AI compatible API
* a bit safer
* support other AIP_* env var
* various fixes
* if AIP_MODE is unset, do nothing
* fix test case
* fix windows build
2026-05-08 15:23:04 +02:00
Xuan-Son Nguyen
9dcf835528
server: (router) expose child model info from router's /v1/models ( #22683 )
...
* server: (router) expose child model info from router's /v1/models
* update docs
2026-05-08 14:42:15 +02:00
Aleksander Grygier
9b2925e1e0
webui: Add Import/Export of Settings configuration + improve architecture ( #22803 )
...
* refactor: Settings keys as constant object keys
* chore: Run `npm audit fix`
* refactor: Settings Sections UI
* feat: Refactor Settings structure and implement import/export logic
* feat: Introduce ROUTES constant and RouterService
* refactor: Consolidate settings definitions into registry
* refactor: Update settings page routing structure
* chore: Migrate hardcoded URLs to use ROUTES and RouterService
* feat: Enhance model selection logic for settings and chat
* chore: Update webui static build
* refactor: Address PR review comments
* fix: Remove unneeded setting
* fix: Re-add missing settings
* fix: Add missing `/slots` proxy for webui dev mode
* chore: Dev-mode logs
* fix: Data binding
* fix: Steering for non-agentic flow
2026-05-08 11:26:04 +02:00
smugman-dot
aaf4a4d5e0
webui: add option for LLM title generation ( #22265 )
...
* webui: add LLM title generation option
* webui: use chat_template_kwargs for title gen + fix conversation check
* webui: capture firstUserMessage before async streamChatCompletion to fix race condition
* webui: extract LLM title generation into separate method
* webui: use constants and ChatService for LLM generated titles
* webui: rebuild static output
* webui: add LLM title generation setting to new settings location
* webui: use sendMessage in generateTitle
* webui: rebuild static output
* webui: fix formatting
* webui: configurable title prompt, remove think tag regexes, fix TS error
* webui: group title constants into TITLE object, use TruncatedText for CSS truncation and fix race condition
* webui: rebuild static output
2026-05-07 21:14:03 +02:00
Pascal
f4b5a2ee91
webui: fix ?model= URL param race in router mode ( #22771 )
...
* webui: fix ?model= URL param race in router mode
* chore: update webui build output
2026-05-07 13:09:32 +02:00
viggy
e358d75adb
webui: fix flicker issue on dismiss animation on overlay primitives ( #22773 )
...
* add fill-mode-forwards
* generated diffs
2026-05-07 08:11:31 +02:00
Aleksander Grygier
e3e3f8e46a
webui: Remove Google Favicons & Improve MCP Information logic & UI ( #22719 )
...
* refactor: Remove Google favicon utility
* fix: MCP Server favicon
* refactor: Cleanup
* refactor: MCP Server Information
* fix: Fix MCP Settings UI
* refactor: Cleanup
2026-05-06 11:12:27 +02:00
viggy
07eaf919ed
add tabindex and aria-hidden ( #22699 )
2026-05-06 09:21:58 +02:00
Georgi Gerganov
2bacb1eb77
server : validate --tools CLI argument against known tool names ( #22538 )
...
Previously, unknown tool names passed via --tools were silently ignored.
Now the server validates each tool name at startup and exits with an
error if an unrecognized tool is specified, listing the available tools.
Assisted-by: llama.cpp:local pi
2026-05-05 06:35:27 +03:00
Georgi Gerganov
d6e7b033a4
llama : add option to save memory in device buffers ( #22679 )
...
* llama : add option to save memory in device buffers
* tests : extend llama-save-load-state
2026-05-05 06:35:07 +03:00
Xuan-Son Nguyen
935a340292
server: implement /models?reload=1 ( #21848 )
2026-05-04 16:23:26 +02:00
JusteLeo
36a694c965
webui : fix circular dependency between chat.service.ts and models.svelte.ts ( #22625 )
2026-05-04 13:38:10 +02:00
Piotr Wilkin (ilintar)
a4701c98f7
common/autoparser: fixes for newline handling / forced tool calls ( #22654 )
...
* chat/autoparser: the fixes
* Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls.
* Trim whitespace on apply instead
2026-05-04 13:18:11 +02:00
Evan Huus
c84e6d6db5
server: Add a simple get_datetime server tool ( #22649 )
2026-05-04 12:19:41 +02:00
Nick Towle
fa8feaed34
webui: restore missing settings ( #22666 )
2026-05-04 09:04:07 +02:00
Georgi Gerganov
846262d787
docs : update speculative decoding parameters after refactor ( #22397 ) ( #22539 )
...
* docs : update speculative decoding parameters after refactor (#22397 )
Update docs/speculative.md to reflect the new parameter naming scheme
introduced in PR #22397 :
- Replace --draft-max/--draft-min with --spec-draft-n-max/--spec-draft-n-min
- Replace --spec-ngram-size-n/m with per-implementation variants
- Add documentation for all new --spec-ngram-*- parameters
- Update all example commands
Assisted-by: llama.cpp:local pi
* pi : add rule to use gh CLI for GitHub resources
Assisted-by: llama.cpp:local pi
* docs : run llama-gen-docs
* arg : fix typo
2026-05-04 08:52:07 +03:00
Georgi Gerganov
0754b7b6fe
server : avoid checkpoint data host copies ( #22558 )
...
* server : avoid checkpoint data host copies
* llama : refactor llama_io_read_i
2026-05-02 18:03:25 +03:00
Aleksander Grygier
ab6120cde5
webui: Spring Cleaning Refactor v1 ( #22505 )
...
* wip: server_tools
* feat: Integrate with `/tools` endpoint
* feat: Builtin + MCP + JSON Schema Tools WIP
* refactor
* displayName -> display_name
* snake_case everywhere
* rm redundant field
* feat: Improvements
* chore: update webui build output
* refactor: Updates after server updates
* chore: update webui build output
* change arg to --tools all
* feat: UI improvements
* chore: update webui build output
* add readme mention
* llama-gen-docs
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* feat: Reorganize settings sections
* feat: Separate dialogs for MCP Servers Settings and Import/Export
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* WIP on allozaur/20677-webui-server-tools
* feat: UI improvements
* chore: Update package lock
* chore: Run `npm audit fix`
* feat: UI WIP
* feat: UI
* refactor: Desktop Icon Strip DRY
* feat: Cleaner rendering and transition for ChatScreen
* feat: UI improvements
* feat: UI improvement
* feat: Remove MCP Server "enable" switch from Tools submenu
* chore: Run `npm audit fix`
* feat: WIP
* feat: Logic improvements
* refactor: Cleanup
* refactor: DRY
* test: Fix Chat Sidebar UI Tests
* chore: Update package lock
* refactor: Cleanup
* feat: Chat Message Action Card with Continue and Permission flow implementations
* feat: Add agentic steering messages, draft messages and improve chat UX
* fix: Search results UI
* test: Fix unit test
* feat: UI/UX improvements
* refactor: Simplify `useToolsPanel` access in components
* feat: Implement Processing Info Context API
* feat: Implement 'Go back to chat' functionality for settings
* feat: Enhance MCP Server management in Chat Form Attachments
* style: Minor UI and branding adjustments
* chore: Update webui static build output
* chore: Formatting, linting & type checks
* feat: Draft messages logic
* feat: UI improvements
* feat: Steering Messages improvements
* refactor: Cleanup
* refactor: Cleanup
* feat: Improve UI
* refactor: Settings navigation hook
* refactor: DRY code
* refactor: DRY ChatMessageUser UI components
* refactor: Desktop Icon Strip DRY
* refactor: Tools & permissions
* fix: Navigation condition
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* fix: preserve reasoning_content in agentic flow
* refactor: Storybook cleanup
* refactor: isInViewport util function
* refactor: Rename globally `onClick` to `onclick`
* chore: `npm audit fix`
* refactor: Action Icon usage
* refactor: Naming
* refactor: JS in `class` directive
* refactor: Chat components cleanup WIP
* refactor: Components structure
* refactor: Cleanup WIP
* feat: New ChatAttachmentsPreview component
* feat: UI improvements
* feat: UI improvements
* refactor: Cleanup
* refactor: ChatAttachmentsPreview UI/UX
* refactor: Remove dead code
* refactor: Cleanup
* fix: Model Name aliases displaying
* feat: Shortcut improvements
* refactor: Chat Message
* feat: Move Import/Export to settings
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2026-05-01 18:36:29 +02:00
Georgi Gerganov
80afa33aad
spec : fix draft model checkpoints ( #22521 )
...
* spec : fix draft model checkpoints
* cont : clean-up
* cont : gate the ngram-mod reset warning behind verbose flag
2026-04-30 08:32:18 +03:00
Georgi Gerganov
683c5acb90
spec : disacard last drafted token with low prob ( #22506 )
2026-04-29 17:00:00 +03:00
Pascal
59237bfbbc
webui: fix slow mic stop and WAV encode ( #22480 )
...
* webui: instant mic stop, race-free recorder restart
* webui: faster WAV PCM encode via hoisted channels and Int16Array
* chore: update webui build output
* webui: drop setTimeout(0) hack and harden cancelRecording
* chore: update webui build output
2026-04-29 12:58:35 +02:00
Aleksander Grygier
f42e29fdf1
webui: Server tools ( #21237 )
...
* wip: server_tools
* feat: Integrate with `/tools` endpoint
* feat: Builtin + MCP + JSON Schema Tools WIP
* refactor
* displayName -> display_name
* snake_case everywhere
* rm redundant field
* feat: Improvements
* chore: update webui build output
* refactor: Updates after server updates
* chore: update webui build output
* change arg to --tools all
* feat: UI improvements
* chore: update webui build output
* add readme mention
* llama-gen-docs
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* feat: Reorganize settings sections
* feat: Separate dialogs for MCP Servers Settings and Import/Export
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* WIP on allozaur/20677-webui-server-tools
* feat: UI improvements
* chore: Update package lock
* chore: Run `npm audit fix`
* feat: UI WIP
* feat: UI
* refactor: Desktop Icon Strip DRY
* feat: Cleaner rendering and transition for ChatScreen
* feat: UI improvements
* feat: UI improvement
* feat: Remove MCP Server "enable" switch from Tools submenu
* chore: Run `npm audit fix`
* feat: WIP
* feat: Logic improvements
* refactor: Cleanup
* refactor: DRY
* test: Fix Chat Sidebar UI Tests
* chore: Update package lock
* refactor: Cleanup
* feat: Chat Message Action Card with Continue and Permission flow implementations
* feat: Add agentic steering messages, draft messages and improve chat UX
* fix: Search results UI
* test: Fix unit test
* feat: UI/UX improvements
* refactor: Simplify `useToolsPanel` access in components
* feat: Implement Processing Info Context API
* feat: Implement 'Go back to chat' functionality for settings
* feat: Enhance MCP Server management in Chat Form Attachments
* style: Minor UI and branding adjustments
* chore: Update webui static build output
* chore: Formatting, linting & type checks
* feat: Draft messages logic
* feat: UI improvements
* feat: Steering Messages improvements
* refactor: Cleanup
* refactor: Cleanup
* feat: Improve UI
* refactor: Settings navigation hook
* refactor: DRY code
* refactor: DRY ChatMessageUser UI components
* refactor: Desktop Icon Strip DRY
* refactor: Tools & permissions
* fix: Navigation condition
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* fix: preserve reasoning_content in agentic flow
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2026-04-28 14:35:49 +03:00
Georgi Gerganov
14e733e36f
spec : refactor params ( #22397 )
...
* spec : refactor params
* cont : fix
* cont : rename "sparam" to "sampling"
* cont : add spec params category
* cont : add info about removed arguments
* cont : skip param length check for spec params
* cont : adapt server tests
2026-04-28 09:07:33 +03:00
Aman Gupta
516e8d7a8a
server: use pos_next instead of n_tokens for m-rope ( #22439 )
2026-04-28 08:41:00 +03:00
tha80
983ca8992e
server: (router) Forward form-data to model server ( Fixes #22044 ) ( #22118 )
...
* This commit enables the router to forward form-data to model server.
Fixes #22044 (enabling to use the /v1/audio/transcriptions in router mode)
* * Applied the suggestion from Copilots first comment: using the non-throwing json::parse overload.
* Addressed Copilots third comment by extending the files representation to also include filename and content-type
* Addressed Copilots fourth comment by making the RNG thread_local
* Changed variable body from std::string to std::ostringstream in build_multipart_body
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127099053
* Added sanitize_field lambda in build_multipart_body for key, filename and content_type
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647
* explicitly checking if value/item is string before calling value/item.get<std::string>()
as requested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127111279
* Added double quote to the sanitize lambda and throw on json parse failure
---------
Co-authored-by: Ralph Paßgang <ralph@trust-it.de >
2026-04-27 23:55:00 +02:00
Piotr Wilkin (ilintar)
0adede866d
parser: fix structured output bug ( #22302 )
...
* fix very stupid structured output bug
* Things just cannot be too easy.
2026-04-24 23:19:55 +02:00
Georgi Gerganov
ffdd983fb8
server : fix swa-full logic ( #22288 )
2026-04-24 10:17:37 +03:00
Yes You Can Have Your Own
793d0a7931
server: rename debug tags to match --cache-idle-slots naming ( #22292 )
2026-04-24 09:28:44 +03:00
srkizer
185cbff6f1
server : convert_anthropic_to_oai: also copy chat_template_kwargs ( #22154 )
2026-04-23 13:32:46 -05:00
Song Li
c78fb909b2
server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21869) ( #22267 )
...
* server: clamp n_discard to non-negative at JSON parse boundary (CVE-2026-21869)
A negative n_discard from client JSON causes heap-buffer-overflow in
update_slots() context-shift loop (CWE-787, CVSS 8.8). Clamp to 0 at
ingress; n_discard=0 already triggers auto-discard (n_left/2).
Ref: GHSA-8947-pfff-2f3c
* cont : cleaner
* cont : cleanerer
* cont : cleanest
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2026-04-23 18:39:07 +02:00
kvc0
c807c6e3b0
server: (anthropic API) fix prefix caching ( #21793 )
...
When testing claude code against llama.cpp, I noticed that only
n_past 18577 was used even when context was 60k or more. The log
in llama-server says:
```
slot update_slots: id 3 | task 10342 | old: ... ; cch= | defa0;You are
slot update_slots: id 3 | task 10342 | new: ... ; cch= | 1c8b4;
```
I observed that the cch value changed every time. Reading about that,
the x-anthropic-billing-header system message seems to be specially
handled inside of the anthropic api. I could remove it, but there
is a meaningful string sometimes included at the end. So instead,
I just replace the changing cch checksum with fffff.
I'm treating this as an anthropic message body API detail - I think this
is the right way to do this, but by all means please correct me!
It's always 5 hexadecimal characters, but I've written the replacement
defensively in case they change the protocol.
2026-04-23 17:45:02 +02:00
Tarek Dakhran
550d684bd1
server: Enable transcriptions API for LFM2-Audio ( #22000 )
2026-04-23 10:47:26 +02:00
Piotr Wilkin (ilintar)
8bccdbbff9
chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs ( #22217 )
...
* chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs
* Fix ty errors.
* Fix flake8 err
2026-04-22 18:10:56 +02:00
Georgi Gerganov
bcb5eeb645
speculative-simple : add checkpoint support ( #22227 )
...
* speculative-simple : add checkpoint support
* cont : fix build
2026-04-22 15:44:45 +03:00
Xuan-Son Nguyen
17f6245168
server: ignore reasoning content from transcription api ( #21905 )
2026-04-22 12:10:50 +02:00
Ethan Turner
750579ff14
common: Refactoring sampler parameters ( #20429 ) ( #22233 )
...
This change refactors the reasoning_budget_message parameter from the
common params into the sampling parameters specifically. It also removes
the reasoning_budget common parameter and standardizes on the existing
reasoning_budget_tokens parameter in the sampling configuration.
Issue: https://github.com/ggml-org/llama.cpp/issues/20429
Original PR: https://github.com/ggml-org/llama.cpp/pull/20297
2026-04-22 10:40:19 +02:00
Piotr Wilkin (ilintar)
134d6e54d4
common/chat, server: refactor, move all conversion functions to common, add tests ( #20690 )
...
* Refactor conversion functions
2026-04-22 10:28:45 +02:00
Xuan-Son Nguyen
04fe84b69d
server: allow cancel loading model ( #21814 )
2026-04-22 00:26:09 +02:00
Georgi Gerganov
cfe9838d26
fit-params : refactor + add option to output estimated memory per device ( #22171 )
...
* fit-params : add option to output estimated memory per device
* cont : minor
* cont : refactor
* cont : move fit params implementation to libcommon
* cont : header
* cont : headers
* cont : codeowners
2026-04-21 09:54:36 +03:00
xris99
ff6b1062af
server : fix hardcoded proxy connection timeout in router mode ( #18760 ) ( #22003 )
...
Fixes: https://github.com/ggml-org/llama.cpp/issues/18760
Co-authored-by: Christian <christian@example.com >
2026-04-21 06:41:14 +02:00
Georgi Gerganov
cf8b0dbda9
server : remove /api endpoints ( #22165 )
...
* server : remove /api endpoints
* cont : remove /api/tags
2026-04-20 20:41:19 +03:00
Georgi Gerganov
de71b5f81c
server : refactor "use checkpoint" logic ( #22114 )
2026-04-20 08:42:37 +03:00
Yes You Can Have Your Own
9d49acb2a7
server: rename --clear-idle to --cache-idle-slots ( #21741 )
2026-04-20 08:30:24 +03:00
Sascha Rogmann
455d8e4be8
server : speculative checkpointing ( #19493 )
...
* server : speculative decoding using checkpoints
* server : fix draft check with checkpoints
* server : rename spec vars
* server : log levels
* server : refactored spec logic to speculative.cpp
* server : renamed spec checkpoints option
* server : fix spec checkpoints, logging
* speculative : checkpoints with draft model, logging
* server : n_tokens_cur and create_checkpoint in draft
* server : fix server_speculative_callback (slot.id)
* spec : fix ngram-map/begin idx_last_check
* spec : init ckpt (begin() wasn't called)
* chore: update webui build output
* server : restore sampler in spec checkpoint and clear mem
* cont : avoid --spec-use-checkpoints argument
* cont : remove server_prompt_checkpoint_with_size
* spec : rename (leave_draft_state)
* cont : clean-up
* cont : do not ignore partial drafts even if the are short
* cont : spec callback owned by session
* cont : simplify
* cont : avoid empty speculative session
* cont : simplify
* cont : simplify
* cont : enable mtmd speculative decoding
* cont : keep the spec sampler alive
* cont : simplify
* cont : fix nullptr deref + draft checkpoints
* cont : remove common_speculative_accept_response
* cont : remove callback
* cont : simplify
* cont : minor
* cont : simplify
* cont : fix accepted number
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2026-04-19 10:24:06 +03:00
Cetarthoriphros
9e5647affa
server: Expose media_tag on /props endpoint. ( #22028 )
2026-04-19 00:27:17 +02:00
Georgi Gerganov
6990e2f1f7
libs : rename libcommon -> libllama-common ( #21936 )
...
* cmake : allow libcommon to be shared
* cmake : rename libcommon to libllama-common
* cont : set -fPIC for httplib
* cont : export all symbols
* cont : fix build_info exports
* libs : add libllama-common-base
* log : add common_log_get_verbosity_thold()
2026-04-17 11:11:46 +03:00
Pascal
4adac43f6f
server: tests: fetch random media marker via /apply-template ( #21962 ) ( #21980 )
...
* server: tests: fetch random media marker via /apply-template (#21962 fix)
* server: allow pinning media marker via LLAMA_MEDIA_MARKER env var
get_media_marker() checks LLAMA_MEDIA_MARKER at first call and uses it
as-is if set, falling back to the random marker otherwise.
Tests no longer need to fetch the marker dynamically via /apply-template:
the fixture sets LLAMA_MEDIA_MARKER=<__media__> so the hardcoded prompts
work as before.
Address review feedback from ngxson
* server: make get_media_marker() thread-safe via magic statics
Use a C++11 static local with a lambda initializer instead of a global
static with an empty-check. The runtime guarantees initialization exactly
once without explicit locking.
Address review feedback from ggerganov
* nits
* nits
2026-04-16 20:46:21 +03:00
Xuan-Son Nguyen
408225bb1a
server: use random media marker ( #21962 )
...
* server: use random media marker
* nits
* remove legacy <__image__> token
* revert special char in random
2026-04-15 23:52:22 +02:00
Xuan-Son Nguyen
e489a5ca0e
server: support OAI /v1/audio/transcriptions API ( #21863 )
...
* server: support OAI /v1/audio/transcriptions API
* address autoreview comments
* correct default response_format value
2026-04-14 11:09:52 +02:00