llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-13 04:24:17 +00:00

Author	SHA1	Message	Date
Piotr Wilkin	1171723f9d	Fix FATTN profiling	2026-05-12 23:58:28 +02:00
Piotr Wilkin	f0210cc40d	Converge implementation with export-graph-ops	2026-05-09 17:24:54 +02:00
Piotr Wilkin	2bbcc61af7	Add missing op parameters to the profiler; add support for test-backend-ops to run performance tests with exactly the tensor shapes from the run	2026-05-09 17:24:54 +02:00
Piotr Wilkin	395c43eadf	fix mul_mat_id stats, add throughput stat, add envvar trigger, add concurrent mode fix	2026-05-09 17:24:54 +02:00
Piotr Wilkin	b1252bcd73	fix builds, integrate vulkan profiler, fix copy events, fix export	2026-05-09 17:24:54 +02:00
Piotr Wilkin	8291ecd707	Fix more missing backend stuff (and Python errors)	2026-05-09 17:24:54 +02:00
Piotr Wilkin	391d2bf23a	add second dimension to reported tensors, fix Mac build, add missing initializer to all backends	2026-05-09 17:24:54 +02:00
Piotr Wilkin	2e66d2c130	feat: cool profiler thingy	2026-05-09 17:24:54 +02:00
smugman-dot	5d6f18a638	webui: fix LLM title generation for agentic conversations (#22840 )	2026-05-08 16:36:04 +02:00
Xuan-Son Nguyen	29debb3a6a	server: support Vertex AI compatible API (#22545 ) * server: support Vertex AI compatible API * a bit safer * support other AIP_* env var * various fixes * if AIP_MODE is unset, do nothing * fix test case * fix windows build	2026-05-08 15:23:04 +02:00
Xuan-Son Nguyen	9dcf835528	server: (router) expose child model info from router's /v1/models (#22683 ) * server: (router) expose child model info from router's /v1/models * update docs	2026-05-08 14:42:15 +02:00
Aleksander Grygier	9b2925e1e0	webui: Add Import/Export of Settings configuration + improve architecture (#22803 ) * refactor: Settings keys as constant object keys * chore: Run `npm audit fix` * refactor: Settings Sections UI * feat: Refactor Settings structure and implement import/export logic * feat: Introduce ROUTES constant and RouterService * refactor: Consolidate settings definitions into registry * refactor: Update settings page routing structure * chore: Migrate hardcoded URLs to use ROUTES and RouterService * feat: Enhance model selection logic for settings and chat * chore: Update webui static build * refactor: Address PR review comments * fix: Remove unneeded setting * fix: Re-add missing settings * fix: Add missing `/slots` proxy for webui dev mode * chore: Dev-mode logs * fix: Data binding * fix: Steering for non-agentic flow	2026-05-08 11:26:04 +02:00
smugman-dot	aaf4a4d5e0	webui: add option for LLM title generation (#22265 ) * webui: add LLM title generation option * webui: use chat_template_kwargs for title gen + fix conversation check * webui: capture firstUserMessage before async streamChatCompletion to fix race condition * webui: extract LLM title generation into separate method * webui: use constants and ChatService for LLM generated titles * webui: rebuild static output * webui: add LLM title generation setting to new settings location * webui: use sendMessage in generateTitle * webui: rebuild static output * webui: fix formatting * webui: configurable title prompt, remove think tag regexes, fix TS error * webui: group title constants into TITLE object, use TruncatedText for CSS truncation and fix race condition * webui: rebuild static output	2026-05-07 21:14:03 +02:00
Pascal	cc97e45a14	mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT (#22770 )	2026-05-07 14:01:01 +02:00
Pascal	f4b5a2ee91	webui: fix ?model= URL param race in router mode (#22771 ) * webui: fix ?model= URL param race in router mode * chore: update webui build output	2026-05-07 13:09:32 +02:00
viggy	e358d75adb	webui: fix flicker issue on dismiss animation on overlay primitives (#22773 ) * add fill-mode-forwards * generated diffs	2026-05-07 08:11:31 +02:00
tc-mb	2496f9c149	mtmd : support MiniCPM-V 4.6 (#22529 ) * Support MiniCPM-V 4.6 in new branch Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code bug Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix pre-commit Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix convert Signed-off-by: tc-mb <tianchi_cai@icloud.com> * rename clip_graph_minicpmv4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use new TYPE_MINICPMV4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use build_attn to allow flash attention support Signed-off-by: tc-mb <tianchi_cai@icloud.com> * no use legacy code, restored here. Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use the existing tensors name Signed-off-by: tc-mb <tianchi_cai@icloud.com> * unused ctx->model.hparams.minicpmv_version Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use n_merge for slice alignment Signed-off-by: tc-mb <tianchi_cai@icloud.com> * borrow wa_layer_indexes for vit_merger insertion point Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code style Signed-off-by: tc-mb <tianchi_cai@icloud.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * use filter_tensors and add model.vision_tower Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix chkhsh Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix type check Signed-off-by: tc-mb <tianchi_cai@icloud.com> --------- Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-06 21:54:09 +02:00
Yakine Tahtah	a00e47e422	mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101 ) * mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) Conformer encoder with Shaw relative position encoding, QFormer projector, log-mel spectrogram with frame stacking. Encoder uses GLU gating, folded batch norm, and SSM depthwise conv. QFormer compresses encoder output via windowed cross-attention (window=15, queries=3) into the LLM embedding space. Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank, dynamic range compression, 2x frame stacking (80->160 mel). GGUF converter handles batch norm folding at export time, fused K/V split, and Conv1d weight reshaping. Tested against HF transformers reference: token-for-token match on 30s/60s audio clips with greedy decoding. * mtmd: rename gs_ prefixed tensors to generic/architecture names * mtmd: use tensor_mapping.py for all granite_speech tensors * convert: fold GraniteSpeechTextModel into GraniteModel * mtmd: replace n_layer hack with explicit has_standard_layers flag * mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech * mtmd: align KEY_A_ define spacing * convert: register GraniteModel for GraniteSpeechForConditionalGeneration * convert: fix ty type-check for GraniteSpeechMmprojModel registration * mtmd: align TN_ define spacing * mtmd: use generic layer loop for granite speech tensor loading * mtmd: merge qformer_proj_layer into clip_layer * mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs * mtmd: granite_speech add comment explaining why build_attn is not used * mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata * gguf: add spacing between granite_speech tensor mapping blocks * mtmd: make generic audio layer_norm_eps read optional * mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps * mtmd: align defines and struct fields in clip-impl.h and clip-model.h * mtmd: fix alignment and ordering issues across granite speech files * convert: granite_speech use filter_tensors instead of modify_tensors for skipping	2026-05-06 14:40:59 +02:00
Aleksander Grygier	e3e3f8e46a	webui: Remove Google Favicons & Improve MCP Information logic & UI (#22719 ) * refactor: Remove Google favicon utility * fix: MCP Server favicon * refactor: Cleanup * refactor: MCP Server Information * fix: Fix MCP Settings UI * refactor: Cleanup	2026-05-06 11:12:27 +02:00
viggy	07eaf919ed	add tabindex and aria-hidden (#22699 )	2026-05-06 09:21:58 +02:00
Adrien Gallouët	bf76ac77be	common : only load backends when required (#22290 ) * common : only load backends when required Signed-off-by: Adrien Gallouët <angt@huggingface.co> * llama : call ggml_backend_load_all() directly from llama_backend_init() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add ggml_backend_load_all() where llama_backend_init() is not used Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-05 09:23:50 +02:00
Georgi Gerganov	2bacb1eb77	server : validate --tools CLI argument against known tool names (#22538 ) Previously, unknown tool names passed via --tools were silently ignored. Now the server validates each tool name at startup and exits with an error if an unrecognized tool is specified, listing the available tools. Assisted-by: llama.cpp:local pi	2026-05-05 06:35:27 +03:00
Georgi Gerganov	d6e7b033a4	llama : add option to save memory in device buffers (#22679 ) * llama : add option to save memory in device buffers * tests : extend llama-save-load-state	2026-05-05 06:35:07 +03:00
Xuan-Son Nguyen	935a340292	server: implement /models?reload=1 (#21848 )	2026-05-04 16:23:26 +02:00
JusteLeo	36a694c965	webui : fix circular dependency between chat.service.ts and models.svelte.ts (#22625 )	2026-05-04 13:38:10 +02:00
Piotr Wilkin (ilintar)	a4701c98f7	common/autoparser: fixes for newline handling / forced tool calls (#22654 ) * chat/autoparser: the fixes * Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls. * Trim whitespace on apply instead	2026-05-04 13:18:11 +02:00
Evan Huus	c84e6d6db5	server: Add a simple get_datetime server tool (#22649 )	2026-05-04 12:19:41 +02:00
Nick Towle	fa8feaed34	webui: restore missing settings (#22666 )	2026-05-04 09:04:07 +02:00
Georgi Gerganov	846262d787	docs : update speculative decoding parameters after refactor (#22397 ) (#22539 ) * docs : update speculative decoding parameters after refactor (#22397) Update docs/speculative.md to reflect the new parameter naming scheme introduced in PR #22397: - Replace --draft-max/--draft-min with --spec-draft-n-max/--spec-draft-n-min - Replace --spec-ngram-size-n/m with per-implementation variants - Add documentation for all new --spec-ngram-- parameters - Update all example commands Assisted-by: llama.cpp:local pi pi : add rule to use gh CLI for GitHub resources Assisted-by: llama.cpp:local pi * docs : run llama-gen-docs * arg : fix typo	2026-05-04 08:52:07 +03:00
Georgi Gerganov	0754b7b6fe	server : avoid checkpoint data host copies (#22558 ) * server : avoid checkpoint data host copies * llama : refactor llama_io_read_i	2026-05-02 18:03:25 +03:00
Aleksander Grygier	ab6120cde5	webui: Spring Cleaning Refactor v1 (#22505 ) * wip: server_tools * feat: Integrate with `/tools` endpoint * feat: Builtin + MCP + JSON Schema Tools WIP * refactor * displayName -> display_name * snake_case everywhere * rm redundant field * feat: Improvements * chore: update webui build output * refactor: Updates after server updates * chore: update webui build output * change arg to --tools all * feat: UI improvements * chore: update webui build output * add readme mention * llama-gen-docs * chore: update webui build output * chore: update webui build output * chore: update webui build output * feat: Reorganize settings sections * feat: Separate dialogs for MCP Servers Settings and Import/Export * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * WIP on allozaur/20677-webui-server-tools * feat: UI improvements * chore: Update package lock * chore: Run `npm audit fix` * feat: UI WIP * feat: UI * refactor: Desktop Icon Strip DRY * feat: Cleaner rendering and transition for ChatScreen * feat: UI improvements * feat: UI improvement * feat: Remove MCP Server "enable" switch from Tools submenu * chore: Run `npm audit fix` * feat: WIP * feat: Logic improvements * refactor: Cleanup * refactor: DRY * test: Fix Chat Sidebar UI Tests * chore: Update package lock * refactor: Cleanup * feat: Chat Message Action Card with Continue and Permission flow implementations * feat: Add agentic steering messages, draft messages and improve chat UX * fix: Search results UI * test: Fix unit test * feat: UI/UX improvements * refactor: Simplify `useToolsPanel` access in components * feat: Implement Processing Info Context API * feat: Implement 'Go back to chat' functionality for settings * feat: Enhance MCP Server management in Chat Form Attachments * style: Minor UI and branding adjustments * chore: Update webui static build output * chore: Formatting, linting & type checks * feat: Draft messages logic * feat: UI improvements * feat: Steering Messages improvements * refactor: Cleanup * refactor: Cleanup * feat: Improve UI * refactor: Settings navigation hook * refactor: DRY code * refactor: DRY ChatMessageUser UI components * refactor: Desktop Icon Strip DRY * refactor: Tools & permissions * fix: Navigation condition * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup * fix: preserve reasoning_content in agentic flow * refactor: Storybook cleanup * refactor: isInViewport util function * refactor: Rename globally `onClick` to `onclick` * chore: `npm audit fix` * refactor: Action Icon usage * refactor: Naming * refactor: JS in `class` directive * refactor: Chat components cleanup WIP * refactor: Components structure * refactor: Cleanup WIP * feat: New ChatAttachmentsPreview component * feat: UI improvements * feat: UI improvements * refactor: Cleanup * refactor: ChatAttachmentsPreview UI/UX * refactor: Remove dead code * refactor: Cleanup * fix: Model Name aliases displaying * feat: Shortcut improvements * refactor: Chat Message * feat: Move Import/Export to settings * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-05-01 18:36:29 +02:00
Georgi Gerganov	80afa33aad	spec : fix draft model checkpoints (#22521 ) * spec : fix draft model checkpoints * cont : clean-up * cont : gate the ngram-mod reset warning behind verbose flag	2026-04-30 08:32:18 +03:00
Georgi Gerganov	683c5acb90	spec : disacard last drafted token with low prob (#22506 )	2026-04-29 17:00:00 +03:00
Pascal	59237bfbbc	webui: fix slow mic stop and WAV encode (#22480 ) * webui: instant mic stop, race-free recorder restart * webui: faster WAV PCM encode via hoisted channels and Int16Array * chore: update webui build output * webui: drop setTimeout(0) hack and harden cancelRecording * chore: update webui build output	2026-04-29 12:58:35 +02:00
Aleksander Grygier	f42e29fdf1	webui: Server tools (#21237 ) * wip: server_tools * feat: Integrate with `/tools` endpoint * feat: Builtin + MCP + JSON Schema Tools WIP * refactor * displayName -> display_name * snake_case everywhere * rm redundant field * feat: Improvements * chore: update webui build output * refactor: Updates after server updates * chore: update webui build output * change arg to --tools all * feat: UI improvements * chore: update webui build output * add readme mention * llama-gen-docs * chore: update webui build output * chore: update webui build output * chore: update webui build output * feat: Reorganize settings sections * feat: Separate dialogs for MCP Servers Settings and Import/Export * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * WIP on allozaur/20677-webui-server-tools * feat: UI improvements * chore: Update package lock * chore: Run `npm audit fix` * feat: UI WIP * feat: UI * refactor: Desktop Icon Strip DRY * feat: Cleaner rendering and transition for ChatScreen * feat: UI improvements * feat: UI improvement * feat: Remove MCP Server "enable" switch from Tools submenu * chore: Run `npm audit fix` * feat: WIP * feat: Logic improvements * refactor: Cleanup * refactor: DRY * test: Fix Chat Sidebar UI Tests * chore: Update package lock * refactor: Cleanup * feat: Chat Message Action Card with Continue and Permission flow implementations * feat: Add agentic steering messages, draft messages and improve chat UX * fix: Search results UI * test: Fix unit test * feat: UI/UX improvements * refactor: Simplify `useToolsPanel` access in components * feat: Implement Processing Info Context API * feat: Implement 'Go back to chat' functionality for settings * feat: Enhance MCP Server management in Chat Form Attachments * style: Minor UI and branding adjustments * chore: Update webui static build output * chore: Formatting, linting & type checks * feat: Draft messages logic * feat: UI improvements * feat: Steering Messages improvements * refactor: Cleanup * refactor: Cleanup * feat: Improve UI * refactor: Settings navigation hook * refactor: DRY code * refactor: DRY ChatMessageUser UI components * refactor: Desktop Icon Strip DRY * refactor: Tools & permissions * fix: Navigation condition * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup * fix: preserve reasoning_content in agentic flow --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-04-28 14:35:49 +03:00
Georgi Gerganov	14e733e36f	spec : refactor params (#22397 ) * spec : refactor params * cont : fix * cont : rename "sparam" to "sampling" * cont : add spec params category * cont : add info about removed arguments * cont : skip param length check for spec params * cont : adapt server tests	2026-04-28 09:07:33 +03:00
Aman Gupta	516e8d7a8a	server: use pos_next instead of n_tokens for m-rope (#22439 )	2026-04-28 08:41:00 +03:00
tha80	983ca8992e	server: (router) Forward form-data to model server (Fixes #22044 ) (#22118 ) * This commit enables the router to forward form-data to model server. Fixes #22044 (enabling to use the /v1/audio/transcriptions in router mode) * * Applied the suggestion from Copilots first comment: using the non-throwing json::parse overload. * Addressed Copilots third comment by extending the files representation to also include filename and content-type * Addressed Copilots fourth comment by making the RNG thread_local * Changed variable body from std::string to std::ostringstream in build_multipart_body as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127099053 * Added sanitize_field lambda in build_multipart_body for key, filename and content_type as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647 * explicitly checking if value/item is string before calling value/item.get<std::string>() as requested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127111279 * Added double quote to the sanitize lambda and throw on json parse failure --------- Co-authored-by: Ralph Paßgang <ralph@trust-it.de>	2026-04-27 23:55:00 +02:00
unraido	ceaf47c4b1	fix: rpc-server cache may not work in Windows environments (#22394 ) * fix: create directory and log cache file name. * Remove GGML_LOG_INFO conditional compilation. --------- Co-authored-by: kotaro <kotaro.kusunoki@gmail.com>	2026-04-27 17:25:09 +03:00
Max Krasnyansky	5594d13224	common: fix missing exports in llama-common (#22340 ) * common: refactor common/debug to move abort_on_nan into base_callback_data Passing bool abort_on_nan as template parameter for common_debug_cb_eval is unnecessary and creates an issue with LTO. It should just be a member of the base_callback_data instead. * cont : cleanup * common : use pimpl in debug.h to reduce header dependencies Move common_debug_cb_user_data's data members (std::regex, std::vector<uint8_t>) into a private impl struct in debug.cpp. This removes the includes of common.h and <regex> from debug.h, reducing transitive dependencies for any translation unit that includes the header. Assisted-by: llama.cpp:local pi --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-27 08:06:39 +03:00
Piotr Wilkin (ilintar)	0adede866d	parser: fix structured output bug (#22302 ) * fix very stupid structured output bug * Things just cannot be too easy.	2026-04-24 23:19:55 +02:00
Georgi Gerganov	ffdd983fb8	server : fix swa-full logic (#22288 )	2026-04-24 10:17:37 +03:00
Yes You Can Have Your Own	793d0a7931	server: rename debug tags to match --cache-idle-slots naming (#22292 )	2026-04-24 09:28:44 +03:00
Ethan Turner	fa0b8a70a8	cli: Remove redundant local sampling variables (#20429 ) (#22264 ) This change implements the third requested change in issue 20429. Because defaults.sampling contains the reasoning budget token count and the reasoning budget message, it's not necessary to assign them to struct variables.	2026-04-24 00:53:23 +02:00
srkizer	185cbff6f1	server : convert_anthropic_to_oai: also copy chat_template_kwargs (#22154 )	2026-04-23 13:32:46 -05:00
Song Li	c78fb909b2	server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21869) (#22267 ) * server: clamp n_discard to non-negative at JSON parse boundary (CVE-2026-21869) A negative n_discard from client JSON causes heap-buffer-overflow in update_slots() context-shift loop (CWE-787, CVSS 8.8). Clamp to 0 at ingress; n_discard=0 already triggers auto-discard (n_left/2). Ref: GHSA-8947-pfff-2f3c * cont : cleaner * cont : cleanerer * cont : cleanest --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-23 18:39:07 +02:00
kvc0	c807c6e3b0	server: (anthropic API) fix prefix caching (#21793 ) When testing claude code against llama.cpp, I noticed that only n_past 18577 was used even when context was 60k or more. The log in llama-server says: ``` slot update_slots: id 3 \| task 10342 \| old: ... ; cch= \| defa0;You are slot update_slots: id 3 \| task 10342 \| new: ... ; cch= \| 1c8b4; ``` I observed that the cch value changed every time. Reading about that, the x-anthropic-billing-header system message seems to be specially handled inside of the anthropic api. I could remove it, but there is a meaningful string sometimes included at the end. So instead, I just replace the changing cch checksum with fffff. I'm treating this as an anthropic message body API detail - I think this is the right way to do this, but by all means please correct me! It's always 5 hexadecimal characters, but I've written the replacement defensively in case they change the protocol.	2026-04-23 17:45:02 +02:00
Matthias Straka	0dd7f915fd	cli : cleanup auto-completion code (#21745 )	2026-04-23 15:03:28 +02:00
Tarek Dakhran	550d684bd1	server: Enable transcriptions API for LFM2-Audio (#22000 )	2026-04-23 10:47:26 +02:00
Piotr Wilkin (ilintar)	8bccdbbff9	chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs (#22217 ) * chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs * Fix ty errors. * Fix flake8 err	2026-04-22 18:10:56 +02:00

1 2 3 4 5 ...

790 Commits