Commit Graph

  • 76a1b7fe8c tests : remove vocab member from test_model_context Daniel Bevenius 2025-12-17 11:46:36 +01:00
  • 9845996919 tests : use smart pointers for model and context Daniel Bevenius 2025-12-17 11:26:05 +01:00
  • 3e7f376b53 Merge branch 'master' into pr/18039 Georgi Gerganov 2025-12-17 12:09:41 +02:00
  • 9a9ea2f6b1 tests : use smart pointers for backend samplers Daniel Bevenius 2025-12-17 11:08:08 +01:00
  • 6853bee680 ci : clean up webui jobs (#18116) Sigbjørn Skjæret 2025-12-17 10:45:40 +01:00
  • 487674fbb3 common: fix --override-kv to support comma-separated values (#18056) Pascal 2025-12-17 10:36:23 +01:00
  • acec774ef6 HIP: Refactor mma for RDNA and CDNA (#17990) yulo 2025-12-17 16:34:54 +08:00
  • 5c0d18881e llama.android : Rewrite Android binding (w/o cpu_features dep) (#17413) b7446 Naco Siren 2025-12-17 00:14:47 -08:00
  • c5d44b8525 llama : fix typo in comment [no ci] Daniel Bevenius 2025-12-17 09:02:30 +01:00
  • 68a1c4dc51 llama : clarify backend_accept/backend_set_input comments [no ci] Daniel Bevenius 2025-12-17 09:00:46 +01:00
  • 4b2a4778f8 arg: allow -kvu flag for llama-perplexity (#18117) b7445 TrevorS 2025-12-16 22:33:02 -08:00
  • 58062860af ggml : use WARP_SIZE/2 for argmax reduction offset (#18092) b7444 Aadeshveer Singh 2025-12-17 09:17:01 +05:30
  • 2973a65ecb gguf-py : allow converting multi-tensor models from read-only locations (#18100) Yuri Khrustalev 2025-12-16 20:27:03 -05:00
  • d0794e89d9 llama-fit-params: force disable mlock (#18103) b7442 Johannes Gäßler 2025-12-17 00:50:12 +01:00
  • 9dcac6cf9f llama-fit-params: lower ctx size for multi GPU (#18101) b7441 Johannes Gäßler 2025-12-17 00:49:34 +01:00
  • 0e49a7b8b4 llama-fit-params: fix underflow for dense models (#18095) b7440 Johannes Gäßler 2025-12-17 00:47:37 +01:00
  • 4164596c76 llama-fit-params: QoL impr. for prints/errors (#18089) b7439 Johannes Gäßler 2025-12-17 00:03:19 +01:00
  • ef83fb8601 model: fix LFM2 missing tensors (#18105) b7438 Xuan-Son Nguyen 2025-12-16 19:07:43 +01:00
  • ac5667dcc6 fix eagle3 logits sync bug & remove ggml_set_sync() ruixiangw 2025-12-16 16:53:28 +00:00
  • 72a41fd960 fix missing tensor Xuan Son Nguyen 2025-12-16 17:34:20 +01:00
  • 7865a1519e Merge branch 'master' into tarek/feat/lfm2-asr-upstream Xuan Son Nguyen 2025-12-16 17:30:59 +01:00
  • a3ebc93d71 remove some redundant ggml_cont Xuan Son Nguyen 2025-12-16 17:20:13 +01:00
  • cea578bc8c rename functions to conformer Xuan Son Nguyen 2025-12-16 16:58:00 +01:00
  • ec98e20021 llama: fix early stop in params_fit if ctx is set (#18070) b7437 Johannes Gäßler 2025-12-16 14:24:00 +01:00
  • 59977eba7b server: fix crash when batch > ubatch with embeddings (#17912) b7436 yifant-code 2025-12-16 07:27:36 -05:00
  • 79dbae034a model-conversion : remove -fa option in model card template [no ci] (#18088) Daniel Bevenius 2025-12-16 13:25:09 +01:00
  • 7f2b2f3c77 arch: refactor LLM_TENSOR_NAMES (#18051) b7434 Xuan-Son Nguyen 2025-12-16 13:22:30 +01:00
  • 7b1db3d3b7 arg: clarify auto kvu/np being set on server (#17997) b7433 Xuan-Son Nguyen 2025-12-16 12:01:27 +01:00
  • a5251ca11d Optimization: Qwen3 next autoregressive pass (#17996) b7432 Piotr Wilkin (ilintar) 2025-12-16 11:59:53 +01:00
  • fb644247de CLI: fixed adding cli and completion into docker containers, improved docs (#18003) Andrew Aladjev 2025-12-16 13:52:23 +03:00
  • 5f5f9b4637 server: Update README.md incorrect argument (#18073) 2114L3 2025-12-16 20:50:43 +10:00
  • 3d86c6c2b5 model: support GLM4V vision encoder (#18042) b7429 Xuan-Son Nguyen 2025-12-16 11:25:26 +01:00
  • 9963b81f63 model-conversion : add note about verifying previous models (#18082) Daniel Bevenius 2025-12-16 11:17:40 +01:00
  • db81d5ec4b model-conversion : use CONVERTED_EMBEDDING_MODEL for embedding_verify_logits (#18079) Daniel Bevenius 2025-12-16 11:17:20 +01:00
  • c05aa69f32 common : add nemotron 3 parsing (#18077) b7426 Aldehir Rojas 2025-12-16 04:05:23 -06:00
  • 279cef27c2 added note for old Intel hardware pre sycl (#18017) Francisco Herrera 2025-12-16 04:45:09 -05:00
  • 5ba95754ee security : add collaborator guidance (#18081) Georgi Gerganov 2025-12-16 11:17:11 +02:00
  • ad1b60abc4 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-16 09:45:08 +01:00
  • e47a082fc9 security : add collaborator guidance gg/security-update Georgi Gerganov 2025-12-16 10:16:46 +02:00
  • 2aa45ef9e3 llama: Include algorithm header needed for C++23 (#18078) b7423 Chris Peterson 2025-12-15 23:37:55 -08:00
  • c560316440 graph : reuse SSM graphs (#16490) b7422 Georgi Gerganov 2025-12-16 09:36:21 +02:00
  • d6742125c3 ci : separate webui from server (#18072) Sigbjørn Skjæret 2025-12-16 08:17:26 +01:00
  • 3034836d36 webui: Improve copy to clipboard with text attachments (#17969) Aleksander Grygier 2025-12-16 07:38:46 +01:00
  • a20979d433 webui: Add setting to always show sidebar on Desktop (#17809) Aleksander Grygier 2025-12-16 07:31:37 +01:00
  • 2995341730 llama : add support for NVIDIA Nemotron 3 Nano (#18058) b7418 Daniel Bevenius 2025-12-16 07:19:26 +01:00
  • 40d9c394f4 Webui: Disable attachment button and model selector button when prompt textbox is disabled. (#17925) Darius Lukas 2025-12-16 01:15:49 -05:00
  • ba9e59739c Address PR feedback Tarek Dakhran 2025-12-15 21:24:32 +01:00
  • f5b132a68c Remove rope_theta setting Tarek Dakhran 2025-12-15 20:03:00 +01:00
  • 0e8779a54c Fix comment Tarek Dakhran 2025-12-15 16:30:04 +01:00
  • 4f5d5212b8 Set rope_theta Tarek Dakhran 2025-12-15 14:43:03 +01:00
  • 145b6280d6 ASR with LFM2-Audio-1.5B Tarek Dakhran 2025-12-15 14:32:13 +01:00
  • d6a1e18c65 convert : move rope_parameters to TextModel class (#18061) b7416 Sigbjørn Skjæret 2025-12-15 22:03:16 +01:00
  • c45f89d551 ggml-hexagon: mm for mtmd (#17894) b7415 Shouyu 2025-12-15 13:53:56 -05:00
  • 9d52f17ae3 model : add KORMo model (#18032) b7414 HelloKS 2025-12-16 02:51:43 +09:00
  • 4529c660c8 kv-cache: Fix state restore fragmented cache (#17982) b7413 ssweens 2025-12-15 09:28:35 -08:00
  • 0f4f35e7be Fix unreadable user markdown colors and truncate long texts in deletion dialogs (#17555) Pascal 2025-12-15 16:34:53 +01:00
  • 165caaf5fb metal: use shared buffers on eGPU (#17866) b7411 Jeremy Demeule 2025-12-15 15:14:49 +01:00
  • 96a181a933 mtmd: refactor audio preprocessing (#17978) b7410 Xuan-Son Nguyen 2025-12-15 14:16:52 +01:00
  • 4a4f7e6550 cli: fixed dead links to tools/main for cli and completion, fixed code owners (#17993) Andrew Aladjev 2025-12-15 13:47:04 +03:00
  • e73d548659 webui: add "delete all conversations" button to import/export tab (#17444) Thomas Jarosch 2025-12-15 11:29:29 +01:00
  • e5737f665f Apply automated code-formating to softmax.cu Oliver Simons 2025-12-15 11:05:17 +01:00
  • 3732b85b09 Fix data-race in soft_max_f32_parallelize_cols_single_row Oliver Simons 2025-12-15 11:01:12 +01:00
  • b1f3a6e5db llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653) Johannes Gäßler 2025-12-15 09:24:59 +01:00
  • 4aced7a631 [SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826) b7406 Neo Zhang Jianyu 2025-12-15 10:35:15 +08:00
  • 745fa0e78b model : add glm-asr support (#17901) b7405 piDack 2025-12-15 10:18:46 +08:00
  • 52392291b2 preset: handle negated arg, reverse the meaning if needed (#18041) b7404 Xuan-Son Nguyen 2025-12-14 22:08:10 +01:00
  • 4574ab6f40 preset: handle negated arg, reverse the meaning if needed xsn/preset_fix_neg_arg Xuan Son Nguyen 2025-12-14 21:44:41 +01:00
  • 8fac4b1cc8 feat: add EAGLE3 speculative decoding support ruixiangw 2025-12-14 18:12:33 +00:00
  • 5c8a717128 convert : refactor rope scaling handling (#18013) Sigbjørn Skjæret 2025-12-14 16:04:37 +01:00
  • 37f5a1093b mtmd: enhance image resizing in llava_uhd (#18014) b7402 Haowei Wu 2025-12-14 22:57:52 +08:00
  • 2652e745ef webui : fix lint Georgi Gerganov 2025-12-14 16:45:07 +02:00
  • 0086c246ee Merge branch 'master' into HEAD Georgi Gerganov 2025-12-14 16:44:30 +02:00
  • 9e6649ecf2 vulkan: fix mul_mat_vec_iq1_s formatting (#18026) b7401 Ruben Ortlam 2025-12-14 14:52:46 +01:00
  • 0759b09c90 graph: add f_attn_temp_offset (#18025) b7400 Xuan-Son Nguyen 2025-12-14 13:05:59 +01:00
  • 357f999381 graph: add f_attn_temp_offset xsn/llama4_scaling_offset Xuan Son Nguyen 2025-12-14 12:12:12 +01:00
  • 22c7f85b9c Merge branch 'master' into HEAD Georgi Gerganov 2025-12-14 10:19:58 +02:00
  • 254098a279 common : refactor common_sampler + grammar logic changes (#17937) b7399 Georgi Gerganov 2025-12-14 10:11:13 +02:00
  • 3238b1400c vulkan: Fix data race/hang in scalar/cm1 flash attention (#17887) b7398 Jeff Bolz 2025-12-14 02:00:00 -06:00
  • 4722671641 vulkan: improve mul_mat_vec_iq1_s speed (#17874) b7397 lovedheart 2025-12-14 08:47:49 +01:00
  • d15d177f43 vulkan: faster q6_k matmul (#17813) Eve 2025-12-14 07:29:37 +00:00
  • 77ad8542bd model-conversion : cast logits to float32 (#18009) Georgi Gerganov 2025-12-14 08:58:13 +02:00
  • 609a2d0268 models : fix YaRN regression + consolidate logic (#18006) b7394 Georgi Gerganov 2025-12-14 08:34:56 +02:00
  • a63cbafbbc ggml : arm repack fix build b7393 Georgi Gerganov 2025-12-13 22:54:14 +02:00
  • 0e59224990 sync : ggml Georgi Gerganov 2025-12-13 10:07:07 +02:00
  • 71fdcf0616 ggml : arm repack fix build (whisper/0) Georgi Gerganov 2025-12-13 08:04:09 +02:00
  • 615655aafe cmake : set CMAKE_RUNTIME_OUTPUT_DIRECTORY for non standalone build (ggml/1394) Congcong Cai 2025-12-12 22:37:38 +08:00
  • c00ff929dc scripts: add script to compare logprobs of llama.cpp against other frameworks (#17947) b7389 Xuan-Son Nguyen 2025-12-13 22:33:29 +01:00
  • 4ed2bae50d server-models.cpp: add missing <filesystem> (#18000) b7388 Sergey Fedorov 2025-12-14 05:02:43 +08:00
  • 292f8e231c model-conversion : cast logits to float32 gg/fix-logits-type Georgi Gerganov 2025-12-13 22:24:21 +02:00
  • 5266379bca llama_context: synchronize before reallocating output buffer (#17974) b7387 Jeff Bolz 2025-12-13 09:19:51 -06:00
  • 4d5ae24c0a arg: fix common_params_parse not accepting negated arg (#17991) b7386 Xuan-Son Nguyen 2025-12-13 12:53:37 +01:00
  • 66ba51252e cmake: correct scope - link ws2_32 for MinGW/w64devkit builds in cpp-httplib (#17972) b7385 Gustavo Rocha Dias 2025-12-13 08:46:36 -03:00
  • 36255a2268 vulkan: support get_rows for i32 (#17941) b7384 Jeff Bolz 2025-12-13 03:12:53 -06:00
  • 3229a23fa6 vulkan: support GGML_OP_DIAG (#17893) b7383 Jeff Bolz 2025-12-13 03:07:49 -06:00
  • 303f8615e9 vulkan: Multi-pass softmax for large number of cols (#17892) b7382 Jeff Bolz 2025-12-13 03:04:29 -06:00
  • 3c6391e748 speculative-simple : free batch on exit (#17985) b7381 Georgi Gerganov 2025-12-13 09:48:34 +02:00
  • 8e4d678528 common : skip model validation when --completion-bash is requested (#17975) b7380 Sigbjørn Skjæret 2025-12-13 08:40:50 +01:00
  • 07a10c1090 vulkan: Allow non-pow2 n_experts in topk_moe (#17872) b7379 Jeff Bolz 2025-12-13 01:40:04 -06:00
  • 2bc94e7928 add llama-completion to completion-bash executables (#17976) b7378 Sigbjørn Skjæret 2025-12-13 08:35:50 +01:00
  • fd1085ffb7 model-conversion : use CONVERTED_MODEL value for converted model [no ci] (#17984) Daniel Bevenius 2025-12-13 08:34:26 +01:00