Commit Graph

  • c0a351cc3b tests : revert server test changes (no longer needed) Georgi Gerganov 2025-12-24 10:45:58 +02:00
  • 0ce03597e8 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-24 10:33:21 +02:00
  • 7f459c98e7 vulkan: use fewer FA rows for small cache runs (#18280) b7527 Ruben Ortlam 2025-12-24 08:59:14 +01:00
  • cf2ffc02bc CANN: Uses yarn_ramp cache in ROPE (#17725) b7526 TianHao324 2025-12-24 14:55:33 +08:00
  • 10355dc7d0 common: add LLAMA_ARG_OVERRIDE_TENSOR env var for -ot arg (#18267) b7525 ddh0 2025-12-24 00:19:12 -06:00
  • 5ee4e43f26 server: return_progress to also report 0% processing state (#18305) b7524 Xuan-Son Nguyen 2025-12-23 21:49:05 +01:00
  • 5b6c9bc0f3 webui: apply webui_settings on first load (#18223) Pascal 2025-12-23 15:48:03 +01:00
  • 849d021104 server: fix crash with model not having BOS/EOS (#18321) b7522 Xuan-Son Nguyen 2025-12-23 14:39:36 +01:00
  • 8e3ead6e4d model-conversion : add device option to run-org-model.py (#18318) Daniel Bevenius 2025-12-23 14:07:25 +01:00
  • 12ee1763a6 rpc : add check for rpc buffer type (#18242) b7520 Chris Rohlf 2025-12-23 04:56:49 -05:00
  • ed75977717 ggml-hexagon: create generalized functions for cpu side op (#17500) b7519 nullname 2025-12-23 15:13:24 +08:00
  • 847c35f7d5 model-conversion : add trust_remote_code for embedding scripts (#18288) Daniel Bevenius 2025-12-23 07:27:37 +01:00
  • a6a552e4ec [SYCL] replace llama-cli by llama-completion to rm the impact to test script (#18290) Neo Zhang 2025-12-23 12:59:12 +08:00
  • 96e33a814e model : fix div-by-zero for Nemotron V2 (#18309) b7516 Alessandro98-git 2025-12-23 03:04:57 +01:00
  • dfc959b886 model : Granite Embedding support (#15641) b7515 Ryan Mangeno 2025-12-22 18:28:19 -05:00
  • 8f48807380 gguf-py : do not align the data start offset (#18291) compilade 2025-12-22 14:25:16 -05:00
  • bf6bc3c155 ggml-hexagon: gelu optimization (#18151) b7513 Shouyu 2025-12-22 13:56:52 -05:00
  • 179fd82a72 gen-docs: automatically update markdown file (#18294) b7512 Xuan-Son Nguyen 2025-12-22 19:30:19 +01:00
  • d34d5ca1e9 llamafile: add rvv support for sgemm kernels (#18199) b7511 Taimur Ahmad 2025-12-22 23:20:23 +05:00
  • eb492bf43f opencl: unpack q4_0 for adreno in get_tensor (#18278) b7510 lhez 2025-12-22 10:19:01 -08:00
  • e3b35ddf1c vulkan: Extend rope fusions to allow mrope (#18264) b7509 Jeff Bolz 2025-12-22 11:03:13 -06:00
  • 5f14aa8e43 gguf-py : do not align the data start offset compilade/fix-safetensors-unaligned Francis Couture-Harpin 2025-12-22 09:49:54 -05:00
  • 6ce863c803 server: prevent data race from HTTP threads (#18263) b7508 Xuan-Son Nguyen 2025-12-22 14:23:34 +01:00
  • 3997c78e33 server: fix data race in to_json_anthropic (#18283) b7507 Xuan-Son Nguyen 2025-12-22 13:21:43 +01:00
  • ee74642982 release: update release workflow to store XCFramework as Zip file (#18284) b7506 Mattt 2025-12-22 04:11:46 -08:00
  • a28310488c convert: rework ftype heuristics (#18214) Aaron Teo 2025-12-22 20:03:49 +08:00
  • 86af848153 server: (docs) remove mention about extra_args (#18262) Xuan-Son Nguyen 2025-12-22 12:22:01 +01:00
  • 147a521636 tool/ex/tests: consistently free ctx, then model (#18168) b7503 Johannes Gäßler 2025-12-22 11:00:37 +01:00
  • f1310ab904 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-22 06:46:54 +01:00
  • e1f15b454f vulkan: Implement set_tensor_async and the event interfaces (#18047) b7502 Jeff Bolz 2025-12-21 14:52:09 -06:00
  • 0e1ccf15c7 llama: fix RPC for -fit on (#18233) b7501 Johannes Gäßler 2025-12-21 19:33:08 +01:00
  • 5e25ddebff move copilot instructions to AGENTS.md (#18259) Xuan-Son Nguyen 2025-12-21 19:09:21 +01:00
  • fd05c51cec vulkan: fix im2col overflowing maxworkgroupcount (#18180) b7499 Jeff Bolz 2025-12-21 03:32:58 -06:00
  • b365c3ff01 vulkan/cuda: fix topk_moe with exp_probs_b (#18071) b7498 Jeff Bolz 2025-12-21 03:27:34 -06:00
  • cb64222b0c vulkan: support GGML_UNARY_OP_XIELU (#18062) b7497 Jeff Bolz 2025-12-21 03:17:58 -06:00
  • 6eb7081860 vulkan: in graph_optimize, try to group ADD operations (#18060) b7496 Jeff Bolz 2025-12-21 03:05:08 -06:00
  • 4117ae5557 Vulkan: some improvement on mul_mat_iq2_xs (#18031) b7495 lovedheart 2025-12-21 09:59:52 +01:00
  • 65e96a2464 docs : fix links in parsing.md (#18245) Daniel Bevenius 2025-12-21 09:35:40 +01:00
  • 9496bbb808 common : reorganize includes to prioritize vendored deps (#18222) b7493 Aldehir Rojas 2025-12-20 21:43:21 -06:00
  • ddcb75dd8a server: add auto-sleep after N seconds of idle (#18228) b7492 Xuan-Son Nguyen 2025-12-21 02:24:42 +01:00
  • 52ab19df63 tests: Avoid floating point precision false positives in SUM (#17471) b7491 Jeff Bolz 2025-12-20 13:46:46 -06:00
  • 5182dd64cd test-backend-ops: improve msvc build time (#18209) b7490 Jeff Bolz 2025-12-20 13:45:45 -06:00
  • 10b4f82d44 Added comments explaining thread block size selection logic based on row count and column size, derived from historical commit context (#18212) b7489 Aadeshveer Singh 2025-12-20 16:58:57 +05:30
  • 408616adbd server : [easy] fix per round speculative decode logging (#18211) b7488 Oleksandr Kuvshynov 2025-12-20 04:57:40 -05:00
  • 9e39a1e6a9 server: support load model on startup, support preset-only options (#18206) b7487 Xuan-Son Nguyen 2025-12-20 09:25:27 +01:00
  • 74e05131e9 ci : remove non-windows zip artifacts (#18201) b7486 Sigbjørn Skjæret 2025-12-19 22:29:46 +01:00
  • f74747d886 ci : only save ccache on master (#18207) Sigbjørn Skjæret 2025-12-19 22:29:37 +01:00
  • ce734a8a2f ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977) b7484 Alfred 2025-12-19 12:42:28 -05:00
  • 14931a826e arg: fix order to use short form before long form (#18196) b7483 Pascal 2025-12-19 18:01:56 +01:00
  • 1da013c66e Build with CCCL 3.2 for CUDA backends Oliver Simons 2025-12-19 16:10:51 +01:00
  • f99ef53d2a llama : Changing off_t to size_t for Windows (#18204) b7482 Julius Tischbein 2025-12-19 15:42:46 +01:00
  • b5ec0fd76c Update CCCL version to v3.2.0-rc2 Oliver Simons 2025-12-19 13:42:27 +01:00
  • cc0a04343e server: friendlier error msg when ctx < input (#18174) b7481 Aman Gupta 2025-12-19 19:10:00 +08:00
  • 98c1c7a7bf presets: refactor, allow cascade presets from different sources, add global section (#18169) b7480 Xuan-Son Nguyen 2025-12-19 12:08:20 +01:00
  • 0a17687c72 Make backend dist sampler use same rnd's as dist sampler Oliver Simons 2025-12-19 11:43:19 +01:00
  • 1750917420 Fix different RNG-states between backend-sampling and llama-sampling Oliver Simons 2025-12-19 11:42:10 +01:00
  • acb73d8340 webui: Add editing attachments in user messages (#18147) Aleksander Grygier 2025-12-19 11:14:07 +01:00
  • bc5195c585 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-19 09:38:01 +01:00
  • 0a271d82b4 model-conversion : add verbose flag in run-org-model.py (#18194) Daniel Bevenius 2025-12-19 08:43:16 +01:00
  • 52fc7fee8a android: fix missing screenshots for Android.md (#18156) Naco Siren 2025-12-18 23:32:04 -08:00
  • cdbada8d10 vulkan: Add perf logger mode with concurrency (#17944) b7476 Jeff Bolz 2025-12-18 23:36:46 -06:00
  • 8ea958d4d9 model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106) b7475 Xuan-Son Nguyen 2025-12-19 00:18:01 +01:00
  • a95df75322 add test model tarek/feat/lfm2-asr-upstream Xuan Son Nguyen 2025-12-18 23:44:01 +01:00
  • 1bd8d1870f clean up Xuan Son Nguyen 2025-12-18 23:39:00 +01:00
  • d244fe1a3e remove redundant reshape Xuan Son Nguyen 2025-12-18 23:35:47 +01:00
  • 4239bf2cde add prefix "a." for conv tensors Xuan Son Nguyen 2025-12-18 23:29:14 +01:00
  • f9ec8858ed webui: display prompt processing stats (#18146) Pascal 2025-12-18 17:55:03 +01:00
  • f716588e63 ggml-cpu: extend support for RVV floating-point kernels (#17318) Taimur Ahmad 2025-12-18 19:02:09 +05:00
  • 4d1316c440 arg: fix ASAN error on sampler_type_names empty (#18167) b7472 Xuan-Son Nguyen 2025-12-18 14:30:32 +01:00
  • ec7b9329ae gguf-py : use copy-on-write mode for localtensor (#18162) Sigbjørn Skjæret 2025-12-18 13:45:38 +01:00
  • 54189c0d39 remove i_major_dual (#18157) b7470 yulo 2025-12-18 19:50:56 +08:00
  • 9ce64aed7d webui: Fix selecting generated output issues during active streaming (#18091) Aleksander Grygier 2025-12-18 11:13:52 +01:00
  • 900316da4e webui: fix chat screen shadow width (#18010) Kim S. 2025-12-18 11:08:42 +01:00
  • 3b3f5fed31 common : disable backend sampling when grammar is involved Georgi Gerganov 2025-12-18 10:52:21 +02:00
  • eefdb0da17 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-18 10:12:47 +02:00
  • 57c1e05643 llama: offload output layer to GPU first (#18148) Johannes Gäßler 2025-12-18 08:12:18 +01:00
  • 9cff4cc554 convert : sort and use file parts from model index if present (#18043) Sigbjørn Skjæret 2025-12-18 07:54:54 +01:00
  • 4d4f4cacd1 llama : Async DirectIO model loading on Linux (#18012) Julius Tischbein 2025-12-18 07:27:19 +01:00
  • 6b1394ed74 prof: fix tensor dims formatter graph-profiler Max Krasnyansky 2025-11-18 17:58:15 -08:00
  • 26ec40967c profiler: output all tensor names Max Krasnyansky 2025-07-24 19:14:41 -07:00
  • 6a5af05973 profiler: initial support for profiling graph ops Max Krasnyansky 2025-07-24 16:18:19 -07:00
  • 0a0bba05e8 ggml-hexagon: swiglu_oai operation (#18114) b7464 Shouyu 2025-12-17 16:38:21 -05:00
  • 5166aaf868 convert : force patch_merger tensors to f16/f32 (#18124) Sigbjørn Skjæret 2025-12-17 22:15:53 +01:00
  • 6ce3d85796 server: (webui) add --webui-config (#18028) Pascal 2025-12-17 21:45:45 +01:00
  • e85e9d7637 server: (router) disable SSL on child process (#18141) Xuan-Son Nguyen 2025-12-17 21:39:08 +01:00
  • 8dcc3662a2 llama-fit-params: fix memory print (#18136) Johannes Gäßler 2025-12-17 21:10:03 +01:00
  • d37fc93505 webui: fix chat header width when sidebar is closed (#17981) Kim S. 2025-12-17 20:05:45 +01:00
  • 4470a0764a ggml-hexagon: gelu operation (#17921) Shouyu 2025-12-17 13:39:32 -05:00
  • 4301e27319 common : restore grammar-based rejection sampling (#18137) Georgi Gerganov 2025-12-17 19:46:00 +02:00
  • a2c199e479 common: clarify instructions for bug reports (#18134) Johannes Gäßler 2025-12-17 18:44:13 +01:00
  • 15dd67d869 model: fix GLM-ASR-Nano-2512 load error (#18130) (#18142) HonestQiao 2025-12-17 23:34:35 +08:00
  • 981475fedc tests : add --device option support to backend sampler tests Daniel Bevenius 2025-12-17 15:27:23 +01:00
  • bde461de8c server: (router) allow child process to report status via stdout (#18110) Xuan-Son Nguyen 2025-12-17 14:54:11 +01:00
  • 5a79c1900f eagle3 : improve naming Georgi Gerganov 2025-12-17 15:49:03 +02:00
  • 8faa87db02 Extend run-org-model.py, add (a) batching (b) loading prompt from file (c) multimodal capacity (#18034) Piotr Wilkin (ilintar) 2025-12-17 14:21:51 +01:00
  • a519aea35c tests : fix batch token position tracking in test_backend_sampler.cpp Daniel Bevenius 2025-12-17 13:49:39 +01:00
  • 6f1f6a961a Github: ask for -v logs for params_fit [no ci] (#18128) Johannes Gäßler 2025-12-17 13:46:48 +01:00
  • 669696e00d ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (#18096) Alberto Cabrera Pérez 2025-12-17 11:39:13 +00:00
  • 982060fadc model: fix LFM2_MOE missing tensors (#18132) Tarek Dakhran 2025-12-17 12:17:11 +01:00
  • cc31e6a20e tests : extract batch info update to separate method Daniel Bevenius 2025-12-17 11:53:15 +01:00