Commit Graph

  • e2751545b9 cont : inline verification gg/kv-mask-opt-verify Georgi Gerganov 2026-01-17 14:33:07 +02:00
  • e08d3ac323 tests : add test-kq-mask.cpp Georgi Gerganov 2026-01-17 14:04:20 +02:00
  • a89002f07b ggml webgpu: support for backend sampling (#18880) b7761 Reese Levine 2026-01-16 16:12:43 -08:00
  • 388ce82241 ggml : extend ggml_pool_1d + metal (#16429) b7760 Thore Koritzius 2026-01-16 15:59:56 +01:00
  • 490f6f70c0 cont : fix Georgi Gerganov 2026-01-16 16:16:37 +02:00
  • bac56aef91 cont : add explanation + improve Georgi Gerganov 2026-01-16 15:38:30 +02:00
  • 6ba6a3c76f docs : update ops.md for CANN backend (#18654) hipudding 2026-01-16 20:32:17 +08:00
  • 6628f5186a kv-cache : optimize KQ mask construction Georgi Gerganov 2026-01-14 17:35:24 +02:00
  • 0802d4cfb3 ggml-blas: hide warnings from included BLAS headers (#18818) b7758 Perry Naseck 2026-01-16 06:38:25 -05:00
  • c945aaaef2 mtmd : Fix ASR for LFM2.5-Audio-1.5B (#18876) b7757 Tarek Dakhran 2026-01-16 11:23:08 +01:00
  • c15395f73c common : implement new jinja template engine (#18462) b7756 Xuan-Son Nguyen 2026-01-16 11:22:06 +01:00
  • aa1dc3770a Setting mmap and direct_io to false as default in llama-bench.cpp (#18841) b7755 Julius Tischbein 2026-01-16 09:46:51 +01:00
  • 4ea2eaac01 CANN: Remove unused ggml_cann_get_device function (#18625) b7754 Raul Torres 2026-01-16 08:34:09 +00:00
  • e20fa27a02 CANN: fix an issue where get_env was not fully renamed (#18796) b7753 Chenguang Li 2026-01-16 16:24:04 +08:00
  • baa4ba0aec CANN: support gated linear attn (#18653) b7752 hipudding 2026-01-16 16:18:49 +08:00
  • 7b78bfa984 eagle3: add support for RedHtAI eagle3 speculator series models ruixiangw 2026-01-16 00:54:14 +00:00
  • 785a710085 OpenCL: add SOLVE_TRI op support (#18846) b7751 shaofeiqi 2026-01-15 11:17:17 -08:00
  • 6e7fc8a146 cuda : print less debug logs when disabling cuda graphs (#18868) b7750 Georgi Gerganov 2026-01-15 20:53:01 +02:00
  • be8e3d9515 context : do not reserve scheduler for warmups (#18867) b7749 Georgi Gerganov 2026-01-15 19:35:57 +02:00
  • 13f1e4a9ca llama : add adaptive-p sampler (#17927) b7748 ddh0 2026-01-15 11:16:29 -06:00
  • a04c2b06a3 server: improve slots scheduling for n_cmpl (#18789) b7747 Xuan-Son Nguyen 2026-01-15 17:10:28 +01:00
  • 39173bcacb context : reserve new scheduler when graph topology changes (#18547) b7746 Georgi Gerganov 2026-01-15 16:39:17 +02:00
  • 5c662d21a3 CUDA: fix allignment on register spill for FA (#18815) b7745 Johannes Gäßler 2026-01-15 15:14:50 +01:00
  • 8cc0ba957b ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (#18837) b7744 shalinib-ibm 2026-01-15 15:01:18 +05:30
  • a7e6ddb8bd lora: make sure model keep track of associated adapters (#18490) b7743 Xuan-Son Nguyen 2026-01-15 10:24:28 +01:00
  • 2a13180100 model-loader : support bool array sliding window pattern (#18850) b7742 Sigbjørn Skjæret 2026-01-15 10:12:46 +01:00
  • ec997b4f2b tests : download models only when running ctest (#18843) b7741 Adrien Gallouët 2026-01-15 09:47:29 +01:00
  • cff777f226 hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) b7740 Max Krasnyansky 2026-01-14 21:46:12 -08:00
  • 36f0132464 CUDA: Factor out and re-use block_reduce function (#18785) b7739 b7739 Oliver Simons 2026-01-15 03:44:54 +01:00
  • d98b548120 Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914) b7738 Piotr Wilkin (ilintar) 2026-01-14 20:29:35 +01:00
  • 8fb7175576 model : clean up and fix EXAONE-MoE configuration (#18840) b7737 Junwon Hwang 2026-01-15 03:38:21 +09:00
  • 60864997fe fit-params : print signed int for -ngl param gg/fit-params-ngl Georgi Gerganov 2026-01-14 19:59:23 +02:00
  • 516a4ca9b5 refactor : remove libcurl, use OpenSSL when available (#18828) b7736 Adrien Gallouët 2026-01-14 18:02:47 +01:00
  • 3e4bb29666 vulkan: Check maxStorageBufferRange in supports_op (#18709) b7735 Jeff Bolz 2026-01-14 03:59:05 -06:00
  • 47f9612492 llama-model: fix unfortunate typo (#18832) Aman Gupta 2026-01-14 17:55:15 +08:00
  • 01cbdfd7eb CUDA : fix typo in clang pragma comment [no ci] (#18830) Daniel Bevenius 2026-01-14 10:31:49 +01:00
  • 635ef78ec5 vulkan: work around Intel fp16 bug in mmq (#18814) Ruben Ortlam 2026-01-14 09:41:23 +01:00
  • 7d587e5544 ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705) b7731 Perry Naseck 2026-01-14 02:22:25 -05:00
  • d34aa07193 mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819) b7730 Daniel Benjaminsson 2026-01-14 08:11:05 +01:00
  • f709c7a33f ci, tests : use cmake to download models and remove libcurl dependency (#18791) b7729 Adrien Gallouët 2026-01-14 07:46:27 +01:00
  • 6e36299b47 llama : print_info alignment fix (#18708) b7728 ddh0 2026-01-13 17:05:11 -06:00
  • 60591f01d4 model : add EXAONE MoE (#18543) b7727 Junwon Hwang 2026-01-14 07:28:38 +09:00
  • e4832e3ae4 vocab : fix attribute overrides for harmony (#18806) b7726 Georgi Gerganov 2026-01-13 17:40:13 +02:00
  • 960e5e3b46 llama-mmap: fix direct-io loading fallback EOF exception (#18801) b7725 Ruben Ortlam 2026-01-13 15:57:07 +01:00
  • 20ca2e12c4 model-conversion : remove -c 0 from model card template [no ci] (#18807) Daniel Bevenius 2026-01-13 14:13:10 +01:00
  • ea4a321f2a HIP: add fattn-mma-f16 for RDNA4 (#18481) b7723 yulo 2026-01-13 20:52:16 +08:00
  • 5292965711 Merge branch 'master' into xsn/lora_keep_track xsn/lora_keep_track Georgi Gerganov 2026-01-13 14:44:22 +02:00
  • c1e79e610f doc: ban AI-generated PR descriptions [no ci] (#18765) Johannes Gäßler 2026-01-13 13:43:12 +01:00
  • e047f9ee9d mtmd: fix use_non_causal being reported incorrectly (#18793) b7721 Xuan-Son Nguyen 2026-01-13 12:19:38 +01:00
  • 0a57271ab6 CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (#18800) b7720 Georgi Gerganov 2026-01-13 12:25:53 +02:00
  • 076b0faf7d graph : clean up t5 input builders (#18795) b7719 Gabe Goodhart 2026-01-13 01:43:51 -07:00
  • db79dc06b1 llama-bench: add direct_io parameter (#18778) b7718 Ruben Ortlam 2026-01-13 08:49:10 +01:00
  • 537d4240d4 ci : remove libcurl in releases (#18775) b7717 Adrien Gallouët 2026-01-12 21:43:02 +01:00
  • bcf7546160 server : add arg for disabling prompt caching (#18776) b7716 Radoslav Gerganov 2026-01-12 19:21:34 +02:00
  • 36c5913c45 ci : use openssl for openEuler-latest-cmake-cann (#18779) Adrien Gallouët 2026-01-12 17:29:00 +01:00
  • 8e649571cd vendor : update cpp-httplib to 0.30.1 (#18771) b7714 Adrien Gallouët 2026-01-12 15:58:52 +01:00
  • 4150da9a95 examples : add --kv-unified to batched example (#18774) b7713 Daniel Bevenius 2026-01-12 13:47:58 +01:00
  • 8e2da778da vulkan: change memory_logger to be controlled by an env var (#18769) b7712 Jeff Bolz 2026-01-12 06:32:55 -06:00
  • ce3bf9b1a4 server: update docs for sleeping [no ci] (#18777) Xuan-Son Nguyen 2026-01-12 13:01:24 +01:00
  • 08b5d956fc minor : std::unordered_set over std::set pr/18490 Georgi Gerganov 2026-01-12 13:35:25 +02:00
  • 2bbe4c2cf8 vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678) b7710 Jeff Bolz 2026-01-12 05:32:13 -06:00
  • 1051ecd289 vulkan: Disable large coopmat matmul configuration on proprietary AMD driver (#18763) b7709 Ruben Ortlam 2026-01-12 07:29:35 +01:00
  • 0c3b7a9efe model: fix qwen3next broken due to #18683 (#18762) b7708 Xuan-Son Nguyen 2026-01-11 21:00:10 +01:00
  • 0e76501e1d Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support (#18749) b7707 Ruben Ortlam 2026-01-11 17:33:33 +01:00
  • 4b060bf240 security: make it clear about subtopics in server (#18754) Xuan-Son Nguyen 2026-01-11 16:51:03 +01:00
  • 9789e28459 debug : include LLAMA_POOLING_TYPE_UNSPECIFIED in pooling check (#18692) b7705 Daniel Bevenius 2026-01-11 16:34:41 +01:00
  • 84ae04f163 tests : refactor test-backend-sampler (#18753) b7704 Georgi Gerganov 2026-01-11 17:31:03 +02:00
  • 506bb6e010 model: try to improve Qwen3 Next (#18683) b7703 Xuan-Son Nguyen 2026-01-11 12:53:33 +01:00
  • 79456a690a readme : update UIs (#18751) thom-dev-fr 2026-01-11 12:46:50 +01:00
  • 28068af789 security: narrow down the scope of what we consider a vulnerability (#18752) Xuan-Son Nguyen 2026-01-11 12:23:36 +01:00
  • 707cbafcaa opencl: add SOFTPLUS op support (#18726) b7700 shaofeiqi 2026-01-10 21:57:44 -08:00
  • 75883cde73 eagle3: add support for gpt-oss-120B eagle3 ruixiangw 2026-01-10 18:33:41 +00:00
  • 13a9f31de3 eagle3: make d2t mapping optional ruixiangw 2026-01-10 18:30:19 +00:00
  • b137718878 test-backend-ops: fix mxfp4 tests on blackwell (#18736) b7699 Aman Gupta 2026-01-11 01:12:57 +08:00
  • d2ff4e23ac HIP: adjust RDNA3.5 MMQ kernel selction logic (#18666) b7698 Johannes Gäßler 2026-01-10 17:19:01 +01:00
  • 657a2e644b cmake : update blas logic (#18205) b7697 Perry Naseck 2026-01-10 11:00:54 -05:00
  • f307926482 server : adjust unified KV cache tests (#18716) Georgi Gerganov 2026-01-10 17:51:56 +02:00
  • 7fdc8c893d scripts : follow api redirects in pr2wt.sh (#18739) Sigbjørn Skjæret 2026-01-10 16:04:05 +01:00
  • 23f82f2420 preset: allow named remote preset (#18728) b7694 Xuan-Son Nguyen 2026-01-10 15:12:29 +01:00
  • 3da288d78d eagle3: load lm_head from target model if not in draft model when convert GGUF ruixiangw 2026-01-10 14:09:50 +00:00
  • 2656c0d265 docs(ggml): update backend ops (#18734) Aaron Teo 2026-01-10 18:48:17 +08:00
  • 600a366478 Corrected: changed s13 = src1->nb[3] instead of nb[2] (#18724) b7692 Michael Wand 2026-01-10 01:16:07 -08:00
  • ea23c15990 common : add --license to display embedded licenses (#18696) b7691 Adrien Gallouët 2026-01-10 09:46:24 +01:00
  • 9ac2693a30 server: fix n_cmpl not skipping processing prompt (#18663) b7690 Xuan-Son Nguyen 2026-01-10 00:00:41 +01:00
  • a61c8bc3bf mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256) b7689 Simranjeet Singh 2026-01-09 22:42:38 +00:00
  • 593da7fa49 opencl: add EXPM1 op (#18704) b7688 shaofeiqi 2026-01-09 10:13:13 -08:00
  • 9e41884dce Updates to webgpu get_memory (#18707) b7687 Reese Levine 2026-01-09 08:17:18 -08:00
  • 4a2751258a server : simplify prompt state transition branches gg/server-refactor Georgi Gerganov 2026-01-09 17:46:03 +02:00
  • ec8fd7876b Webui/file upload (#18694) Pascal 2026-01-09 16:45:32 +01:00
  • a180ba78c7 cmake: only build cli when server is enabled (#18670) b7685 Asbjørn Olling 2026-01-09 16:43:26 +01:00
  • cc5cafecf4 fix : nullptr task dereference Georgi Gerganov 2026-01-09 17:32:39 +02:00
  • aef22e7afc cont : reduce parent checks Georgi Gerganov 2026-01-09 16:44:07 +02:00
  • 9ceb268ee1 cont : remove redundant function Georgi Gerganov 2026-01-09 16:42:29 +02:00
  • a4854f0349 cont : improve n_cmpl logic Georgi Gerganov 2026-01-09 15:30:39 +02:00
  • caff0fd247 server : adjust unified KV cache tests gg/server-test-fix-race Georgi Gerganov 2026-01-09 14:26:14 +02:00
  • 71ba283a65 add eagle3 support for Qwen3 MoE models ruixiangw 2026-01-09 11:54:28 +00:00
  • f2d988db55 cont : cleanup Georgi Gerganov 2026-01-09 13:22:04 +02:00
  • 91fd50be1b Merge branch 'master' into pr/18663 Georgi Gerganov 2026-01-09 13:05:16 +02:00
  • 53eb9435da server : fix timing of prompt/generation (#18713) b7684 Georgi Gerganov 2026-01-09 12:59:50 +02:00
  • d3435efc8a scripts : pr2wt.sh reset to remote head (#18695) Georgi Gerganov 2026-01-09 12:16:40 +02:00