Commit Graph

  • 914dde72ba ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511) b8005 Georgi Gerganov 2026-02-11 18:58:43 +02:00
  • 3136a849db common : remove unused token util functions (#19506) b8004 Daniel Bevenius 2026-02-11 17:41:35 +01:00
  • e463bbdf65 model: Add Kimi-K2.5 support (#19170) b8003 AesSedai 2026-02-11 07:47:30 -08:00
  • 76d9439276 move allocation workaround out of ggml-alloc.c Johannes Gäßler 2026-02-11 15:21:58 +01:00
  • 4dc3d10e80 Remove shfl and AllReduce from backend interface Johannes Gäßler 2026-02-11 14:51:37 +01:00
  • 29c5327d01 GGML: HIP: add RCCL support Carl Philipp Klemm 2026-02-11 13:42:23 +01:00
  • e7fbfc9b80 ci : tmp fixes pr/19378-test-tp Georgi Gerganov 2026-02-11 15:48:40 +02:00
  • 8de41b5b40 NCCL support Johannes Gäßler 2026-02-10 21:01:59 +01:00
  • c531444411 fix output pattern Johannes Gäßler 2026-02-09 22:40:30 +01:00
  • c925563499 re-use buffers + ggml contexts Johannes Gäßler 2026-02-08 23:45:10 +01:00
  • 02325685ae unconditional peer access Johannes Gäßler 2026-02-07 23:34:01 +01:00
  • 2ffa49decc add support for 4/8 GPUs Johannes Gäßler 2026-02-07 19:18:36 +01:00
  • 4b8aa26650 partial Vulkan fix Johannes Gäßler 2026-02-07 00:19:36 +01:00
  • ab69c58aaa support for GPT-OSS, Qwen 3 MoE Johannes Gäßler 2026-02-06 17:09:01 +01:00
  • a0d9dd20ee ggml: backend-agnostic tensor parallelism Johannes Gäßler 2026-01-14 15:52:53 +01:00
  • 53de59f67d build : fix case in dSYMs path for build-macos [no ci] (#19515) Daniel Bevenius 2026-02-11 14:02:29 +01:00
  • 9ab072ebbe metal : extend l2_norm support for non-cont src0 (#19502) b8001 Georgi Gerganov 2026-02-11 14:53:19 +02:00
  • d46bd7ef2d Apply suggestion from @ggerganov (src->buffer to buf_src) v2 Andreas Kieslinger 2026-02-06 09:55:51 +01:00
  • 070933684f Apply suggestion from @ggerganov (src->buffer to buf_src) Andreas Kieslinger 2026-02-06 09:55:19 +01:00
  • 1528c841dc Simplifies synchronizations to adhere to saaasg pattern. aendk 2026-01-19 17:45:33 +01:00
  • ff28ae93a2 Corrects initialization of ggml_backend_sync_mode in ggml_backend_sched_split initialization aendk 2026-01-16 10:43:56 +01:00
  • 01d89f9b96 Reintroduces stricter check for CPU->CUDA backend async copy via GGML_DEVICE_TYPE_CPU. aendk 2026-01-12 15:35:38 +01:00
  • e74b070e30 Makes opt-in to relax use of explicit syncs more general. Backends like vulkan which require a synchronization between HtoD copies and graph execution could also adopt this change now. aendk 2026-01-12 14:16:01 +01:00
  • b7376c3ed7 Minor cleanup aendk 2026-01-09 17:07:19 +01:00
  • d776354dc9 Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues aendk 2025-12-19 11:30:03 +01:00
  • 79a77277ad Reworked backend detection in ggml-backend.cpp to avoid linking conflicts aendk 2025-12-18 10:25:14 +01:00
  • 44e481bb34 Adds macro guards to allow compilation in non-CUDA builds aendk 2025-12-16 17:41:40 +01:00
  • 91c6026b5c Exchanges synchronous copy with async copy function. aendk 2025-12-16 17:21:00 +01:00
  • 2ad0d391e1 Adds function to relax sync requirements between input copies on supported backends (CUDA for now) aendk 2025-12-16 16:51:54 +01:00
  • dd9f1faf42 Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() aendk 2025-12-15 10:44:38 +01:00
  • a554bdd70f metal : fix event synchronization in cpy_tensor_async (#19402) Georgi Gerganov 2026-02-07 07:37:15 +02:00
  • ada90bf2ba docs: ban AI for issues and discussions [no CI] (#19512) Johannes Gäßler 2026-02-11 12:49:40 +01:00
  • 0c1f39a9ae common : improve download error reporting (#19491) b7999 Adrien Gallouët 2026-02-11 09:27:55 +01:00
  • 73cd5e1b97 hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (#19406) b7998 Max Krasnyansky 2026-02-10 23:21:12 -08:00
  • 8ee538ce73 llama : correct typos 'occured' and 'occurences' (#19414) b7997 thecaptain789 2026-02-11 06:05:31 +00:00
  • 6d95707827 model : fix wavtokenizer embedding notions (#19479) b7996 Georgi Gerganov 2026-02-11 07:52:20 +02:00
  • 89181c0b6d ggml : extend bin bcast for permuted src1 (#19484) b7995 Georgi Gerganov 2026-02-11 07:52:00 +02:00
  • ceaa89b786 metal : consolidate unary ops (#19490) b7994 Georgi Gerganov 2026-02-11 07:51:12 +02:00
  • 2cce9fddb7 llama : refactor sampling_info to use buffer_view template (#19368) b7993 Daniel Bevenius 2026-02-11 05:38:13 +01:00
  • 5372fc6461 wip gg/qwen3-next-opt-tmp Georgi Gerganov 2026-02-10 23:44:42 +02:00
  • 0cc02542a8 wip Georgi Gerganov 2026-02-10 23:36:46 +02:00
  • 612db61886 CUDA : Update CCCL-tag for 3.2 to final release from RC (#19486) b7992 Oliver Simons 2026-02-10 22:31:19 +01:00
  • 08358235a3 wip Georgi Gerganov 2026-02-10 22:46:09 +02:00
  • bd7c16f0a4 metal : extend l2_norm support for non-cont src0 Georgi Gerganov 2026-02-10 22:45:59 +02:00
  • 6bd21ebb29 wip Georgi Gerganov 2026-02-10 22:04:37 +02:00
  • 862d720ad1 wip Georgi Gerganov 2026-02-10 21:53:34 +02:00
  • 89dd9f6a10 wip Georgi Gerganov 2026-02-10 21:24:39 +02:00
  • e480e383fd wip Georgi Gerganov 2026-02-10 20:50:47 +02:00
  • 1c312dc758 wip Georgi Gerganov 2026-02-10 20:33:38 +02:00
  • 835a949286 wip Georgi Gerganov 2026-02-10 20:02:06 +02:00
  • b1264663c2 wip Georgi Gerganov 2026-02-10 19:36:49 +02:00
  • e2c0463eab tests : simplify Georgi Gerganov 2026-02-10 17:33:57 +02:00
  • c82bc9c030 cont : s0 is always 1 Georgi Gerganov 2026-02-10 17:12:27 +02:00
  • 029c30fda4 cont : extend bin support Georgi Gerganov 2026-02-10 12:44:50 +02:00
  • 0b0bfb20f4 tests : extend bin bcast for permuted src1 Georgi Gerganov 2026-02-10 12:23:07 +02:00
  • ff77be289e metal : consolidate unary ops Georgi Gerganov 2026-02-10 14:51:13 +02:00
  • 57487a64c8 [WebGPU] Plug memory leaks and free resources on shutdown (#19315) b7991 Nikhil Jain 2026-02-10 08:04:00 -08:00
  • fc0fe40049 models : support qwen3.5 series (#19468) b7990 JJJYmmm 2026-02-11 00:00:26 +08:00
  • 9a96352729 test: fix IMROPE perf test case (#19465) b7989 Xuan-Son Nguyen 2026-02-10 14:37:50 +01:00
  • b9b56b017e Apply suggestion from @ggerganov (src->buffer to buf_src) v2 pr/17795-test-ci Andreas Kieslinger 2026-02-06 09:55:51 +01:00
  • 05c74eae8a Apply suggestion from @ggerganov (src->buffer to buf_src) Andreas Kieslinger 2026-02-06 09:55:19 +01:00
  • 84252009b2 Simplifies synchronizations to adhere to saaasg pattern. aendk 2026-01-19 17:45:33 +01:00
  • 2789c1b396 Corrects initialization of ggml_backend_sync_mode in ggml_backend_sched_split initialization aendk 2026-01-16 10:43:56 +01:00
  • e03fb8eee7 Reintroduces stricter check for CPU->CUDA backend async copy via GGML_DEVICE_TYPE_CPU. aendk 2026-01-12 15:35:38 +01:00
  • bba41184de Makes opt-in to relax use of explicit syncs more general. Backends like vulkan which require a synchronization between HtoD copies and graph execution could also adopt this change now. aendk 2026-01-12 14:16:01 +01:00
  • 362934a975 Minor cleanup aendk 2026-01-09 17:07:19 +01:00
  • 5a77ac71b4 Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues aendk 2025-12-19 11:30:03 +01:00
  • 5fba596128 Reworked backend detection in ggml-backend.cpp to avoid linking conflicts aendk 2025-12-18 10:25:14 +01:00
  • 0ae8664b8e Adds macro guards to allow compilation in non-CUDA builds aendk 2025-12-16 17:41:40 +01:00
  • 1f959c5cee Exchanges synchronous copy with async copy function. aendk 2025-12-16 17:21:00 +01:00
  • a187cbdb80 Adds function to relax sync requirements between input copies on supported backends (CUDA for now) aendk 2025-12-16 16:51:54 +01:00
  • cb39afd239 Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() aendk 2025-12-15 10:44:38 +01:00
  • c03a5a46f0 ggml-cpu: arm64: q6_K repack gemm and gemv (and generic) implementations (dotprod) (#19360) b7988 Alberto Cabrera Pérez 2026-02-10 10:47:45 +00:00
  • 6948adc90d ggml : use noexcept overload for is_regular_file in backend registration (#19452) b7987 k4ss4n 2026-02-10 10:57:48 +01:00
  • 404d0c8e80 cont Georgi Gerganov 2026-02-10 11:22:50 +02:00
  • 25dad910ab models : optimizing qwen3next graph Georgi Gerganov 2026-02-05 22:20:12 +02:00
  • 854b09f0d7 convert : move experts permutation from Qwen2MoeModel to Qwen3VLMoeTextModel (#19445) Piotr Wilkin (ilintar) 2026-02-10 09:01:37 +01:00
  • 66d403c480 tts : fix typos in README.md [no ci] (#19463) Daniel Bevenius 2026-02-10 07:30:41 +01:00
  • f0bfe54f55 CANN: Remove unnecessary wrapper for gml_backend_buft_is_cann (#18968) b7984 Raul Torres 2026-02-10 06:19:30 +00:00
  • 52e38faf8c CANN: implement quantized MUL_MAT_ID for MoE models (#19228) b7983 hipudding 2026-02-10 14:18:59 +08:00
  • a0d585537c cuda : extend GGML_OP_PAD to work with non-cont src0 (#19429) b7982 Georgi Gerganov 2026-02-10 08:07:16 +02:00
  • 02ee504f90 fix output pattern Johannes Gäßler 2026-02-09 22:40:30 +01:00
  • 98e57ca422 chat: fix case where template accepts type content only (#19419) b7981 Xuan-Son Nguyen 2026-02-09 22:14:12 +01:00
  • 262364e31d mtmd: Implement tiling for LFM2-VL (#19454) Tarek Dakhran 2026-02-09 17:30:32 +01:00
  • 820ebfa6f4 Server: log when converting requests to chat completions format (#19457) 손희준 2026-02-10 00:22:57 +09:00
  • 5e224bc190 Merge branch 'master' into pr/18039 Georgi Gerganov 2026-02-09 15:40:01 +02:00
  • 292f6908cd spec : remove check rate (#19377) Sascha Rogmann 2026-02-09 14:30:50 +01:00
  • 81ddc60cb3 ci : add metal server workflows (#19293) Georgi Gerganov 2026-02-09 15:09:30 +02:00
  • 972f323e73 revert : "[Model] Qwen3.5 dense and MoE support (no vision) (#19435)" (#19453) b7976 Georgi Gerganov 2026-02-09 14:57:51 +02:00
  • f5e7734ff2 ggml-virtgpu: add backend documentation (#19354) Kevin Pouget 2026-02-09 13:15:42 +01:00
  • 1e8924fd65 cmake : add variable to skip installing tests (#19370) b7974 Hugo 2026-02-09 06:12:02 +00:00
  • 94c66557b8 re-use buffers + ggml contexts Johannes Gäßler 2026-02-08 23:45:10 +01:00
  • 39bf692af1 [Model] Qwen3.5 dense and MoE support (no vision) (#19435) b7973 Piotr Wilkin (ilintar) 2026-02-09 00:24:08 +01:00
  • e06088da0f CUDA: Fix non-contig rope (#19338) b7972 Oliver Simons 2026-02-08 14:12:51 +01:00
  • 6ab02d0908 unconditional peer access Johannes Gäßler 2026-02-07 23:34:01 +01:00
  • 5fa1c190d9 rpc : update from common.cpp (#19400) b7971 Adrien Gallouët 2026-02-08 09:06:45 +01:00
  • eb449cdfa4 server : improve context checkpoint logic (#19408) b7970 Georgi Gerganov 2026-02-08 09:40:04 +02:00
  • 5999b50eb0 llama-quantize : cleanup --help output (#19317) b7969 ddh0 2026-02-08 01:22:38 -06:00
  • 9a5f57795c ci : remove server job from webui and move slow test (#19424) b7968 Sigbjørn Skjæret 2026-02-08 01:20:00 +01:00
  • 96441c955e ci : use -j param correctly when building with sanitizers (#19411) Georgi Gerganov 2026-02-08 00:50:47 +02:00