llama.cpp/ggml/src at b6608 - llama.cpp - Gitea: Git with a cup of tea

sdgoij/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-06 17:14:07 +00:00

Files

History

Aman Gupta c0bfc57af4 CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277 )

* CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32

This commit adds mul_mat_id support for ncols_dst >= 16. It does this by
packing ncols_dst tiles into the blockDim.y.

My tests on a RTX 3090 show that this is faster than the cuBLAS fallback
for f16 till bs=64, and for f32 till bs=32

* Review: refactor if statement

2025-09-27 18:49:32 +02:00

..

rename optimize_graph to graph_optimize (#16082 )

2025-09-18 13:46:17 -05:00

rename optimize_graph to graph_optimize (#16082 )

2025-09-18 13:46:17 -05:00

devops: add s390x & ppc64le CI (#15925 )

2025-09-27 02:03:33 +08:00

CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277 )

2025-09-27 18:49:32 +02:00

HIP: bump requirement to rocm 6.1 (#15296 )

2025-08-13 20:44:30 +02:00

metal : report OOM errors (#16274 )

2025-09-26 14:14:28 +03:00

CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433 )

2025-08-20 16:58:49 +02:00

ggml : implement set_rows with i32 index (#16159 )

2025-09-22 19:13:00 +02:00

rpc : use ggml logging facilities

2025-09-25 07:20:02 +00:00

ggml : implement set_rows with i32 index (#16159 )

2025-09-22 19:13:00 +02:00

vulkan: throw system error instead of SIGABRT during init on older devices (#16156 )

2025-09-27 18:26:46 +02:00

ggml : implement set_rows with i32 index (#16159 )

2025-09-22 19:13:00 +02:00

zdnn: refactor codebase + add docs (#16178 )

2025-09-23 14:53:05 +08:00

CMakeLists.txt

cmake : fix static linking for OpenMP on Unix-like systems (#16031 )

2025-09-18 23:07:18 +02:00

ggml-alloc.c

ggml : split graph allocations according to backend max buffer size (#15815 )

2025-09-24 16:17:49 +02:00

ggml-backend-impl.h

rename optimize_graph to graph_optimize (#16082 )

2025-09-18 13:46:17 -05:00

ggml-backend-reg.cpp

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797 )

2025-09-11 22:47:38 +02:00

ggml-backend.cpp

llama: print memory breakdown on exit (#15860 )

2025-09-24 16:53:48 +02:00

ggml-common.h

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

ggml-impl.h

ggml : split graph allocations according to backend max buffer size (#15815 )

2025-09-24 16:17:49 +02:00

ggml-opt.cpp

finetune: SGD optimizer, more CLI args (#13873 )

2025-08-14 12:03:57 +02:00

ggml-quants.c

ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928 )

2025-09-23 10:25:20 +02:00

ggml-quants.h

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

ggml-threading.cpp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-threading.h

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )

2024-12-12 19:02:49 +01:00

ggml.c

devops: add s390x & ppc64le CI (#15925 )

2025-09-27 02:03:33 +08:00

ggml.cpp

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

2025-06-01 13:43:57 +03:00

gguf.cpp

gguf: gguf_writer refactor (#15691 )

2025-09-05 11:34:28 +02:00