llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-01 14:44:05 +00:00

Files

nullname 85dde8dc4a hexagon: optimize HMX matmul operations (#21071 )

* optimize hmx_mat_mul functions by calculating row and column tiles upfront

* refactor core_dot_chunk_fp16 to use size_t for tile counts and improve readability

* wip

* set scale outside of loop

* wip

* refactor core_mma_chunk_fp16 and mat_mul_qk_0_d16a32 to use size_t for tile counts

* wip

* wip

* refactor transfer_output_chunk_fp16_to_fp32 to use size_t for dimensions

* refactor core_dot_chunk_fp16 to use size_t for tile row stride calculation

* wip

* refactor hmx_mat_mul functions to use hvx_vec_splat_f16 for column scales initialization

* refactor hmx_mat_mul_permuted_w16a32_batched to streamline scale setting and locking

* refactor core_dot_chunk_fp16 to improve tile stride calculations for output

* refactor hmx_mat_mul functions to use Q6_V_vsplat_R for column scales initialization

* fix compiling error

* wip

* optimize row and column tile indexing in core_mma_chunk_fp16 function

* wip

* Revert "wip"

This reverts commit cde679eff7.

* Add size limit check for HAP_mmap in htp_iface_mmap and drop_mmap functions

* wip

2026-04-16 13:48:34 -07:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

CUDA: manage NCCL communicators in context (#21891 )

2026-04-15 15:58:40 +02:00

src

hexagon: optimize HMX matmul operations (#21071 )

2026-04-16 13:48:34 -07:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

[SYCL] Fix Q8_0 reorder: garbage on 2nd prompt + crash on full VRAM (#21638 )

2026-04-16 08:34:05 +03:00