llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-01 22:54:05 +00:00

Files

Kevin Pouget 015deb9048 ggml-virtgpu: make the code thread safe (#19204 )

* ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a function

* ggml-virtgpu: deprecate buffer_type is_host remoting

not necessary

* ggml-virtgpu: stop using static vars as cache

The static init isn't thread safe.

* ggml-virtgpu: protect the use of the shared memory to transfer data

* ggml-virtgpu: make the remote calls thread-safe

* ggml-virtgpu: backend: don't continue if couldn't allocate the tensor memory

* ggml-virtgpu: add a cleanup function for consistency

* ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is missing

* fix style and ordering

* Remove the static variable in apir_device_get_count

* ggml-virtgpu: improve the logging

* fix review minor formatting changes

2026-02-04 10:46:18 +08:00

ggml-alloc.h

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )

2025-12-15 09:24:59 +01:00

ggml-backend.h

vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295 )

2026-01-01 08:58:27 +01:00

ggml-blas.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-cann.h

docs : Minor cleanups (#19252 )