mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-08 10:04:10 +00:00
RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.