mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-05-08 10:04:10 +00:00
Store the last computed graph and reuse it when possible. Also do not return response from GRAPH_COMPUTE and assume it always completes successfully. If this this is not the case, the server closes the connection. This saves us a network round trip to the server.