llama.cpp/utils.cpp at master-b3f460e

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-08 10:04:10 +00:00

Files

Georgi Gerganov 7a9b6c3a8b Reduce memory usage and allocate enough memory for largest context (#473 )

* Reduce memory usage and allocate enough memory for large contexts

* Simpler scratch buffer usage

* Reenable BLAS for quantized mul_mat

* Fix number of layers in 30B and 65B

* Fix KV cache size for F32

2023-03-24 23:17:37 +02:00

9.9 KiB

Raw Permalink Blame History

View Raw

9.9 KiB Raw Permalink Blame History

9.9 KiB

Raw Permalink Blame History