Commit Graph

338 Commits

Author SHA1 Message Date
Georgi Gerganov
f588a70da3 context : wrap input tensors in struct
ggml-ci
2025-02-21 15:09:28 +02:00
Georgi Gerganov
ebf1bdf97b context : add logs
ggml-ci
2025-02-21 14:35:23 +02:00
Georgi Gerganov
548c230dff graph : remove worst_case from the API
ggml-ci
2025-02-21 13:29:25 +02:00
Georgi Gerganov
2645a7d9a9 context : add save/load for recurrent context
ggml-ci
2025-02-21 10:28:42 +02:00
Georgi Gerganov
08011c2ca1 context : add llama_kv_cache_recurrent prototype
ggml-ci
2025-02-20 20:55:13 +02:00
Georgi Gerganov
ad870c49f4 context : fix causal input for cache-less case
ggml-ci
2025-02-20 20:01:02 +02:00
Georgi Gerganov
b1554be1d7 context : add cache-less llama_context
ggml-ci
2025-02-20 18:30:04 +02:00
Georgi Gerganov
072280ea6b Merge branch 'master' into gg/llama-kv-cache
ggml-ci
2025-02-20 14:26:43 +02:00
Georgi Gerganov
f95b04a21c model : fix order kvq -> qkv
ggml-ci
2025-02-19 18:52:20 +02:00
Georgi Gerganov
2eacb4c1bf graph : simplify attention api
ggml-ci
2025-02-19 18:43:49 +02:00
Georgi Gerganov
e17e4b72d1 context : add llama_context_recurrent
ggml-ci
2025-02-19 16:07:27 +02:00
Georgi Gerganov
5f11a5502a kv-cache : remove llama_kv_cache_i 2025-02-19 14:36:27 +02:00
Daniel Bevenius
9626d9351a llama : fix indentation in llama-grammar [no ci] (#11943)
This commit adjusts the indentation for the functions `parse_sequence`
and `parse_rule` in src/llama-grammar.cpp.

The motivation is consistency and improve readability.
2025-02-19 06:16:23 +01:00
Georgi Gerganov
f5cedbcaaa kv-cache : prepare for abstraction
ggml-ci
2025-02-18 21:28:58 +02:00
Georgi Gerganov
2bffc2d514 model : pass llama_graph_i as ptr
ggml-ci
2025-02-18 14:57:26 +02:00
Georgi Gerganov
9e50456e19 context : minor simplify
ggml-ci
2025-02-18 14:53:02 +02:00
Georgi Gerganov
befe14f06f llama : reorder encode/decode in sources 2025-02-18 14:47:53 +02:00
Georgi Gerganov
bc6f187e9c cont : use returend tensors from the graph build
ggml-ci
2025-02-18 14:24:17 +02:00
Georgi Gerganov
172f61690c cont : return important tensors
ggml-ci
2025-02-18 13:48:43 +02:00
Georgi Gerganov
c23590319a graph : add llama_graph_result
ggml-ci
2025-02-18 13:48:21 +02:00
Georgi Gerganov
f0d3ff2388 Merge branch 'master' into gg/llama-kv-cache
ggml-ci
2025-02-18 10:14:37 +02:00
Georgi Gerganov
68ff663a04 repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00
Georgi Gerganov
1d801d27b9 graph : update attn/kv_self names 2025-02-14 17:22:55 +02:00
Georgi Gerganov
828064564c context : move common inputs to base class
ggml-ci
2025-02-14 16:48:21 +02:00
Georgi Gerganov
d5e8e1a2ba context : remove batch_manager
ggml-ci
2025-02-14 16:10:55 +02:00
Georgi Gerganov
131743ff4f context : abstract constructor and init
ggml-ci
2025-02-13 17:17:51 +02:00
Georgi Gerganov
ed3cb55abe context : abstract input
ggml-ci
2025-02-13 15:53:15 +02:00
Georgi Gerganov
107d1e2c32 context : move output functionality to base class
ggml-ci
2025-02-13 15:42:14 +02:00
Georgi Gerganov
e08f38df69 context : minor cleanup
ggml-ci
2025-02-13 12:50:53 +02:00
Georgi Gerganov
f7c7757bab context : abstract state read/write
ggml-ci
2025-02-13 12:37:28 +02:00
Georgi Gerganov
3a504d9a0b llama : introduce llama_io interfaces
ggml-ci
2025-02-13 12:25:54 +02:00
Olivier Chafik
c7f460ab88 server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607)
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B

* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template

* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out

* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability

* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 10:05:16 +00:00
Vinesh Janarthanan
27e8a23300 sampling: add Top-nσ sampler (#11223)
* initial sampling changes:

* completed top nsigma sampler implementation

* apply parameter to only llama-cli

* updated readme

* added tests and fixed nsigma impl

* cleaned up pr

* format

* format

* format

* removed commented tests

* cleanup pr and remove explicit floats

* added top-k sampler to improve performance

* changed sigma to float

* fixed string format to float

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* added llama_sampler_init

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 08:45:57 +02:00
Daniel Bevenius
3e69319772 llama : update llama_decode_internal ref [no ci] (#11840)
This commit updates the comment in llama_kv_cache.h to reflect the
change of the function name from llama_decode_internal to
llama_decode_impl.
2025-02-13 08:07:51 +02:00
Georgi Gerganov
fbe6a07256 context : rename to llama_context_kv_self 2025-02-12 17:16:44 +02:00
Georgi Gerganov
6ee86e5e0f graph : restore ubatch in build_cb
ggml-ci
2025-02-12 16:29:15 +02:00
bandoti
fef0cbeadf cleanup: fix compile warnings associated with gnu_printf (#11811) 2025-02-12 10:06:53 -04:00
Georgi Gerganov
f63aeecce6 llama : models now build their graphs using llama_graph_i
ggml-ci
2025-02-12 15:08:40 +02:00
Georgi Gerganov
0ab50f1bbb context : prepare llama_model graph build
ggml-ci
2025-02-12 14:09:55 +02:00
Georgi Gerganov
e633dc171a context : introduce llama_graph_i
ggml-ci
2025-02-12 13:49:44 +02:00
Georgi Gerganov
5eae8e5183 context : move build_rope_factors to base class
ggml-ci
2025-02-12 13:32:02 +02:00
Georgi Gerganov
d146a14f77 context : minor naming fix 2025-02-12 12:41:36 +02:00
Georgi Gerganov
8da7f612b7 context : improve llama_context encapsulation
ggml-ci
2025-02-12 12:15:04 +02:00
Georgi Gerganov
b52b79b048 context : move encode/decode to llama-context.cpp 2025-02-12 11:23:38 +02:00
Daniel Bevenius
369be5598a llama : fix typo in llama-grammar.h [no ci] (#11816) 2025-02-12 09:40:01 +02:00
Georgi Gerganov
02ef4be975 context : initial abstraction
ggml-ci
2025-02-11 22:27:21 +02:00
Wilken Gottwalt
19b392d58d llama-mmap: fix missing include (#11796)
Technically the fixed width types come only from iostream and
cstdint/stdint.h headers. memory and vector headers should not provide
these. In GCC 15 the headers are cleaned up and you require the proper
header cstdint.

src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type
   26 |     uint32_t read_u32() const;
      |     ^~~~~~~~
2025-02-10 20:58:18 +02:00
Georgi Gerganov
2cd8a903c8 context : make output functions members
ggml-ci
2025-02-10 17:01:27 +02:00
Georgi Gerganov
d1d8d53008 bman : remove ubatch member
ggml-ci
2025-02-10 16:50:14 +02:00
Georgi Gerganov
ef358ee78f context : add decode/encode
ggml-ci
2025-02-10 16:14:13 +02:00