llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-14 21:14:10 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	f588a70da3	context : wrap input tensors in struct ggml-ci	2025-02-21 15:09:28 +02:00
Georgi Gerganov	ebf1bdf97b	context : add logs ggml-ci	2025-02-21 14:35:23 +02:00
Georgi Gerganov	548c230dff	graph : remove worst_case from the API ggml-ci	2025-02-21 13:29:25 +02:00
Georgi Gerganov	2645a7d9a9	context : add save/load for recurrent context ggml-ci	2025-02-21 10:28:42 +02:00
Georgi Gerganov	08011c2ca1	context : add llama_kv_cache_recurrent prototype ggml-ci	2025-02-20 20:55:13 +02:00
Georgi Gerganov	ad870c49f4	context : fix causal input for cache-less case ggml-ci	2025-02-20 20:01:02 +02:00
Georgi Gerganov	b1554be1d7	context : add cache-less llama_context ggml-ci	2025-02-20 18:30:04 +02:00
Georgi Gerganov	072280ea6b	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-20 14:26:43 +02:00
Georgi Gerganov	f95b04a21c	model : fix order kvq -> qkv ggml-ci	2025-02-19 18:52:20 +02:00
Georgi Gerganov	2eacb4c1bf	graph : simplify attention api ggml-ci	2025-02-19 18:43:49 +02:00
Georgi Gerganov	e17e4b72d1	context : add llama_context_recurrent ggml-ci	2025-02-19 16:07:27 +02:00
Georgi Gerganov	5f11a5502a	kv-cache : remove llama_kv_cache_i	2025-02-19 14:36:27 +02:00
Daniel Bevenius	9626d9351a	llama : fix indentation in llama-grammar [no ci] (#11943 ) This commit adjusts the indentation for the functions `parse_sequence` and `parse_rule` in src/llama-grammar.cpp. The motivation is consistency and improve readability.	2025-02-19 06:16:23 +01:00
Georgi Gerganov	f5cedbcaaa	kv-cache : prepare for abstraction ggml-ci	2025-02-18 21:28:58 +02:00
Georgi Gerganov	2bffc2d514	model : pass llama_graph_i as ptr ggml-ci	2025-02-18 14:57:26 +02:00
Georgi Gerganov	9e50456e19	context : minor simplify ggml-ci	2025-02-18 14:53:02 +02:00
Georgi Gerganov	befe14f06f	llama : reorder encode/decode in sources	2025-02-18 14:47:53 +02:00
Georgi Gerganov	bc6f187e9c	cont : use returend tensors from the graph build ggml-ci	2025-02-18 14:24:17 +02:00
Georgi Gerganov	172f61690c	cont : return important tensors ggml-ci	2025-02-18 13:48:43 +02:00
Georgi Gerganov	c23590319a	graph : add llama_graph_result ggml-ci	2025-02-18 13:48:21 +02:00
Georgi Gerganov	f0d3ff2388	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-18 10:14:37 +02:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Georgi Gerganov	1d801d27b9	graph : update attn/kv_self names	2025-02-14 17:22:55 +02:00
Georgi Gerganov	828064564c	context : move common inputs to base class ggml-ci	2025-02-14 16:48:21 +02:00
Georgi Gerganov	d5e8e1a2ba	context : remove batch_manager ggml-ci	2025-02-14 16:10:55 +02:00
Georgi Gerganov	131743ff4f	context : abstract constructor and init ggml-ci	2025-02-13 17:17:51 +02:00
Georgi Gerganov	ed3cb55abe	context : abstract input ggml-ci	2025-02-13 15:53:15 +02:00
Georgi Gerganov	107d1e2c32	context : move output functionality to base class ggml-ci	2025-02-13 15:42:14 +02:00
Georgi Gerganov	e08f38df69	context : minor cleanup ggml-ci	2025-02-13 12:50:53 +02:00
Georgi Gerganov	f7c7757bab	context : abstract state read/write ggml-ci	2025-02-13 12:37:28 +02:00
Georgi Gerganov	3a504d9a0b	llama : introduce llama_io interfaces ggml-ci	2025-02-13 12:25:54 +02:00
Olivier Chafik	c7f460ab88	`server`: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless `--reasoning-format none` (#11607 ) * extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B * tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template * tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out * server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability * tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-13 10:05:16 +00:00
Vinesh Janarthanan	27e8a23300	sampling: add Top-nσ sampler (#11223 ) * initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-13 08:45:57 +02:00
Daniel Bevenius	3e69319772	llama : update llama_decode_internal ref [no ci] (#11840 ) This commit updates the comment in llama_kv_cache.h to reflect the change of the function name from llama_decode_internal to llama_decode_impl.	2025-02-13 08:07:51 +02:00
Georgi Gerganov	fbe6a07256	context : rename to llama_context_kv_self	2025-02-12 17:16:44 +02:00
Georgi Gerganov	6ee86e5e0f	graph : restore ubatch in build_cb ggml-ci	2025-02-12 16:29:15 +02:00
bandoti	fef0cbeadf	cleanup: fix compile warnings associated with gnu_printf (#11811 )	2025-02-12 10:06:53 -04:00
Georgi Gerganov	f63aeecce6	llama : models now build their graphs using llama_graph_i ggml-ci	2025-02-12 15:08:40 +02:00
Georgi Gerganov	0ab50f1bbb	context : prepare llama_model graph build ggml-ci	2025-02-12 14:09:55 +02:00
Georgi Gerganov	e633dc171a	context : introduce llama_graph_i ggml-ci	2025-02-12 13:49:44 +02:00
Georgi Gerganov	5eae8e5183	context : move build_rope_factors to base class ggml-ci	2025-02-12 13:32:02 +02:00
Georgi Gerganov	d146a14f77	context : minor naming fix	2025-02-12 12:41:36 +02:00
Georgi Gerganov	8da7f612b7	context : improve llama_context encapsulation ggml-ci	2025-02-12 12:15:04 +02:00
Georgi Gerganov	b52b79b048	context : move encode/decode to llama-context.cpp	2025-02-12 11:23:38 +02:00
Daniel Bevenius	369be5598a	llama : fix typo in llama-grammar.h [no ci] (#11816 )	2025-02-12 09:40:01 +02:00
Georgi Gerganov	02ef4be975	context : initial abstraction ggml-ci	2025-02-11 22:27:21 +02:00
Wilken Gottwalt	19b392d58d	llama-mmap: fix missing include (#11796 ) Technically the fixed width types come only from iostream and cstdint/stdint.h headers. memory and vector headers should not provide these. In GCC 15 the headers are cleaned up and you require the proper header cstdint. src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type 26 \| uint32_t read_u32() const; \| ^~~~~~~~	2025-02-10 20:58:18 +02:00
Georgi Gerganov	2cd8a903c8	context : make output functions members ggml-ci	2025-02-10 17:01:27 +02:00
Georgi Gerganov	d1d8d53008	bman : remove ubatch member ggml-ci	2025-02-10 16:50:14 +02:00
Georgi Gerganov	ef358ee78f	context : add decode/encode ggml-ci	2025-02-10 16:14:13 +02:00

1 2 3 4 5 ...

338 Commits