llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-14 21:14:10 +00:00

Files

Oliver Simons 7668999518 Merge branch 'master' into gpu-sampling

Let's keep `master's` cumsum implementation for it's likely better AMD
perf and add back pure-CUB-implementation in follow-up commit

2025-12-05 14:41:08 +01:00

batched

cont : keep backend sampling disabled for now

2025-12-04 17:42:09 +02:00

batched.swift

examples : remove references to make in examples [no ci] (#15457 )

2025-08-21 06:12:28 +02:00

convert-llama2c-to-ggml

gguf: gguf_writer refactor (#15691 )

2025-09-05 11:34:28 +02:00

deprecation-warning

Update deprecation-warning.cpp (#10619 )

2024-12-04 23:19:20 +01:00

diffusion

models : Added support for RND1 Diffusion Language Model (#17433 )

2025-11-24 14:16:56 +08:00

embedding

Merge branch 'master' into HEAD

2025-11-29 22:38:44 +02:00

eval-callback

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

gen-docs

ggml : move AMX to the CPU backend (#10570 )

2024-11-29 21:54:58 +01:00

gguf

examples(gguf): GGUF example outputs (#17025 )

2025-11-05 19:58:16 +02:00

gguf-hash

GGUF: C++ refactor, backend support, misc fixes (#11030 )

2025-01-07 18:01:58 +01:00

llama.android

llama : deprecate llama_kv_self_ API (#14030 )

2025-06-06 14:11:15 +03:00

llama.swiftui

llama : deprecate llama_kv_self_ API (#14030 )

2025-06-06 14:11:15 +03:00

lookahead

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

lookup

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

model-conversion

model : Qwen3 Next (#16095 )

2025-11-28 12:02:56 +01:00

parallel

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

passkey

examples : remove references to make in examples [no ci] (#15457 )

2025-08-21 06:12:28 +02:00

retrieval

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

save-load-state

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

simple

examples : support encoder-decoder models in the simple example (#16002 )

2025-09-17 10:29:00 +03:00

simple-chat

simple-chat : fix context-exceeded condition (#14494 )

2025-07-02 14:12:07 +03:00

simple-cmake-pkg

examples : add missing code block end marker [no ci] (#17756 )

2025-12-04 14:17:30 +01:00

speculative

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

speculative-simple

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

sycl

sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566 )

2025-11-29 14:59:44 +02:00

training

refactor : simplify and improve memory management

2025-11-28 16:09:42 +02:00

CMakeLists.txt

codeowners : update + cleanup (#16174 )

2025-09-22 18:20:21 +03:00

convert_legacy_llama.py

metadata: Detailed Dataset Authorship Metadata (#8875 )

2024-11-13 21:10:38 +11:00

json_schema_pydantic_example.py

py : type-check all Python scripts with Pyright (#8341 )

2024-07-07 15:04:39 -04:00

json_schema_to_grammar.py

common : fix json schema with '\' in literals (#17307 )

2025-11-29 17:06:32 +01:00

llama.vim

llama : remove KV cache defragmentation logic (#15473 )

2025-08-22 12:22:13 +03:00

pydantic_models_to_grammar_examples.py

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

pydantic_models_to_grammar.py

pydantic : replace uses of __annotations__ with get_type_hints (#8474 )

2024-07-14 19:51:21 -04:00

reason-act.sh

scripts : make the shell scripts cross-platform (#14341 )

2025-06-30 10:17:18 +02:00

regex_to_grammar.py

py : switch to snake_case (#8305 )

2024-07-05 07:53:33 +03:00

server_embd.py

llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )

2025-04-08 19:54:51 +03:00

server-llama2-13B.sh

scripts : make the shell scripts cross-platform (#14341 )

2025-06-30 10:17:18 +02:00

ts-type-to-grammar.sh

scripts : make the shell scripts cross-platform (#14341 )

2025-06-30 10:17:18 +02:00