llama.cpp/examples at 78fbbc2c0788efc8857a2c0dc9802ec689fa12c1 - llama.cpp - Gitea: Git with a cup of tea

sdgoij/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-13 20:44:09 +00:00

Files

History

Georgi Gerganov 68e7ea3eab spec : parallel drafting support (#22838 )

* spec : refactor

* spec : drop support for incompatible vocabs

* spec : update common_speculative_init()

* cont : pass seq_id

* cont : dedup ctx_seq_rm_type

* server : sketch the ctx_dft decode loop

* server : draft prompt cache and checkpoints

* server : improve ctx names

* server, spec : transition to unified spec context

* cont : sync main and drft contexts

* cont : async drft eval when possible

* cont : handle non-ckpt models

* cont : pass correct n_past for drafting

* cont : process images throught the draft context

* spec : handle draft running out of context

* server : fix mtmd draft processing

* server : fix URL for draft model

* server : add comment

* server : clean-up + dry

* speculative-simple : update

* spec : fix n_past type

* server : fix slot ctx_drft ptr

* tools : update readme

* naming : improve consistency

* spec : refactor for multi-sequence speculative context

* cont : prepare params

* cont : prepare params

* spec : support parallel drafts

* server : support parallel drafting

* llama : reuse device buffers when possible

* server, spec : clean-up

* cont : clean-up

* cont : minor

* spec : reset `drafting` flag at the end

* spec : introduce `common_speculative_process()`

* spec : allow for multiple spec types (chain of speculators)

* replace old type field of type common_speculative_type in the
  common_params_speculative struct with a vector to allow multiple
  types to be specified

* introduce common_get_enabled_speculative_impls(const std::vector<enum common_speculative_type>)
  to figure out which implementations the user has enabled

* introduce common_speculative_type_from_names(const std::vector<std::string> & names)
  to parse the already user provided spec types

* all speculators run sequentially, best one wins (we verify its drafted tokens)

* maximize expected accepted tokens for current round by calculating the
  product between the probability of accepting current token (n_acc_tokens / n_gen_drafts)
  and the draft's length

---------

Co-authored-by: Petros Sideris <petros.sideris@nokia.com>

2026-05-11 19:09:43 +03:00

..

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

examples : remove references to make in examples [no ci] (#15457 )

2025-08-21 06:12:28 +02:00

convert-llama2c-to-ggml

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

common: fix missing exports in llama-common (#22340 )

2026-04-27 08:06:39 +03:00

deprecation-warning

Fix locale-dependent float printing in GGUF metadata (#17331 )

2026-03-04 09:30:40 +01:00

examples: refactor diffusion generation (#22590 )

2026-05-04 20:19:30 +08:00

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

common: fix missing exports in llama-common (#22340 )

2026-04-27 08:06:39 +03:00

spec : refactor params (#22397 )

2026-04-28 09:07:33 +03:00

Fix locale-dependent float printing in GGUF metadata (#17331 )

2026-03-04 09:30:40 +01:00

Fix locale-dependent float printing in GGUF metadata (#17331 )

2026-03-04 09:30:40 +01:00

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

android : libcommon -> libllama-common (#22076 )

2026-04-18 11:19:40 +02:00

llama : deprecate llama_kv_self_ API (#14030 )

2025-06-06 14:11:15 +03:00

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

spec : refactor params (#22397 )

2026-04-28 09:07:33 +03:00

model-conversion

model-conversion : fix mmproj output file name [no ci] (#22274 )

2026-04-23 15:07:38 +02:00

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

save-load-state

common : only load backends when required (#22290 )

2026-05-05 09:23:50 +02:00

Fix locale-dependent float printing in GGUF metadata (#17331 )

2026-03-04 09:30:40 +01:00

Fix locale-dependent float printing in GGUF metadata (#17331 )

2026-03-04 09:30:40 +01:00

simple-cmake-pkg

examples : add missing code block end marker [no ci] (#17756 )

2025-12-04 14:17:30 +01:00

spec : fix vocab compat checks in spec example (#22426 )

2026-04-30 08:18:25 +03:00

speculative-simple

spec : parallel drafting support (#22838 )

2026-05-11 19:09:43 +03:00

fix script error (#22795sycl : )

2026-05-08 06:54:57 +03:00

libs : rename libcommon -> libllama-common (#21936 )

2026-04-17 11:11:46 +03:00

CMakeLists.txt

examples : add debug utility/example (#18464 )

2026-01-07 10:42:19 +01:00

convert_legacy_llama.py

metadata: Detailed Dataset Authorship Metadata (#8875 )

2024-11-13 21:10:38 +11:00

json_schema_pydantic_example.py

py : type-check all Python scripts with Pyright (#8341 )

2024-07-07 15:04:39 -04:00

json_schema_to_grammar.py

ci : switch from pyright to ty (#20826 )

2026-03-21 08:54:34 +01:00

llama.vim

chore : correct typos [no ci] (#20041 )

2026-03-05 08:50:21 +01:00

pydantic_models_to_grammar_examples.py

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

pydantic_models_to_grammar.py

ci : switch from pyright to ty (#20826 )

2026-03-21 08:54:34 +01:00

reason-act.sh

scripts : make the shell scripts cross-platform (#14341 )

2025-06-30 10:17:18 +02:00

regex_to_grammar.py

py : switch to snake_case (#8305 )

2024-07-05 07:53:33 +03:00

server_embd.py

llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )

2025-04-08 19:54:51 +03:00

server-llama2-13B.sh

scripts : make the shell scripts cross-platform (#14341 )

2025-06-30 10:17:18 +02:00

ts-type-to-grammar.sh

scripts : make the shell scripts cross-platform (#14341 )

2025-06-30 10:17:18 +02:00