gen-libllama-abi: compile sort-key regex once outside the lambda

Agent-Logs-Url: https://github.com/ggml-org/llama.cpp/sessions/cd21903e-afd2-477a-8285-0a2d46e1398c Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com>
semver: revert llama_export.h, fix ABI baseline to track full signatures
2026-05-03 23:54:19 +00:00 · 2026-04-15 12:04:44 +00:00 · 2026-04-15 12:02:36 +00:00 · 2026-04-15 11:45:14 +00:00 · 2026-04-15 11:44:00 +00:00
5 changed files with 514 additions and 2 deletions
--- a/.github/workflows/libllama-abi-check.yml
+++ b/.github/workflows/libllama-abi-check.yml
@@ -0,0 +1,99 @@
+name: libllama ABI check
+
+# Checks exported function signatures of libllama against a committed baseline
+# (scripts/libllama.abi) and determines whether a major (signatures
+# removed/changed) or minor (signatures added) version bump is required.
+#
+# The baseline is generated from include/llama.h by scripts/gen-libllama-abi.py.
+# To update the baseline after an intentional ABI change:
+#
+#   python3 scripts/gen-libllama-abi.py include/llama.h > scripts/libllama.abi
+#
+# Then increment LLAMA_VERSION_MAJOR (breaking change) or LLAMA_VERSION_MINOR
+# (backwards-compatible addition) in CMakeLists.txt.
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - master
+    paths:
+      - 'include/llama.h'
+      - 'scripts/libllama.abi'
+      - 'scripts/gen-libllama-abi.py'
+      - 'CMakeLists.txt'
+      - '.github/workflows/libllama-abi-check.yml'
+
+  pull_request:
+    types: [opened, synchronize, reopened]
+    paths:
+      - 'include/llama.h'
+      - 'scripts/libllama.abi'
+      - 'scripts/gen-libllama-abi.py'
+      - 'CMakeLists.txt'
+      - '.github/workflows/libllama-abi-check.yml'
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  abi-check:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+
+      - name: Extract current signatures
+        run: |
+          python3 scripts/gen-libllama-abi.py include/llama.h > /tmp/current.abi
+
+      - name: Compare with baseline
+        id: compare
+        run: |
+          baseline=scripts/libllama.abi
+          current=/tmp/current.abi
+
+          removed=$(comm -23 "$baseline" "$current")
+          added=$(comm -13 "$baseline" "$current")
+
+          if [ -n "$removed" ]; then
+            echo "bump=major" >> "$GITHUB_OUTPUT"
+            echo "### :boom: MAJOR version bump required" >> "$GITHUB_STEP_SUMMARY"
+            echo "" >> "$GITHUB_STEP_SUMMARY"
+            echo "The following exported signatures were **removed or changed** in libllama:" >> "$GITHUB_STEP_SUMMARY"
+            echo '```' >> "$GITHUB_STEP_SUMMARY"
+            echo "$removed" >> "$GITHUB_STEP_SUMMARY"
+            echo '```' >> "$GITHUB_STEP_SUMMARY"
+          elif [ -n "$added" ]; then
+            echo "bump=minor" >> "$GITHUB_OUTPUT"
+            echo "### :sparkles: MINOR version bump required" >> "$GITHUB_STEP_SUMMARY"
+            echo "" >> "$GITHUB_STEP_SUMMARY"
+            echo "The following new signatures were **added** to libllama:" >> "$GITHUB_STEP_SUMMARY"
+            echo '```' >> "$GITHUB_STEP_SUMMARY"
+            echo "$added" >> "$GITHUB_STEP_SUMMARY"
+            echo '```' >> "$GITHUB_STEP_SUMMARY"
+          else
+            echo "bump=patch" >> "$GITHUB_OUTPUT"
+            echo "### :white_check_mark: No ABI change – PATCH version bump only" >> "$GITHUB_STEP_SUMMARY"
+          fi
+
+          if [ -n "$removed" ] || [ -n "$added" ]; then
+            echo "" >> "$GITHUB_STEP_SUMMARY"
+            echo "Regenerate the baseline and bump the version:" >> "$GITHUB_STEP_SUMMARY"
+            echo '```' >> "$GITHUB_STEP_SUMMARY"
+            echo "python3 scripts/gen-libllama-abi.py include/llama.h > scripts/libllama.abi" >> "$GITHUB_STEP_SUMMARY"
+            echo '```' >> "$GITHUB_STEP_SUMMARY"
+            echo "Then increment \`LLAMA_VERSION_MAJOR\` (breaking) or \`LLAMA_VERSION_MINOR\` (additive) in \`CMakeLists.txt\`." >> "$GITHUB_STEP_SUMMARY"
+          fi
+
+      - name: Fail on unacknowledged ABI change
+        if: steps.compare.outputs.bump == 'major' || steps.compare.outputs.bump == 'minor'
+        run: |
+          echo "ABI change detected. Run: python3 scripts/gen-libllama-abi.py include/llama.h > scripts/libllama.abi"
+          echo "Then bump LLAMA_VERSION_MAJOR (breaking) or LLAMA_VERSION_MINOR (additive) in CMakeLists.txt."
+          exit 1
+
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -127,7 +127,13 @@ endif()
 if (NOT DEFINED LLAMA_BUILD_COMMIT)
    set(LLAMA_BUILD_COMMIT        ${BUILD_COMMIT})
 endif()
-set(LLAMA_INSTALL_VERSION 0.0.${LLAMA_BUILD_NUMBER})
+if (NOT DEFINED LLAMA_VERSION_MAJOR)
+    set(LLAMA_VERSION_MAJOR 0)
+endif()
+if (NOT DEFINED LLAMA_VERSION_MINOR)
+    set(LLAMA_VERSION_MINOR 0)
+endif()
+set(LLAMA_INSTALL_VERSION ${LLAMA_VERSION_MAJOR}.${LLAMA_VERSION_MINOR}.${LLAMA_BUILD_NUMBER})

 # override ggml options
 set(GGML_ALL_WARNINGS   ${LLAMA_ALL_WARNINGS})
--- a/scripts/gen-libllama-abi.py
+++ b/scripts/gen-libllama-abi.py
@@ -0,0 +1,174 @@
+#!/usr/bin/env python3
+"""Extract LLAMA_API function signatures from include/llama.h.
+
+Outputs one normalized signature per line, sorted alphabetically by function
+name.  The result is suitable for committing as scripts/libllama.abi and for
+diffing in CI to detect ABI changes.
+
+Usage:
+    python3 scripts/gen-libllama-abi.py [path/to/llama.h]
+"""
+
+import re
+import sys
+
+
+def preprocess(text: str) -> str:
+    """Strip comments and preprocessor directives, keeping newlines for
+    accurate error reporting (we don't use line numbers here but it keeps
+    the character offsets meaningful for debugging)."""
+
+    # Remove /* ... */ block comments (may span lines).
+    text = re.sub(r'/\*.*?\*/', lambda m: '\n' * m.group().count('\n'), text, flags=re.DOTALL)
+
+    # Remove // ... line comments (keep the newline).
+    text = re.sub(r'//[^\n]*', '', text)
+
+    # Remove preprocessor directive lines (lines where the first non-space
+    # char is '#').  Replace with blank lines to preserve offsets.
+    lines = text.splitlines(keepends=True)
+    result = []
+    for line in lines:
+        if line.lstrip().startswith('#'):
+            result.append('\n' * line.count('\n'))
+        else:
+            result.append(line)
+    return ''.join(result)
+
+
+def normalize(s: str) -> str:
+    """Collapse all whitespace runs to a single space and strip edges."""
+    return re.sub(r'\s+', ' ', s).strip()
+
+
+def extract_signatures(header_text: str) -> list[str]:
+    """Return a sorted list of normalized LLAMA_API function signatures."""
+
+    text = preprocess(header_text)
+    sigs: list[str] = []
+
+    i = 0
+    n = len(text)
+
+    while i < n:
+        # Find the next LLAMA_API token.
+        pos = text.find('LLAMA_API', i)
+        if pos == -1:
+            break
+        i = pos + len('LLAMA_API')
+
+        # Skip leading whitespace after LLAMA_API.
+        while i < n and text[i] in ' \t\r\n':
+            i += 1
+
+        # Determine whether we are inside DEPRECATED(...).
+        #
+        # Case A: DEPRECATED(LLAMA_API ..., "hint");
+        #   – look back before the LLAMA_API token for 'DEPRECATED('
+        # Case B: LLAMA_API DEPRECATED(return_type func(...), "hint");
+        #   – look forward for 'DEPRECATED('
+
+        # Case A: look back (skip whitespace) for 'DEPRECATED('
+        before = text[:pos].rstrip()
+        in_deprecated_wrap = before.endswith('DEPRECATED(')
+
+        if in_deprecated_wrap:
+            # We are the argument list of DEPRECATED(LLAMA_API ..., "hint");
+            # Collect everything until the matching ')' that closes DEPRECATED,
+            # then strip the trailing , "hint" part.
+            depth = 1   # we just entered DEPRECATED(
+            start = i   # start of "return_type func_name(..."
+            j = i
+            while j < n and depth > 0:
+                if text[j] == '(':
+                    depth += 1
+                elif text[j] == ')':
+                    depth -= 1
+                j += 1
+            # text[start:j-1] is everything inside DEPRECATED(...).
+            # We need the function signature part, which ends at the last
+            # top-level comma (separating the function from the "hint" string).
+            inner = text[start:j - 1]
+            # Find the last top-level comma.
+            depth2 = 0
+            last_comma = -1
+            for k, ch in enumerate(inner):
+                if ch == '(':
+                    depth2 += 1
+                elif ch == ')':
+                    depth2 -= 1
+                elif ch == ',' and depth2 == 0:
+                    last_comma = k
+            sig_raw = inner[:last_comma] if last_comma != -1 else inner
+            i = j
+        elif text[i:i + len('DEPRECATED(')] == 'DEPRECATED(':
+            # Case B: LLAMA_API DEPRECATED(return_type func(...), "hint");
+            i += len('DEPRECATED(')
+            depth = 1
+            start = i
+            j = i
+            while j < n and depth > 0:
+                if text[j] == '(':
+                    depth += 1
+                elif text[j] == ')':
+                    depth -= 1
+                j += 1
+            inner = text[start:j - 1]
+            depth2 = 0
+            last_comma = -1
+            for k, ch in enumerate(inner):
+                if ch == '(':
+                    depth2 += 1
+                elif ch == ')':
+                    depth2 -= 1
+                elif ch == ',' and depth2 == 0:
+                    last_comma = k
+            sig_raw = inner[:last_comma] if last_comma != -1 else inner
+            i = j
+        else:
+            # Plain: LLAMA_API return_type func_name(...);
+            # Collect until the ';' at parenthesis depth 0.
+            depth = 0
+            start = i
+            j = i
+            while j < n:
+                if text[j] == '(':
+                    depth += 1
+                elif text[j] == ')':
+                    depth -= 1
+                    if depth == 0:
+                        j += 1
+                        break
+                elif text[j] == ';' and depth == 0:
+                    break
+                j += 1
+            sig_raw = text[start:j]
+            # Advance past the ';'
+            i = j
+            while i < n and text[i] in ' \t\r\n;':
+                i += 1
+
+        sig = normalize(sig_raw)
+        if sig and '(' in sig:
+            sigs.append(sig)
+
+    _name_re = re.compile(r'\b(llama_\w+)\s*\(')
+
+    def _sort_key(s: str) -> str:
+        m = _name_re.search(s)
+        return m.group(1) if m else s
+
+    sigs.sort(key=_sort_key)
+    return sigs
+
+
+def main() -> None:
+    header_path = sys.argv[1] if len(sys.argv) > 1 else 'include/llama.h'
+    with open(header_path, encoding='utf-8') as f:
+        text = f.read()
+    for sig in extract_signatures(text):
+        print(sig)
+
+
+if __name__ == '__main__':
+    main()
--- a/scripts/libllama.abi
+++ b/scripts/libllama.abi
@@ -0,0 +1,233 @@
+const llama_token * llama_adapter_get_alora_invocation_tokens (const struct llama_adapter_lora * adapter)
+uint64_t llama_adapter_get_alora_n_invocation_tokens(const struct llama_adapter_lora * adapter)
+void llama_adapter_lora_free(struct llama_adapter_lora * adapter)
+struct llama_adapter_lora * llama_adapter_lora_init( struct llama_model * model, const char * path_lora)
+int32_t llama_adapter_meta_count(const struct llama_adapter_lora * adapter)
+int32_t llama_adapter_meta_key_by_index(const struct llama_adapter_lora * adapter, int32_t i, char * buf, size_t buf_size)
+int32_t llama_adapter_meta_val_str(const struct llama_adapter_lora * adapter, const char * key, char * buf, size_t buf_size)
+int32_t llama_adapter_meta_val_str_by_index(const struct llama_adapter_lora * adapter, int32_t i, char * buf, size_t buf_size)
+bool llama_add_bos_token(const struct llama_vocab * vocab)
+bool llama_add_eos_token(const struct llama_vocab * vocab)
+void llama_attach_threadpool( struct llama_context * ctx, ggml_threadpool_t threadpool, ggml_threadpool_t threadpool_batch)
+void llama_backend_free(void)
+void llama_backend_init(void)
+void llama_batch_free(struct llama_batch batch)
+struct llama_batch llama_batch_get_one( llama_token * tokens, int32_t n_tokens)
+struct llama_batch llama_batch_init( int32_t n_tokens, int32_t embd, int32_t n_seq_max)
+int32_t llama_chat_apply_template( const char * tmpl, const struct llama_chat_message * chat, size_t n_msg, bool add_ass, char * buf, int32_t length)
+int32_t llama_chat_builtin_templates(const char ** output, size_t len)
+struct llama_context_params llama_context_default_params(void)
+size_t llama_copy_state_data( struct llama_context * ctx, uint8_t * dst)
+int32_t llama_decode( struct llama_context * ctx, struct llama_batch batch)
+void llama_detach_threadpool(struct llama_context * ctx)
+int32_t llama_detokenize( const struct llama_vocab * vocab, const llama_token * tokens, int32_t n_tokens, char * text, int32_t text_len_max, bool remove_special, bool unparse_special)
+int32_t llama_encode( struct llama_context * ctx, struct llama_batch batch)
+const char * llama_flash_attn_type_name(enum llama_flash_attn_type flash_attn_type)
+void llama_free(struct llama_context * ctx)
+void llama_free_model(struct llama_model * model)
+float * llama_get_embeddings(struct llama_context * ctx)
+float * llama_get_embeddings_ith(struct llama_context * ctx, int32_t i)
+float * llama_get_embeddings_seq(struct llama_context * ctx, llama_seq_id seq_id)
+float * llama_get_logits(struct llama_context * ctx)
+float * llama_get_logits_ith(struct llama_context * ctx, int32_t i)
+llama_memory_t llama_get_memory (const struct llama_context * ctx)
+const struct llama_model * llama_get_model (const struct llama_context * ctx)
+uint32_t llama_get_sampled_candidates_count_ith(struct llama_context * ctx, int32_t i)
+llama_token * llama_get_sampled_candidates_ith (struct llama_context * ctx, int32_t i)
+uint32_t llama_get_sampled_logits_count_ith(struct llama_context * ctx, int32_t i)
+float * llama_get_sampled_logits_ith (struct llama_context * ctx, int32_t i)
+uint32_t llama_get_sampled_probs_count_ith(struct llama_context * ctx, int32_t i)
+float * llama_get_sampled_probs_ith (struct llama_context * ctx, int32_t i)
+llama_token llama_get_sampled_token_ith(struct llama_context * ctx, int32_t i)
+size_t llama_get_state_size(struct llama_context * ctx)
+struct llama_context * llama_init_from_model( struct llama_model * model, struct llama_context_params params)
+struct llama_model * llama_load_model_from_file( const char * path_model, struct llama_model_params params)
+bool llama_load_session_file( struct llama_context * ctx, const char * path_session, llama_token * tokens_out, size_t n_token_capacity, size_t * n_token_count_out)
+void llama_log_get(ggml_log_callback * log_callback, void ** user_data)
+void llama_log_set(ggml_log_callback log_callback, void * user_data)
+size_t llama_max_devices(void)
+size_t llama_max_parallel_sequences(void)
+size_t llama_max_tensor_buft_overrides(void)
+void llama_memory_breakdown_print(const struct llama_context * ctx)
+bool llama_memory_can_shift(llama_memory_t mem)
+void llama_memory_clear( llama_memory_t mem, bool data)
+void llama_memory_seq_add( llama_memory_t mem, llama_seq_id seq_id, llama_pos p0, llama_pos p1, llama_pos delta)
+void llama_memory_seq_cp( llama_memory_t mem, llama_seq_id seq_id_src, llama_seq_id seq_id_dst, llama_pos p0, llama_pos p1)
+void llama_memory_seq_div( llama_memory_t mem, llama_seq_id seq_id, llama_pos p0, llama_pos p1, int d)
+void llama_memory_seq_keep( llama_memory_t mem, llama_seq_id seq_id)
+llama_pos llama_memory_seq_pos_max( llama_memory_t mem, llama_seq_id seq_id)
+llama_pos llama_memory_seq_pos_min( llama_memory_t mem, llama_seq_id seq_id)
+bool llama_memory_seq_rm( llama_memory_t mem, llama_seq_id seq_id, llama_pos p0, llama_pos p1)
+const char * llama_model_chat_template(const struct llama_model * model, const char * name)
+const char * llama_model_cls_label(const struct llama_model * model, uint32_t i)
+llama_token llama_model_decoder_start_token(const struct llama_model * model)
+struct llama_model_params llama_model_default_params(void)
+int32_t llama_model_desc(const struct llama_model * model, char * buf, size_t buf_size)
+void llama_model_free(struct llama_model * model)
+const struct llama_vocab * llama_model_get_vocab(const struct llama_model * model)
+bool llama_model_has_decoder(const struct llama_model * model)
+bool llama_model_has_encoder(const struct llama_model * model)
+struct llama_model * llama_model_init_from_user( struct gguf_context * metadata, llama_model_set_tensor_data_t set_tensor_data, void * set_tensor_data_ud, struct llama_model_params params)
+bool llama_model_is_diffusion(const struct llama_model * model)
+bool llama_model_is_hybrid(const struct llama_model * model)
+bool llama_model_is_recurrent(const struct llama_model * model)
+struct llama_model * llama_model_load_from_file( const char * path_model, struct llama_model_params params)
+struct llama_model * llama_model_load_from_file_ptr( FILE * file, struct llama_model_params params)
+struct llama_model * llama_model_load_from_splits( const char ** paths, size_t n_paths, struct llama_model_params params)
+int32_t llama_model_meta_count(const struct llama_model * model)
+int32_t llama_model_meta_key_by_index(const struct llama_model * model, int32_t i, char * buf, size_t buf_size)
+const char * llama_model_meta_key_str(enum llama_model_meta_key key)
+int32_t llama_model_meta_val_str(const struct llama_model * model, const char * key, char * buf, size_t buf_size)
+int32_t llama_model_meta_val_str_by_index(const struct llama_model * model, int32_t i, char * buf, size_t buf_size)
+uint32_t llama_model_n_cls_out(const struct llama_model * model)
+int32_t llama_model_n_ctx_train(const struct llama_model * model)
+int32_t llama_model_n_embd (const struct llama_model * model)
+int32_t llama_model_n_embd_inp (const struct llama_model * model)
+int32_t llama_model_n_embd_out (const struct llama_model * model)
+int32_t llama_model_n_head (const struct llama_model * model)
+int32_t llama_model_n_head_kv (const struct llama_model * model)
+int32_t llama_model_n_layer (const struct llama_model * model)
+uint64_t llama_model_n_params(const struct llama_model * model)
+int32_t llama_model_n_swa (const struct llama_model * model)
+uint32_t llama_model_quantize( const char * fname_inp, const char * fname_out, const llama_model_quantize_params * params)
+struct llama_model_quantize_params llama_model_quantize_default_params(void)
+float llama_model_rope_freq_scale_train(const struct llama_model * model)
+enum llama_rope_type llama_model_rope_type(const struct llama_model * model)
+void llama_model_save_to_file( const struct llama_model * model, const char * path_model)
+uint64_t llama_model_size(const struct llama_model * model)
+uint32_t llama_n_batch (const struct llama_context * ctx)
+uint32_t llama_n_ctx (const struct llama_context * ctx)
+uint32_t llama_n_ctx_seq (const struct llama_context * ctx)
+int32_t llama_n_ctx_train(const struct llama_model * model)
+int32_t llama_n_embd (const struct llama_model * model)
+int32_t llama_n_head (const struct llama_model * model)
+int32_t llama_n_layer (const struct llama_model * model)
+uint32_t llama_n_seq_max (const struct llama_context * ctx)
+int32_t llama_n_threads(struct llama_context * ctx)
+int32_t llama_n_threads_batch(struct llama_context * ctx)
+uint32_t llama_n_ubatch (const struct llama_context * ctx)
+int32_t llama_n_vocab (const struct llama_vocab * vocab)
+struct llama_context * llama_new_context_with_model( struct llama_model * model, struct llama_context_params params)
+void llama_numa_init(enum ggml_numa_strategy numa)
+void llama_opt_epoch( struct llama_context * lctx, ggml_opt_dataset_t dataset, ggml_opt_result_t result_train, ggml_opt_result_t result_eval, int64_t idata_split, ggml_opt_epoch_callback callback_train, ggml_opt_epoch_callback callback_eval)
+void llama_opt_init(struct llama_context * lctx, struct llama_model * model, struct llama_opt_params lopt_params)
+bool llama_opt_param_filter_all(const struct ggml_tensor * tensor, void * userdata)
+enum llama_params_fit_status llama_params_fit( const char * path_model, struct llama_model_params * mparams, struct llama_context_params * cparams, float * tensor_split, struct llama_model_tensor_buft_override * tensor_buft_overrides, size_t * margins, uint32_t n_ctx_min, enum ggml_log_level log_level)
+struct llama_perf_context_data llama_perf_context (const struct llama_context * ctx)
+void llama_perf_context_print(const struct llama_context * ctx)
+void llama_perf_context_reset( struct llama_context * ctx)
+struct llama_perf_sampler_data llama_perf_sampler (const struct llama_sampler * chain)
+void llama_perf_sampler_print(const struct llama_sampler * chain)
+void llama_perf_sampler_reset( struct llama_sampler * chain)
+enum llama_pooling_type llama_pooling_type(const struct llama_context * ctx)
+const char * llama_print_system_info(void)
+void llama_sampler_accept( struct llama_sampler * smpl, llama_token token)
+void llama_sampler_apply ( struct llama_sampler * smpl, llama_token_data_array * cur_p)
+void llama_sampler_chain_add( struct llama_sampler * chain, struct llama_sampler * smpl)
+struct llama_sampler_chain_params llama_sampler_chain_default_params(void)
+struct llama_sampler * llama_sampler_chain_get( struct llama_sampler * chain, int32_t i)
+struct llama_sampler * llama_sampler_chain_init(struct llama_sampler_chain_params params)
+int llama_sampler_chain_n (const struct llama_sampler * chain)
+struct llama_sampler * llama_sampler_chain_remove( struct llama_sampler * chain, int32_t i)
+struct llama_sampler * llama_sampler_clone (const struct llama_sampler * smpl)
+void llama_sampler_free ( struct llama_sampler * smpl)
+uint32_t llama_sampler_get_seed(const struct llama_sampler * smpl)
+struct llama_sampler * llama_sampler_init ( struct llama_sampler_i * iface, llama_sampler_context_t ctx)
+struct llama_sampler * llama_sampler_init_adaptive_p( float target, float decay, uint32_t seed)
+struct llama_sampler * llama_sampler_init_dist(uint32_t seed)
+struct llama_sampler * llama_sampler_init_dry( const struct llama_vocab * vocab, int32_t n_ctx_train, float dry_multiplier, float dry_base, int32_t dry_allowed_length, int32_t dry_penalty_last_n, const char ** seq_breakers, size_t num_breakers)
+struct llama_sampler * llama_sampler_init_grammar( const struct llama_vocab * vocab, const char * grammar_str, const char * grammar_root)
+struct llama_sampler * llama_sampler_init_grammar_lazy( const struct llama_vocab * vocab, const char * grammar_str, const char * grammar_root, const char ** trigger_words, size_t num_trigger_words, const llama_token * trigger_tokens, size_t num_trigger_tokens)
+struct llama_sampler * llama_sampler_init_grammar_lazy_patterns( const struct llama_vocab * vocab, const char * grammar_str, const char * grammar_root, const char ** trigger_patterns, size_t num_trigger_patterns, const llama_token * trigger_tokens, size_t num_trigger_tokens)
+struct llama_sampler * llama_sampler_init_greedy(void)
+struct llama_sampler * llama_sampler_init_infill(const struct llama_vocab * vocab)
+struct llama_sampler * llama_sampler_init_logit_bias( int32_t n_vocab, int32_t n_logit_bias, const llama_logit_bias * logit_bias)
+struct llama_sampler * llama_sampler_init_min_p (float p, size_t min_keep)
+struct llama_sampler * llama_sampler_init_mirostat( int32_t n_vocab, uint32_t seed, float tau, float eta, int32_t m)
+struct llama_sampler * llama_sampler_init_mirostat_v2( uint32_t seed, float tau, float eta)
+struct llama_sampler * llama_sampler_init_penalties( int32_t penalty_last_n, float penalty_repeat, float penalty_freq, float penalty_present)
+struct llama_sampler * llama_sampler_init_temp (float t)
+struct llama_sampler * llama_sampler_init_temp_ext (float t, float delta, float exponent)
+struct llama_sampler * llama_sampler_init_top_k (int32_t k)
+struct llama_sampler * llama_sampler_init_top_n_sigma(float n)
+struct llama_sampler * llama_sampler_init_top_p (float p, size_t min_keep)
+struct llama_sampler * llama_sampler_init_typical (float p, size_t min_keep)
+struct llama_sampler * llama_sampler_init_xtc (float p, float t, size_t min_keep, uint32_t seed)
+const char * llama_sampler_name (const struct llama_sampler * smpl)
+void llama_sampler_reset ( struct llama_sampler * smpl)
+llama_token llama_sampler_sample(struct llama_sampler * smpl, struct llama_context * ctx, int32_t idx)
+bool llama_save_session_file( struct llama_context * ctx, const char * path_session, const llama_token * tokens, size_t n_token_count)
+void llama_set_abort_callback(struct llama_context * ctx, ggml_abort_callback abort_callback, void * abort_callback_data)
+int32_t llama_set_adapter_cvec( struct llama_context * ctx, const float * data, size_t len, int32_t n_embd, int32_t il_start, int32_t il_end)
+int32_t llama_set_adapters_lora( struct llama_context * ctx, struct llama_adapter_lora ** adapters, size_t n_adapters, float * scales)
+void llama_set_causal_attn(struct llama_context * ctx, bool causal_attn)
+void llama_set_embeddings(struct llama_context * ctx, bool embeddings)
+void llama_set_n_threads(struct llama_context * ctx, int32_t n_threads, int32_t n_threads_batch)
+bool llama_set_sampler(struct llama_context * ctx, llama_seq_id seq_id, struct llama_sampler * smpl)
+size_t llama_set_state_data( struct llama_context * ctx, const uint8_t * src)
+void llama_set_warmup(struct llama_context * ctx, bool warmup)
+int32_t llama_split_path(char * split_path, size_t maxlen, const char * path_prefix, int32_t split_no, int32_t split_count)
+int32_t llama_split_prefix(char * split_prefix, size_t maxlen, const char * split_path, int32_t split_no, int32_t split_count)
+size_t llama_state_get_data( struct llama_context * ctx, uint8_t * dst, size_t size)
+size_t llama_state_get_size(struct llama_context * ctx)
+bool llama_state_load_file( struct llama_context * ctx, const char * path_session, llama_token * tokens_out, size_t n_token_capacity, size_t * n_token_count_out)
+bool llama_state_save_file( struct llama_context * ctx, const char * path_session, const llama_token * tokens, size_t n_token_count)
+size_t llama_state_seq_get_data( struct llama_context * ctx, uint8_t * dst, size_t size, llama_seq_id seq_id)
+size_t llama_state_seq_get_data_ext( struct llama_context * ctx, uint8_t * dst, size_t size, llama_seq_id seq_id, llama_state_seq_flags flags)
+size_t llama_state_seq_get_size( struct llama_context * ctx, llama_seq_id seq_id)
+size_t llama_state_seq_get_size_ext( struct llama_context * ctx, llama_seq_id seq_id, llama_state_seq_flags flags)
+size_t llama_state_seq_load_file( struct llama_context * ctx, const char * filepath, llama_seq_id dest_seq_id, llama_token * tokens_out, size_t n_token_capacity, size_t * n_token_count_out)
+size_t llama_state_seq_save_file( struct llama_context * ctx, const char * filepath, llama_seq_id seq_id, const llama_token * tokens, size_t n_token_count)
+size_t llama_state_seq_set_data( struct llama_context * ctx, const uint8_t * src, size_t size, llama_seq_id dest_seq_id)
+size_t llama_state_seq_set_data_ext( struct llama_context * ctx, const uint8_t * src, size_t size, llama_seq_id dest_seq_id, llama_state_seq_flags flags)
+size_t llama_state_set_data( struct llama_context * ctx, const uint8_t * src, size_t size)
+bool llama_supports_gpu_offload(void)
+bool llama_supports_mlock (void)
+bool llama_supports_mmap (void)
+bool llama_supports_rpc (void)
+void llama_synchronize(struct llama_context * ctx)
+int64_t llama_time_us(void)
+llama_token llama_token_bos(const struct llama_vocab * vocab)
+llama_token llama_token_cls(const struct llama_vocab * vocab)
+llama_token llama_token_eos(const struct llama_vocab * vocab)
+llama_token llama_token_eot(const struct llama_vocab * vocab)
+llama_token llama_token_fim_mid(const struct llama_vocab * vocab)
+llama_token llama_token_fim_pad(const struct llama_vocab * vocab)
+llama_token llama_token_fim_pre(const struct llama_vocab * vocab)
+llama_token llama_token_fim_rep(const struct llama_vocab * vocab)
+llama_token llama_token_fim_sep(const struct llama_vocab * vocab)
+llama_token llama_token_fim_suf(const struct llama_vocab * vocab)
+enum llama_token_attr llama_token_get_attr(const struct llama_vocab * vocab, llama_token token)
+float llama_token_get_score(const struct llama_vocab * vocab, llama_token token)
+const char * llama_token_get_text(const struct llama_vocab * vocab, llama_token token)
+bool llama_token_is_control(const struct llama_vocab * vocab, llama_token token)
+bool llama_token_is_eog(const struct llama_vocab * vocab, llama_token token)
+llama_token llama_token_nl (const struct llama_vocab * vocab)
+llama_token llama_token_pad(const struct llama_vocab * vocab)
+llama_token llama_token_sep(const struct llama_vocab * vocab)
+int32_t llama_token_to_piece( const struct llama_vocab * vocab, llama_token token, char * buf, int32_t length, int32_t lstrip, bool special)
+int32_t llama_tokenize( const struct llama_vocab * vocab, const char * text, int32_t text_len, llama_token * tokens, int32_t n_tokens_max, bool add_special, bool parse_special)
+llama_token llama_vocab_bos(const struct llama_vocab * vocab)
+llama_token llama_vocab_cls(const struct llama_vocab * vocab)
+llama_token llama_vocab_eos(const struct llama_vocab * vocab)
+llama_token llama_vocab_eot(const struct llama_vocab * vocab)
+llama_token llama_vocab_fim_mid(const struct llama_vocab * vocab)
+llama_token llama_vocab_fim_pad(const struct llama_vocab * vocab)
+llama_token llama_vocab_fim_pre(const struct llama_vocab * vocab)
+llama_token llama_vocab_fim_rep(const struct llama_vocab * vocab)
+llama_token llama_vocab_fim_sep(const struct llama_vocab * vocab)
+llama_token llama_vocab_fim_suf(const struct llama_vocab * vocab)
+bool llama_vocab_get_add_bos(const struct llama_vocab * vocab)
+bool llama_vocab_get_add_eos(const struct llama_vocab * vocab)
+bool llama_vocab_get_add_sep(const struct llama_vocab * vocab)
+enum llama_token_attr llama_vocab_get_attr(const struct llama_vocab * vocab, llama_token token)
+float llama_vocab_get_score(const struct llama_vocab * vocab, llama_token token)
+const char * llama_vocab_get_text(const struct llama_vocab * vocab, llama_token token)
+bool llama_vocab_is_control(const struct llama_vocab * vocab, llama_token token)
+bool llama_vocab_is_eog(const struct llama_vocab * vocab, llama_token token)
+llama_token llama_vocab_mask(const struct llama_vocab * vocab)
+int32_t llama_vocab_n_tokens(const struct llama_vocab * vocab)
+llama_token llama_vocab_nl (const struct llama_vocab * vocab)
+llama_token llama_vocab_pad(const struct llama_vocab * vocab)
+llama_token llama_vocab_sep(const struct llama_vocab * vocab)
+enum llama_vocab_type llama_vocab_type(const struct llama_vocab * vocab)
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -153,7 +153,7 @@ add_library(llama

 set_target_properties(llama PROPERTIES
    VERSION ${LLAMA_INSTALL_VERSION}
-    SOVERSION 0
+    SOVERSION ${LLAMA_VERSION_MAJOR}
    MACHO_CURRENT_VERSION 0 # keep macOS linker from seeing oversized version number
 )
Author	SHA1	Message	Date
copilot-swe-agent[bot]	4943e3a396	gen-libllama-abi: compile sort-key regex once outside the lambda Agent-Logs-Url: https://github.com/ggml-org/llama.cpp/sessions/cd21903e-afd2-477a-8285-0a2d46e1398c Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com>	2026-04-15 12:04:44 +00:00
copilot-swe-agent[bot]	51b679a5d6	semver: revert llama_export.h, fix ABI baseline to track full signatures - Revert include/llama.h to use the original manual LLAMA_API visibility macro block (LLAMA_SHARED / LLAMA_BUILD) - Revert src/CMakeLists.txt: remove GenerateExportHeader, restore LLAMA_BUILD/LLAMA_SHARED compile definitions and original target_include_directories - Revert CMakeLists.txt: remove llama_export.h from LLAMA_PUBLIC_HEADERS - Add scripts/gen-libllama-abi.py: Python parser that reads include/llama.h and extracts normalized full LLAMA_API function signatures (return type + name + parameter list), handling both plain and DEPRECATED() patterns - Regenerate scripts/libllama.abi with full signatures (233 entries) - Update .github/workflows/libllama-abi-check.yml to use the header parser script instead of building the library and running nm; the check now runs in seconds with no compiler dependency Agent-Logs-Url: https://github.com/ggml-org/llama.cpp/sessions/cd21903e-afd2-477a-8285-0a2d46e1398c Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com>	2026-04-15 12:02:36 +00:00
copilot-swe-agent[bot]	c00ac13fee	libllama-abi-check: add explicit read-only permissions to workflow job Agent-Logs-Url: https://github.com/ggml-org/llama.cpp/sessions/e9059c50-ffff-4233-a16d-13a7214f7b98 Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com>	2026-04-15 11:45:14 +00:00
copilot-swe-agent[bot]	3f3d62ffec	semver: add proper semantic versioning and ABI check workflow for libllama - Add LLAMA_VERSION_MAJOR/MINOR variables to CMakeLists.txt (both default 0) replacing the hard-coded 0.0.{build_number} scheme - Use GenerateExportHeader in src/CMakeLists.txt to generate llama_export.h and replace the manual LLAMA_API visibility macro dance in include/llama.h - Set SOVERSION to LLAMA_VERSION_MAJOR so the .so symlink tracks the major ABI version (libllama.so.0 -> libllama.so.0.MINOR.PATCH) - Install the generated llama_export.h alongside llama.h as a public header - Add scripts/libllama.abi: committed baseline of exported llama_* symbols (233 symbols extracted from the current build) - Add .github/workflows/libllama-abi-check.yml: CI workflow that builds libllama, extracts symbols with nm, and compares against the baseline to determine whether a MAJOR (symbols removed) or MINOR (symbols added) version bump is required Agent-Logs-Url: https://github.com/ggml-org/llama.cpp/sessions/e9059c50-ffff-4233-a16d-13a7214f7b98 Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com>	2026-04-15 11:44:00 +00:00