mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-03-17 16:44:07 +00:00
docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/ * Bump copyrights
This commit is contained in:
committed by
GitHub
parent
b4d05a3d2f
commit
7a4ca3cbd9
2
LICENSE
2
LICENSE
@@ -1,6 +1,6 @@
|
|||||||
MIT License
|
MIT License
|
||||||
|
|
||||||
Copyright (c) 2023-2024 The ggml authors
|
Copyright (c) 2023-2026 The ggml authors
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
|||||||
@@ -9,7 +9,7 @@ Download [MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) PyTorch m
|
|||||||
### Build llama.cpp
|
### Build llama.cpp
|
||||||
Readme modification time: 20250206
|
Readme modification time: 20250206
|
||||||
|
|
||||||
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
|
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
|
||||||
|
|
||||||
Clone llama.cpp:
|
Clone llama.cpp:
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
@@ -8,11 +8,11 @@ Download [MiniCPM-o-4](https://huggingface.co/openbmb/MiniCPM-o-4) PyTorch model
|
|||||||
### Build llama.cpp
|
### Build llama.cpp
|
||||||
Readme modification time: 20250206
|
Readme modification time: 20250206
|
||||||
|
|
||||||
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
|
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
|
||||||
|
|
||||||
Clone llama.cpp:
|
Clone llama.cpp:
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ggerganov/llama.cpp
|
git clone https://github.com/ggml-org/llama.cpp
|
||||||
cd llama.cpp
|
cd llama.cpp
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -8,7 +8,7 @@ Download [MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-
|
|||||||
### Build llama.cpp
|
### Build llama.cpp
|
||||||
Readme modification time: 20250206
|
Readme modification time: 20250206
|
||||||
|
|
||||||
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
|
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
|
||||||
|
|
||||||
Clone llama.cpp:
|
Clone llama.cpp:
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
@@ -8,7 +8,7 @@ Download [MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) PyTorch m
|
|||||||
### Build llama.cpp
|
### Build llama.cpp
|
||||||
Readme modification time: 20250206
|
Readme modification time: 20250206
|
||||||
|
|
||||||
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
|
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
|
||||||
|
|
||||||
Clone llama.cpp:
|
Clone llama.cpp:
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
@@ -8,11 +8,11 @@ Download [MiniCPM-V-4](https://huggingface.co/openbmb/MiniCPM-V-4) PyTorch model
|
|||||||
### Build llama.cpp
|
### Build llama.cpp
|
||||||
Readme modification time: 20250731
|
Readme modification time: 20250731
|
||||||
|
|
||||||
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
|
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
|
||||||
|
|
||||||
Clone llama.cpp:
|
Clone llama.cpp:
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ggerganov/llama.cpp
|
git clone https://github.com/ggml-org/llama.cpp
|
||||||
cd llama.cpp
|
cd llama.cpp
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -8,11 +8,11 @@ Download [MiniCPM-V-4_5](https://huggingface.co/openbmb/MiniCPM-V-4_5) PyTorch m
|
|||||||
### Build llama.cpp
|
### Build llama.cpp
|
||||||
Readme modification time: 20250826
|
Readme modification time: 20250826
|
||||||
|
|
||||||
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
|
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
|
||||||
|
|
||||||
Clone llama.cpp:
|
Clone llama.cpp:
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ggerganov/llama.cpp
|
git clone https://github.com/ggml-org/llama.cpp
|
||||||
cd llama.cpp
|
cd llama.cpp
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# Migration notice for binary filenames
|
# Migration notice for binary filenames
|
||||||
|
|
||||||
> [!IMPORTANT]
|
> [!IMPORTANT]
|
||||||
[2024 Jun 12] Binaries have been renamed w/ a `llama-` prefix. `main` is now `llama-cli`, `server` is `llama-server`, etc (https://github.com/ggerganov/llama.cpp/pull/7809)
|
[2024 Jun 12] Binaries have been renamed w/ a `llama-` prefix. `main` is now `llama-cli`, `server` is `llama-server`, etc (https://github.com/ggml-org/llama.cpp/pull/7809)
|
||||||
|
|
||||||
This migration was important, but it is a breaking change that may not always be immediately obvious to users.
|
This migration was important, but it is a breaking change that may not always be immediately obvious to users.
|
||||||
|
|
||||||
|
|||||||
@@ -28,7 +28,7 @@ int main(int argc, char** argv) {
|
|||||||
fprintf(stdout, "\n");
|
fprintf(stdout, "\n");
|
||||||
fprintf(stdout, "WARNING: The binary '%s' is deprecated.\n", filename.c_str());
|
fprintf(stdout, "WARNING: The binary '%s' is deprecated.\n", filename.c_str());
|
||||||
fprintf(stdout, " Please use '%s' instead.\n", replacement_filename.c_str());
|
fprintf(stdout, " Please use '%s' instead.\n", replacement_filename.c_str());
|
||||||
fprintf(stdout, " See https://github.com/ggerganov/llama.cpp/tree/master/examples/deprecation-warning/README.md for more information.\n");
|
fprintf(stdout, " See https://github.com/ggml-org/llama.cpp/tree/master/examples/deprecation-warning/README.md for more information.\n");
|
||||||
fprintf(stdout, "\n");
|
fprintf(stdout, "\n");
|
||||||
|
|
||||||
return EXIT_FAILURE;
|
return EXIT_FAILURE;
|
||||||
|
|||||||
@@ -402,7 +402,7 @@ class SchemaConverter:
|
|||||||
Transforms a regular expression pattern into a GBNF rule.
|
Transforms a regular expression pattern into a GBNF rule.
|
||||||
|
|
||||||
Input: https://json-schema.org/understanding-json-schema/reference/regular_expressions
|
Input: https://json-schema.org/understanding-json-schema/reference/regular_expressions
|
||||||
Output: https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md
|
Output: https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md
|
||||||
|
|
||||||
Unsupported features: negative/positive lookaheads, greedy/non-greedy modifiers.
|
Unsupported features: negative/positive lookaheads, greedy/non-greedy modifiers.
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2023-2024 The ggml authors
|
* Copyright (c) 2023-2026 The ggml authors
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
* of this software and associated documentation files (the "Software"), to
|
* of this software and associated documentation files (the "Software"), to
|
||||||
|
|||||||
@@ -6,7 +6,7 @@
|
|||||||
// This documentation is still a work in progress.
|
// This documentation is still a work in progress.
|
||||||
// If you wish some specific topics to be covered, feel free to drop a comment:
|
// If you wish some specific topics to be covered, feel free to drop a comment:
|
||||||
//
|
//
|
||||||
// https://github.com/ggerganov/whisper.cpp/issues/40
|
// https://github.com/ggml-org/whisper.cpp/issues/40
|
||||||
//
|
//
|
||||||
// ## Overview
|
// ## Overview
|
||||||
//
|
//
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2023-2024 The ggml authors
|
* Copyright (c) 2023-2026 The ggml authors
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
* of this software and associated documentation files (the "Software"), to
|
* of this software and associated documentation files (the "Software"), to
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2023-2024 The ggml authors
|
* Copyright (c) 2023-2026 The ggml authors
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
* of this software and associated documentation files (the "Software"), to
|
* of this software and associated documentation files (the "Software"), to
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2023-2024 The ggml authors
|
* Copyright (c) 2023-2026 The ggml authors
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
* of this software and associated documentation files (the "Software"), to
|
* of this software and associated documentation files (the "Software"), to
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/**
|
/**
|
||||||
* Copyright (c) 2023-2024 The ggml authors
|
* Copyright (c) 2023-2026 The ggml authors
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
* of this software and associated documentation files (the "Software"), to
|
* of this software and associated documentation files (the "Software"), to
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2023-2024 The ggml authors
|
* Copyright (c) 2023-2026 The ggml authors
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
* of this software and associated documentation files (the "Software"), to
|
* of this software and associated documentation files (the "Software"), to
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2023-2024 The ggml authors
|
* Copyright (c) 2023-2026 The ggml authors
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
* of this software and associated documentation files (the "Software"), to
|
* of this software and associated documentation files (the "Software"), to
|
||||||
|
|||||||
@@ -71,7 +71,7 @@ else()
|
|||||||
# disabling fast math is needed in order to pass tests/test-backend-ops
|
# disabling fast math is needed in order to pass tests/test-backend-ops
|
||||||
# note: adding -fno-inline fixes the tests when using MTL_SHADER_VALIDATION=1
|
# note: adding -fno-inline fixes the tests when using MTL_SHADER_VALIDATION=1
|
||||||
# note: unfortunately, we have to call it default.metallib instead of ggml.metallib
|
# note: unfortunately, we have to call it default.metallib instead of ggml.metallib
|
||||||
# ref: https://github.com/ggerganov/whisper.cpp/issues/1720
|
# ref: https://github.com/ggml-org/whisper.cpp/issues/1720
|
||||||
# note: adding -g causes segmentation fault during compile
|
# note: adding -g causes segmentation fault during compile
|
||||||
#set(XC_FLAGS -fno-fast-math -fno-inline -g)
|
#set(XC_FLAGS -fno-fast-math -fno-inline -g)
|
||||||
set(XC_FLAGS -fno-fast-math -fno-inline)
|
set(XC_FLAGS -fno-fast-math -fno-inline)
|
||||||
|
|||||||
@@ -3740,7 +3740,7 @@ static enum ggml_status ggml_backend_opencl_buffer_init_tensor(ggml_backend_buff
|
|||||||
// Reuse extra of the parent tensor. The offset of this view tensor
|
// Reuse extra of the parent tensor. The offset of this view tensor
|
||||||
// becomes `extra->offset + view_offs` and needs to be calculated when
|
// becomes `extra->offset + view_offs` and needs to be calculated when
|
||||||
// it is used. This changes is needed because of the change to
|
// it is used. This changes is needed because of the change to
|
||||||
// ggml_alloc.c in https://github.com/ggerganov/llama.cpp/pull/7640.
|
// ggml_alloc.c in https://github.com/ggml-org/llama.cpp/pull/7640.
|
||||||
// `buffer` passed in here will always be `tensor->buffer`. It is OK
|
// `buffer` passed in here will always be `tensor->buffer`. It is OK
|
||||||
// to allocate extras from the same buffer context for ordinary
|
// to allocate extras from the same buffer context for ordinary
|
||||||
// intermediate tensors. But for views into kv cache tensors, doing so
|
// intermediate tensors. But for views into kv cache tensors, doing so
|
||||||
|
|||||||
@@ -3390,7 +3390,7 @@ static void ggml_sycl_mul_mat(ggml_backend_sycl_context & ctx, const ggml_tensor
|
|||||||
|
|
||||||
|
|
||||||
// mmvq and mmq need the __dp4a instruction which is available for gen12+
|
// mmvq and mmq need the __dp4a instruction which is available for gen12+
|
||||||
// Workaround in https://github.com/ggerganov/llama.cpp/commit/95f84d5ce8b449a9b16009434aca800df504a02e
|
// Workaround in https://github.com/ggml-org/llama.cpp/commit/95f84d5ce8b449a9b16009434aca800df504a02e
|
||||||
use_mul_mat_q = use_mul_mat_q && (src0->type != GGML_TYPE_IQ2_XXS);
|
use_mul_mat_q = use_mul_mat_q && (src0->type != GGML_TYPE_IQ2_XXS);
|
||||||
#ifdef SYCL_USE_XMX
|
#ifdef SYCL_USE_XMX
|
||||||
use_mul_mat_q = use_mul_mat_q && (src1->ne[1] <= MMQ_MAX_BATCH_SIZE);
|
use_mul_mat_q = use_mul_mat_q && (src1->ne[1] <= MMQ_MAX_BATCH_SIZE);
|
||||||
|
|||||||
@@ -330,7 +330,7 @@ void string_to_spv_func(std::string name, std::string in_path, std::string out_p
|
|||||||
std::vector<std::string> cmd = {GLSLC, "-fshader-stage=compute", target_env, in_path, "-o", out_path};
|
std::vector<std::string> cmd = {GLSLC, "-fshader-stage=compute", target_env, in_path, "-o", out_path};
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
// disable spirv-opt for coopmat shaders for https://github.com/ggerganov/llama.cpp/issues/10734
|
// disable spirv-opt for coopmat shaders for https://github.com/ggml-org/llama.cpp/issues/10734
|
||||||
// disable spirv-opt for bf16 shaders for https://github.com/ggml-org/llama.cpp/issues/15344
|
// disable spirv-opt for bf16 shaders for https://github.com/ggml-org/llama.cpp/issues/15344
|
||||||
// disable spirv-opt for rope shaders for https://github.com/ggml-org/llama.cpp/issues/16860
|
// disable spirv-opt for rope shaders for https://github.com/ggml-org/llama.cpp/issues/16860
|
||||||
if (!coopmat && name.find("bf16") == std::string::npos && name.find("rope") == std::string::npos) {
|
if (!coopmat && name.find("bf16") == std::string::npos && name.find("rope") == std::string::npos) {
|
||||||
|
|||||||
@@ -6562,7 +6562,7 @@ static void ggml_compute_backward(
|
|||||||
case GGML_OP_DIAG_MASK_INF: {
|
case GGML_OP_DIAG_MASK_INF: {
|
||||||
if (src0_needs_grads) {
|
if (src0_needs_grads) {
|
||||||
/* ggml_diag_mask_inf_impl() shouldn't be here */
|
/* ggml_diag_mask_inf_impl() shouldn't be here */
|
||||||
/* ref: https://github.com/ggerganov/llama.cpp/pull/4203#discussion_r1412377992 */
|
/* ref: https://github.com/ggml-org/llama.cpp/pull/4203#discussion_r1412377992 */
|
||||||
const int n_past = ((const int32_t *) tensor->op_params)[0];
|
const int n_past = ((const int32_t *) tensor->op_params)[0];
|
||||||
ggml_add_or_set(ctx, cgraph, isrc0, ggml_diag_mask_zero_impl(ctx, grad, n_past, false));
|
ggml_add_or_set(ctx, cgraph, isrc0, ggml_diag_mask_zero_impl(ctx, grad, n_past, false));
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -233,7 +233,7 @@ int32_t llm_chat_apply_template(
|
|||||||
llm_chat_template tmpl,
|
llm_chat_template tmpl,
|
||||||
const std::vector<const llama_chat_message *> & chat,
|
const std::vector<const llama_chat_message *> & chat,
|
||||||
std::string & dest, bool add_ass) {
|
std::string & dest, bool add_ass) {
|
||||||
// Taken from the research: https://github.com/ggerganov/llama.cpp/issues/5527
|
// Taken from the research: https://github.com/ggml-org/llama.cpp/issues/5527
|
||||||
std::stringstream ss;
|
std::stringstream ss;
|
||||||
if (tmpl == LLM_CHAT_TEMPLATE_CHATML) {
|
if (tmpl == LLM_CHAT_TEMPLATE_CHATML) {
|
||||||
// chatml template
|
// chatml template
|
||||||
|
|||||||
@@ -195,7 +195,7 @@ struct llama_hparams {
|
|||||||
uint32_t n_deepstack_layers = 0;
|
uint32_t n_deepstack_layers = 0;
|
||||||
|
|
||||||
// needed by encoder-decoder models (e.g. T5, FLAN-T5)
|
// needed by encoder-decoder models (e.g. T5, FLAN-T5)
|
||||||
// ref: https://github.com/ggerganov/llama.cpp/pull/8141
|
// ref: https://github.com/ggml-org/llama.cpp/pull/8141
|
||||||
llama_token dec_start_token_id = LLAMA_TOKEN_NULL;
|
llama_token dec_start_token_id = LLAMA_TOKEN_NULL;
|
||||||
uint32_t dec_n_layer = 0;
|
uint32_t dec_n_layer = 0;
|
||||||
|
|
||||||
|
|||||||
@@ -90,7 +90,7 @@ static_assert(std::is_trivially_copyable<llm_symbol>::value, "llm_symbol is not
|
|||||||
//
|
//
|
||||||
// SPM tokenizer
|
// SPM tokenizer
|
||||||
// original implementation:
|
// original implementation:
|
||||||
// https://github.com/ggerganov/llama.cpp/commit/074bea2eb1f1349a0118239c4152914aecaa1be4
|
// https://github.com/ggml-org/llama.cpp/commit/074bea2eb1f1349a0118239c4152914aecaa1be4
|
||||||
//
|
//
|
||||||
|
|
||||||
struct llm_bigram_spm {
|
struct llm_bigram_spm {
|
||||||
@@ -285,7 +285,7 @@ struct llm_tokenizer_bpe : llm_tokenizer {
|
|||||||
// original regex from tokenizer.json
|
// original regex from tokenizer.json
|
||||||
//"(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
//"(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
||||||
|
|
||||||
// adapted: https://github.com/ggerganov/llama.cpp/pull/6920#issuecomment-2080233989
|
// adapted: https://github.com/ggml-org/llama.cpp/pull/6920#issuecomment-2080233989
|
||||||
"(?:'[sS]|'[tT]|'[rR][eE]|'[vV][eE]|'[mM]|'[lL][lL]|'[dD])|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
"(?:'[sS]|'[tT]|'[rR][eE]|'[vV][eE]|'[mM]|'[lL][lL]|'[dD])|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
||||||
};
|
};
|
||||||
break;
|
break;
|
||||||
@@ -2390,7 +2390,7 @@ void llama_vocab::impl::load(llama_model_loader & ml, const LLM_KV & kv) {
|
|||||||
|
|
||||||
// maintain a list of tokens that cause end-of-generation
|
// maintain a list of tokens that cause end-of-generation
|
||||||
// this is currently determined based on the token text, which is obviously not ideal
|
// this is currently determined based on the token text, which is obviously not ideal
|
||||||
// ref: https://github.com/ggerganov/llama.cpp/issues/9606
|
// ref: https://github.com/ggml-org/llama.cpp/issues/9606
|
||||||
special_eog_ids.clear();
|
special_eog_ids.clear();
|
||||||
|
|
||||||
if (special_fim_pad_id != LLAMA_TOKEN_NULL && special_eog_ids.count(special_fim_pad_id) == 0) {
|
if (special_fim_pad_id != LLAMA_TOKEN_NULL && special_eog_ids.count(special_fim_pad_id) == 0) {
|
||||||
@@ -3079,7 +3079,7 @@ std::vector<llama_token> llama_vocab::impl::tokenize(
|
|||||||
}
|
}
|
||||||
|
|
||||||
int32_t llama_vocab::impl::token_to_piece(llama_token token, char * buf, int32_t length, int32_t lstrip, bool special) const {
|
int32_t llama_vocab::impl::token_to_piece(llama_token token, char * buf, int32_t length, int32_t lstrip, bool special) const {
|
||||||
// ref: https://github.com/ggerganov/llama.cpp/pull/7587#discussion_r1620983843
|
// ref: https://github.com/ggml-org/llama.cpp/pull/7587#discussion_r1620983843
|
||||||
static const int attr_special = LLAMA_TOKEN_ATTR_UNKNOWN | LLAMA_TOKEN_ATTR_CONTROL;
|
static const int attr_special = LLAMA_TOKEN_ATTR_UNKNOWN | LLAMA_TOKEN_ATTR_CONTROL;
|
||||||
const llama_token_attr attr = token_get_attr(token);
|
const llama_token_attr attr = token_get_attr(token);
|
||||||
if (!special && (attr & attr_special)) {
|
if (!special && (attr & attr_special)) {
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ llm_build_deepseek2::llm_build_deepseek2(const llama_model & model, const llm_gr
|
|||||||
const uint32_t kv_lora_rank = hparams.n_lora_kv;
|
const uint32_t kv_lora_rank = hparams.n_lora_kv;
|
||||||
|
|
||||||
// We have to pre-scale kq_scale and attn_factor to make the YaRN RoPE work correctly.
|
// We have to pre-scale kq_scale and attn_factor to make the YaRN RoPE work correctly.
|
||||||
// See https://github.com/ggerganov/llama.cpp/discussions/7416 for detailed explanation.
|
// See https://github.com/ggml-org/llama.cpp/discussions/7416 for detailed explanation.
|
||||||
// And also: https://github.com/ggml-org/llama.cpp/pull/17945 [TAG_DEEPSEEK2_YARN_LOG_MUL_FIX]
|
// And also: https://github.com/ggml-org/llama.cpp/pull/17945 [TAG_DEEPSEEK2_YARN_LOG_MUL_FIX]
|
||||||
|
|
||||||
// first cancel the adjustment from llama_hparams::yarn_attn_factor_adjust to get the original attn_factor
|
// first cancel the adjustment from llama_hparams::yarn_attn_factor_adjust to get the original attn_factor
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
// ref: https://github.com/ggerganov/llama.cpp/issues/4952#issuecomment-1892864763
|
// ref: https://github.com/ggml-org/llama.cpp/issues/4952#issuecomment-1892864763
|
||||||
|
|
||||||
#include <cstdio>
|
#include <cstdio>
|
||||||
#include <string>
|
#include <string>
|
||||||
|
|||||||
@@ -290,7 +290,7 @@ static void power_iteration(
|
|||||||
ggml_gallocr_free(allocr);
|
ggml_gallocr_free(allocr);
|
||||||
|
|
||||||
// TODO @ngxson : The output vector is randomly inverted
|
// TODO @ngxson : The output vector is randomly inverted
|
||||||
// Solution: https://github.com/ggerganov/llama.cpp/pull/8069#issuecomment-2185328171
|
// Solution: https://github.com/ggml-org/llama.cpp/pull/8069#issuecomment-2185328171
|
||||||
}
|
}
|
||||||
|
|
||||||
static void run_pca(
|
static void run_pca(
|
||||||
|
|||||||
@@ -190,7 +190,7 @@ struct lora_merge_ctx {
|
|||||||
gguf_set_val_u32(ctx_out, "general.file_type", LLAMA_FTYPE_MOSTLY_F16);
|
gguf_set_val_u32(ctx_out, "general.file_type", LLAMA_FTYPE_MOSTLY_F16);
|
||||||
|
|
||||||
// check if all lora adapters have the same tensors
|
// check if all lora adapters have the same tensors
|
||||||
// TODO: remove this when we can support merging subset of adapters. Ref: https://github.com/ggerganov/llama.cpp/pull/8607#discussion_r1686027777
|
// TODO: remove this when we can support merging subset of adapters. Ref: https://github.com/ggml-org/llama.cpp/pull/8607#discussion_r1686027777
|
||||||
static const char * err_no_subset_adapter = "Input adapters do not have the same list of tensors. This is not yet supported. Please merge the adapter one-by-one instead of merging all at once.";
|
static const char * err_no_subset_adapter = "Input adapters do not have the same list of tensors. This is not yet supported. Please merge the adapter one-by-one instead of merging all at once.";
|
||||||
if (adapters.size() > 1) {
|
if (adapters.size() > 1) {
|
||||||
for (size_t i = 1; i < adapters.size(); ++i) {
|
for (size_t i = 1; i < adapters.size(); ++i) {
|
||||||
|
|||||||
@@ -29,7 +29,7 @@ In addition to the KL divergence the following statistics are calculated with `-
|
|||||||
* Mean change in "correct" token probability. Positive values mean the model gets better at prediction, negative values mean it gets worse.
|
* Mean change in "correct" token probability. Positive values mean the model gets better at prediction, negative values mean it gets worse.
|
||||||
* Pearson correlation coefficient of the "correct" token probabilites between models.
|
* Pearson correlation coefficient of the "correct" token probabilites between models.
|
||||||
* Percentiles of change in "correct" token probability. Positive values mean the model gets better at prediction, negative values mean it gets worse. Can be used to judge noise vs. quality loss from quantization. If the percentiles are symmetric then the quantization is essentially just adding noise. If the negative values are significantly larger than the positive values then this indicates that the model is actually becoming worse from the quantization.
|
* Percentiles of change in "correct" token probability. Positive values mean the model gets better at prediction, negative values mean it gets worse. Can be used to judge noise vs. quality loss from quantization. If the percentiles are symmetric then the quantization is essentially just adding noise. If the negative values are significantly larger than the positive values then this indicates that the model is actually becoming worse from the quantization.
|
||||||
* The root mean square of the change in token probabilities. If you were to assume that the quantization simply causes Gaussian noise on the token probabilities then this would be the standard deviation of said noise. The uncertainty on the value is calculated that the change in token probabilities follows a Gaussian distribution. Related discussion: https://github.com/ggerganov/llama.cpp/discussions/2875 .
|
* The root mean square of the change in token probabilities. If you were to assume that the quantization simply causes Gaussian noise on the token probabilities then this would be the standard deviation of said noise. The uncertainty on the value is calculated that the change in token probabilities follows a Gaussian distribution. Related discussion: https://github.com/ggml-org/llama.cpp/discussions/2875 .
|
||||||
* Same top p: Percentage of how often the token was assigned the highest probabilites by both models. The uncertainty is calculated from the Gaussian approximation of the binomial distribution.
|
* Same top p: Percentage of how often the token was assigned the highest probabilites by both models. The uncertainty is calculated from the Gaussian approximation of the binomial distribution.
|
||||||
|
|
||||||
## LLaMA 3 8b Scoreboard
|
## LLaMA 3 8b Scoreboard
|
||||||
|
|||||||
@@ -1096,7 +1096,7 @@ return html`
|
|||||||
</section>
|
</section>
|
||||||
<footer>
|
<footer>
|
||||||
<p><${ModelGenerationInfo} /></p>
|
<p><${ModelGenerationInfo} /></p>
|
||||||
<p>Powered By <a href="https://github.com/ggerganov/llama.cpp#readme" target="_blank">llama.cpp</a> and <a href="https://ggml.ai/" target="_blank">ggml.ai</a></p>
|
<p>Powered By <a href="https://github.com/ggml-org/llama.cpp#readme" target="_blank">llama.cpp</a> and <a href="https://ggml.ai/" target="_blank">ggml.ai</a></p>
|
||||||
</footer>
|
</footer>
|
||||||
</div>
|
</div>
|
||||||
`;
|
`;
|
||||||
|
|||||||
@@ -1281,7 +1281,7 @@
|
|||||||
|
|
||||||
<footer>
|
<footer>
|
||||||
<p><${ModelGenerationInfo} /></p>
|
<p><${ModelGenerationInfo} /></p>
|
||||||
<p>Powered by <a href="https://github.com/ggerganov/llama.cpp">llama.cpp</a> and <a href="https://ggml.ai">ggml.ai</a>.</p>
|
<p>Powered by <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> and <a href="https://ggml.ai">ggml.ai</a>.</p>
|
||||||
</footer>
|
</footer>
|
||||||
</div>
|
</div>
|
||||||
`;
|
`;
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/* Author: Yazan Agha-Schrader */
|
/* Author: Yazan Agha-Schrader */
|
||||||
/* Inspiration from llama.cpp logo/banner https://github.com/ggerganov/llama.cpp#readme */
|
/* Inspiration from llama.cpp logo/banner https://github.com/ggml-org/llama.cpp#readme */
|
||||||
|
|
||||||
.theme-mangotango {
|
.theme-mangotango {
|
||||||
|
|
||||||
|
|||||||
@@ -1032,7 +1032,7 @@
|
|||||||
|
|
||||||
<footer>
|
<footer>
|
||||||
<p><${ModelGenerationInfo} /></p>
|
<p><${ModelGenerationInfo} /></p>
|
||||||
<p>Powered by <a href="https://github.com/ggerganov/llama.cpp">llama.cpp</a> and <a href="https://ggml.ai">ggml.ai</a>.</p>
|
<p>Powered by <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> and <a href="https://ggml.ai">ggml.ai</a>.</p>
|
||||||
</footer>
|
</footer>
|
||||||
</div>
|
</div>
|
||||||
`;
|
`;
|
||||||
|
|||||||
@@ -1036,7 +1036,7 @@
|
|||||||
|
|
||||||
<footer>
|
<footer>
|
||||||
<p><${ModelGenerationInfo} /></p>
|
<p><${ModelGenerationInfo} /></p>
|
||||||
<p>Powered by <a href="https://github.com/ggerganov/llama.cpp">llama.cpp</a> and <a href="https://ggml.ai">ggml.ai</a>.</p>
|
<p>Powered by <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> and <a href="https://ggml.ai">ggml.ai</a>.</p>
|
||||||
</footer>
|
</footer>
|
||||||
</div>
|
</div>
|
||||||
`;
|
`;
|
||||||
|
|||||||
Reference in New Issue
Block a user