mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-03-17 16:44:07 +00:00
general: CONTRIBUTING.md - guidelines for quantization schemes (#19762)
* Guidelines for quantization schemes * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Change required precision from Q8 to FP16/BF16 * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md [no ci] * Update CONTRIBUTING.md [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
This commit is contained in:
committed by
GitHub
parent
73c9eb8ced
commit
2948e6049a
@@ -38,6 +38,11 @@ Before submitting your PR:
|
||||
- Avoid combining unrelated changes in a single PR
|
||||
- For intricate features, consider opening a feature request first to discuss and align expectations
|
||||
- When adding support for a new model or feature, focus on **CPU support only** in the initial PR unless you have a good reason not to. Add support for other backends like CUDA in follow-up PRs
|
||||
- In particular, adding new data types (extension of the `ggml_type` enum) carries with it a disproportionate maintenance burden. As such, to add a new quantization type you will need to meet the following *additional* criteria *at minimum*:
|
||||
- convert a small model to GGUF using the new type and upload it to HuggingFace
|
||||
- provide [perplexity](https://github.com/ggml-org/llama.cpp/tree/master/tools/perplexity) comparisons to FP16/BF16 (whichever is the native precision) as well as to types of similar size
|
||||
- provide KL divergence data calculated vs. the FP16/BF16 (whichever is the native precision) version for both the new type as well as types of similar size
|
||||
- provide [performance data](https://github.com/ggml-org/llama.cpp/tree/master/tools/llama-bench) for the new type in comparison to types of similar size on pure CPU
|
||||
- Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
|
||||
- If you are a new contributor, limit your open PRs to 1.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user