Compare commits

...

4 Commits

Author SHA1 Message Date
M. Yusuf Sarıgöz
ff137fbbed Bump patch version for release 2024-07-10 12:39:50 +03:00
M. Yusuf Sarıgöz
f6a3321701 Upd gguf-py/readme 2024-07-10 12:38:35 +03:00
Clint Herron
a59f8fdc85 Server: Enable setting default sampling parameters via command-line (#8402)
* Load server sampling parameters from the server context by default.

* Wordsmithing comment
2024-07-09 18:26:40 -04:00
Andy Salerno
fd560fe680 Update README.md to fix broken link to docs (#8399)
Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'
2024-07-09 14:58:44 -04:00
4 changed files with 4 additions and 4 deletions

View File

@@ -453,7 +453,7 @@ To learn more how to measure perplexity using llama.cpp, [read this documentatio
- [How to build](./docs/build.md)
- [Running on Docker](./docs/docker.md)
- [Build on Android](./docs/android.md)
- [Performance troubleshooting](./docs/token_generation_performance_tips.md)
- [Performance troubleshooting](./docs/development/token_generation_performance_tips.md)
- [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)
**Seminal papers and background on the models**

View File

@@ -884,7 +884,8 @@ struct server_context {
bool launch_slot_with_task(server_slot & slot, const server_task & task) {
slot_params default_params;
llama_sampling_params default_sparams;
// Sampling parameter defaults are loaded from the global server context (but individual requests can still override them)
llama_sampling_params default_sparams = params.sparams;
auto & data = task.data;
if (data.count("__oaicompat") != 0) {

View File

@@ -79,5 +79,4 @@ python -m twine upload dist/*
```
## TODO
- [ ] Add tests
- [ ] Include conversion scripts as command line entry points in this package.

View File

@@ -1,6 +1,6 @@
[tool.poetry]
name = "gguf"
version = "0.9.0"
version = "0.9.1"
description = "Read and write ML models in GGUF for GGML"
authors = ["GGML <ggml@ggml.ai>"]
packages = [