Commit Graph

7227 Commits

Author SHA1 Message Date
Aleksander Grygier
11c26ecfca Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-24 16:07:05 +01:00
Xuan Son Nguyen
e514b86d2b fix merge 2025-11-24 14:50:42 +01:00
Xuan Son Nguyen
399b39f21b Merge branch 'master' into xsn/server_model_management_v1_2 2025-11-24 14:45:57 +01:00
Xuan-Son Nguyen
b8372eecd9 server: split server.cpp code into server/common/task/queue (#17362)
* add server-task, server-common

* add server-queue

* rm redundant includes

* move enum stop_type to server-task

* server : headers cleanup

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b7146
2025-11-24 14:41:53 +01:00
Daniel Bevenius
6ab8eacddf examples : add -kvu to batched usage example [no ci] (#17469)
This commit adds the --kv-unified flag to the usage example
in the README.md file for the batched example.

The motivation for this is that without this flag the example will fail
with the following error:
```console
Hello my name is
split_equal: sequential split is not supported when there are coupled
sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 4
main: llama_decode() failed
```
2025-11-24 15:38:45 +02:00
Georgi Gerganov
2d50b9d8cb sync : ggml b7144 2025-11-24 15:26:31 +02:00
Daniel Bevenius
697edfeead ggml : remove dirty flag from version string (ggml/1391)
This commit removes the "-dirty" suffix from the GGML version string.

The motivation for this change is to ensure that the version string
works with different ways of checking out ggml and using it in projects.
By removing the dirty flag from the version string, we avoid potential
artifacts like shared libraries getting a -dirty suffix in their names.

Instead, if the project is built from a dirty git state, the dirty flag
will be appended to the commit hash in the GGML_BUILD_COMMIT variable.
This will enable users to still identify that the build was made from
from a modified/dirty state even though the version might match a "real"
version.

For example, the commit can be produces as follows:
```c++
    printf("commit: %s\n", ggml_commit());
```
Which would print the following for a dirty build:
```console
commit: 781baf2a-dirty
```

Refs: https://github.com/ggml-org/ggml/pull/1363#issuecomment-3569691546
2025-11-24 15:26:31 +02:00
Xuan Son Nguyen
539cbf003e add stdin_file 2025-11-24 14:21:21 +01:00
Xuan Son Nguyen
2c6b58f785 nits 2025-11-24 12:20:34 +01:00
Alberto Cabrera Pérez
dbb852b549 ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (#16739)
* Enabled q4_K_8x8_q8_K path on ARM

* wip: I8mm qs multiplication, pending bias

* cpu : arm : REPACK gemm q4_K8x8 implementation

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Guard gemm with proper features, improved superblock scale and min calc

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* cpu: arm: Implemented REPACK gemv for Q4_K

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Removed completed TODO

* Fixed missing guards when selecting optimal repack type for Q4_K

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Fixed macro guard for gemv

* Fixed wrong comment in GEMV

* Fixed warning for unused variable

* vdotq_s32 -> ggml_vdotq_s32

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Clang-format issues

* Apply suggestions from code review

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Removed unnecessary GGML_UNUSED

* Fixed guards in q4_k gemm and gemv (repack)

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
b7142
2025-11-24 13:08:11 +02:00
ixgbe
5f55c385cb ggml: add RISC-V cpu-feats (#17461)
* ggml: add RISC-V cpu-feats

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

* fix comment[1]

---------

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
b7141
2025-11-24 13:07:14 +02:00
Xuan Son Nguyen
6ed192b4dd add --models-allow-extra-args for security 2025-11-24 12:01:16 +01:00
william pan
4902eebe33 models : Added support for RND1 Diffusion Language Model (#17433)
* Converted RND1 model to GGUF weights

* RND1 llama.cpp support v1

* RND1 llama.cpp support v2 non causal bug

* RND1 llama.cpp support v3 doccumentation

* RND1 llama.cpp support v4 clean code

* linting issues

* RND1 pr fixes v1

* RND1 pr fixes v2

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Diffusion documentation edits

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
b7140
2025-11-24 14:16:56 +08:00
Max Krasnyansky
923ae3c619 hexagon: add support for ROPE_NEOX (#17458) b7139 2025-11-23 18:55:56 -08:00
Raul Torres
01ad35e6d6 CANN: Define cann_graph_update_required before macro (#17434)
**Description of the problem**

`cann_graph_update_required` is redundantly defined and
initialized as `false` inside two mutually exclusive macro branches.

**Proposed solution**

Define it right before the macro so that it could serve both
branches.
b7138
2025-11-24 10:02:52 +08:00
Aleksander Grygier
5ef3f990b9 chore: update webui build output 2025-11-24 02:24:27 +01:00
Aleksander Grygier
b2590a7f6c refactor: Cleanup 2025-11-24 02:24:10 +01:00
M. Mediouni
fcb013847c ggml-hexagon: Initial Hexagon v68/v69 support (#17394)
* ggml-hexagon: fix build error with GCC

Add stdexcept include to fix GCC build errors

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

* ggml-hexagon: check VTCM acquire failures

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

* ggml-hexagon: disable destination bypass on older than v73

v68 errors out if having bypass enabled when the VTCM is the destination.

At least on v68 this made things actually work... not a proper fix though, so to look at later...

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

* ggml-hexagon: add initial v68/v69 support

v68 is the Hexagon revision notably used on the Snapdragon 8cx
Gen 3 and the QCM6490.

Also add support for v69.

8MB isn't a supported page size, so relax asked for page size constraint
for HAP_compute_res_attr_set_vtcm_param_v2 to optimal.

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

---------

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>
b7137
2025-11-23 16:54:49 -08:00
Aleksander Grygier
13fe8607c5 refactor: Cleanup 2025-11-24 01:42:42 +01:00
Aleksander Grygier
76557cd5d3 Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-24 00:36:00 +01:00
Aleksander Grygier
e808f2b2e6 chore: update webui build output 2025-11-23 23:45:08 +01:00
Aleksander Grygier
16747dee5b refactor: UI badges 2025-11-23 23:44:14 +01:00
Aleksander Grygier
188d3236e4 chore: update webui build output 2025-11-23 23:28:49 +01:00
Aleksander Grygier
39fb1c2b17 refactor: Cleanup 2025-11-23 23:28:28 +01:00
nullname
d5bc1ad110 ggml-hexagon: add hex_supported_buffer for better buffer supported check (#17212)
* hexagon: add buffer support checks for hexagon sessions

* refactor: simplify buffer support checks in hexagon operations

* hexagon: update buffer support checks to use tensor structure

* refactor: streamline buffer initialization for DSP queue in hexagon operations

* refactor: simplify buffer initialization in DSP queue for hexagon operations

* refactor: optimize hex_supported_buffer function by fold expression

* wip

* refactor: simplify dspqueue_buffers_init function and its usage in hexagon operations

* fix: improve nan handling at hvx_vec_fast_sigmoid_fp32_guard

* refactor: optimize hvx_vec_inverse_fp32_guard for better nan handling

* refactor: update hvx_vec_fast_sigmoid_fp32_guard to use adjusted exponent limits

* refactor: modify hvx_vec_fast_sigmoid_fp32_guard to accept parameters for improved flexibility

* refactor: update hvx_vec_exp_fp32_guard to accept max_exp and inf parameters to save some instructions

* refactor: move hvx_vec_inverse_fp32_guard implementation to hvx-inverse.c for better perf
b7136
2025-11-23 14:26:36 -08:00
Aleksander Grygier
fb5445e9ce chore: update webui build output 2025-11-23 23:25:05 +01:00
Aleksander Grygier
e92ce07916 refactor: Copy To Clipboard Icon component 2025-11-23 23:23:38 +01:00
Aleksander Grygier
219fd19eb8 chore: update webui build output 2025-11-23 23:09:09 +01:00
Aleksander Grygier
41764b8fa0 refactor: Formatters 2025-11-23 22:54:14 +01:00
Aleksander Grygier
f8ff39c64e refactor: Cleanup 2025-11-23 22:32:31 +01:00
Aleksander Grygier
d5a6671b81 refactor: Cleanup 2025-11-23 22:27:25 +01:00
Aleksander Grygier
49c8062db1 chore: update webui build output 2025-11-23 22:25:34 +01:00
Aleksander Grygier
ef5f9d07b0 feat: Improve Model Selector responsiveness 2025-11-23 22:23:50 +01:00
Aleksander Grygier
1c214e9a49 refactor: Enum imports 2025-11-23 22:16:22 +01:00
Aleksander Grygier
48dbef1729 chore: update webui build output 2025-11-23 21:58:38 +01:00
Aleksander Grygier
b7ba13b6a0 refactor: Attachments data 2025-11-23 21:46:43 +01:00
Aleksander Grygier
1f0cb3ab26 feat: Use model property for displaying the repo/model-name naming format 2025-11-23 21:19:00 +01:00
Xuan Son Nguyen
d65be9170b address review comments 2025-11-23 19:31:21 +01:00
Xuan Son Nguyen
5ad594e6d6 cleaner 2025-11-23 19:02:07 +01:00
Pascal
0c7220db56 webui: minor settings reorganization and add disable autoscroll option (#17452)
* webui: added a dedicated 'Display' settings section that groups visualization options

* webui: added a Display setting to toggle automatic chat scrolling

* chore: update webui build output
2025-11-23 18:42:00 +01:00
Xuan Son Nguyen
2e355c7f8e oai-compat /models endpoint 2025-11-23 17:25:24 +01:00
Xuan Son Nguyen
f95f9c5128 typo docs 2025-11-23 16:14:02 +01:00
Xuan Son Nguyen
74685f4194 allow reusing args if auto_load 2025-11-23 15:42:33 +01:00
Xuan Son Nguyen
f927e21ffc support extra_args on loading model 2025-11-23 15:39:03 +01:00
Xuan Son Nguyen
7ef6312f85 add note 2025-11-23 15:08:31 +01:00
Xuan Son Nguyen
f25bfaba4d expose args and exit_code in API 2025-11-23 14:59:04 +01:00
Sigbjørn Skjæret
96ac5a2329 cuda : support non-contiguous i32 to i32 copy (#17326)
* support non-contiguous i32 to i32 copy

* add tests

* rename cpy_flt to cpy_scalar and reindent params
b7134
2025-11-23 11:13:34 +01:00
Eric Curtin
bc809e9c53 vulkan: Update docker image to Ubuntu 26.04 to enable glslc features (#17439)
26.04 provides these

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
2025-11-23 10:29:36 +01:00
Jeff Bolz
54d83bbe85 vulkan: remove a couple unnecessary switches (#17419) b7132 2025-11-23 06:29:40 +01:00
Aleksander Grygier
6282537a8b Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-22 23:35:05 +01:00