Commit Graph

7287 Commits

Author SHA1 Message Date
Aleksander Grygier
3470b12b76 chore: update webui build output 2025-11-28 15:09:55 +01:00
Aleksander Grygier
eed1bd9b97 refactor: Enhance model info and attachment handling 2025-11-28 15:08:41 +01:00
Aleksander Grygier
491fe2d3f7 feat: Update logic for PDF as Image 2025-11-28 13:10:00 +01:00
Aleksander Grygier
bc577266b9 docs: Architecture documentation 2025-11-27 22:04:20 +01:00
Aleksander Grygier
db479523ec feat: Condition available models based on modality + better model loading strategy & UX 2025-11-27 19:13:05 +01:00
Aleksander Grygier
9086bc30bd feat: Improve statistic badges 2025-11-27 14:12:21 +01:00
Aleksander Grygier
d73353732f refactor: Architecture cleanup 2025-11-27 14:03:25 +01:00
Aleksander Grygier
78ead49830 Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-27 13:48:21 +01:00
Aleksander Grygier
6a3d6e79d2 refactor: Services/Stores syntax + logic improvements
Refactors components to access stores directly instead of using exported getter functions.

This change centralizes store access and logic, simplifying component code and improving maintainability by reducing the number of exported functions and promoting direct store interaction.

Removes exported getter functions from `chat.svelte.ts`, `conversations.svelte.ts`, `models.svelte.ts` and `settings.svelte.ts`.
2025-11-27 13:44:49 +01:00
Aleksander Grygier
69065ddc56 fix: UI 2025-11-27 11:27:58 +01:00
Aleksander Grygier
6b95118abc refactor: Processing state reactivity 2025-11-27 11:11:45 +01:00
Aleksander Grygier
2a5922b1f6 chore: update webui build output 2025-11-26 17:52:40 +01:00
Aleksander Grygier
13e7988459 refactor: Model modality handling 2025-11-26 17:51:25 +01:00
Xuan Son Nguyen
1493ee09ea tmp webui build 2025-11-26 17:43:27 +01:00
Aleksander Grygier
d6ee3d133a refactor: Server store 2025-11-26 17:16:41 +01:00
Aleksander Grygier
456828b365 refactor: Chat requests abort handling 2025-11-26 16:48:13 +01:00
Aleksander Grygier
42483f463d refactor: Remove ConversationsService 2025-11-26 16:45:07 +01:00
Xuan Son Nguyen
becc602612 Merge branch 'master' into xsn/server_model_management_v1_2 2025-11-26 16:21:57 +01:00
Xuan Son Nguyen
e2731c3767 set hf_repo/docker_repo as model alias when posible 2025-11-26 15:57:20 +01:00
Xuan Son Nguyen
e40f35fb61 remove support for extra args 2025-11-26 15:43:27 +01:00
Aleksander Grygier
ddf98bdf28 refactor: Improve API header management via utility functions 2025-11-26 15:36:09 +01:00
Aleksander Grygier
9431f358b8 chore: update webui build output 2025-11-26 15:07:12 +01:00
Aleksander Grygier
284557cd2f feat: Improve model loading/unloading status updates 2025-11-26 15:06:11 +01:00
xctan
6ab4e50d9c ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448)
* ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16

* ggml-cpu : dedup scalar impl

* Update ggml/src/ggml-cpu/vec.h

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b7164
2025-11-26 15:33:05 +02:00
Adrien Gallouët
2336cc4784 cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520)
We have to separate the code path starting 3.28 because
`FetchContent_Populate` is now deprecated and will be completely removed
in a future version.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
b7163
2025-11-26 15:15:21 +02:00
Adrien Gallouët
e6923caaec ggml : fix ARM feature verification (#17519)
On arm64 with `cmake` version 3.31.6, the final feature verification fails:

    -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sve
    -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sme
    -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success
    -- Checking for ARM features using flags:
    --   -U__ARM_FEATURE_SME
    --   -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme
    -- Performing Test HAVE_DOTPROD
    -- Performing Test HAVE_DOTPROD - Failed
    -- Performing Test HAVE_SVE
    -- Performing Test HAVE_SVE - Failed
    -- Performing Test HAVE_MATMUL_INT8
    -- Performing Test HAVE_MATMUL_INT8 - Failed
    -- Performing Test HAVE_FMA
    -- Performing Test HAVE_FMA - Success
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed
    -- Performing Test HAVE_SME
    -- Performing Test HAVE_SME - Failed
    -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme

We need to explicitly replace `;` with spaces from the list to make
`CMAKE_REQUIRED_FLAGS` work correctly...

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
b7162
2025-11-26 15:14:41 +02:00
Aleksander Grygier
d0d7a88d13 chore: update webui build output 2025-11-26 14:14:15 +01:00
Aleksander Grygier
23a91cd257 refactor: Icons 2025-11-26 14:13:17 +01:00
Aleksander Grygier
b1cf8bb814 refactor: Improve server properties management 2025-11-26 14:05:42 +01:00
Jiacheng (Jason) Chen
3e18dba9fd HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (#17502)
* patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4

* Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162
b7161
2025-11-26 11:18:48 +01:00
hipudding
eeb5605de2 CANN: Add MROPE and IMROPE support (#17401)
* CANN: ROPE supports both MROPE and IMROPE.

1. Optimize the caching logic of rope_cache_init.
2. Add support for mRoPE and i-mRoPE.

Note that on Ascend 910B devices, it is necessary to disable FA
in CLIP and disable NZ-format conversion. These two issues are
still under investigation.

* Resolve review comments
b7160
2025-11-26 16:44:19 +08:00
o7si
f3a848a3b1 chore: upgrade cpp-httplib from v0.27.0 to v0.28.0 (#17513) b7159 2025-11-26 09:21:06 +02:00
Jeff Bolz
b3b03a7baf vulkan: Implement GGML_OP_CUMSUM (#17479) b7158 2025-11-26 07:08:10 +01:00
Aleksander Grygier
19e5385bd5 chore: update webui build output 2025-11-26 02:14:33 +01:00
Aleksander Grygier
2a280b6082 feat: Model management and selection features WIP 2025-11-26 02:13:31 +01:00
Aleksander Grygier
81b8e1abb4 chore: update webui build output 2025-11-26 00:44:18 +01:00
Aleksander Grygier
22507fed74 refactor: Icons 2025-11-26 00:43:49 +01:00
Aleksander Grygier
5207527e9d fix: Audio attachments 2025-11-26 00:21:36 +01:00
Aleksander Grygier
c680083cce feat: Remove redundant settigns + rearrange 2025-11-26 00:08:04 +01:00
Aleksander Grygier
33356f36e4 fix: Regenerate 2025-11-26 00:03:17 +01:00
Aleksander Grygier
82975a1f2d fix: Add untrack inside chat processing info data logic to prevent infinite effect 2025-11-26 00:01:36 +01:00
Aleksander Grygier
013244933b chore: update webui build output 2025-11-25 17:15:48 +01:00
Aleksander Grygier
b9a3129d42 feat: Switching models logic for ChatForm or when regenerating messges + modality detection logic 2025-11-25 17:13:10 +01:00
Aleksander Grygier
4c24ead8e0 chore: update webui build output 2025-11-25 15:06:32 +01:00
Aleksander Grygier
501badc9c4 refactor: Multi-model business logic WIP 2025-11-25 15:04:46 +01:00
Georgi Gerganov
583cb83416 ggml : add ggml_top_k (#17365)
* ggml : add ggml_top_k

* cont : add ggml_argsort_top_k

* metal : add top_k support

* ggml : cleanup

* tests : add virtual err() function for test_case

* ggml : add comments
b7157
2025-11-25 15:31:43 +02:00
Aleksei Nikiforov
05872ac885 convert : fix big-endian conversion (#17431)
* Fix convert_hf_to_gguf.py script on s390x

Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.

Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.

byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.

* Make GGUFWriter accept tensors in native endianness instead of little-endian

With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x

* Fix byteswapping in convert_hf_to_gguf.py for remote models
2025-11-25 14:18:16 +01:00
Diego Devesa
55ab25caf5 codeowners : remove slaren (#17492) 2025-11-25 13:00:23 +01:00
Aleksander Grygier
f9c911d025 refactor: Remove redundant settings 2025-11-25 10:55:08 +01:00
TianHao324
064c90d843 CANN: supports out_prod operator for F32 and F16 (#17406)
Co-authored-by: tianhao <tianhao42@huawei.com>
b7154
2025-11-25 17:39:06 +08:00