llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-05-14 13:04:08 +00:00

Author	SHA1	Message	Date
Aleksander Grygier	3470b12b76	chore: update webui build output	2025-11-28 15:09:55 +01:00
Aleksander Grygier	eed1bd9b97	refactor: Enhance model info and attachment handling	2025-11-28 15:08:41 +01:00
Aleksander Grygier	491fe2d3f7	feat: Update logic for PDF as Image	2025-11-28 13:10:00 +01:00
Aleksander Grygier	bc577266b9	docs: Architecture documentation	2025-11-27 22:04:20 +01:00
Aleksander Grygier	db479523ec	feat: Condition available models based on modality + better model loading strategy & UX	2025-11-27 19:13:05 +01:00
Aleksander Grygier	9086bc30bd	feat: Improve statistic badges	2025-11-27 14:12:21 +01:00
Aleksander Grygier	d73353732f	refactor: Architecture cleanup	2025-11-27 14:03:25 +01:00
Aleksander Grygier	78ead49830	Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2	2025-11-27 13:48:21 +01:00
Aleksander Grygier	6a3d6e79d2	refactor: Services/Stores syntax + logic improvements Refactors components to access stores directly instead of using exported getter functions. This change centralizes store access and logic, simplifying component code and improving maintainability by reducing the number of exported functions and promoting direct store interaction. Removes exported getter functions from `chat.svelte.ts`, `conversations.svelte.ts`, `models.svelte.ts` and `settings.svelte.ts`.	2025-11-27 13:44:49 +01:00
Aleksander Grygier	69065ddc56	fix: UI	2025-11-27 11:27:58 +01:00
Aleksander Grygier	6b95118abc	refactor: Processing state reactivity	2025-11-27 11:11:45 +01:00
Aleksander Grygier	2a5922b1f6	chore: update webui build output	2025-11-26 17:52:40 +01:00
Aleksander Grygier	13e7988459	refactor: Model modality handling	2025-11-26 17:51:25 +01:00
Xuan Son Nguyen	1493ee09ea	tmp webui build	2025-11-26 17:43:27 +01:00
Aleksander Grygier	d6ee3d133a	refactor: Server store	2025-11-26 17:16:41 +01:00
Aleksander Grygier	456828b365	refactor: Chat requests abort handling	2025-11-26 16:48:13 +01:00
Aleksander Grygier	42483f463d	refactor: Remove ConversationsService	2025-11-26 16:45:07 +01:00
Xuan Son Nguyen	becc602612	Merge branch 'master' into xsn/server_model_management_v1_2	2025-11-26 16:21:57 +01:00
Xuan Son Nguyen	e2731c3767	set hf_repo/docker_repo as model alias when posible	2025-11-26 15:57:20 +01:00
Xuan Son Nguyen	e40f35fb61	remove support for extra args	2025-11-26 15:43:27 +01:00
Aleksander Grygier	ddf98bdf28	refactor: Improve API header management via utility functions	2025-11-26 15:36:09 +01:00
Aleksander Grygier	9431f358b8	chore: update webui build output	2025-11-26 15:07:12 +01:00
Aleksander Grygier	284557cd2f	feat: Improve model loading/unloading status updates	2025-11-26 15:06:11 +01:00
xctan	6ab4e50d9c	ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448 ) * ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 * ggml-cpu : dedup scalar impl * Update ggml/src/ggml-cpu/vec.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b7164	2025-11-26 15:33:05 +02:00
Adrien Gallouët	2336cc4784	cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520 ) We have to separate the code path starting 3.28 because `FetchContent_Populate` is now deprecated and will be completely removed in a future version. Signed-off-by: Adrien Gallouët <angt@huggingface.co> b7163	2025-11-26 15:15:21 +02:00
Adrien Gallouët	e6923caaec	ggml : fix ARM feature verification (#17519 ) On arm64 with `cmake` version 3.31.6, the final feature verification fails: -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs -- Performing Test GGML_MACHINE_SUPPORTS_dotprod -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success -- Performing Test GGML_MACHINE_SUPPORTS_i8mm -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success -- Performing Test GGML_MACHINE_SUPPORTS_sve -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success -- Performing Test GGML_MACHINE_SUPPORTS_sme -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed -- Performing Test GGML_MACHINE_SUPPORTS_nosme -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success -- Checking for ARM features using flags: -- -U__ARM_FEATURE_SME -- -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme -- Performing Test HAVE_DOTPROD -- Performing Test HAVE_DOTPROD - Failed -- Performing Test HAVE_SVE -- Performing Test HAVE_SVE - Failed -- Performing Test HAVE_MATMUL_INT8 -- Performing Test HAVE_MATMUL_INT8 - Failed -- Performing Test HAVE_FMA -- Performing Test HAVE_FMA - Success -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed -- Performing Test HAVE_SME -- Performing Test HAVE_SME - Failed -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme We need to explicitly replace `;` with spaces from the list to make `CMAKE_REQUIRED_FLAGS` work correctly... Signed-off-by: Adrien Gallouët <angt@huggingface.co> b7162	2025-11-26 15:14:41 +02:00
Aleksander Grygier	d0d7a88d13	chore: update webui build output	2025-11-26 14:14:15 +01:00
Aleksander Grygier	23a91cd257	refactor: Icons	2025-11-26 14:13:17 +01:00
Aleksander Grygier	b1cf8bb814	refactor: Improve server properties management	2025-11-26 14:05:42 +01:00
Jiacheng (Jason) Chen	3e18dba9fd	HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (#17502 ) * patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4 * Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162 b7161	2025-11-26 11:18:48 +01:00
hipudding	eeb5605de2	CANN: Add MROPE and IMROPE support (#17401 ) * CANN: ROPE supports both MROPE and IMROPE. 1. Optimize the caching logic of rope_cache_init. 2. Add support for mRoPE and i-mRoPE. Note that on Ascend 910B devices, it is necessary to disable FA in CLIP and disable NZ-format conversion. These two issues are still under investigation. * Resolve review comments b7160	2025-11-26 16:44:19 +08:00
o7si	f3a848a3b1	chore: upgrade cpp-httplib from v0.27.0 to v0.28.0 (#17513 ) b7159	2025-11-26 09:21:06 +02:00
Jeff Bolz	b3b03a7baf	vulkan: Implement GGML_OP_CUMSUM (#17479 ) b7158	2025-11-26 07:08:10 +01:00
Aleksander Grygier	19e5385bd5	chore: update webui build output	2025-11-26 02:14:33 +01:00
Aleksander Grygier	2a280b6082	feat: Model management and selection features WIP	2025-11-26 02:13:31 +01:00
Aleksander Grygier	81b8e1abb4	chore: update webui build output	2025-11-26 00:44:18 +01:00
Aleksander Grygier	22507fed74	refactor: Icons	2025-11-26 00:43:49 +01:00
Aleksander Grygier	5207527e9d	fix: Audio attachments	2025-11-26 00:21:36 +01:00
Aleksander Grygier	c680083cce	feat: Remove redundant settigns + rearrange	2025-11-26 00:08:04 +01:00
Aleksander Grygier	33356f36e4	fix: Regenerate	2025-11-26 00:03:17 +01:00
Aleksander Grygier	82975a1f2d	fix: Add `untrack` inside chat processing info data logic to prevent infinite effect	2025-11-26 00:01:36 +01:00
Aleksander Grygier	013244933b	chore: update webui build output	2025-11-25 17:15:48 +01:00
Aleksander Grygier	b9a3129d42	feat: Switching models logic for ChatForm or when regenerating messges + modality detection logic	2025-11-25 17:13:10 +01:00
Aleksander Grygier	4c24ead8e0	chore: update webui build output	2025-11-25 15:06:32 +01:00
Aleksander Grygier	501badc9c4	refactor: Multi-model business logic WIP	2025-11-25 15:04:46 +01:00
Georgi Gerganov	583cb83416	ggml : add ggml_top_k (#17365 ) * ggml : add ggml_top_k * cont : add ggml_argsort_top_k * metal : add top_k support * ggml : cleanup * tests : add virtual err() function for test_case * ggml : add comments b7157	2025-11-25 15:31:43 +02:00
Aleksei Nikiforov	05872ac885	convert : fix big-endian conversion (#17431 ) * Fix convert_hf_to_gguf.py script on s390x Assume converted model data is originally little-endian. Byteswap data on s390x after reading it to put values in correct presentation for any transformation needed, like calculating weight tensors. Then byteswap data to little-endian before passing it to GGUFWriter while GGUFWriter will byteswap data back to big endian if big endian output is requested. byteswap(inplace=True) calls don't work with lazy tensor and array wrappers. Use byteswap with copying data to workaround this behaviour. * Make GGUFWriter accept tensors in native endianness instead of little-endian With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x * Fix byteswapping in convert_hf_to_gguf.py for remote models	2025-11-25 14:18:16 +01:00
Diego Devesa	55ab25caf5	codeowners : remove slaren (#17492 )	2025-11-25 13:00:23 +01:00
Aleksander Grygier	f9c911d025	refactor: Remove redundant settings	2025-11-25 10:55:08 +01:00
TianHao324	064c90d843	CANN: supports out_prod operator for F32 and F16 (#17406 ) Co-authored-by: tianhao <tianhao42@huawei.com> b7154	2025-11-25 17:39:06 +08:00

1 2 3 4 5 ...

7287 Commits