167 Commits

Author SHA1 Message Date
Yuannan
65d7a8bbf0 devops : updated Nix systems (#22869) 2026-05-09 17:15:03 +03:00
Davi Henrique Linhares
00d56b11c3 docker : upgraded the default intel compute-runtime version (#22567) 2026-05-09 10:22:23 +02:00
Neo Zhang Jianyu
4ead6fd957 [SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24 package. (#22078)
* upgrade oneAPI to 2025.3.3

* update

* seperate SYCL CI and support release binary package for ubuntu 24

* add dependence

* remove wrong copy lines

* add missed line

* remove other task to test the release for SYCL

* rm more for test release

* fix file name

* correct the error in running

* support build for fp32/fp16

* rm ubuntu-24-sycl-fp16 for duplicated

* refactor build setting

* update guide for ubuntu 24 release package, restore the release.yml for other backend

* user docker replace to install oneAPI

* use download installation package to replace docker

* use wget to download and install oneapi, replace the apt cmd

* enable ccache for oneAPI installation

* fix format error

* enable cache for oneAPI installation

* update guide

* Update .github/workflows/release.yml

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update .github/workflows/release.yml

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update .github/workflows/build-sycl.yml

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update .github/workflows/release.yml

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-23 08:21:36 +03:00
Zijun Yu
52f1096f21 openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944)
* Thread safety per request only

* Fix ROPE yarn case

* Fix sticky stateful config

* Use i4/i8 directly for symmetric quant

* Use weightless caching

* Add WeightlessCacheAttribute to reduce NPU memory usage

* Gelu tanh support (#125)

* Imrope support (#126)

* fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free

* add GPU,NPU support in OV Dockerfile

* add build-openvino.yml ci

* Fix sticky stateful config

* add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml

* fix thread-safety of shared runtime context

* rope type abstraction for frontend translations

* fix editorconfig

---------

Co-authored-by: Mustafa Cavus <mustafa.cavus@intel.com>
Co-authored-by: Dan Hoffman <dhoff749@gmail.com>
Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>
2026-04-21 18:58:34 +03:00
Yuannan
90fb96a7b3 devops : added spirv-headers to nix (#21965) 2026-04-16 11:12:52 +03:00
Jeff Bolz
1f30ac0cea vulkan: Programmatically add RoundingModeRTE to all shaders when the device supports it (#21572)
* vulkan: Programmatically add RoundingModeRTE to all shaders when the device supports it

* use FetchContent to get SPIRV-Headers

* Fetch spirv-headers unconditionally

* remove fetchcontent, rely on installed headers

* fix ubuntu job

* Update docs/build.md
2026-04-14 15:17:45 +02:00
M1DNYT3
277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920)
Co-authored-by: M1DNYT3 <m1dnyt3@MacBookPro.lan>
Co-authored-by: CISC <CISC@users.noreply.github.com>
2026-04-03 15:06:45 +02:00
Tillerino
f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355) 2026-04-03 12:21:07 +03:00
Slobodan Josic
7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)
Bump ROCm version on Linux from 7.2 to 7.2.1
Add gfx1102 target
Delete LLVM workaround since ROCm 7.2.1 has fix for ROCm 7.2 perf regression https://github.com/ROCm/rocm-systems/issues/2865

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-03 00:59:20 +02:00
Seungmin Kim
84ae8434d0 CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122)
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>

* Obtain source tag name from git tag

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-30 20:24:37 +02:00
Davi Henrique Linhares
2405d59cb6 devops: including compute-runtime for intel.Dockerfile (#21076) 2026-03-29 13:34:03 +08:00
Ts-sound
bf934f28db docker : fix and enable ARM64 image build (#20929)
* CI: fix ARM64 image build error & enable compilation

* Update .github/workflows/docker.yml

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

* CI: revert ggml/src/ggml-cpu/CMakeLists.txt

* Update .github/workflows/docker.yml

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

* CI: update runs-on to ubuntu24.04, and update ARM64 build image ( ubuntu_version: "24.04")

* CI: change cpu.Dockerfile gcc to 14;

* CI : cpu.Dockerfile , update pip install .

* Update .github/workflows/docker.yml

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

---------

Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-28 01:45:09 +01:00
Kusha Gharahi
ff934e29bc server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the embedded web ui (#20158)
* introduce LLAMA_SERVER_NO_WEBUI

* LLAMA_SERVER_NO_WEBUI → LLAMA_BUILD_WEBUI

* LLAMA_BUILD_WEBUI ON by default not based on LLAMA_STANDALONE

* MIssed this

* Add useWebUi to package.nix
2026-03-27 17:25:55 +01:00
KokerZhou
6861f6509a CANN: update docker images to 8.5.0 and improve CANN.md (#20801)
* cann: update docker images to 8.5.0

- bump CANN base image from 8.3.rc2 to 8.5.0
- bump ASCEND_VERSION from 8.1.RC1.alpha001 to 8.5.0

Move to newer stable releases.

* cann: update CANN.md

* Update CANN.md to include BF16 support

Added BF16 support information to the CANN documentation and corrected formatting for the installation instructions.

* Fix formatting issues in CANN.md

Fix 234: Trailing whitespace
2026-03-27 08:53:00 +08:00
Davi Henrique Linhares
fd18364755 devops: upgraded default oneAPI version (#20731) 2026-03-23 21:47:34 +08:00
Gerard Guillemas Martos
fc350fdf96 docker : force Python 3.13 in Vulkan container (#20530)
* ci: force Python 3.13 in Vulkan container

* remove unnecessary `update-alternatives` line
2026-03-14 21:37:09 +01:00
Zijun Yu
9789c4ecdc ggml : add OpenVINO backend (#15307)
* Update build doc

* Add cgraph tensor output name to OV op name

* Update openvino build instructions

* Add initial NPU support

* draft NPU support version 2: prefill + kvcache

* NPU support version 2: prefill + kvcache

* Change due to ggml cgraph changes, not correct yet

* Change due to ggml cgraph changes, llama-3.2 CPU work

* Add AMD64 to CMakeLists

* Change due to ggml cgraph changes, all device work

* Refactor: clean, fix warning

* Update clang-format

* Statful transformation for CPU GPU

* Add SwiGLU

* Fuse to SDPA

* Replace Concat with Broadcast in MulMat for GQA

* Pull out indices creation for kv cache update

* Refactor: remove past_token_len from extra_inputs

* Fix Phi3 SwiGLU and SoftMax

* Pull out sin cos from rope

* Reduce memory: free ov weights node after graph conversion

* Fix CPY due to cgraph change

* Added OpenVINO CI/CD. Updated docs

* Fix llama-cli

* Fix Phi3 ROPE; Add test-backend-ops

* Fix NPU

* Fix llama-bench; Clang-format

* Fix llama-perplexity

* temp. changes for mark decomp

* matmul in fp32

* mulmat input conversion fix

* mulmat type conversion update

* add mark decomp pass

* Revert changes in fuse_to_sdpa

* Update build.md

* Fix test-backend-ops

* Skip test-thread-safety; Run ctest only in ci/run.sh

* Use CiD for NPU

* Optimize tensor conversion, improve TTFT

* Support op SET_ROWS

* Fix NPU

* Remove CPY

* Fix test-backend-ops

* Minor updates for raising PR

* Perf: RMS fused to OV internal RMS op

* Fix after rebasing

- Layout of cache k and cache v are unified: [seq, n_head, head_size]
- Add CPY and FLASH_ATTN_EXT, flash attn is not used yet
- Skip test-backend-ops due to flash attn test crash
- Add mutex around graph conversion to avoid test-thread-safety fali in the future
- Update NPU config
- Update GPU config to disable SDPA opt to make phi-3 run

* Change openvino device_type to GPU; Enable flash_attn

* Update supports_buft and supports_op for quantized models

* Add quant weight conversion functions from genai gguf reader

* Quant models run with accuracy issue

* Fix accuracy: disable cpu_repack

* Fix CI; Disable test-backend-ops

* Fix Q4_1

* Fix test-backend-ops: Treat quantized tensors as weights

* Add NPU Q4_0 support

* NPU perf: eliminate zp

* Dequantize q4_1 q4_k q6_k for NPU

* Add custom quant type: q8_1_c, q4_0_128

* Set m_is_static=false as default in decoder

* Simpilfy translation of get_rows

* Fix after rebasing

* Improve debug util; Eliminate nop ReshapeReshape

* STYLE: make get_types_to_requant a function

* Support BF16 model

* Fix NPU compile

* WA for npu 1st token acc issue

* Apply EliminateZP only for npu

* Add GeGLU

* Fix Hunyuan

* Support iSWA

* Fix NPU accuracy

* Fix ROPE accuracy when freq_scale != 1

* Minor: not add attention_size_swa for non-swa model

* Minor refactor

* Add Q5_K to support phi-3-q4_k_m

* Requantize Q6_K (gs16) to gs32 on GPU

* Fix after rebasing

* Always apply Eliminate_ZP to fix GPU compile issue on some platforms

* kvcachefusion support

* env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added

* Fix for Phi3

* Fix llama-cli (need to run with --no-warmup)

* Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working

* fix after rebasing

* Fix llama-3-8b and phi3-mini q4_0 NPU

* Update to OV-2025.3 and CMakeLists.txt

* Add OV CI cache

* Apply CISC review and update CI to OV2025.3

* Update CI to run OV dep install before build

* Update OV dockerfile to use OV2025.3 and update build docs

* Style: use switch in supports_ops

* Style: middle ptr and ref align, omit optional struct keyword

* NPU Unify PD (#14)

* Stateless. Fix llama-cli llama-server

* Simplify broadcast op in attention

* Replace get_output_tensor+memcpy with set_output_tensor

* NPU unify PD. Unify dynamic and static dims

* Clean placeholders in ggml-openvino.cpp

* NPU unify PD (handled internally)

* change graph to 4d, support multi sequences

* Fix llama-bench

* Fix NPU

* Update ggml-decoder.cpp

Hitting error while compiling on windows:

error C3861: 'unsetenv': identifier not found

Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it.

Proposed fix: Use _putenv_s() (Windows equivalent)
This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment.

This keeps cross-platform compatibility.

* Update ggml-decoder.cpp

* Update ggml-decoder.cpp

* Update ggml-decoder.cpp

* Update ggml-decoder.cpp

* Update ggml-decoder.cpp

* Remove the second decoder for node. Moving the function into the model decoder

* Fix error for naive

* NPU prefill chunking

* NPU fix llama-bench

* fallback naive run with accuracy issue

* NPU support llma-perplexity -b 512 --no-warmup

* Refactor: split ov_graph_compute for dynamic and static

* remove unused API GgmlOvDecoder::get_output_stride(const std::string & name)

* minor update due to ov 2025.4

* remove unused API GgmlOvDecoder::get_output_names()

* remove unused API get_output_shape(const std::string & name)

* Modified API GgmlOvDecoder::get_output_type(const std::string & name)

* Removed API GgmlOvDecoder::get_output_op_params(const std::string & name)

* Removed API get_output_ggml_tensor(const std::string & name)

* Removed API m_outputs

* Removed m_output_names

* Removed API GgmlOvDecoder::get_input_names()

* Removed API GgmlOvDecoder::get_input_stride(const std::string& name)

* Removed API get_input_type

* Removed API get_input_type

* Removed API GgmlOvDecoder::get_input_shape(const std::string & name)

* Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)

* Fix error for decoder cache

* Reuse cached decoder

* GPU remove Q6_K requantization

* NPU fix wrong model output shape

* NPU fix q4 perf regression

* Remove unused variable nodes

* Fix decoder can_reuse for llama-bench

* Update build.md for Windows

* backend buffer: allocate on host

* Use shared_buffer for GPU NPU; Refactor

* Add ov_backend_host_buffer; Use cached remote context

* Put kvcache on GPU

* Use ggml_aligned_malloc

* only use remote tensor for kvcache

* only use remote tensor for kvcache for GPU

* FIX: use remote tensor from singleton

* Update build.md to include OpenCL

* NPU always requant to q4_0_128

* Optimize symmetric quant weight extraction: use single zp

* Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant

* Update build.md

* Support -ctk f32

* Initial stateful graph support

* Update ggml/src/ggml-openvino/ggml-decoder.cpp

Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>

* code cleanup

* npu perf fix

* requant to f16 for Q6 embed on NPU

* Update ggml/src/ggml-openvino/ggml-decoder.cpp

* Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp

* Create OPENVINO.md in llama.cpp backend docs

* Update OPENVINO.md

* Update OPENVINO.md

* Update OPENVINO.md

* Update build.md

* Update OPENVINO.md

* Update OPENVINO.md

* Update OPENVINO.md

* kq_mask naming fix

* Syntax correction for workflows build file

* Change ov backend buffer is_host to false

* Fix llama-bench -p -n where p<=256

* Fix --direct-io 0

* Don't put kvcache on GPU in stateful mode

* Remove hardcode names

* Fix stateful shapes

* Simplification for stateful and update output shape processing

* Remove hardcode names

* Avoid re-compilation in llama-bench

* Extract zp directly instead of bias

* Refactor weight tensor processing

* create_weight_node accept non-ov backend buffer

* remove changes in llama-graph.cpp

* stateful masking fix (#38)

Fix for stateful accuracy issues and cl_out_of_resources error in stateful GPU with larger context sizes.

* Fix test-backend-ops crash glu, get_rows, scale, rms_norm, add

* hardcoded name handling for rope_freqs.weight

* Suppress logging and add error handling to allow test-backend-ops to complete

* Fix MUL_MAT with broadcast; Add unsupported MUL_MAT FLASH_ATTN cases

* Use bias instead of zp in test-backend-ops

* Update OV in CI, Add OV CI Tests in GH Actions

* Temp fix for multithreading bug

* Update OV CI, fix review suggestions.

* fix editorconfig-checker, update docs

* Fix tabs to spaces for editorconfig-checker

* fix editorconfig-checker

* Update docs

* updated model link to be GGUF model links

* Remove GGML_CPU_REPACK=OFF

* Skip permuted ADD and MUL

* Removed static variables from utils.cpp

* Removed initializing non-existing variable

* Remove unused structs

* Fix test-backend-ops for OV GPU

* unify api calling

* Update utils.cpp

* When the dim is dynamic, throw an error, need to is stastic forst

* Add interface compute_model_outputs(), which get the model output through computing the node use count & status in the cgraph to avoid the flag using

* No need to return

* Fix test-backend-ops for OV GPU LNL

* Fix test-thread-safety

* use the shape from infer request of output tensor create to avoid issue

* fix dynamic output shape  issue

* fix issue for the unused node in tests

* Remove unused lock

* Add comment

* Update openvino docs

* update to OV release version 2026.0

* add ci ov-gpu self hosted runner

* fix editorconfig

* Fix perplexity

* Rewrite the model inputs finding mechanism  (#54)

* Rewrite the model inputs finding logistic

* Put stateful shape handle in get input shape

* Put the iteration logistic in func

* Added ggml-ci-intel-openvino-gpu and doc update

* .hpp files converted to .h

* fix ggml-ci-x64-intel-openvino-gpu

* Fix for stateful execution bug in llama-bench

* Minor updates after stateful llama-bench fix

* Update ggml/src/ggml-openvino/utils.cpp

Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>

* Remove multiple get_shape calls

* Bring back mutex into compute

* Fix VIEW op, which slice the input node

* Added token_len_per_seq existence check before slicing masks and moved node retrieval inside guarded block to prevent missing-key access

* Temp. fix for test requant errors

* Update to OV ggml-ci to low-perf

* ci : temporary disable "test-llama-archs"

* ci : cache v4 -> v5, checkout v4 -> v6, fix runner tag

* docs : update url

* Fix OV link in docker and Update docs

---------

Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>
Co-authored-by: Cavus Mustafa <mustafa.cavus@intel.com>
Co-authored-by: Arshath <arshath.ramzan@intel.com>
Co-authored-by: XuejunZhai <Xuejun.Zhai@intel.com>
Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>
Co-authored-by: Xuejun Zhai <Xuejun.Zhai@intel>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-14 07:56:55 +02:00
Mario Limonciello
35715657cb Update ROCm docker container to 7.2 release (#19418)
Also update architectures
2026-02-21 21:53:39 +01:00
Sigbjørn Skjæret
b828e18c75 docker : fix vulkan build (#19352) 2026-02-05 11:10:39 +01:00
Alexis Williams
8a98ba4582 nix: fix allowUnfreePredicate for packages with multiple licenses (#19237)
The allowUnfreePredicate in pkgsCuda was wrapping p.meta.license in a
list unconditionally. This fails when meta.license is already a list
of licenses, as it creates a nested list and then tries to access
.free and .shortName on the inner list.

Use lib.toList instead, which correctly handles both cases:
- Single license attrset -> wraps in list
- List of licenses -> returns unchanged
2026-02-01 22:10:48 +02:00
Matthieu Coudron
41ea26144e nix: fix nix develop .#python-scripts (#19218)
Without this I get:

> * Getting build dependencies for wheel...
> * Building wheel...
> Successfully built gguf-0.17.1-py3-none-any.whl
> Finished creating a wheel...
> Finished executing pypaBuildPhase
> Running phase: pythonRuntimeDepsCheckHook
> Executing pythonRuntimeDepsCheck
> Checking runtime dependencies for gguf-0.17.1-py3-none-any.whl
>   - requests not installed
For full logs, run:
    nix log /nix/store/x0c4a251l68bvdgang9d8v2fsmqay8a4-python3.12-gguf-0.0.0.drv

I changed a bit the style to make it more terse ~> more elegant in my
opinion.
2026-01-31 18:01:46 +02:00
Chenguang Li
e20fa27a02 CANN: fix an issue where get_env was not fully renamed (#18796)
* CANN: fix an issue where get_env was not fully renamed

* ci: add cann with acl group

* ci: define use_acl_graph using GitHub Action

* ci: update cann dockerfile with acl graph
2026-01-16 16:24:04 +08:00
Adrien Gallouët
516a4ca9b5 refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
R
ae9f8df778 fix(docker): add missing libglvnd libraries to Vulkan image (#18664)
Add libglvnd0, libgl1, libglx0, libegl1, libgles2 to the Vulkan
Dockerfile base image. These libraries are required by mesa-vulkan-drivers
to properly initialize the Vulkan ICD and detect GPU devices.

Without these libraries, vkEnumeratePhysicalDevices() returns an empty
list, resulting in "ggml_vulkan: No devices found." error.

Fixes #17761
2026-01-07 16:57:42 +01:00
Sigbjørn Skjæret
4849661d98 docker : add CUDA 13.1 image build (#18441)
* add updated cuda-new.Dockerfile for Ubuntu 24.04 compatibilty

* add cuda13 build
2025-12-30 22:28:53 +01:00
Andrew Aladjev
fb644247de CLI: fixed adding cli and completion into docker containers, improved docs (#18003)
Co-authored-by: Andrew Aladjev <andrew.aladjev@gmail.com>
2025-12-16 11:52:23 +01:00
jiahao su
a8c7f33d79 ci : change the cann version and the container pull method (#17953)
fix error format

Update build.yml

Remove unnecessary zip files

fix

update
2025-12-12 20:43:00 +01:00
Sigbjørn Skjæret
b7f5f46e03 docker : include legacy llama-completion binary (#17964) 2025-12-12 19:39:23 +01:00
Eric Curtin
d21a76ac38 devops: Add build-essential to Ubuntu 26.04 image (#17531)
This is no longer passing the build, needs more packages.

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
2025-11-27 18:35:47 +08:00
Eric Curtin
bc809e9c53 vulkan: Update docker image to Ubuntu 26.04 to enable glslc features (#17439)
26.04 provides these

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
2025-11-23 10:29:36 +01:00
jiahao su
ffa277a54c CANN: Add openEuler-cann in build and release (#17192)
Update openEuler version

Remove variable ASCEND_SOC_TYPE

Modify the chip type

Fix case in zip filename

Change "device" to "chip_type"

Modify the value of chip_type
2025-11-18 16:08:55 +08:00
Mike Abbott
92bb442ad9 docker : preserve .so symlinks for docker container builds (#17214) 2025-11-12 20:33:55 +01:00
Nicolas B. Pierron
d2d626938a Install rpc-server when GGML_RPC is ON. (#17149) 2025-11-11 10:53:59 +00:00
Eric Curtin
86fde91e62 Switch to using Ubuntu 25.10 vulkan/mesa (#16497)
Because "Ubuntu packages to be discontinued in Vulkan SDK"

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
2025-11-09 10:25:38 +01:00
Aaron Teo
a864132ba5 devops: fix failing s390x docker build (#16918) 2025-11-02 08:48:46 +08:00
Yuannan
1d49ca3759 nix : removed metal for nix (#16118) 2025-10-06 12:29:56 +03:00
Neo Zhang Jianyu
2be72c2b12 SYCL: Update to oneAPI 2025.2 (#16371)
* update oneapi to 2025.2, use deep-learning-essentials to replace base-tool

* update to 2025.2 use deeplearn essi to replace base toolkit

* add missed dll

* add deep learning essentials

* add sycl-ls

---------

Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>
2025-10-02 10:16:25 +03:00
uvos
c8dedc9999 CI: reenable cdna in rocm docker builds (#16376) 2025-10-01 23:32:39 +02:00
uvos
1fe4e38cc2 ci: Properly install rocwmma for hip builds (#16305)
* CI: Properly install rocwmma for hip builds

on windows we now windows install rocwmma from ubuntu pacakges

* CI: update linux rocm docker build to use rocm 7.0
2025-10-01 20:18:03 +02:00
R0CKSTAR
d9e0e7c819 ci : fix musa docker build (#16306)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2025-09-28 16:38:15 +02:00
R0CKSTAR
a86a580a66 musa: upgrade musa sdk to 4.3.0 (#16240)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-09-26 02:56:38 +02:00
Aaron Teo
5fb557653b devops: fix s390x docker release failure (#16231) 2025-09-25 11:36:30 +08:00
Aaron Teo
4b9f4cb0f8 devops: add s390x containers (#15915)
* devops: add s390x dockerfile

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add missing ninja

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: move s390x docker into cpu docker

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: rework s390x docker

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: copy more tools

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add server build step

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: remove apt clean steps as distroless misses it

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: remove apt commands from distroless

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix shared libs in distroless

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: use correct libs path

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix shared libs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add collector stage

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix missing stage ref

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix permission issue

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix unknown model loading failures

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: attempt at fixing model loading failure

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix missing ggml shared object

failure to load model

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: remove move shared objects

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: move libggml-cpu and blas into bin

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: finalise hardened server stage

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add cli target

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix typos

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix missing shared libraries in base

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: update debian target

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: formalise llama.cpp loc

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Revert "devops: formalise llama.cpp loc"

This reverts commit 0a7664af84.

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: formalise llama.cpp loc

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 0a7664af84)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: attempt at fixing missing dir

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: attempt at making it cache the build

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix copying process

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: make build dir an argument

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Revert "devops: make build dir an argument"

This reverts commit 438698976b.

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add build stage for gguf-py

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: move gguf-py installation into build stage

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: break system packages?

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add rust compiler installer

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: fix rustc not found

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: remove cache mount to allow rustc to persist

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: move rustc installation to another layer

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: move gguf-py installation to full stage, fix copying

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: remove rustc installation in build

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: disable full target for now

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: attempting static build

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: merge s390x dockerfile into cpu for now

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: switch to gcc image for build step

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: remove build essentials

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: install openblas into base target

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: go back to s390x dockerfile

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: remove libggml and libblas

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add full target

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add break system packages

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add libjpeg

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add missing cmake dep

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: finalise docker images for s390x

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add custom openblas patch

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: use libopenblas-dev instead of libopenblas-openmp-dev

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* devops: add s390x docker build

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-23 13:59:34 +08:00
Diego Devesa
dc381aa9a6 docker : enable rocWMMA in ROCm images, add gfx1151 (#15997) 2025-09-15 23:38:52 +02:00
Adam
0fa154e350 rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD Radeon RX 9000 series (#15994)
* rocm.Dockerfile: added gfx1200,gfx1201 architectures to support  AMD Radeon RX 9000 series

https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html#rdna-os
states the Radeon RX 9000 series is supported support from Ubuntu 24.04.2, and the dockerfile is using 24.04 which is ROCm 6.4.

This fixed the `ROCm error: invalid device function` I was getting when trying to use the rocm container.
2025-09-14 20:43:54 +02:00
R0CKSTAR
b55f06e1aa vulkan.Dockerfile: install vulkan SDK using tarball (#15282)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-23 08:58:57 +02:00
Dobri Danchev
618575c582 Fix broken build: require updated pip to support --break-system-packages (#15357)
* Revert "devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 (#15005)"

This reverts commit e4e915912c.

* devops: Allow pip to modify externally-managed python environment (system installation)

- Updated pip install commands to include the --break-system-packages
  flag, ensuring compatibility when working with system-managed Python
  environments (PEP 668).

- Note: The --break-system-packages option was introduced in 2023.
  Ensure pip is updated to a recent version before using this flag.

fixes [#15004](https://github.com/danchev/llama.cpp/issues/15004)
2025-08-18 12:50:48 +02:00
simevo
e4e915912c devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 (#15005)
fixes #15004

Co-authored-by: Paolo Greppi <paolo.greppi@libpf.com>
2025-08-14 18:45:27 +03:00
Christian Kastner
646944cfa8 docker : Enable GGML_CPU_ALL_VARIANTS for ARM (#15267) 2025-08-14 16:22:58 +02:00
Ali Tariq
648ebcdb73 ci : Added CI with RISC-V RVV1.0 Hardware (#14439)
* Changed the CI file to hw

* Changed the CI file to hw

* Added to sudoers for apt

* Removed the clone command and used checkout

* Added libcurl

* Added gcc-14

* Checking gcc --version

* added gcc-14 symlink

* added CC and C++ variables

* Added the gguf weight

* Changed the weights path

* Added system specification

* Removed white spaces

* ci: Replace Jenkins riscv native build Cloud-V pipeline with GitHub Actions workflow

Removed the legacy .devops/cloud-v-pipeline Jenkins CI configuration and introduced .github/workflows/build-riscv-native.yml for native RISC-V builds using GitHub Actions.

* removed trailing whitespaces

---------

Co-authored-by: Akif Ejaz <akifejaz40@gmail.com>
2025-08-13 13:14:44 +03:00