Updated Feature matrix (markdown)

2026-03-17 16:44:07 +00:00 · 2025-02-17 14:57:56 +01:00
parent 3180009927
commit ed24c4179b
1 changed files with 9 additions and 8 deletions
--- a/Feature-matrix.md
+++ b/Feature-matrix.md
@@ -1,18 +1,19 @@
-|                      | **CPU (AVX2)** | **CPU (ARM NEON)** | **Metal** | **cuBLAS** |    **rocBLAS**   | **SYCL** | **CLBlast** | **Vulkan** | **Kompute** |
-|:--------------------:|:--------------:|:------------------:|:---------:|:----------:|:----------------:|:--------:|:-----------:|:----------:|:-----------:|
-| **K-quants**         | ✅              | ✅                  | ✅         | ✅          | ✅                | ✅        | ✅ 🐢⁵          | ✅ 🐢⁵         | 🚫           |
-| **I-quants**         | ✅ 🐢⁴       | ✅ 🐢⁴           | ✅ 🐢⁴ | ✅          | ✅                | Partial¹        | 🚫           | 🚫          | 🚫           |
-| **Multi-GPU**        | N/A            | N/A                | N/A       | ✅          | ❓                | 🚫        | ❓           | ✅          | ❓           |
-|  **K cache quants**  | ✅              | ❓                  | ✅         | ✅ 🐢³          | Partial⁶ 🐢³ | ❓        | ✅           | 🚫          | 🚫           |
-| **MoE architecture** | ✅              | ❓                  | ✅         | ✅          | ✅                | ❓        | Partial² | 🚫          | 🚫           |
+|                         | **CPU (AVX2)** | **CPU (ARM NEON)** | **Metal** | **CUDA**   |    **ROCm**      | **SYCL** | **CLBlast** | **Vulkan** | **Kompute** |
+|:-----------------------:|:--------------:|:------------------:|:---------:|:----------:|:----------------:|:--------:|:-----------:|:----------:|:-----------:|
+| **K-quants**            | ✅              | ✅                  | ✅         | ✅          | ✅                | ✅        | ✅ 🐢⁵          | ✅ 🐢⁵         | 🚫           |
+| **I-quants**            | ✅ 🐢⁴       | ✅ 🐢⁴           | ✅ 🐢⁴ | ✅          | ✅                | Partial¹        | 🚫           | 🚫          | 🚫           |
+| **Parallel Multi-GPU⁶** | N/A            | N/A                | N/A       | ✅          | ✅                | 🚫        | ❓           | ❓          | ❓           |
+|  **K cache quants**     | ✅              | ❓                  | ✅         | ✅ 🐢³          | Partial⁶ 🐢³ | ❓        | ✅           | 🚫          | 🚫           |
+| **MoE architecture**    | ✅              | ❓                  | ✅         | ✅          | ✅                | ❓        | Partial² | 🚫          | 🚫           |

 * ✅: feature works
 * 🚫: feature does not work
-* ❓: unknown, please contribute if you can test it youself
+* ❓: unknown, please contribute if you can test it yourself
 * 🐢: feature is slow
 * ¹: IQ3_S and IQ1_S, see #5886
 * ²: Only with `-ngl 0`
 * ³: Inference is 50% slower
 * ⁴: Slower than K-quants of comparable size
 * ⁵: Slower than cuBLAS/rocBLAS on similar cards
+* ⁶: By default, all backends can utilize multiple devices by running them sequentially. The CUDA code (which is also used for ROCm via HIP) also has code for running GPUs in parallel via `--split-mode row`. However, this is optimized relatively poorly and is only faster if the interconnect speed is fast vs. the speed of a single GPU.
 * ⁶: Only q8_0 and iq4_nl