ollama

mirror of https://github.com/ollama/ollama.git synced 2026-01-29 15:22:02 +03:00

Files

Jeffrey Morgan 64737330a4 Re-apply "model: add MLA absorption for glm4moelite" with fix (#13870 )

The nvidia_fp32 config for (576, 512) head sizes had nbatch_fa=32,
which caused zero-sized arrays when computing array dimensions:
  nbatch_fa / (np * warp_size) = 32 / (2 * 32) = 0

This resulted in CUDA compilation failures on CUDA 12 (Windows and
Linux arm64):
- "static assertion failed with nbatch_fa % (np*warp_size) != 0"
- "the size of an array must be greater than zero"

Fix by changing nbatch_fa from 32 to 64 for all (576, 512) configs
in the nvidia_fp32 function, matching the nvidia_fp16 and AMD configs.

2026-01-23 18:40:28 -08:00

imageproc

deepseekocr

2025-11-18 16:11:37 -08:00

input

batch: use tensors for outputs (#12185 )

2025-09-15 14:33:06 -07:00

models

Re-apply "model: add MLA absorption for glm4moelite" with fix (#13870 )

2026-01-23 18:40:28 -08:00

parsers

model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )