ollama

mirror of https://github.com/ollama/ollama.git synced 2026-01-29 07:12:03 +03:00

Files

Jeffrey Morgan 912d984346 llama: fix fattn-tile shared memory overflow on sm_50/52 (#13872 )

Use nthreads=128 for ncols=4 configurations in flash attention tile
kernel to reduce shared memory usage below 48KB limit on Maxwell
architectures (sm_50/52).

With nthreads=256 and ncols=4, np=2 which caused shared memory to
exceed 48KB. With nthreads=128 and ncols=4, np=1 keeps shared memory
under the limit.

2026-01-23 19:22:32 -08:00

backend

llama: fix fattn-tile shared memory overflow on sm_50/52 (#13872 )

2026-01-23 19:22:32 -08:00

fix: qwen2.5 vl rope (#13486 )

2025-12-15 17:30:33 -08:00

backend.go

model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )

2026-01-20 12:20:53 -08:00

device.go

flash attn: add auto mode for llama engine (#13052 )

2025-12-12 13:27:19 -08:00

path.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00