ollama/model/models at main - ollama - Gitea: Git with a cup of tea

gsrlabs/ollama

mirror of https://github.com/ollama/ollama.git synced 2026-01-29 07:12:03 +03:00

Files

History

Jeffrey Morgan a1ca428c90 glm4moelite: fix attention scale calculation (#13893 )

Use the original key dimension (qkNopeHeadDim + qkRopeHeadDim = 256) for
the attention scale instead of the MLA absorbed dimension (kvLoraRank +
qkRopeHeadDim = 576).

MLA absorption is a mathematically equivalent reorganization of the
attention computation - it should not change the effective attention
scale. The scale should match training, which uses 1/sqrt(256).

This improves tool calling and model looping issues.

2026-01-24 17:48:09 -08:00

..

revert granite-embedding (#13505 )

2025-12-16 15:44:52 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

model: default gemma 3 rope scale to 1.0, apply corrections based on layer counts (#13453 )

2025-12-12 17:51:56 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

glm4moelite: fix attention scale calculation (#13893 )

2026-01-24 17:48:09 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )

2026-01-20 12:20:53 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

model: fix rotary embeddings for ministral 3 (#13432 )

2025-12-11 16:02:05 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

nomic-embed-text:v2: model implementation (#13162 )

2025-12-09 14:24:51 -08:00

model: add olmo3 and olmo3.1 (#13415 )

2025-12-15 15:20:04 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

refactor rope

2025-12-08 14:42:22 -08:00

fix: qwen2.5 vl rope (#13486 )

2025-12-15 17:30:33 -08:00

models.go

model: add lfm2 architecture and LFM2.5-1.2B-Thinking support (#13792 )

2026-01-20 12:20:53 -08:00