mirror of
https://github.com/ollama/ollama.git
synced 2026-01-29 07:12:03 +03:00
On the llama engine, when we compute the memory layout, we reserve a buffer to allow for some flexibility for incorrect estimates. This is subtracted from GPU free memory and on GPUs with limited memory, it may underflow. Fixes #13494