ollama

mirror of https://github.com/ollama/ollama.git synced 2026-01-29 07:12:03 +03:00

Files

Jeffrey Morgan 9667c2282f x/imagegen: add naive TeaCache and FP8 quantization support (#13683 )

TeaCache:
- Timestep embedding similarity caching for diffusion models
- Polynomial rescaling with configurable thresholds
- Reduces transformer forward passes by ~30-50%

FP8 quantization:
- Support for FP8 quantized models (8-bit weights with scales)
- QuantizedMatmul on Metal, Dequantize on CUDA
- Client-side quantization via ollama create --quantize fp8

Other bug fixes:
- Fix `/api/show` API for image generation models
- Server properly returns model info (architecture, parameters, quantization)
- Memory allocation optimizations
- CLI improvements for image generation

2026-01-12 13:45:22 -08:00

examples

ci: restore previous linter rules (#13322 )

2025-12-03 18:55:02 -08:00

client_test.go

api/client: handle non-json streaming errors (#13007 )

2025-12-01 15:10:16 -08:00

client.go

x/imagegen: add naive TeaCache and FP8 quantization support (#13683 )

2026-01-12 13:45:22 -08:00

types_test.go

preserve tool definition and call JSON ordering (#13525 )

2026-01-05 18:03:36 -08:00

types_typescript_test.go

tools: support anyOf types

2025-08-05 16:46:24 -07:00

types.go

preserve tool definition and call JSON ordering (#13525 )

2026-01-05 18:03:36 -08:00