diff --git a/README.md b/README.md index 27bfd040d..bda6b4c34 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ ollama run gemma3 ## Model library -Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library 'ollama model library') +Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library "ollama model library") Here are some example models that can be downloaded: @@ -79,7 +79,7 @@ Here are some example models that can be downloaded: | Code Llama | 7B | 3.8GB | `ollama run codellama` | | Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` | | LLaVA | 7B | 4.5GB | `ollama run llava` | -| Granite-3.3 | 8B | 4.9GB | `ollama run granite3.3` | +| Granite-3.3 | 8B | 4.9GB | `ollama run granite3.3` | > [!NOTE] > You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. @@ -260,6 +260,38 @@ Finally, in a separate shell, run a model: ./ollama run llama3.2 ``` +## Building with MLX (experimental) + +First build the MLX libraries: + +```shell +cmake --preset MLX +cmake --build --preset MLX --parallel +cmake --install build --component MLX +``` + +Next, build the `ollama-mlx` binary, which is a separate build of the Ollama runtime with MLX support enabled (needs to be in the same directory as `ollama`): + +```shell +go build -tags mlx -o ollama-mlx . +``` + +Finally, start the server: + +``` +./ollama serve +``` + +### Building MLX with CUDA + +When building with CUDA, use the preset "MLX CUDA 13" or "MLX CUDA 12" to enable CUDA with default architectures: + +```shell +cmake --preset 'MLX CUDA 13' +cmake --build --preset 'MLX CUDA 13' --parallel +cmake --install build --component MLX +``` + ## REST API Ollama has a REST API for running and managing models. @@ -422,7 +454,7 @@ See the [API documentation](./docs/api.md) for all endpoints. - [AppFlowy](https://github.com/AppFlowy-IO/AppFlowy) (AI collaborative workspace with Ollama, cross-platform and self-hostable) - [Lumina](https://github.com/cushydigit/lumina.git) (A lightweight, minimal React.js frontend for interacting with Ollama servers) - [Tiny Notepad](https://pypi.org/project/tiny-notepad) (A lightweight, notepad-like interface to chat with ollama available on PyPI) -- [macLlama (macOS native)](https://github.com/hellotunamayo/macLlama) (A native macOS GUI application for interacting with Ollama models, featuring a chat interface.) +- [macLlama (macOS native)](https://github.com/hellotunamayo/macLlama) (A native macOS GUI application for interacting with Ollama models, featuring a chat interface.) - [GPTranslate](https://github.com/philberndt/GPTranslate) (A fast and lightweight, AI powered desktop translation application written with Rust and Tauri. Features real-time translation with OpenAI/Azure/Ollama.) - [ollama launcher](https://github.com/NGC13009/ollama-launcher) (A launcher for Ollama, aiming to provide users with convenient functions such as ollama server launching, management, or configuration.) - [ai-hub](https://github.com/Aj-Seven/ai-hub) (AI Hub supports multiple models via API keys and Chat support via Ollama API.) @@ -494,7 +526,7 @@ See the [API documentation](./docs/api.md) for all endpoints. ### Database - [pgai](https://github.com/timescale/pgai) - PostgreSQL as a vector database (Create and search embeddings from Ollama models using pgvector) - - [Get started guide](https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md) + - [Get started guide](https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md) - [MindsDB](https://github.com/mindsdb/mindsdb/blob/staging/mindsdb/integrations/handlers/ollama_handler/README.md) (Connects Ollama models with nearly 200 data platforms and apps) - [chromem-go](https://github.com/philippgille/chromem-go/blob/v0.5.0/embed_ollama.go) with [example](https://github.com/philippgille/chromem-go/tree/v0.5.0/examples/rag-wikipedia-ollama) - [Kangaroo](https://github.com/dbkangaroo/kangaroo) (AI-powered SQL client and admin tool for popular databases) @@ -637,6 +669,7 @@ See the [API documentation](./docs/api.md) for all endpoints. - [llama.cpp](https://github.com/ggml-org/llama.cpp) project founded by Georgi Gerganov. ### Observability + - [Opik](https://www.comet.com/docs/opik/cookbook/ollama) is an open-source platform to debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. Opik supports native integration to Ollama. - [Lunary](https://lunary.ai/docs/integrations/ollama) is the leading open-source LLM observability platform. It provides a variety of enterprise-grade features such as real-time analytics, prompt templates management, PII masking, and comprehensive agent tracing. - [OpenLIT](https://github.com/openlit/openlit) is an OpenTelemetry-native tool for monitoring Ollama Applications & GPUs using traces and metrics. @@ -645,4 +678,5 @@ See the [API documentation](./docs/api.md) for all endpoints. - [MLflow Tracing](https://mlflow.org/docs/latest/llms/tracing/index.html#automatic-tracing) is an open source LLM observability tool with a convenient API to log and visualize traces, making it easy to debug and evaluate GenAI applications. ### Security + - [Ollama Fortress](https://github.com/ParisNeo/ollama_proxy_server) diff --git a/x/README.md b/x/README.md deleted file mode 100644 index 5af087380..000000000 --- a/x/README.md +++ /dev/null @@ -1,50 +0,0 @@ -# Experimental Features - -## MLX Backend - -We're working on a new experimental backend based on the [MLX project](https://github.com/ml-explore/mlx) - -Support is currently limited to MacOS and Linux with CUDA GPUs. We're looking to add support for Windows CUDA soon, and other GPU vendors. - -### Building ollama-mlx - -The `ollama-mlx` binary is a separate build of Ollama with MLX support enabled. This enables experimental features like image generation. - -#### macOS (Apple Silicon and Intel) - -```bash -# Build MLX backend libraries -cmake --preset MLX -cmake --build --preset MLX --parallel -cmake --install build --component MLX - -# Build ollama-mlx binary -go build -tags mlx -o ollama-mlx . -``` - -#### Linux (CUDA) - -On Linux, use the preset "MLX CUDA 13" or "MLX CUDA 12" to enable CUDA with the default Ollama NVIDIA GPU architectures enabled: - -```bash -# Build MLX backend libraries with CUDA support -cmake --preset 'MLX CUDA 13' -cmake --build --preset 'MLX CUDA 13' --parallel -cmake --install build --component MLX - -# Build ollama-mlx binary -CGO_CFLAGS="-O3 -I$(pwd)/build/_deps/mlx-c-src" \ -CGO_LDFLAGS="-L$(pwd)/build/lib/ollama -lmlxc -lmlx" \ -go build -tags mlx -o ollama-mlx . -``` - -#### Using build scripts - -The build scripts automatically create the `ollama-mlx` binary: - -- **macOS**: `./scripts/build_darwin.sh` produces `dist/darwin/ollama-mlx` -- **Linux**: `./scripts/build_linux.sh` produces `ollama-mlx` in the output archives - -## Image Generation - -Image generation is built into the `ollama-mlx` binary. Run `ollama-mlx serve` to start the server with image generation support enabled.