diff --git a/docs/capabilities/image-generation.mdx b/docs/capabilities/image-generation.mdx new file mode 100644 index 000000000..40201d2dd --- /dev/null +++ b/docs/capabilities/image-generation.mdx @@ -0,0 +1,205 @@ +--- +title: Image Generation +--- + + +Image generation is experimental and currently only available on macOS. This feature may change in future versions. + + +Image generation models create images from text prompts. Ollama supports diffusion-based image generation models through both Ollama's API and OpenAI-compatible endpoints. + +## Usage + + + + ```shell + ollama run x/z-image-turbo "a sunset over mountains" + ``` + The generated image will be saved to the current directory. + + + ```shell + curl http://localhost:11434/api/generate -d '{ + "model": "x/z-image-turbo", + "prompt": "a sunset over mountains", + "stream": false + }' + ``` + + + ```python + import ollama + import base64 + + response = ollama.generate( + model='x/z-image-turbo', + prompt='a sunset over mountains', + ) + + # Save the generated image + with open('output.png', 'wb') as f: + f.write(base64.b64decode(response['image'])) + + print('Image saved to output.png') + ``` + + + ```javascript + import ollama from 'ollama' + import { writeFileSync } from 'fs' + + const response = await ollama.generate({ + model: 'x/z-image-turbo', + prompt: 'a sunset over mountains', + }) + + // Save the generated image + const imageBuffer = Buffer.from(response.image, 'base64') + writeFileSync('output.png', imageBuffer) + + console.log('Image saved to output.png') + ``` + + + +### Response + +The response includes an `image` field containing the base64-encoded image data: + +```json +{ + "model": "x/z-image-turbo", + "created_at": "2024-01-15T10:30:15.000000Z", + "image": "iVBORw0KGgoAAAANSUhEUg...", + "done": true, + "done_reason": "stop", + "total_duration": 15000000000, + "load_duration": 2000000000 +} +``` + +## Image dimensions + +Customize the output image size using the `width` and `height` parameters: + + + + ```shell + curl http://localhost:11434/api/generate -d '{ + "model": "x/z-image-turbo", + "prompt": "a portrait of a robot artist", + "width": 768, + "height": 1024, + "stream": false + }' + ``` + + + ```python + import ollama + + response = ollama.generate( + model='x/z-image-turbo', + prompt='a portrait of a robot artist', + width=768, + height=1024, + ) + ``` + + + ```javascript + import ollama from 'ollama' + + const response = await ollama.generate({ + model: 'x/z-image-turbo', + prompt: 'a portrait of a robot artist', + width: 768, + height: 1024, + }) + ``` + + + +## Streaming progress + +When streaming is enabled (the default), progress updates are sent during image generation: + +```json +{ + "model": "x/z-image-turbo", + "created_at": "2024-01-15T10:30:00.000000Z", + "completed": 5, + "total": 20, + "done": false +} +``` + +The `completed` and `total` fields indicate the current progress through the diffusion steps. + +## Parameters + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `prompt` | Text description of the image to generate | Required | +| `width` | Width of the generated image in pixels | Model default | +| `height` | Height of the generated image in pixels | Model default | +| `steps` | Number of diffusion steps | Model default | + +## OpenAI compatibility + +Image generation is also available through the OpenAI-compatible `/v1/images/generations` endpoint: + + + + ```shell + curl http://localhost:11434/v1/images/generations \ + -H "Content-Type: application/json" \ + -d '{ + "model": "x/z-image-turbo", + "prompt": "a sunset over mountains", + "size": "1024x1024", + "response_format": "b64_json" + }' + ``` + + + ```python + from openai import OpenAI + + client = OpenAI( + base_url='http://localhost:11434/v1/', + api_key='ollama', # required but ignored + ) + + response = client.images.generate( + model='x/z-image-turbo', + prompt='a sunset over mountains', + size='1024x1024', + response_format='b64_json', + ) + + print(response.data[0].b64_json[:50] + '...') + ``` + + + ```javascript + import OpenAI from 'openai' + + const openai = new OpenAI({ + baseURL: 'http://localhost:11434/v1/', + apiKey: 'ollama', // required but ignored + }) + + const response = await openai.images.generate({ + model: 'x/z-image-turbo', + prompt: 'a sunset over mountains', + size: '1024x1024', + response_format: 'b64_json', + }) + + console.log(response.data[0].b64_json.slice(0, 50) + '...') + ``` + + + +See [OpenAI compatibility](/api/openai-compatibility#v1imagesgenerations-experimental) for more details. diff --git a/docs/docs.json b/docs/docs.json index 921c9e34e..b6614dbd7 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -93,6 +93,7 @@ "/capabilities/thinking", "/capabilities/structured-outputs", "/capabilities/vision", + "/capabilities/image-generation", "/capabilities/embeddings", "/capabilities/tool-calling", "/capabilities/web-search" diff --git a/docs/openapi.yaml b/docs/openapi.yaml index 4817bcb41..e5cc24a21 100644 --- a/docs/openapi.yaml +++ b/docs/openapi.yaml @@ -117,6 +117,15 @@ components: top_logprobs: type: integer description: Number of most likely tokens to return at each token position when logprobs are enabled + width: + type: integer + description: (Experimental) Width of the generated image in pixels. For image generation models only. + height: + type: integer + description: (Experimental) Height of the generated image in pixels. For image generation models only. + steps: + type: integer + description: (Experimental) Number of diffusion steps. For image generation models only. GenerateResponse: type: object properties: @@ -161,6 +170,15 @@ components: items: $ref: "#/components/schemas/Logprob" description: Log probability information for the generated tokens when logprobs are enabled + image: + type: string + description: (Experimental) Base64-encoded generated image data. For image generation models only. + completed: + type: integer + description: (Experimental) Number of completed diffusion steps. For image generation streaming progress. + total: + type: integer + description: (Experimental) Total number of diffusion steps. For image generation streaming progress. GenerateStreamEvent: type: object properties: @@ -200,6 +218,15 @@ components: eval_duration: type: integer description: Time spent generating tokens in nanoseconds + image: + type: string + description: (Experimental) Base64-encoded generated image data. For image generation models only. + completed: + type: integer + description: (Experimental) Number of completed diffusion steps. For image generation streaming progress. + total: + type: integer + description: (Experimental) Total number of diffusion steps. For image generation streaming progress. ChatMessage: type: object required: [role, content]